Manipulating files and directories with the command line

mv: moving and renaming files

Since we’re currently doing a command line tutorial, let’s go into that directory and see what is there.

cd command_line_recitation
ls

We see that we have a directory called sequences, as well as a FASTA file named some sequence.fasta. This file name has the annoying space in it. We would like to rename it something without a space, say some_sequence.fasta. To do this, we us the ``mv`` command, short for “move”. We enter mv, followed by the name of the file we want to rename, and then its new name.

mv some sequence.fasta some_sequence.fasta

Uh-oh! That gave us some strange output, talking about the usage of mv. This is because the space in the file some sequence.fasta was interpreted as a gap between arguments of the mv command. To specify that the space is part of the file name, we need to use an escape character. The escape character for macOS or Linux is \. With Windows, you can use a caret ^ as an escape character or you can enclose the file name with a space in single quotes. The space following the escape character is not considered as an argument separator. To change the name do the following.

  • macOS or Linux:
    mv some\ sequence.fasta some_sequence.fasta
    or
    mv 'some sequence.fasta' some_sequence.fasta
  • Windows:
    mv 'some sequence.fasta' some_sequence.fasta

You will also work with some special directories, such as the one for you team’s repo. These directories are managed by Git, that will synchronize them to your github repo. As you synchronize them according to the version of the files you have, they are called “under version control”. When you are working on files under version control, you should precede the mv command with git. That way, Git will keep track of the naming changes you made. So, you would do this:

  • macOS or Linux:
    git mv some\ sequence.fasta some_sequence.fasta
    or
    git mv 'some sequence.fasta' some_sequence.fasta
  • Windows:
    git mv 'some sequence.fasta' some_sequence.fasta

Now, we probably want this file in the sequences directory. We can also move files into directories (without changing their file names) using the mv command.

mv some_sequence.fasta sequences/

The trailing slash is not necessary, but I always include it out of habit to remind myself that I am moving a file to a directory. (Again, if these files were under version control, you would precede the above command with git.)

Now let’s go into the sequences directory and see what we have.

cd sequences
ls

We see that some_sequence.fasta is there, along with other FASTA files.

Word to the wise: NO SPACES

Look at what is in the directory using ls.

ls

The directory command_line_recitation/ has some files that will help us through this lesson. Note that there are no spaces in the directory name. In general, you avoid spaces in directory and file names, even though your operating system often has them in there. Trust me on this, they can make things a total mess, especially on the command line, since a space also separates commands. Really. NO SPACES.

Exploring file content

We would like to see what is in the sequence files. Bash offers various ways to display the content of files. We’ll look at the genome of the dengue virus in the file dengue.fasta. There are lots of ways to do it. We’ll start with less. It got its name because it is more feature-rich than more, which was used to look at files before less came to be. (“less is more,” get it?) It allows using the arrow up and arrow down keys traverse up or down by line. It also allows scrolling by touchpad or mouse. Since it doesn’t require the whole file to be read before displaying the top content, it’s ideal for larger files.

  • macOS or Linux: less dengue.fasta

  • Windows: more dengue.fasta

It also supports searching initiated by “/” followed by the query: /AAAA. You can move down by a number of lines by “:” followed by the number of lines: :40.

To show line numbers, type -N. Other useful commands are shift+g will go to the end of the file and gg to the beginning. To exit less or more, hit q.

We’ll now look at several other ways to look at files. Just substitute them for less in the above command.

cat

cat prints the entire file to the standard output (terminal). This is especially useful if the files are very small. Windows users, use !type instead of cat.

tail

Like head, but for the last lines of the file. Windows users, note alternative command: gc myfile.txt -tail 5.

You will notice that less does not show the text in the terminal once you exit, whereas cat, head, and tail do. This might be useful if you need something from the file for your next command.

Copying files and directories: cp

If you want to retain a copy of the folder/file in the original folder you can use the copy command cp. It works straightforwardly with files. Applied to directories it requires a flag: cp -r, meaning “recursive.” A flag typically begins with a hyphen (-) and gives the command some extra directions on how you want to do things. In this case, we are telling cp to work recursively.

Let’s have a look at the cp command in action.

cp dengue.fasta copy_of_dengue.fasta

Maybe we want a copy of the entire sequences directory. To do that, we will cd one directory up to the command_line_recitation directory.

cd ..

We went up one directory using ... This is an example of a relative path. The current directory is “.”, “../..” is two directories up, “../../..” is three directories up, and so on. This is very very useful when navigating directory structures. Now let’s try copying an entire directory with the -r flag.

cp -r sequences copy_of_sequences

We can also rename directories with the mv command. Let’s rename copy_of_sequences to sequences_copy. This is silly, but illustrates how things work.

mv copy_of_sequences sequences_copy

One more thing… rsync is a generally better version of cp and it’s available on most distributions and OSes. Knowing that cp exists is necessary, using it everyday, not so much. rsync -avzP is a great setup where your files will be copied faster (thanks to compression), while maintaining all their properties and showing you progress.

Removing files and directories with rm

Yes, some of the things we just did are silly. We have no need for having a copy of a given sequence or a copy of the whole sequences directory. We can clean things up by deleting them. First, let’s get rid of our copy of the dengue sequence. Let’s cd into the sequences directory and make sure it’s there.

cd sequences
ls

Now let’s remove the file and verify it is gone.

rm copy_of_dengue.fasta
ls

And poof! It’s gone! And I mean gone. It is pretty much irrecoverable. Warning: rm is a wrecking ball. It will destroy any files you have that do not have restrictive permissions. This is so important, I will say it again.

rm is unforgiving.

Therefore, I always like to use the -i flag, which means that rm will ask me if I’m sure before deletion.

rm -i some_sequence.fasta

You will get a prompt. Answer “n” if you do not want to delete it.

Now, cd into the higher directory cd ../, and let’s use rm to remove an entire directory. To do this, we need to use the -r flag.

rm -r sequences_copy

In the same way, copying onto a previously existing file with cp or rsync is also irreversible, so be very careful when using them too! cp -i also exists and, if you are unsure about what you might be doing, choose this over rsync.

rm -

Copyright note: In addition to the copyright shown below, this recitation was developed based on materials from Axel Müller.