15 most useful Linux commands for file system maintenance

15 most useful Linux commands for file system maintenance



One of the most common and tedious tasks of a sysadmin is to prevent file systems become completely full, because when a server runs out of space the consequences are unpredictable. Depending on how you structured the root file system and if it is divided into different partitions or volumes, those consequences will be more or less severe, but in any case undesirable.

In any case it is always better to be safe than sorry, so use tools that perform automatic log rotation as logrotate and custom scripts to monitor and conduct periodic emptying actions to prevent file systems to get full. However, still using these prevention methods it is for sure it will be many times when you will have to act manually to troubleshoot problems.

Below I gather a collection of Linux commands that I find most useful on my day to day work freeing up disk space and keeping my file systems in optimal health from command line.

1. Check free space available

To find the free space available on all filesystems within a computer execute the following command:

~$ df -h
S.ficheros     TamaƱo Usados  Disp Uso% Montado en
/dev/sdb5        3,9G   842M  3,0G  22% /
udev             2,0G   4,0K  2,0G   1% /dev
tmpfs            791M   956K  790M   1% /run
none             5,0M      0  5,0M   0% /run/lock
none             2,0G    72M  1,9G   4% /run/shm
/dev/sdb1        118M    91M   28M  77% /boot
/dev/sdb10        31G   6,8G   24G  23% /home
/dev/sdb6         23G    12G   12G  52% /usr
/dev/sdb7        9,6G   1,5G  8,1G  16% /var
/dev/sdb8        2,9G    62M  2,8G   3% /tmp
/dev/sdb11       102G    11G   91G  11% /vmware
/dev/sda1        230G    93G  125G  43% /var/datos

For a specific directory:

~$ df -h /home
S.ficheros     TamaƱo Usados  Disp Uso% Montado en
/dev/sdb10        31G   6,8G   24G  23% /home

To display the file systems in order of occupation and thus know which ones are fuller:

~$ df -h | awk '{print $5 " " $6}' | sort -n | tail -5
22% /
23% /home
43% /var/datos
52% /usr
77% /boot

2. Calculate directory size

The -h parameter shows directory size in a friendly human readable way, either in Kilobytes, Megabytes, Gigabytes or Terabytes.

~# du -h -s /var/log
9,6M	/var/log

3. Remove vs empty files

We usually use the rm command to remove files in order to free up space. However, it is very common that we can not delete a file because it is being used at that time by an application, which is most common with log files on production systems wich can not be stopped. Removing them directly can have harmful effect, such as hanging the application, or milder but also undesirable, as dumping data to these files is interrupted and they are no longer usefu


 In order to not alter application behavior and achieve our goal of freeing up disk space we will empty files instead of deleting them:

~# >/var/log/syslog

After this the file will be 0 bytes size.

If you need to empty multiple files at once with a single command:

~# for I in `ls "/var/log/*.log"`;do >"$I";done

4. Count the number of files in a directory

~# ls -l /var/log | wc -l
80

5. Get the bigger files in a filesystem

This command is also useful when you want to free up space, as it shows the largest files within a directory including all its subdirectories.

~# du -k /var/log | sort -n | tail -5
516	/var/log/apache2
5256	/var/log/exim4
12884	/var/log/installer/cdebconf
13504	/var/log/installer
21456	/var/log

The file size must be displayed in Kilobytes (parameter -k). You must use this parameter instead of -h otherwise sort -n command will not sort the list the way you expect.

It’s important to limit the number of files you want to display with tail -X, where X is that number, because if the directory at issue has hundreds or thousands of files, the command output may take too much input/output overhead to your terminal and slow down the command response too much, especially if you are connecting remotely via telnet or ssh and the connection is not very fast

6. List the largest files in a directory

Similar to above, but in this case subdirectories are not included.

~# ls -lSr | tail -5
-rw-r----- 1 syslog      adm   118616 sep 29 22:05 auth.log
-rw-r--r-- 1 root        root  149012 sep  9 17:12 udev
-rw-r--r-- 1 root        root  160128 ago  4 19:27 faillog
-rw-r----- 1 syslog      adm   499400 sep 28 06:25 auth.log.1
-rw-rw-r-- 1 root        utmp 1461168 sep 29 21:54 lastlog

If -r parameter is removed files listed would be the smaller instead of the larger ones.

7. Calculate the size of only certain files

For example, if you want to get the total size of only .log files in a directory use the following command:

~# du -ch /var/log/*.log | grep total
468K	total

8. Find large files setting boundaries

For example, those greater than 100 MB in size, or those between 100 MB and 1 GB:

~$ find . -type f -size +100M -ls
~$ find . -type f -size +100M -size -1G -ls

9. List the most recently modified files

~# ls -larth /var/log | tail -5
drwxr-xr-x 20 root              root     4,2K sep 30 12:27 .
-rw-rw-r--  1 root              utmp      11K sep 30 13:03 wtmp
-rw-r-----  1 syslog            adm       13K sep 30 13:03 syslog
-rw-r-----  1 syslog            adm       13K sep 30 13:03 kern.log
-rw-r-----  1 syslog            adm      1000 sep 30 13:03 auth.log

The -a parameter says that hidden files must also be displayed.

10. Find old files (I)

Many times we need to know the files modified within a given time interval. In the following example files older than 90 days are located in order to find out old files no longer in use wich can be safely removed to free up space.

~# find /var/log -mtime +90 -ls
~# find /var/log -mtime +90 -ls -exec rm {} \;

The first command locates files only, the second one also removes them.

11. Find old files (II)

Same as above, but in this case also files that have been accessed, modified or not, within the specified time interval are considered.

~# find /var/log -atime +90 -ls

12. Find empty files

The following command allows you to find files in the current directory with a size of 0 bytes, ie empty. This is useful at anomalous situations in which this files are generated, for example after a file system got 100% full and applications tried to unsuccesfully write to disk, or by an abnormal application behavior. Cleaning is necessary given these scenarios, because although those empty files do not take up disk space they can consume all available file system inodes if 0 byte files are massively created, wich in turn causes no more files can be created.

~$ find . -type f -size 0b -ls

Or:

~$ find . -type f -empty -ls

To know the number of free inodes available in a file system use the df -i command.

13. Package and compress directory content

Sometimes it’s useful to package all log files in a directory into a single compressed tar file to preserve the state of that directory at a given point in time and then safely remove or empty all those files in order to free up space.

~# tar -zcvf var_log.`date +%Y%m%d`.tar.gz /var/log/*.log

Last command compresses all log files into a single file with .tar.gz extension and today’s date to make it easier to locate them in the future. Let’s see how to save space, passing in this example from 468 MB to 35 MB:

~# du -ch /var/log/*.log | grep total
468M	total
~# ls -lh var_log.20140930.tar.gz 
-rw-r--r-- 1 root root 35M sep 30 13:36 var_log.20140930.tar.gz

After that we can proceed and empty all log files as in section #3.

14. Find files in Recycle Bin

Normally when we send a file to the recycle bin it is simply moved to a hidden folder in your home directory such as ~/.local/share/Trash in Ubuntu. However, there exist applications that use their own directories to store the trash with a name that is a combination of the word trash either in upper or lower case in combination with a sequence of numbers, such as .Trash001.trash-002.Trash_0003, etc.

Also when file systems from external hard drives or SD cards are mounted, the recycle bin’s name can differ from one operating system to another wich can cause it is not recognized and therefore although bin is emptied the device continues to have a large amount of space used for no apparent reason.

Therefore, the solution lies in searching all *trash* subdirectories you have on your system with no upper and lower case differentation and analyzing its contents to see if we can get rid of it (not always all items found are garbage).

The following is the required command. Its execution can be very time consuming so you may want to enter a specific file system or directory:

~$ find / -iname "*trash*" -ls

15. Find duplicate files

Finally, here’s a huge command that will allow you to find and delete duplicate files under a directory to avoid unnecessary redundancies that can be very costly in terms of disk space consumed.

~$ find -not -empty -type f -printf "%s\n" | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate | cut -f3-100 -d ' ' | tr '\n.' '\t.' | sed 's/\t\t/\n/g' | cut -f2-100 | tr '\t' '\n' | perl -pe 's/([ (){}-])/\\$1/g' | perl -pe 's/'\''/\\'\''/g' | xargs -pr rm -v

And what about you? Do you have other useful commands that you are used to use in order to keep your file systems as empty as you can?

Comments