Mininook

Musings on Christianity, Politics, and Computer Science Geekery

Tag: Linux

Quick Bar-Chart of disk usage

Today I was in search of a command that I had used a long time ago, but ran into a much more interesting one instead.  At the time, I must have been needing to discover what files were the largest disk hogs and if there was a long tail (i.e. how many of the 3.7M files in this directory--not my fault, by the way--were inconsequential).  That brings us to this wonderful "one-line" command:

find /dir/ -name "*.xml" -exec du -s {} ; | perl -ni -e 'if (/^(d+)s+(.*)/) { $h{$2} = $1; if ($max < $1) { $max = $1; } if (length($2) > $maxfname) { $maxfname = length($2); } } END { map { $barlen = ($h{$_} / $max) * 50; $bar = "*" x $barlen; printf ("%" . $maxfname . "s" . "(%5d): %s", $_, $h{$_}, $bar); print "n"; } sort { $h{$b} <=> $h{$a} } keys %h }' 2> /dev/null > report.txt

What that specifically does is to find every XML file in the dir directory, use the linux du command to get the file's size.  That list of filenames and sizes is passed to a hacky perl script that pulls out the size, creates a horizontal histogram bar based on the max size (limit 50 *s wide), sort and return the list from max to min.  Lastly, that's saved to report.txt.

That's quite a quick and dirty trick, but produces a nice command-line output like this:

/dir/w6bz9whg.xml(36560): **************************************************
/dir/w6km312r.xml(31772): *******************************************
/dir/w68d03gz.xml(27728): *************************************
/dir/w6vt5fhv.xml(27076): *************************************
/dir/w6m07v80.xml(17420): ***********************
/dir/w68m0zj8.xml(15276): ********************
/dir/w6mq7qpz.xml(15052): ********************
/dir/w6vq30tq.xml(13808): ******************
/dir/w6tb51hr.xml(13160): *****************
...

 

Command Line Master

Wanted to post the craziest command line script I've used in a long time.  Used to convert names listed in XML tags in an EAC-CPF record to filenames to copy.

grep -h -o -P "<relationEntry>(.*?)</relationEntry>" *.xml
 | sed -e 's/<[a-zA-Z0-9\/\+]*>//g'
 | awk '{print tolower($0)}'
 | sed -e 's/[ ,.:]\+/\-/g'
 | sed -e 's/$/cr.xml/g'
 | while read x ; do cp /data/production/data/$x eac_data/. ; done