Wednesday, June 2, 2010

One-liner: Finding files that include a match

These ARE the files you're looking for

Today I am going to share a one-liner I use often to find file names.
Say for example you would like to print all files in the current directory that include the word Fred. You could use this one-liner:

perl -wnl -e '/Fred/ and print $ARGV and close ARGV' *

w is for warnings
n is for looping
l is for line end processing

The "and close ARGV" part at the end is to save time and keep you from double-printing file names. If 'Fred' pops up 50 times in a file, and you don't add the "and close ARGV" then the file name will show up 50 times in standard out. Another benefit of this is that since you are closing the file after you find 'Fred' the first time, you will no longer continue to process the file and thus you are saving time.

Finding all matching lines

Another way I use this one liner is to find examples from config files. For example, say I have a directory full of configuration files and I want to see how many of them use the same option:

perl -wnl -e '/^option_name/ and print "$_\n"' /path/to/configs/*

This one-liner instead prints every match to standard out. We got rid of the and close ARGV this time, because we really don't need it. If we are matching the entire config option, it should only show up once in the file. Otherwise say we are trying to match multiple similar config options (e.g. option_name_1 option_name_2). In that case we would want to print out each match "and close ARGV" would only allow us to print the first one.

I hope this helps you the next time you need to glean information from a large number of files.

1 comment:

  1. The problem with this method is that it's going to be VERY slow and will be especially painful when examining large numbers of files. It's much more efficient to use find in conjunction with xargs.

    Here is a safe and fast way to search through a bunch of files...

       find /path/to/something -type f -iname '*some_pattern*' -print0 | xargs -0 grep -H '^option_name'

    If you just want the filename, change the grep option -H to -l. This is handy for subsearches. So you can search for all files containing the specified pattern, then on those files, print the lines containing another pattern.

       find /path -type f -iname '*pattern*' -print0 | xargs -0 grep -lZ '^option_name' | xargs -0 grep -H 'another pattern'

    If you want to stick with perl to handle your matching, no problem:

       find /path -type f -iname '*pattern*' -print0 | xargs -0 perl -wnl -e '/^option_name/ and print "$_\n"'

    Keep in mind that using grep/egrep is normally MUCH faster than using perl (though I've noticed great improvements in 5.10+). If you want to stick with perl patterns, pcregrep is slightly faster than using perl directly. But for complicated patterns, I've found pcregrep to be quite a bit faster than grep/egrep.

    I use this method fairly regularly to search millions of files at a time.