ProSPOTLIGHT Menu

Member

Brian Broderick

Avatar_sm

My ProSPOTLIGHT

ProFILE

ProSITE

ProBLOG

ProCARD

ProSCORE

ProBLOG™

Using grep to find the number of occurances in a file

In doing some Linux server maintenance, I noticed that the server was using quite a bit more of its CPU resources than normal, yet my Analytics wasn't showing much of a spike in traffic. I have a rather large Apache access_log file, and I wanted to see how many times a particular bot spidered my web pages.  Looking through it by hand isn't practical since the log is over 1GB in size.

Instead, what I did was this simple grep command:

grep -c "regularexpression" access_log

In the quotes, I put the real string that I was searching for.  The C flag refers to "Count", which returns the number of times that regular expression occurs in the file.

In this case, the spider that I thought was the culprit had only downloaded 50 web pages, but the true culprit had downloaded many more.  It was using a regular browser's User-Agent so it's either a really active visitor, a macro plugin, or a spider spoofing a real browser.  Either way, if that IP address keeps up that level of activity on the site, an easy solution is to block it via IPTables.

 

Tags: grep, linux, server administration