Manage Learn to apply best practices and optimize your operations.

How to use user-agent strings as a network monitoring tool

User-agent strings, elements of HTTP headers, can be used as a network monitoring tool to reveal information about client networks without affecting network performance or privacy. Learn more in this tip by Richard Bejtlich.

Solution provider takeaway: User-agent strings, elements of HTTP headers, can be used as a network monitoring tool to reveal information about client networks without affecting network performance or privacy. Learn how to benefit from user-agent strings in this tip by Richard Bejtlich.
More "Traffic Talk" tips on network monitoring from Richard Bejtlich
Network security monitoring: Know your network

Network security monitoring using transaction data

How to deploy NetFlow v5 and v9 probes and analyzers

Clients of network services often want to know more about their network. In this edition of Traffic Talk, I will...

demonstrate how user-agent strings can be used as a networking monitoring tool to reveal an enormous amount of information with little or no impact on network performance or privacy.

A user-agent string is an element of an HTTP header sent by HTTP clients such as Web browsers. The following HTTP request includes a user-agent string from a Windows XP SP3 system running Firefox, talking to a Squid proxy server.

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: Gecko/20090824 Firefox/3.5.3 (.NET CLR 3.5.30729)
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Proxy-Connection: keep-alive
Cookie: BIGipServerlive=2768357386.41503.0000

The user-agent string displays a lot of interesting information that can be used to identify the version of the operating system and application that made the request.

Collecting user-agent strings
Network administrators can collect user-agent strings in two ways. The first is to extract them from proxy logs. For example, a Squid proxy log might contain an entry like the following:

1256175164.757 ::: 38 ::: ::: TCP_MISS/302 ::: 748 ::: GET ::: ::: - ::: DIRECT/ ::: text/html :::
"" ::: "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: Gecko/20090824 Firefox/3.5.3 (.NET CLR 3.5.30729)"
The triple colons ( ::: ) were added intentionally, for reasons that will appear next. The entry in the squid.conf file used to generate this log format is the following:

logformat squid-extended %ts.%03tu ::: %6tr ::: %>a ::: %Ss/%03Hs ::: % h" ::: "%{User-Agent}>h"

The second way to gather user-agent strings is to examine network traffic, perhaps using a tool like Httpry. I explained this method in the tip "Network security monitoring using transaction data."

Once you have logs, what can you do with them? Consider the following command that examines Squid proxy logs, extracts the source IP addresses and user-agents, counts unique appearances, and sorts them.

cat /usr/local/squid/logs/access.log | awk 'FS=":::" {print $3 $12}' | sort -k 2 | uniq -c

Here we see the file separator (FS) is set to triple colons. In my experience, "traditional" file separators like commas or pipes appear too frequently in HTTP requests to be useful for logging, but you are free to use whatever file separator you would like. An excerpt of the output of running a command like this on a small live network appears next. I describe a few interesting elements of each after they are listed.

r200a:/root# cat /usr/local/squid/logs/access.log | awk 'FS=":::" {print $3 $12}' | sort -k 2 | uniq -c

1 103:::
14 "-"
7 "AVGINET8-WVSHX86 85FREE AVI=270.14.10/2429 BUILD=421 LOC=1033 LIC=8FREE--[...key obscured...] DIAG=90 OPF=0 PCA=" 6 "AVGINET8-WVSHX86 85FREE AVI=270.14.11/2430 BUILD=421 LOC=1033 LIC=8FREE--[...key obscured...] DIAG=90 OPF=0 PCA=" 10 "AVGINET8-WVSHX86 85FREE AVI=270.14.12/2431 BUILD=421 LOC=1033 LIC=8FREE--[...key obscured...] DIAG=90 OPF=0 PCA=" ...edited...

The three entries above show has been updating its AVG antivirus product.

4 "AVGINET8-WXPPX86 85FREE AVI=270.14.25/2450 BUILD=423 LOC=1033 LIC=8FREE--[...key obscured...] DIAG=380 OPF=0 PCA="

Now we see a different PC running AVG. It has a different LIC (license) key. Google searches for both keys reveal they are not unique to these systems.

8 "Adobe Update Manager 6" 1 "Client" ...edited...

The Adobe program is interesting because it must have checked local proxy settings to do its update. The "Client" entry is extremely interesting because it appears only once.

We can search the proxy logs for that entry:

r200a:/root# grep \"Client /usr/local/squid/logs/access.log

1255628697.525 ::: 226 ::: ::: TCP_MISS/204 ::: 406 ::: GET ::: ::: - ::: DIRECT/ ::: - ::: "-" ::: "Client"

We can see the system accessed with IP address (which belongs in Microsoft's netblock). So this appears to be related to a Microsoft application.

16 "MSDW"

This entry is also obscure.

r200a:/root# grep \"MSDW /usr/local/squid/logs/access.log

1255516537.205 ::: 1185 ::: ::: TCP_MISS/200 ::: 7640 ::: CONNECT ::: ::: - ::: DIRECT/ ::: - ::: "-" ::: "MSDW"
1255517315.250 ::: 680 ::: ::: TCP_MISS/200 ::: 7640 ::: CONNECT ::: ::: - ::: DIRECT/ ::: - ::: "-" ::: "MSDW"
1255674327.981 ::: 210 ::: ::: TCP_MISS/200 ::: 500 ::: GET ::: ::: - ::: DIRECT/ ::: text/html ::: "-" ::: "MSDW" ...truncated...

Checking the logs, we see another Microsoft application, perhaps related to Dr. Watson and Windows Defender.

2 "Python-urllib/2.5"
6 "Python-urllib/2.6"

These Python entries are probably not caused by a Windows application. Checking the logs we see they are used by Ubuntu.

r200a:/root# grep \"Python-urllib /usr/local/squid/logs/access.log

1256133834.436 ::: 188 ::: ::: TCP_MISS/304 ::: 275 ::: GET ::: ::: - ::: DIRECT/ ::: - ::: "-" ::: "Python-urllib/2.5" 1256133856.075 ::: 206 ::: ::: TCP_MISS/304 ::: 275 ::: GET ::: ::: - ::: DIRECT/ ::: - ::: "-" ::: "Python-urllib/2.5" 1256173578.838 ::: 187 ::: ::: TCP_MISS/304 ::: 345 ::: GET ::: ::: - ::: DIRECT/ ::: - ::: "-" ::: "Python-urllib/2.6" ...truncated...

As you can see, you can learn a lot about a network simply by looking at user-agent strings. The very simple network used to generate the logs for this story offered more than 60 different entries for analysis, but I displayed only nine for the sake of brevity. User-agent string mining can be used passively to identify and track applications and systems, for both inventory and security purposes. Consider ways you can use user-agent strings for network monitoring when working with client networks!

Richard Bejtlich is director of incident response for General Electric and author of the TaoSecurity blog.

Dig Deeper on Network management services