Parsing the user agent string using PHP
Recently I experimented a bit with an Apache log file analyzer written in PHP. It’s not all that difficult were it not for trying to parse the browser, or user agent, string. There are in fact two RFC documents, RFC 1945 and RFC 2068, that define how a user agent string should be written. Still many does not adhere to these standards and many write the interesting details in the comment field only. There is a good article at texSoft.it on how to identify the user agent and I have tried to steal bits and pieces of my time to implement the algorithm.
For my purposes I don’t care much for the operating system details. This is the result so far. I’m still not very satisfied but I thought maybe other people might be interested and maybe help out. Maybe there is something better out there? I’d be happy for any input.
function parseUserAgent($ua)
{
$userAgent = array();
$agent = $ua;
$products = array();
$pattern = "([^/[:space:]]*)" . "(/([^[:space:]]*))?"
."([[:space:]]*\[[a-zA-Z][a-zA-Z]\])?" . "[[:space:]]*"
."(\\((([^()]|(\\([^()]*\\)))*)\\))?" . "[[:space:]]*";
while( strlen($agent) > 0 )
{
if ($l = ereg($pattern, $agent, $a))
{
// product, version, comment
array_push($products, array($a[1], // Product
$a[3], // Version
$a[6])); // Comment
$agent = substr($agent, $l);
}
else
{
$agent = "";
}
}
// Directly catch these
foreach($products as $product)
{
switch($product[0])
{
case 'Firefox':
case 'Netscape':
case 'Safari':
case 'Camino':
case 'Mosaic':
case 'Galeon':
case 'Opera':
$userAgent[0] = $product[0];
$userAgent[1] = $product[1];
break;
}
}
if (count($userAgent) == 0)
{
// Mozilla compatible (MSIE, konqueror, etc)
if ($products[0][0] == 'Mozilla' &&
!strncmp($products[0][2], 'compatible;', 11))
{
$userAgent = array();
if ($cl = ereg("compatible; ([^ ]*)[ /]([^;]*).*",
$products[0][2], $ca))
{
$userAgent[0] = $ca[1];
$userAgent[1] = $ca[2];
}
else
{
$userAgent[0] = $products[0][0];
$userAgent[1] = $products[0][1];
}
}
else
{
$userAgent = array();
$userAgent[0] = $products[0][0];
$userAgent[1] = $products[0][1];
}
}
return $userAgent;
}
If you enjoyed this post, please consider to leave a comment or subscribe to the feed and get future articles delivered to your feed reader.

Handy script — cheers. Just thought I’d let you know that it doesn’t recognise Chrome (it says it’s Safari)
One other thing, often it’s better to check for the rendering engine (Gecko, Trident, Presto, WebKit/KHTML) than the actual browser. Camino, Firefox, and Seamonkey based on the same gecko backend should display pages the same way, so unless you’re wanting to interact with the browser rather than the renderer, there’s no reason to check for Camino vs Firefox vs Seamonkey (same thing with Chrome, Konqueror, Safari).
Oh this is very cool. Thanks. I haven’t messed around with any coding in several months, but I’m trying to tweak it to directly catch Chrome and the Spinn3r bot.
You should mention the source when using customized copypaste code.
To Phurious: Luckily I have. If you actually read the blog post you’ll see the below text including a link to the very source…
“There is a good article at texSoft.it on how to identify the user agent and I have tried to steal bits and pieces of my time to implement the algorithm.”