Recently I experimented a bit with an Apache log file analyzer written in PHP. It’s not all that difficult were it not for trying to parse the browser, or user agent, string. There are in fact two RFC documents, RFC 1945 and RFC 2068, that define how a user agent string should be written. Still many does not adhere to these standards and many write the interesting details in the comment field only. There is a good article at texSoft.it on how to identify the user agent and I have tried to steal bits and pieces of my time to implement the algorithm.
For my purposes I don’t care much for the operating system details. This is the result so far. I’m still not very satisfied but I thought maybe other people might be interested and maybe help out. Maybe there is something better out there? I’d be happy for any input.
function parseUserAgent($ua)
{
$userAgent = array();
$agent = $ua;
$products = array();
$pattern = "([^/[:space:]]*)" . "(/([^[:space:]]*))?"
."([[:space:]]*\[[a-zA-Z][a-zA-Z]\])?" . "[[:space:]]*"
."(\\((([^()]|(\\([^()]*\\)))*)\\))?" . "[[:space:]]*";
while( strlen($agent) > 0 )
{
if ($l = ereg($pattern, $agent, $a))
{
// product, version, comment
array_push($products, array($a[1], // Product
$a[3], // Version
$a[6])); // Comment
$agent = substr($agent, $l);
}
else
{
$agent = "";
}
}
// Directly catch these
foreach($products as $product)
{
switch($product[0])
{
case 'Firefox':
case 'Netscape':
case 'Safari':
case 'Camino':
case 'Mosaic':
case 'Galeon':
case 'Opera':
$userAgent[0] = $product[0];
$userAgent[1] = $product[1];
break;
}
}
if (count($userAgent) == 0)
{
// Mozilla compatible (MSIE, konqueror, etc)
if ($products[0][0] == 'Mozilla' &&
!strncmp($products[0][2], 'compatible;', 11))
{
$userAgent = array();
if ($cl = ereg("compatible; ([^ ]*)[ /]([^;]*).*",
$products[0][2], $ca))
{
$userAgent[0] = $ca[1];
$userAgent[1] = $ca[2];
}
else
{
$userAgent[0] = $products[0][0];
$userAgent[1] = $products[0][1];
}
}
else
{
$userAgent = array();
$userAgent[0] = $products[0][0];
$userAgent[1] = $products[0][1];
}
}
return $userAgent;
}

Handy script — cheers. Just thought I’d let you know that it doesn’t recognise Chrome (it says it’s Safari)
One other thing, often it’s better to check for the rendering engine (Gecko, Trident, Presto, WebKit/KHTML) than the actual browser. Camino, Firefox, and Seamonkey based on the same gecko backend should display pages the same way, so unless you’re wanting to interact with the browser rather than the renderer, there’s no reason to check for Camino vs Firefox vs Seamonkey (same thing with Chrome, Konqueror, Safari).
Oh this is very cool. Thanks. I haven’t messed around with any coding in several months, but I’m trying to tweak it to directly catch Chrome and the Spinn3r bot.
You should mention the source when using customized copypaste code.
To Phurious: Luckily I have. If you actually read the blog post you’ll see the below text including a link to the very source…
“There is a good article at texSoft.it on how to identify the user agent and I have tried to steal bits and pieces of my time to implement the algorithm.”
Thanks for the script. I used it in my own access_log analyser
Possible fix for matching Chrome, assumes Chrome string proceeds other matches like Safari.
// Directly catch these
foreach($products as $product)
{
switch($product[0])
{
case ‘Firefox’:
case ‘Netscape’:
case ‘Chrome’: // find Chrome too
case ‘Safari’:
case ‘Camino’:
case ‘Mosaic’:
case ‘Galeon’:
case ‘Opera’:
$userAgent[0] = $product[0];
$userAgent[1] = $product[1];
break 2; // 2 is used to also break out of for loop on first find
}
}
Hi, how can i get the OS of the user??
http://php.net/manual/en/function.get-browser.php