Parsing the user agent string using PHP

Recently I experimented a bit with an Apache log file analyzer written in PHP. It’s not all that difficult were it not for trying to parse the browser, or user agent, string. There are in fact two RFC documents, RFC 1945 and RFC 2068, that define how a user agent string should be written. Still many does not adhere to these standards and many write the interesting details in the comment field only. There is a good article at texSoft.it on how to identify the user agent and I have tried to steal bits and pieces of my time to implement the algorithm.

For my purposes I don’t care much for the operating system details. This is the result so far. I’m still not very satisfied but I thought maybe other people might be interested and maybe help out. Maybe there is something better out there? I’d be happy for any input.

function parseUserAgent($ua)
  {

    $userAgent = array();
    $agent = $ua;
    $products = array();

    $pattern  = "([^/[:space:]]*)" . "(/([^[:space:]]*))?"
      ."([[:space:]]*\[[a-zA-Z][a-zA-Z]\])?" . "[[:space:]]*"
      ."(\\((([^()]|(\\([^()]*\\)))*)\\))?" . "[[:space:]]*";

    while( strlen($agent) > 0 )
      {
        if ($l = ereg($pattern, $agent, $a))
          {
            // product, version, comment
            array_push($products, array($a[1],    // Product
                                        $a[3],    // Version
                                        $a[6]));  // Comment
            $agent = substr($agent, $l);
          }
        else
          {
            $agent = "";
          }
      }

    // Directly catch these
    foreach($products as $product)
      {
        switch($product[0])
          {
          case 'Firefox':
          case 'Netscape':
          case 'Safari':
          case 'Camino':
          case 'Mosaic':
          case 'Galeon':
          case 'Opera':
            $userAgent[0] = $product[0];
            $userAgent[1] = $product[1];
            break;
          }
      }

    if (count($userAgent) == 0)
      {
        // Mozilla compatible (MSIE, konqueror, etc)
        if ($products[0][0] == 'Mozilla' &&
            !strncmp($products[0][2], 'compatible;', 11))
          {
            $userAgent = array();
            if ($cl = ereg("compatible; ([^ ]*)[ /]([^;]*).*",
                           $products[0][2], $ca))
              {
                $userAgent[0] = $ca[1];
                $userAgent[1] = $ca[2];
              }
            else
              {
                $userAgent[0] = $products[0][0];
                $userAgent[1] = $products[0][1];
              }
          }
        else
        {
          $userAgent = array();
          $userAgent[0] = $products[0][0];
          $userAgent[1] = $products[0][1];
        }
      }

    return $userAgent;
  }
Tagged with: ,
Posted in PHP
9 comments on “Parsing the user agent string using PHP
  1. Greg says:

    Handy script — cheers. Just thought I’d let you know that it doesn’t recognise Chrome (it says it’s Safari)

  2. Hanspeter says:

    One other thing, often it’s better to check for the rendering engine (Gecko, Trident, Presto, WebKit/KHTML) than the actual browser. Camino, Firefox, and Seamonkey based on the same gecko backend should display pages the same way, so unless you’re wanting to interact with the browser rather than the renderer, there’s no reason to check for Camino vs Firefox vs Seamonkey (same thing with Chrome, Konqueror, Safari).

  3. Devon Young says:

    Oh this is very cool. Thanks. I haven’t messed around with any coding in several months, but I’m trying to tweak it to directly catch Chrome and the Spinn3r bot.

  4. Phurious says:

    You should mention the source when using customized copypaste code.

  5. Danne says:

    To Phurious: Luckily I have. If you actually read the blog post you’ll see the below text including a link to the very source…

    “There is a good article at texSoft.it on how to identify the user agent and I have tried to steal bits and pieces of my time to implement the algorithm.”

  6. bart says:

    Thanks for the script. I used it in my own access_log analyser

  7. Rob Harrigan says:

    Possible fix for matching Chrome, assumes Chrome string proceeds other matches like Safari.

    // Directly catch these
    foreach($products as $product)
    {
    switch($product[0])
    {
    case ‘Firefox’:
    case ‘Netscape’:
    case ‘Chrome’: // find Chrome too
    case ‘Safari’:
    case ‘Camino’:
    case ‘Mosaic’:
    case ‘Galeon’:
    case ‘Opera’:
    $userAgent[0] = $product[0];
    $userAgent[1] = $product[1];
    break 2; // 2 is used to also break out of for loop on first find
    }
    }

  8. IdeasMX says:

    Hi, how can i get the OS of the user??

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>