Parsing the user agent string using PHP

Recently I experimented a bit with an Apache log file analyzer written in PHP. It’s not all that difficult were it not for trying to parse the browser, or user agent, string. There are in fact two RFC documents, RFC 1945 and RFC 2068, that define how a user agent string should be written. Still many does not adhere to these standards and many write the interesting details in the comment field only. There is a good article at texSoft.it on how to identify the user agent and I have tried to steal bits and pieces of my time to implement the algorithm.

For my purposes I don’t care much for the operating system details. This is the result so far. I’m still not very satisfied but I thought maybe other people might be interested and maybe help out. Maybe there is something better out there? I’d be happy for any input.

function parseUserAgent($ua)
  {

    $userAgent = array();
    $agent = $ua;
    $products = array();

    $pattern  = "([^/[:space:]]*)" . "(/([^[:space:]]*))?"
      ."([[:space:]]*\[[a-zA-Z][a-zA-Z]\])?" . "[[:space:]]*"
      ."(\\((([^()]|(\\([^()]*\\)))*)\\))?" . "[[:space:]]*";

    while( strlen($agent) > 0 )
      {
        if ($l = ereg($pattern, $agent, $a))
          {
            // product, version, comment
            array_push($products, array($a[1],    // Product
                                        $a[3],    // Version
                                        $a[6]));  // Comment
            $agent = substr($agent, $l);
          }
        else
          {
            $agent = "";
          }
      }

    // Directly catch these
    foreach($products as $product)
      {
        switch($product[0])
          {
          case 'Firefox':
          case 'Netscape':
          case 'Safari':
          case 'Camino':
          case 'Mosaic':
          case 'Galeon':
          case 'Opera':
            $userAgent[0] = $product[0];
            $userAgent[1] = $product[1];
            break;
          }
      }

    if (count($userAgent) == 0)
      {
        // Mozilla compatible (MSIE, konqueror, etc)
        if ($products[0][0] == 'Mozilla' &&
            !strncmp($products[0][2], 'compatible;', 11))
          {
            $userAgent = array();
            if ($cl = ereg("compatible; ([^ ]*)[ /]([^;]*).*",
                           $products[0][2], $ca))
              {
                $userAgent[0] = $ca[1];
                $userAgent[1] = $ca[2];
              }
            else
              {
                $userAgent[0] = $products[0][0];
                $userAgent[1] = $products[0][1];
              }
          }
        else
        {
          $userAgent = array();
          $userAgent[0] = $products[0][0];
          $userAgent[1] = $products[0][1];
        }
      }

    return $userAgent;
  }
PHP

If you enjoyed this post, please consider to leave a comment or subscribe to the feed and get future articles delivered to your feed reader.

Comments

5 Responses to “Parsing the user agent string using PHP”

Leave Comment

(required)

(required)