Getting to grips with an existing XML structure

Very often I find myself writing input filters for large XML files using PHP. Common enough task; and PHP offer a great variety of tools to do this effectively depending on the situation. Unfortunately, almost as common is the lack of documentation for the aforementioned XML files.

Manually trying to document the XML structure is not a fun job. And I have looked around for a simple tool but I didn’t really find a  tool that gave me the quick and dirty overview I wanted. A year or so ago I finally wrote a small PHP class to analyze large XML files.

It is simple, but quickly creates a good overview of the XML structure as well as the kind of data and attributes of the elements in the XML file. Even for a very large XML file it does a good job of showing just the relevant structure of the XML.

It is best described using an example.

Example XML

<?xml version="1.0" encoding="UTF-8" ?>  <company xmlns:dvoidcomp="http://dotvoid.se/schema/company" name="MegaCorp" employeeOfTheMonth="E0003">
<department name="Advanced Technologies" location="NY" number="123">
<employee name="Al Smith" SN="E0004" manager="true">
<title>Tech manager</title>
</employee>
<employee name="John Jones" SN="E0001">
<title>Worker</title>
</employee>
<employee name="Jane Doe" SN="E0003">
<title>Worker</title>
</employee>
</department>
<department name="Resarch &amp; Development" location="JS" number="789">
<employee name="Anders Andersson" SN="E0005">
<title>Lead technologist</title>
</employee>
<employee name="Joe Schmoe" SN="E0008" manager="true"/>
<employee name="Sven Svensson" SN="E0006">
<title>Worker</title>
</employee>
</department>
</company>

Example output in HTML

The output shows the overall structure of all possible elements and all possible attributes of the different elements. All attributes have example values so that you can see what kind of data you can expect. In this example I have also asked for all possible values of the title element to be included. This is good if you know there are XML elements with a limited set of values.

XML Structure

Usage of the PHP class

The class is simple to use. And really the only thing you can do is give the path to an XML file and call parse() with an optional array of XML element names that you want all values for in the HTML output.

s<?php
	use Void\File\XmlStructure;
	require __DIR__ . '/XmlStructure.php';

	$xmlStruct = new XmlStructure();
	$xmlStruct->parse(
		__DIR__ . '/XmlStructure.xml',
		array('title')
	);

?>
<html>
	<head>
		<style type="text/css">
		 <?php echo $xmlStruct->css();?>
		</style>
	</head>
	<body>
	<?echo $xmlStruct->html();?>
	</body>
</html>

Download: xmlgrips.tar.gz

Eventually I might, and might not, put it on Bitbucket or Github. The PHP class is not well documented, but small enough. Bare in mind I wrote it quickly and I only use it as a simple means of boiling down a large XML file to the bare essentials. Hopefully you will find it useful or maybe give you ideas on how to improve it.

Tagged with: ,
Posted in PHP
5 comments on “Getting to grips with an existing XML structure
  1. Kore says:

    Hi,

    May I suggest to take a look at https://github.com/kore/XML-Schema-learner — a simple PHP tool to learn a XML schema from input XML.

    At least if you generate a DTD this should provide you with description of the structure of the XML which should be easy to understand. Depending on your experience the XSD might even provide more insight — more details here: http://kore-nordmann.de/blog/0104_generating_xml_schemas_from_xml.html

  2. Tim Strehle says:

    Hi,

    thanks for posting this!

    I have created a PHP command line script to solve the same problem, but my solution is less beautiful :-)

    If you want to take a look:
    https://github.com/digicol/xml_explorer

    Regards,
    Tim

  3. Danne says:

    Cool to see other ideas around the same problems. What is not mentioned in my post is that this class also support xml namespaces.

  4. Danne says:

    Your solution Kore is a lot more extensive and seems to be a really good tool. I’ll definitely have a closer look at your project later.

  5. Steve Reynolds says:

    A valuable tool. I realized I had a need for such a tool in the shower this morning. I felt sure someone else has already done something similar. I would suggest 3 things to give a better picture of the XML structure shape, as opposed to just the schema: report 1) attributes that appear in the schema optionally, 2) min/max/average occurrence counts for each element within the scope of its parent element, and 3) total occurrence counts. Thank you for sharing.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>