Page 329 - Open Soource Technologies 304.indd
P. 329
Unit 13: Extensible Markup Language
<?xml version=’1.0’ encoding=’ISO-8859-1’ ?> Notes
<!DOCTYPE rss PUBLIC ‘-//Netscape Communications//DTD RSS 0.91//EN’
‘http://my.netscape.com/publish/formats/rss-0.91.dtd’>
<rss version=”0.91”>
<channel>
<item>
<title>Man Bites Dog</title>
<link>http://www.example.com/dog.php</link>
<description>Ironic turnaround!</description>
<language>en-us</language>
</item>
<item>
<title>Medical Breakthrough!</title>
<link>http://www.example.com/doc.php</link>
<description>Doctors announced a cure for me.</description>
<language>en-us</language>
</item>
</channel>
</rss>
13.4 Parsing XML
Say you have a collection of books written in XML, and you want to build an index showing the
document title and its author. You need to parse the XML files to recognize the title and author
elements and their contents. You could do this by hand with regular expressions and string
functions such as strtok( ), but it is a lot more complex than it seems. The easiest and quickest
solution is to use the XML parser that ships with PHP.
PHP’s XML parser is based on the Expat C library, which lets you parse but not validate XML
documents. This means you can find out which XML tags are present and what they surround,
but you cannot find out if they are the right XML tags in the right structure for this type of
document. In practice, this is not generally a big problem.
In the following we discuss the handlers you can provide, the functions to set the handlers,
and the events that trigger the calls to those handlers. We also provide sample functions for
creating a parser to generate a map of the XML document in memory, tied together in a sample
application that pretty-prints XML.
13.4.1 Element Handlers
When the parser encounters the beginning or end of an element, it calls the start and end element
handlers. You set the handlers through the xml_set_element_handler( ) function:
xml_set_element_handler(parser, start_element, end_element);
The start_element and end_element parameters are the names of the handler functions.
LOVELY PROFESSIONAL UNIVERSITY 323