Simplexml load string doesn't work. Processing SimpleXML with PHP

SimpleXML is quite simple and at the same time quite powerful way to process xml data. The essence of simpleXML is that all XML code is converted into a PHP object, which makes it quite easy to work with it. When working with simpleXML, all data must be in UTF-8 encoding.

Most often, the conversion to a PHP object is carried out using the function simplexml_load_file, below are examples of working with it. Also, you can use the function simplexml_load_string, creating a PHP object from an XML string

Let's first create xml file

In the following example, only the price of the second car will be displayed.

To display the entire xml code at once or a single node, the asXML() method is used.

simpleXML also supports addressing using the XPath language. The following example will select all "year" nodes and return an array of them.

Replacing element values ​​is done by simply assigning a value

When replacing nodes that have child nodes, care must be taken as all child nodes will be removed.

In the two previous examples, the xml data located in random access memory, but they were not written to disk. To overwrite data in a file, use the function file_put_contents()

It is also possible to integrate simpleXML and Dom using the Simplexml_import_dom() function

This example will show how to get the value of element attributes.

I'm dealing with a third party PHP library that I can't edit and it's been working fine for almost a year now. It uses simplexml_load_string to respond from the remote server. He's been choking on big replies lately. This is a data feed for real estate objects and also the format looks something like this:

sysid 1 2 3 4 5 6 252370080 Residential 0.160 No ADDR0 06051 252370081 Residential 0.440 Yes ADDR0 06043 252370082 Residential 1.010 No ADDR0 06023 More tab delimited text

I downloaded an example response file (about 22MB), that's where I ended up with my debugging and common sense. Both servers are running PHP Version 5.3.8, but note the different results. I'm pretty sure both files are the same (I guess the different file types, strlen and last 50 chars can be explained with new Windows lines, which have an additional carriage return character). Test scenario:

error_reporting(-1); ini_set("display_errors", 1); $file = "error-example.xml"; $xml = file_get_contents($file); echo "filesize: "; var_dump(filesize($file)); echo "strlen: "; var_dump(strlen($xml)); echo "simplexml object? "; var_dump(is_object(simplexml_load_string($xml))); echo "Last 50 characters: "; var_dump(substr($xml, -50));

Output locally on Windows:

Filesize: int(21893604) strlen: int(21893604) simplexml object? bool(true) Last 50 characters: string(50) "RD DR CT Watertown 203-555-5555 "

Exit to remote server UNIX:

Filesize: int(21884093) strlen: int(21884093) simplexml object? Warning: simplexml_load_string(): Entity: line 9511: parser error: internal error in /path/to/test.php on line 19 Warning: simplexml_load_string(): AULTED CEILING IN FOYER, BRICK FP IN FR, NEW FLOORING IN LR DR FR FOYER KITCHEN in /path/to/test.php on line 19 Warning: simplexml_load_string(): ^in /path/to/test.php on line 19 Warning: simplexml_load_string(): Entity: line 9511: parser error: Extra content at the end of the document in /path/to/test.php on line 19 Warning: simplexml_load_string(): AULTED CEILING IN FOYER, BRICK FP IN FR, NEW FLOORING IN LR DR FR FOYER KITCHEN in /path/to/test.php on line 19 Warning: simplexml_load_string(): ^in /path/to/test.php on line 19 bool(false) Last 50 characters: string(50) "ORD DR CT Watertown 203-555-5555 "

Some responses to comments and additional information:

    The XML itself appears to be valid as far as I can tell (and it does work on my system).

    magic_quotes_runtime is definitely off.

    The working server has libxml version 2.7.7 and the other one is 2.7.6. Can it really make a difference? I couldn't find the libxml changelog, but that seems unlikely.

    This only happens if the response/file is over a certain size, and the error always occurs on the next line.

    I don't run into memory issues, the test script runs instantly.

There are differences in PHP configurations that I can post if I knew which ones were relevant. Any idea what could be the problem, or know of anything else that I can check?

SimpleXMLElement->asXML

SimpleXMLElement->asXML -- Returns a well-formed XML document

Description

Mixed SimpleXMLElement->asXML()

The asXML method generates data in XML version 1.0.

Parameter List
filename
If specified, the method will write data to the specified file.
Return Values
If a filename is specified, the method will write the XML data to the specified file. Otherwise, the method will return the XML data as a string.
Remarks
If the source document specified the encoding of the XML document in the headers using the encoding parameter, then the asXML method will return the XML document in the specified encoding. Changing the encoding of an XML document using the SIMPLEXML extension is not possible.
Examples
Example 1: Output XML

$string =<<

text
stuff


code

XML

echo $xml->asXML(); //textstuff
...?>

The asXML method can also work with Xpath:

Example 2: Using the asXML() method with Xpath

// Continuation of the example above.
/* Search */
$result = $xml->xpath("/a/b/c");
while(list(, $node) = each($result)) (
echo $node->asXML(); // text and stuff
}
?>

SimpleXMLElement->attributes

SimpleXMLElement->attributes -- Returns the element's attributes.

Description

SimpleXMLElement simplexml_element->attributes()

This function returns the names and values ​​of the attributes of the selected xml element. Note: SimpleXML has a rule for adding iterative properties to most methods. They cannot be inspected using var_dump() or any other object parser.

Example 1: Interpreting an XML string

$string =<<
[email protected]

XML
$xml = simplexml_load_string($string);
foreach($xml->users->attributes() as $a => $b) (
echo $a,"="",$b,"\"\n";
}
?>

This example will output:

Name="Evgen"
age="27

SimpleXMLElement->children

SimpleXMLElement->children -- Returns the child elements for the given element

Description

SimpleXMLElement simplexml_element->children()

This method finds the child elements for the given element.

Note: SimpleXML has a rule for adding iterative properties to most methods. They cannot be inspected using var_dump() or any other object parser.

Example 1: Using the children() method

$xml = simplexml_load_string(
"










");
echo "

";
?>

This example will output:

php-help.ru
links.php-help.ru
forum.php-help.ru
server.php-spravka.ruyandex.ru
money.yandex.ru
map.yandex.ru
market.yandex.ru

SimpleXMLElement->xpath

SimpleXMLElement->xpath -- Performs an Xpath query on XML data

Description

Array SimpleXMLElement->xpath (string path)

The xpath method searches for the child elements of the SimpleXML element whose path is specified in the path parameter. The method returns an array of SimpleXMLElement objects.

Example 1. Xpath

$string =<<

text
stuff


code

plain



XML
$xml = simplexml_load_string($string);
/* Search by */
$result = $xml->xpath("/a/b/c");
foreach ($result as $node) (
echo "/a/b/c: " . $node. "
";
}
/* Relative paths also work... */
$result = $xml->xpath("b/c");
foreach ($result as $node) (
echo "b/c: " . $node. "
";
}
?>

This script will output:

/a/b/c: text
/a/b/c:stuffb/c:textb/c:stuff

The two results are the same in this case.

simplexml_import_dom (PHP 5)

simplexml_import_dom -- Returns a SimpleXMLElement object created from a DOM object.

Description

SimpleXMLElement simplexml_import_dom(DOMNode node[, string class_name])

This function takes a DOM object and creates a SimpleXML object based on it.

This new object can be used like a normal SimpleXML object.

If errors occurred during object creation, the method will return false.

Example 1 Import DOM

$dom = new domDocument;
$dom->loadXML(" php-spravka.ru");
if (!$dom) (
echo "Error parsing document!";
exit;
}
$s = simplexml_import_dom($dom);
echo $s->site->url; // php-help.com
?>

simplexml_load_file (PHP 5)

simplexml_load_file -- Interprets an XML file into an object

Description

Object simplexml_load_file(string filename[, string class_name[, int options]])

This function interprets the file filename with well-formed XML data into a SimpleXMLElement object. If there are errors in the XML data, the function will return FALSE.

You can use the optional class_name parameter in the simplexml_load_file() function to have the function return an object of the specified class. In this case, the class must be an extension of the SimpleXMLElement class.

Since PHP 5.1.0 and Libxml 2.6.0 you can use the optional options parameter, the specification of which is described in additional Libxml parameters.

Note: Libxml 2 converts the URL to the correct form. Those. if you want to set a to b&c in the URL string, you don't have to do:

Simplexml_load_file(rawurlencode("http://example.com/?a=" . urlencode("b&c"))).

Since PHP 5.1.0 this is done automatically.

Example 1: Interpreting an XML Document

// The file test.xml contains an XML document with a root element
// and nested title element //title.if (file_exists("test.xml")) (
$xml = simplexml_load_file("test.xml");

Var_dump($xml);
) else (
exit("Error opening test.xml.");
}
?>
This example will output the following: SimpleXMLElement Object(
=> Test header
...
)

In this example, you can refer to the title element as follows: $xml->title.

simplexml_load_string(PHP 5)

simplexml_load_string -- Interprets an XML string into an object

Description

Object simplexml_load_string(string data[, string class_name[, int options]])

This function takes the "correct" XML document in the data string and returns an object of the SimpleXMLElement class that has properties equal to the contents of the xml document. If the XML document has errors, the function will return FALSE.

You can use the optional class_name parameter to have the simplexml_load_string() function return an object of the given class. This class must extend the SimpleXMLElement class.

Starting with PHP 5.1.0 and Libxml 2.6.0, you can also use the optional options parameter whose content is defined in additional Libxml parameters.

Example 1: Transforming an XML String

$string =<<

Forty What?
Joe
Jane



XML
$xml = simplexml_load_string($string);
var_dump($xml);
?>
This example will output: SimpleXMLElement Object(
=> Forty What?
=> Joe
=> Jane
=>
I know that "s the answer -- but what"s the question?
)

In this example, you can also use the $xml->body constructs, etc.

Markup library for parsing XML with PHP

PHP version 5 introduces SimpleXML, a new application programming interface (API) for reading and writing XML. SimpleXML extensions like

$doc->rss->channel->item->title

select elements from the document. As long as you have a good idea of ​​the structure of your document, writing these expressions is easy. However, if you don't know exactly where the elements of interest appear (as is the case with Docbook, HTML, and other similar text documents), SimpleXML can use XPath expressions to find those elements.

Getting started with SimpleXML

Let's say you want to create a PHP page that converts an RSS feed into HTML code. RSS is the main XML format for publishing content from multiple sources. The root element of this document is rss , which contains a single channel element. The channel element contains metadata about the content, including its title, language, and URL. It also contains a variety of text elements nested within item elements. Each item element has a link element that contains a URL or title or description (usually both) that contain readable text. Namespaces are not used. Of course, there is much more to be said about RSS, but for the purposes of this article, the information is enough. shows a typical example with a couple of informational messages.

Listing 1. RSS feed
Mokka mit Schlag http://www.elharo.com/blog en Penn Station: Gone but not Forgotten The old Penn Station in New York was torn down before I was born. Looking at these pictures, that feels like a mistake. The current site is functional, but no more; really just some office towers and underground corridors of no particular interest or beauty. The new Madison Square... http://www.elharo.com/blog/new-york/2006/07/31/penn-station Personal for Elliotte Harold Some people use very obnoxious spam filters that require you to type some random string in your subject such as E37T to get through. Needless to say neither I nor most other people bother to communicate with these paranoids. They are grossly overreacting to the spam problem. Personally I won't... http://www.elharo.com/blog/tech/2006/07/28/personal-for-elliotte-harold/

Let's make a PHP page that formats each RSS feed as HTML. shows the structure of the future page.

Listing 2. Static structure for PHP code
<?php // Заголовок будет читаться из RSS ?>

Parsing an XML document

The first step is to parse the XML document and store it in a variable. This requires writing just one line of code that passes the URL to the simplexml_load_file() function:

$rss = simplexml_load_file("http://partners.userland.com/nytRss/nytHomepage.xml");
Warning

The scheme used here is dangerously far from optimal. I really shouldn't download and parse the RSS feed every time the page is visited. This slows down the readers of the page and is a potential denial of service for the RSS feeds I download, since most of them set their maximum update rate to about once an hour. The real solution to this problem is to cache either the generated HTML page, or the RSS feeds, or both. However, this question is at odds with using the SimpleXML library, so I'm exaggerating a little here.

For this example, I took a page from the Userland channel New York Times at http://partners.userland.com/nytRss/nytHomepage.xml. Of course, you can use any other URL for a different RSS feed instead.

Note that despite the name simplexml_load_ file() , this function will need to parse the XML document at the remote HTTP URL. But that's not the only surprise in this feature. The return value of the function, which is stored here in the $rss variable, does not point to the entire document, as you might expect from experience with other APIs such as the Document Object Model (DOM). Rather, it points to the root element of the document. The content that is in the prologue and epilogue of the document is not accessible from SimpleXML.

Finding the channel name

The name of the entire channel (as opposed to the titles of the individual text portions of that channel) is in the title child element of the channel element, which is descended from the rss root element. You can load this header as if the XML document were just a serialized form of an rss object with a field channel, which in turn would have a field title. Using regular PHP object reference syntax, this statement finds the header:

$title = $rss->channel->title;

Once you find the title, you must add it to the HTML output. It's easy to do: repeat the $title variable:

<?php echo $title; ?>

This line outputs the element's string value, but not the entire element. That is, the text is recorded, but the tags are not.

You can even omit the $title intermediate variable entirely:

<?php echo $rss->channel->title; ?>

Because this page reuses this value in many places, I find it more convenient to store it as a descriptively titled variable.

Iterating through elements

$rss->channel->item

However, channels usually contain more than one element. Or they may not even exist at all. Accordingly, this statement returns an array, which you can iterate with a for-each loop:

foreach ($rss->channel->item as $item) ( echo "

". $item->title. "

"; echo "

". $item->description. "

"; }
Listing 3. A simple but complete PHP RSS reader
channel->title; ?> <?php echo $title; ?>

channel->item as $item) ( echo "

link. "">".$item->title. "

"; echo "

". $item->description. "

"; } ?>

That's all it takes to write a simple program RSS Readers in PHP - Multiple HTML strings and a few lines of PHP. Not including spaces, a total of 20 lines. Of course, this is not the most feature rich, optimized or reliable development. Let's see what we can do to fix this.

Error processing

Not all RSS feeds are as well-formed as they should be. The XML specification requires processors to stop processing documents as soon as a formal error is detected, and SimpleXML conforms to the XML processing program. However, this won't help you much when an error is found. Typically, the program writes a warning to the php error file (but no detailed error message) and the simplexml-load-file() function throws an error. If you're not sure if the file you're parsing is well-built, check for this error before using the file's data, as shown in .

Listing 4. Beware of malformed input
xpath("//title") as $title) ( echo "

". $title. "

"; ) ) else ( echo "Oops! The input is malformed!"; ) ?>

Another common mistake happens when a document is well-formatted but doesn't contain the elements you expect where you expect them to be. What happens, for example, to such an expression $doc->rss->channel->item->title , when the element group does not have a title (as happens with at least one of the hundred most frequent RSS feeds)? The simplest approach is to always treat the return value of a function as an array of data and wrap it in a loop. In this case, you are protected from the fact that there are more or fewer elements than you expected. However, if you know that you want the first element in the document, even if there is more than one, you can query for it via a zero-based index. For example, to request the title of the first group of elements, you could write:

$doc->rss->channel->item->title

If the first group of elements is absent, or has no name, it is treated in the same way as any other out-of-bounds index, in PHP array. That is, the result is null, which turns into an empty string when you try to paste it into the HTML output code.

Recognizing and rejecting unexpected formats that you are not prepared to work with is usually the domain of a validating XML parser. However, SimpleXML cannot validate against a DTD template (DTD) or data schema. It checks only for formal correctness.

How to work with namespace

Many sites today are moving from RSS to Atom. shows an example document in Atom. This document is identical in many ways to the RSS example. However, there is more metadata here, and the root element is feed instead of rss . The feed element has lists instead of elements. The content element replaces the description element. More importantly, an Atom document uses a namespace while RSS does not. Thus, an Atom document can output real, uncut Extensible HTML (XHTML) content.

Listing 5. Document in Atom
2006-08-04T16:00:04-04:00 http://www.cafeconleche.org/ Cafe con Leche XML News and Resources Copyright 2006 Elliotte Rusty Harold Steve Palmer has posted a beta of Vienna 2.1, an open source RSS/Atom client <a href="https://zhumor.ru/en/internet/zashchishchaet-li-kasperskii-ot-novogo-virusa-kaspersky-anti-virus-for-mac-a-nuzhen-li-on.html">for Mac</a> OSX.

Steve Palmer has posted a beta of Vienna 2.1, an open source RSS/Atom client for Mac OS X. Vienna is the first reader I "ve found acceptable for daily use; not great but good enough. (Of course my standards for "good enough" are pretty high.) 2.1 focuses on improving the user interface with a unified layout that lets you scroll through several articles, article filtering (e.g. read all articles since the last refresh), manual folder reordering, a new get info window, and an improved condensed layout.

http://www.cafeconleche.org/#August_1_2006_25279 2006-08-01T07:01:19Z
Matt Mullenweg has released Wordpress 2.0.4, a blog engine based on PHP and MySQL.

Matt Mullenweg has released Wordpress 2.0.4, a blog engine based on PHP and MySQL. 2.0.4 plugs various security holes, mostly involving plugins.

http://www.cafeconleche.org/#August_1_2006_21750 2006-08-01T06:02:30Z

Although the element names have changed, the basic approach to working with SimpleXML in documents in Atom is the same as with RSS. The only difference is that you need to specify the namespace, i.e. Uniform Resource Identifier (URI) when you request an element with a name, just like a local name. This is a two-step process: first, request child elements in the given namespace by passing the namespace URI to the children() function. Then query for elements with the correct local name in that namespace. Imagine that you first loaded the Atom feed into the $feed variable, like this:

$feed = simplexml_load_file("http://www.cafeconleche.org/today.atom");

These two lines now find the title element:

$children = $feed->children("http://www.w3.org/2005/Atom"); $title = $children->title;

You can condense this code into a single statement if you like, although the string gets a bit long. All other elements in namespaces should be handled in the same way. shows full page PHP that displays headers from a named Atom pipe.

Listing 6. A simple PHP header reader Atom
children("http://www.w3.org/2005/Atom"); $title = $children->title; ?> <?php echo $title; ?>

entry; foreach ($entries as $entry) ( $details = $entry->children("http://www.w3.org/2005/Atom"); echo "

". $details->title. "

"; } ?>

Mixed content

Why did I only show the headers in this example? Because in Atom, the content of any list can contain the full text of the fragment and not only the text itself, but also the entire markup. It - narrative structure: words in a row are meant to be read by humans. As with most data of this kind, there is mixed content here. XML is no longer simplified, and so the SimpleXML approach is starting to falter. It cannot work correctly with mixed content, and this data omission precludes use in many cases.

You can do one thing, but this is only a partial solution to the problem. It will only work because the content element contains real XHTML. You can copy this XHTML as unparsed source directly to final product, using the asXML() function, for example, as follows:

echo "

". $details->content->asXML(). "

";

The result will be something like .

Listing 7. Weekend XML data

Nikolai Grigoriev has released SVGMath 0.3, a presentation MathML formatter that produces SVG written in pure Python and published under an MIT license. According to Grigoriev, "The new version can work with multiple-namespace documents (e.g. replace all MathML subtrees with SVG in an XSL-FO or XHTML document); configuration is made more flexible, and several bugs are fixed. There is also a stylesheet to adjust the vertical position of the resulting SVG image in XSL-FO."

It's not pure XHTML. The content element is retrieved from the Atom of the document, and you're actually better off not having it. Even worse, it enters the wrong namespace so it can't be recognized for what it is. Fortunately, this extra element doesn't hurt much in practice, because Web browsers simply ignore any tags they don't recognize. The finished document is faulty, but that doesn't really matter. If it really bothers you, nullify it with string operations, like this:

$description = $details->content->asXML(); $tags = array(" ", ""); $notags = array("", ""); $description = str_replace($tags, $notags, $description);

To make the code a little more robust, use a regular expression rather than assuming that the start tag is exactly as shown above. Especially you can calculate many possible attributes:

// end tag has a fixed shape so it's easy to replace $description = str_replace("", "", $description); // remove the start tag, including attributes and white space if possible $description = ereg_replace(" ]*>", "", $description);

Even with this modification, your code can issue comments, computation commands, and CDATA chunks. Either way, you're cutting it off, though I'm afraid it's not that easy anymore. Mixed content simply crosses the boundaries that SimpleXML was designed to work within.

XPath

Expressions like $rss->channel->item->title are great only if you know exactly what elements are in the document and exactly where they are. However, you don't always know this. For example, in XHTML, elements in the header (h1 , h2 , h3 , etc.) can be children of body , div , table , and several other elements. Moreover, div , table , blockquote and other elements can be nested multiple times. For many less specific use cases, it's easier to use XPath expressions such as //h1 or //h1 . SimpleXML has this set functionality via the xpath() function.

Listing 9. Using XPath with namespaces
$atom = simplexml_load_file("http://www.cafeconleche.org/today.atom"); $atom->registerXPathNamespace("atm", "http://www.w3.org/2005/Atom"); $titles = $atom->xpath("//atm:title"); foreach ($titles as $title) ( echo "

". $title. "

"; }

One final warning: XPath in PHP is quite slow. Page loading can take anywhere from a moment to a few seconds when I switch to this XPath expression, even on an unloaded local server. If you are using these tricks, you must use some sort of caching to work intelligently. Dynamically generating every page just won't work.

Conclusion

SimpleXML is a useful addition to the PHP developer's toolkit, as long as you don't need to work with mixed content. It covers a large number of use cases. This works especially well with simple data in the form of records. As long as the document isn't too deep, too complex, and doesn't have mixed content, SimpleXML is much simpler than its DOM alternative. It also helps if you know the structure of the document ahead of time, although using XPath can help ease this requirement a lot. Lack of validation and support for mixed content is a nuisance, not always a hindrance. Many simple formats do not have mixed content, and in many use cases only very predictable data formats are used. If that's what characterizes your work, feel free to give SimpleXML a try. With a little attention to error and a bit of cache tuning effort to keep performance issues to a minimum, SimpleXML can be a robust, error-tolerant, XML processing tool within PHP.

Parsing XML essentially means going through an XML document and returning the corresponding data. And although everything more web services returns data to json format, but still most people still use XML, so it's important to master XML parsing if you want to use the full range of available APIs.

Using the extension SimpleXML in PHP, which was added back in PHP 5.0, working with XML is very easy and simple. In this article, I will show you how to do it.

Usage Basics

Let's start with the following example languages.xml:


>

> 1972>
> Dennis Ritchie >
>

> 1995>
> Rasmus Lerdorf >
>

> 1995>
> James Gosling >
>
>

This XML document contains a list of programming languages ​​with some information about each language: the year of its implementation and the name of its creator.

The first step is to load the XML using the functions either simplexml_load_file(), or simplexml_load_string(). As the name of the functions implies, the first one will load XML from a file, and the second one will load XML from a string.

Both functions read the entire DOM tree into memory and return an object SimpleXMLElement. In the example above, the object is stored in the $languages ​​variable. You can use functions var_dump() or print_r() to get detailed information about the returned object, if you like.

SimpleXMLElement Object
[lang] => Array
[ 0 ] => SimpleXMLElementObject
[@attributes] => Array
[name] => C
[appeared] => 1972
[ creator] => Dennis Ritchie
[ 1 ] => SimpleXMLElement Object
[@attributes] => Array
[name] => PHP
[appeared] => 1995
[ creator] => Rasmus Lerdorf
[ 2 ] => SimpleXMLElement Object
[@attributes] => Array
[name] => Java
[appeared] => 1995
[ creator] => James Gosling
)
)

This XML contains the root element languages, which contains three elements lang. Each array element corresponds to an element language in an XML document.

You can access the properties of an object using the operator -> . For example, $languages->lang will return you a SimpleXMLElement object that matches the first element language. This object contains two properties: appeared and creator.

$languages ​​-> lang [ 0 ] -> appeared ;
$languages ​​-> lang [ 0 ] -> creator ;

Displaying a list of languages ​​and displaying their properties is very easy with a standard loop such as foreach.

foreach ($languages ​​-> lang as $lang ) (
printf (
"" ,
$lang["name"] ,
$lang -> appeared ,
$lang -> creator
) ;
}

Notice how I accessed the lang element's attribute name to get the name of the language. This way you can access any attribute of an element represented as a SimpleXMLElement object.

Working with namespaces

While working with the XML of various web services, you will often encounter element namespaces. Let's change our languages.xml to show an example of using a namespace:



xmlns:dc =>

> 1972>
> Dennis Ritchie >
>

> 1995>
> Rasmus Lerdorf >
>

> 1995>
> James Gosling >
>
>

Now element creator placed in the namespace dc, which points to http://purl.org/dc/elements/1.1/. If you try to print the language creators using our previous code, it won't work. In order to read element namespaces you need to use one of the following approaches.

The first approach is to use the URI names directly in the code when referring to the element's namespace. The following example shows how this is done:

$dc = $languages ​​-> lang [ 1 ] -> children( "http://purl.org/dc/elements/1.1/") ;
echo $dc -> creator ;

Method children() takes a namespace and returns child elements that start with a prefix. It takes two arguments, the first is the XML namespace and the second is an optional argument that defaults to false. If the second argument is set to TRUE, the namespace will be treated as a prefix. If FALSE, then the namespace will be treated as the URL namespace.

The second approach is to read the URI names from the document and use them when referring to the element's namespace. This is actually the best way to access elements because you don't have to be hardcoded into a URI.

$namespaces = $languages ​​-> getNamespaces (true ) ;
$dc = $languages ​​-> lang [ 1 ] -> children ($namespaces [ "dc" ] ) ;

echo $dc -> creator ;

Method GetNamespaces() returns an array of prefix names and their associated URIs. It takes an additional parameter which defaults to false. If you install it like true, then this method will return the names used in parent and child nodes. Otherwise, it finds namespaces used only in the parent node.

Now you can iterate through the list of languages ​​like this:

$languages ​​= simplexml_load_file ("languages.xml" ) ;
$ns = $languages ​​-> getNamespaces (true ) ;

foreach ($languages ​​-> lang as $lang ) (
$dc = $lang -> children ($ns [ "dc" ] ) ;
printf (
"

%s appeared in %d and was created by %s .

" ,
$lang["name"] ,
$lang -> appeared ,
$dc -> creator
) ;
}

Case Study - Parsing a YouTube Video Channel

Let's look at an example that receives an RSS feed from a YouTube channel and displays links to all videos from it. To do this, please contact the following address:

http://gdata.youtube.com/feeds/api/users/xxx/uploads

The URL returns a list of the latest videos from the given channel in XML format. We will parse the XML and get the following information for each video:

  • Link to video
  • Miniature
  • Name

We'll start by searching and loading the XML:

$channel = "ChannelName" ;
$url = "http://gdata.youtube.com/feeds/api/users/". $channel. "/uploads" ;
$xml = file_get_contents ($url ) ;

$feed = simplexml_load_string ($xml ) ;
$ns = $feed -> getNameSpaces (true ) ;

If you look at the XML feed, you can see that there are several elements there. entity, each of which stores detailed information about a specific video from the channel. But we only use image thumbnails, video address and title. These three elements are children of the element group, which in turn is a child of entry:

>

>



Title... >

>

>

We'll just go through all the elements entry, and extract the necessary information for each of them. note that player, thumbnail and title are in the media namespace. Thus, we must proceed as in the previous example. We get the names from the document and use the namespace when referring to the elements.

foreach ($feed -> entry as $entry ) (
$group = $entry -> children ($ns [ "media" ] ) ;
$group = $group -> group ;
$thumbnail_attrs = $group -> thumbnail [ 1 ] -> attributes () ;
$image = $thumbnail_attrs [ "url" ] ;
$player = $group -> player -> attributes () ;
$link = $player["url"] ;
$title = $group -> title ;
printf ( "

" ,
$player , $image , $title ) ;
}

Conclusion

Now that you know how to use SimpleXML to parse XML data, you can improve your skills by parsing different XML feeds with different APIs. But it's important to keep in mind that SimpleXML reads the entire DOM into memory, so if you're parsing a large dataset, you may run out of memory. To learn more about SimpleXML read the documentation.


If you have any questions, please use our

Internet