Parsing XML with SimpleXML. What are XML parsers for and how they can be useful How to parse an uploaded xml file in php

Many examples in this reference require an XML string. Instead of repeating this string in every example, we put it into a file which we include in each example. This included file is shown in the following example section. alternatively, you could create an XML document and read it with simplexml_load_file().

Example #1 Include file example.php with XML string

$xmlstr =<<


PHP: Behind the Parser


Ms. coder
Onlivia Actora


Mr. coder
El ActÓr


So, this language. It "s like, a programming language. Or is it a
scripting language? All is revealed in this thrilling horror spoof
of a documentary.




7
5


XML
?>

The simplicity of SimpleXML appears most clearly when one extracts a string or number from a basic XML document.

Example #2 Getting started

include "example.php" ;

echo $movies -> movie [ 0 ]-> plot ;
?>

So, this language. It "s like, a programming language. Or is it a scripting language? All is revealed in this thrilling horror spoof of a documentary.

Accessing elements within an XML document that contain characters not permitted under PHP's naming convention (e.g. the hyphen) can be accomplished by encapsulating the element name within braces and the apostrophe.

Example #3 Getting started

include "example.php" ;

echo $movies -> movie ->( "great-lines" )-> line ;
?>

The above example will output:

PHP solves all my web problems

Example #4 Accessing non-unique elements in SimpleXML

When multiple instances of an element exist as children of a single parent element, normal iteration techniques apply.

include "example.php" ;

$movies = new SimpleXMLElement($xmlstr );

/* For each node, we echo a separate . */
foreach ($movies -> movie -> characters -> character as $character ) (
echo $character -> name , " played by " , $character -> actor , PHP_EOL ;
}

?>

The above example will output:

Properties ( $movies->movies in previous example) are not arrays. They are iterable and accessible objects.

Example #5 Using attributes

So far, we have only covered the work of reading element names and their values. SimpleXML can also access element attributes. Access attributes of an element just as you would elements of an array.

include "example.php" ;

$movies = new SimpleXMLElement($xmlstr );

/* Access the nodes of the first movie.
* Output the rating scale, too. */
foreach ($movies -> movie [ 0 ]-> rating as $rating ) (
switch((string) $rating [ "type" ]) ( // Get attributes as element indices
case "thumbs" :
echo $rating , " thumbs up" ;
break;
case "stars" :
echo $rating , "stars" ;
break;
}
}
?>

The above example will output:

7 thumbs up5 stars

Example #6 Comparing Elements and Attributes with Text

To compare an element or attribute with a string or pass it into a function that requires a string, you must cast it to a string using (string). Otherwise, PHP treats the element as an object.

include "example.php" ;

$movies = new SimpleXMLElement($xmlstr );

if ((string) $movies -> movie -> title == "PHP: Behind the Parser" ) {!}
print "My favorite movie." ;
}

echo htmlentities ((string) $movies -> movie -> title );
?>

The above example will output:

My favorite movie.PHP: Behind the Parser

Example #7 Comparing Two Elements

Two SimpleXMLElements are considered different even if they point to the same element since PHP 5.2.0.

include "example.php" ;

$movies1 = new SimpleXMLElement($xmlstr );
$movies2 = new SimpleXMLElement($xmlstr );
var_dump ($movies1 == $movies2 ); // false since PHP 5.2.0
?>

The above example will output:

Example #8 Using XPath

SimpleXML includes built-in XPath support. To find all elements:

include "example.php" ;

$movies = new SimpleXMLElement($xmlstr );

foreach ($movies -> xpath("//character" ) as $character ) (
echo $character -> name , " played by " , $character -> actor , PHP_EOL ;
}
?>

"// " serves as a wildcard. To specify absolute paths, omit one of the slashes.

The above example will output:

Ms. Coder played by Onlivia Actora Mr. Coder played by El ActÓr

Example #9 Setting values

Data in SimpleXML doesn't have to be constant. The object allows for manipulation of all of its elements.

include "example.php" ;
$movies = new SimpleXMLElement($xmlstr );

$movies -> movie [ 0 ]-> characters -> character [ 0 ]-> name = "Miss Coder" ;

echo $movies -> asXML();
?>

The above example will output:

PHP: Behind the Parser Miss Coder Onlivia Actora Mr. coder El ActÓr PHP solves all my web problems 7 5

Example #10 Adding elements and attributes

Since PHP 5.1.3, SimpleXML has had the ability to easily add children and attributes.

include "example.php" ;
$movies = new SimpleXMLElement($xmlstr );

$character = $movies -> movie [ 0 ]-> characters -> addChild("character" );
$character -> addChild("name" , "Mr. Parser" );
$character -> addChild("actor" , "John Doe" );

$rating = $movies -> movie [ 0 ]-> addChild("rating" , "PG" );
$rating -> addAttribute ("type" , "mpaa" );

echo $movies -> asXML();
?>

The above example will output:

PHP: Behind the Parser Ms. coder Onlivia Actora Mr. coder El ActÓr Mr. parserJohn Doe So, this language. It "s like, a programming language. Or is it a scripting language? All is revealed in this thrilling horror spoof of a documentary. PHP solves all my web problems 7 5 PG

Example #11 DOM Interoperability

PHP has a mechanism to convert XML nodes between SimpleXML and DOM formats. This example shows how one might change a DOM element to SimpleXML.

$dom = new DOMDocument ;
$dom -> loadXML( "blah" );
if (! $dom ) (
echo "Error while parsing the document";
exit;
}

$books = simplexml_import_dom($dom );

echo $books -> book [ 0 ]-> title ;
?>

The above example will output:

4 years ago

There is a common "trick" often proposed to convert a SimpleXML object to an array, by running it through json_encode() and then json_decode(). I "d like to explain why this is a bad idea.

Most simply, because the whole point of SimpleXML is to be easier to use and more powerful than a plain array. For instance, you can writebar -> baz [ "bing" ] ?> and it means the same thing asbar [ 0 ]-> baz [ 0 ][ "bing" ] ?> , regardless of how many bar or baz elements there are in the XML; and if you writebar [ 0 ]-> baz [ 0 ] ?> you get all the string content of that node - including CDATA sections - regardless of whether it also has child elements or attributes. You also have access to namespace information, the ability to make simple edits to the XML, and even the ability to "import" into a DOM object, for much more powerful manipulation. All of this is lost by turning the object into an array rather than reading understanding the examples on this page.

Additionally, because it is not designed for this purpose, the conversion to JSON and back will actually lose information in some situations. For instance, any elements or attributes in a namespace will simply be discarded, and any text content will be discarded if an element also has children or attributes. Sometimes, this won't matter, but if you get in the habit of converting everything to arrays, it's going to sting you eventually.

Of course, you could write a smarter conversion, which didn't have these limitations, but at that point, you are getting no value out of SimpleXML at all, and should just use the lower level XML Parser functions, or the XMLReader class, to create your structure. You still won't have the extra convenience functionality of SimpleXML, but that's your loss.

9 years ago

If you need to output valid xml in your response, don"t forget to set your header content type to xml in addition to echoing out the result of asXML():

$xml = simplexml_load_file("...");
...
... xml stuff
...

//output xml in your response:
header("Content-Type: text/xml");
echo $xml -> asXML();
?>

1 year ago

If your xml string contains booleans encoded with "0" and "1", you will run into problems when you cast the element directly to bool:

$xmlstr =<<

1
0

XML
$values ​​= new SimpleXMLElement($xmlstr);
$truevalue = (bool)$values->truevalue; // true
$falsevalue = (bool)$values->falsevalue; // also true!!!

Instead of you need to cast to string or int first:

$truevalue = (bool)(int)$values->truevalue; // true
$falsevalue = (bool)(int)$values->falsevalue; // false

9 years ago

From the README file:

SimpleXML is meant to be an easy way to access XML data.

SimpleXML objects follow four basic rules:

1) properties denote element iterators
2) numeric indices denote elements
3) non numeric indices denote attributes
4) string conversion allows to access TEXT data

When iterating properties then the extension always iterates over
all nodes with that element name. Thus method children() must be
called to iterate over subnodes. But also doing the following:
foreach ($obj->node_name as $elem) (
// do something with $elem
}
always results in iteration of "node_name" elements. So no further
check is needed to distinguish the number of nodes of that type.

When an elements TEXT data is being accessed through a property
then the result does not include the TEXT data of subelements.

Known issues
============

Due to engine problems it is currently not possible to access
a subelement by index 0: $object->property.

8 years ago

A quick tip on xpath queries and default namespaces. It looks like the XML-system behind SimpleXML has the same workings as I believe the XML-system .NET uses: when one needs to address something in the default namespace, one will have to declare the namespace using registerXPathNamespace and then use its prefix to address the otherwise in the default namespace living element.

$string =<<

Forty What?
Joe
Jane

I know that "s the answer -- but what"s the question?


XML

$xml = simplexml_load_string ($string );
$xml -> registerXPathNamespace("def" , "http://www.w3.org/2005/Atom");

$nodes = $xml -> xpath("//def:document/def:title" );

?>

8 years ago

Using stuff like: is_object($xml->module->admin) to check if there actually is a node called "admin", doesn't seem to work as expected, since simplexml always returns an object- in that case an empty one - even if a particular node does not exist.
For me good old empty() function seems to work just fine in such cases.

9 years ago

While SimpleXMLElement claims to be iterable, it does not seem to implement the standard Iterator interface functions like::next and::reset properly. Therefore while foreach() works, functions like next(), current(), or each() don"t seem to work as you would expect -- the pointer never seems to move or keeps getting reset.

5 years ago

If the XML document's encoding is other than UTF-8, the encoding declaration must come immediately after version="..." and before standalone="...". This is a requirement of the XML standard.

If encoding XML-document differs from UTF-8. Encoding declaration should follow immediately after the version = "..." and before standalone = "...". This requirement is standard XML.


Ok

Russian language. English language
Fatal error: Uncaught exception "Exception" with message "String could not be parsed as XML" in...

XML parser is a program that extracts from source file xml format and saves or uses for subsequent actions.

Why are xml parsers needed?

First of all, because the xml format itself is popular among computer standards. The XML file looks like this:

those. in fact there are tags, there are some rules which tags should follow each other.

Reason for popularity xml file ov lies in the fact that it is well readable by a person. And the fact that it is relatively easy to process in programs.

Cons of xml files.

The downside is primarily a large number of disk space occupied by this data. Due to the fact that tags that are constantly repeated, with large amounts of data, they take up relatively many megabytes, which simply need to be downloaded from the source and then processed. Are there alternatives? There are, of course, but still, xml and xml parsers are one of the simplest and most reliable and technologically popular formats today.

How are XML parsers written?

Parsers are written in programming languages. As they say, they are written for everyone, but not for some anymore. It should be understood that there are programming languages ​​that already have built-in libraries for parsing xml files. But in any case, even if there is no library, you can always find a suitable library for this case and use it to extract data from a file.

Globally, there are 2 different approaches to parsing xml files.

The first is to load the xml file completely into memory and then do data extraction manipulations.

The second is the streaming option. In this case, the programming language defines certain tags to which the functions of the created xml parser need to respond, and the programmer himself decides what to do if a particular tag is found.

The advantage of the first approach is speed. I immediately loaded the file, then quickly ran through the memory and found what I needed and most importantly, it was easy to program. but there is a minus and a very important one - this is

a large amount of memory is required for operation. Sometimes, I would even say it often happens that it is simply impossible to process and parse an xml file, i.e. create an xml parser so that it works correctly according to the first method. Why is that? Well, for example, the limitation for 32-bit applications under Windows allows the program to occupy a maximum of 2 gigabytes of memory - no more.

However, programming inline is difficult. The complexity with a sufficiently serious extraction grows many times, which accordingly affects both the timing and the budget.

Validity of xml files and parsers.

Everything would be fine with xml files and xml parsers, but there is a problem. In view of the fact that "any schoolboy" can create an xml file, but in reality it is (because a lot of code is written by schoolchildren, invalid files appear, i.e. incorrect ones. What does this mean and what is it fraught with? The biggest problem , this is that it is simply impossible sometimes to correctly parse an invalid file.For example, its tags are not closed as expected by the standard, or for example, the encoding is set incorrectly.Another problem is that if, for example, you make a parser on .net, you can create so-called wrappers , and the most annoying thing is that you make such a wrapper, and then you read the file that the "student" created, and the file is invalid and impossible to read. Therefore, you have to get angry and resort to very, very unpopular options for parsing such files. \u003d because many people create xml files without using standard libraries and with complete disgust for all xml file standards. It is difficult for customers to explain this. They are waiting for the result - an xml parser that converts data from the original file to another format.

How to create xml parsers (first option)

There is a query language for XML data like XPath. This language has two editions, we will not delve into the features of each version. A better understanding of this language will show examples of how to use it to extract data. For example.

//div[@class="supcat guru"]/a

what this request does. It takes all a tags that have a backbone that contains the text catalog.xml?hid= and that a tag must be a child div whose class is supcat guru.

Yes, for the first time it may not be clear enough, but you can still figure it out if you want. The starting point for me is http://ru.wikipedia.org/wiki/XPath and I advise you.


publication of this article is allowed only with a link to the site of the author of the article

In this article, I will show an example of how to parse a large XML file. If your server (hosting) is not prohibited from increasing the script running time, then you can parse an XML file weighing at least gigabytes, I personally parsed only files from ozone weighing 450 megabytes.

There are two problems when parsing large XML files:
1. Not enough memory.
2. There is not enough allocated time for the script to work.

The second problem with time can be solved if this is not prohibited by the server.
But the problem with memory is difficult to solve, even if we are talking about your own server, then moving files of 500 megabytes is not very easy, and even on hosting and on VDS, you simply cannot increase the memory.

PHP has several built-in XML processing options - SimpleXML, DOM, SAX.
All of these options are detailed in many example articles, but all of the examples show how to work with a complete XML document.

Here is one example, we get an object from an XML file

Now you can process this object, BUT...
As you can see, the entire XML file is read into memory, then everything is parsed into an object.
That is, all data gets into memory, and if the allocated memory is not enough, then the script stops.

For processing large files this option is not suitable, you need to read the file line by line and process this data in turn.
At the same time, the validity check is also carried out as the data is processed, so you need to be able to roll back, for example, delete all the database entered in the case of a non-valid XML file, or make two passes through the file, first read for validity, then read for processing data.

Here is a theoretical example of parsing a large XML file.
This script reads one character from a file, collects this data into blocks and sends it to the XML parser.
This approach completely solves the memory problem and does not cause a load, but exacerbates the problem over time. How to try to solve the problem over time, read below.

Function webi_xml($file)
{

########
### data handling function

{
print $data ;
}
############################################



{
print $name ;
print_r($attrs);
}


## closing tag function
function endElement ($parser , $name )
{
print $name ;
}
############################################

($xml_parser , "data" );

// open file
$fp = fopen($file , "r" );

$perviy_vxod = 1 ; $data = "" ;



{

$simvol = fgetc($fp); $data .= $simvol ;


if($simvol != ">" ) ( continue;)


echo "

break;
}

$data = "" ;
}
fclose($fp);

webi_xml("1.xml");

?>

In this example, I put everything into one webi_xml () function and its call is visible at the very bottom.
The script itself consists of three main functions:
1. A function that catches the opening of the startElement() tag
2. A function that catches the closing of the endElement() tag
3. And receiving function data() .

Let's assume that the content of file 1.xml is some recipe



< title >simple bread
< ingredient amount = "3" unit = "стакан" >Flour
< ingredient amount = "0.25" unit = "грамм" >Yeast
< ingredient amount = "1.5" unit = "стакан" >warm water
< ingredient amount = "1" unit = "чайная ложка" >Salt
< instructions >
< step > Mix all ingredients and knead thoroughly.
< step > Cover with a cloth and leave for one hour in a warm room.
< step > Knead again, put on a baking sheet and put in the oven.
< step > Visit site site


We start by calling the generic function webi_xml("1.xml");
Further in this function, the parser starts and all tag names are translated into uppercase so that all tags have the same case.

$xml_parser = xml_parser_create();
xml_parser_set_option ($xml_parser , XML_OPTION_CASE_FOLDING , true );

Now we specify which functions will work to catch the opening of the tag, closing and processing data

xml_set_element_handler($xml_parser , "startElement" , "endElement" );
xml_set_character_data_handler($xml_parser , "data" );

Next comes the opening of the specified file, iterate over the file one character at a time and each character is added to the string variable until the character is found > .
If this is the very first access to the file, then everything that is superfluous at the beginning of the file will be deleted along the way, everything that stands before , this is the tag XML should start with.
The first time a string variable will collect a string

And send it to the parser
xml_parse ($xml_parser , $data , feof ($fp ));
After processing the data, the string variable is reset and the collection of data into a string begins again and a string is formed for the second time

In the third
</b><br>in the fourth <br><b>simple bread

Please note that the string variable is always formed by the completed tag > and it is not necessary to send open and closed tags with data to the decomposer, for example
simple bread
It is important for this handler to get a whole unbroken tag, at least one open tag, and in the next step a closed tag, or immediately get 1000 lines of the file, it doesn’t matter, the main thing is that the tag does not break, for example

le>Simple bread
So it is impossible to send data to the handler, because the tag is broken.
You can come up with your own method of sending data to the handler, for example, collect 1 megabyte of data and send it to the handler to increase speed, just make sure that the tags always end, and the data can be broken
Simple</b><br><b>bread

Thus, in parts, as you wish, you can send a large file to the handler.

Now let's look at how this data is processed and how to get it.

Starting with the opening tags feature startElement ($parser , $name , $attrs )
Let's assume that processing has reached the line
< ingredient amount = "3" unit = "стакан" >Flour
Then inside the function the variable $name will be equal to ingredient that is, the name of the open tag (the matter has not yet reached the closing of the tag).
Also in this case, an array of attributes of this $attrs tag will be available, in which there will be data amount = "3" and unit = "glass".

After that, the data of the open tag was processed by the function data ($parser , $data )
The $data variable will contain everything that is between the opening and closing tag, in our case it is the text Muk

And the processing of our string by the function is completed endElement ($parser , $name )
This is the name of the closed tag, in our case $name will be equal to ingredient

And after that, it all went full circle again.

The above example only demonstrates the principle of XML processing, but for real application it needs to be finalized.
Usually, you have to parse large XML to enter data into the database, and for proper data processing, you need to know which open tag the data belongs to, what level of tag nesting, and which tags are open in the hierarchy above. With this information, you can process the file correctly without any problems.
To do this, you need to introduce several global variables that will collect information about open tags, nesting and data.
Here is an example that can be used

Function webi_xml($file)
{
global $webi_depth ; // counter to keep track of nesting depth
$webi_depth = 0 ;
global $webi_tag_open ; // will contain an array of open in this moment tags
$webi_tag_open = array();
global $webi_data_temp ; // this array will contain the data of one tag

####################################################
### data handling function
function data ($parser , $data )
{
global $webi_depth ;
global $webi_tag_open ;
global $webi_data_temp ;
// add data to the array with nesting and currently opened tag
$webi_data_temp [ $webi_depth ][ $webi_tag_open [ $webi_depth ]][ "data" ].= $data ;
}
############################################

####################################################
### opening tag function
function startElement ($parser , $name , $attrs )
{
global $webi_depth ;
global $webi_tag_open ;
global $webi_data_temp ;

// if the nesting level is not already zero, then one tag is already open
// and the data from it is already in the array, you can process them
if ($webi_depth)
{




" ;

print"
" ;
print_r($webi_tag_open); // array of open tags
print"


" ;

// after processing the data, delete them to free memory
unset($GLOBALS [ "webi_data_temp" ][ $webi_depth ]);
}

// now the opening of the next tag has begun and further processing will occur in the next step
$webi_depth++; // increase nesting

$webi_tag_open [ $webi_depth ]= $name ; // add open tag to info array
$webi_data_temp [ $webi_depth ][ $name ][ "attrs" ]= $attrs ; // now add tag attributes

}
###############################################

#################################################
## closing tag function
function endElement ($parser , $name ) (
global $webi_depth ;
global $webi_tag_open ;
global $webi_data_temp ;

// data processing starts here, for example, adding to the database, saving to a file, etc.
// $webi_tag_open contains a chain of open tags by nesting level
// for example $webi_tag_open[$webi_depth] contains the name of the open tag whose information is currently being processed
// $webi_depth tag nesting level
// $webi_data_temp[$webi_depth][$webi_tag_open[$webi_depth]]["attrs"] array of tag attributes
// $webi_data_temp[$webi_depth][$webi_tag_open[$webi_depth]]["data"] tag data

Print "data " . $webi_tag_open [ $webi_depth ]. "--" .($webi_data_temp [ $webi_depth ][ $webi_tag_open [ $webi_depth ]][ "data" ]). "
" ;
print_r ($webi_data_temp [ $webi_depth ][ $webi_tag_open [ $webi_depth ]][ "attrs" ]);
print"
" ;
print_r($webi_tag_open);
print"


" ;

Unset($GLOBALS [ "webi_data_temp" ]); // after processing the data, delete the array with the data as a whole, since the tag was closed
unset($GLOBALS [ "webi_tag_open" ][ $webi_depth ]); // remove information about this opened tag... since it closed

$webi_depth --; // reduce nesting
}
############################################

$xml_parser = xml_parser_create();
xml_parser_set_option ($xml_parser , XML_OPTION_CASE_FOLDING , true );

// specify which functions will work when opening and closing tags
xml_set_element_handler($xml_parser , "startElement" , "endElement" );

// specify a function for working with data
xml_set_character_data_handler($xml_parser , "data" );

// open file
$fp = fopen($file , "r" );

$perviy_vxod = 1 ; // flag for checking the first input to the file
$data = "" ; // here we collect parts of the data from the file and send it to the xml parser

// loop until end of file found
while (! feof ($fp ) and $fp )
{
$simvol = fgetc($fp); // read one character from file
$data .= $simvol ; // add this character to the data to be sent

// if the character is not the end tag, then return to the beginning of the loop and add one more character to the data, and so on until the end tag is found
if($simvol != ">" ) ( continue;)
// if the closing tag was found, now send this collected data to processing

// check if this is the first entry in the file, then delete everything before the tag// since sometimes there may be garbage before the beginning of the XML (clumsy editors, or the file was received by the script from another server)
if($perviy_vxod ) ( $data = strstr ($data , "

// now we throw data into the xml parser
if (! xml_parse ($xml_parser , $data , feof ($fp ))) (

// here you can process and get errors for validity...
// as soon as an error is encountered, parsing stops
echo "
XML Error: " .xml_error_string (xml_get_error_code ($xml_parser ));
echo "at line" . xml_get_current_line_number($xml_parser );
break;
}

// after parsing, we throw off the collected data for the next step of the loop.
$data = "" ;
}
fclose($fp);
xml_parser_free($xml_parser );
// delete global variables
unset($GLOBALS [ "webi_depth" ]);
unset($GLOBALS [ "webi_tag_open" ]);
unset($GLOBALS [ "webi_data_temp" ]);

webi_xml("1.xml");

?>

The whole example was accompanied by comments, now test and experiment.
Please note that in the data manipulation function, data is not simply inserted into the array, but is added using " .=" since the data may not come in a whole form, and if you just make an assignment, then from time to time you will receive data in chunks.

Well, that's all, now there will be enough memory when processing a file of any size, but the script's running time can be increased in several ways.
Insert a function at the beginning of the script
set_time_limit(6000);
or
ini_set("max_execution_time" , "6000" );

Or add text to the .htaccess file
php_value max_execution_time 6000

These examples will increase the script running time to 6000 seconds.
You can increase the time in this way only in the off safe mode.

If you have access to edit php.ini you can increase the time with
max_execution_time = 6000

For example, on masterhost hosting, at the time of this writing, increasing the script time is prohibited, despite the disabled safe mode, but if you are a pro, you can make your php build on the masterhost, but this is not in this article.

Now we will study working with XML. XML is a format for exchanging data between sites. It is very similar to HTML, only XML allows its own tags and attributes.

Why is XML needed for parsing? Sometimes it happens that the site you need to parse has an API that allows you to get what you want without much effort. Therefore, immediately advice - before parsing the site, check if it has an API.

What is an API? This is a set of functions with which you can send a request to this site and get the desired response. This answer most often comes in XML format. So let's start studying it.

Working with XML in PHP

Let's say you have XML. It can be in a string, stored in a file, or served on request to a specific URL.

Let the XML be stored in a string. In this case, you need to create an object from this line using new SimpleXMLElement:

$str = " Kolya 25 1000 "; $xml = new SimpleXMLElement($str);

Now we have in a variable $xml an object with parsed XML is stored. By accessing the properties of this object, you can access the content of the XML tags. How exactly - we will analyze a little lower.

If the XML is stored in a file or returned by accessing a URL (which is most often the case), then you should use the function simplexml_load_file which makes the same object $xml:

Kolya 25 1000

$xml = simplexml_load_file(file path or url);

Working methods

In the examples below, our XML is stored in a file or URL.

Let the following XML be given:

Kolya 25 1000

Let's get the name, age and salary of an employee:

$xml = simplexml_load_file(file path or url); echo $xml->name; //displays "Kolya" echo $xml->age; //outputs 25 echo $xml->salary; //outputs 1000

As you can see, the $xml object has properties corresponding to the tags.

You may have noticed that the tag does not appear anywhere in circulation. This is because it is the root tag. You can rename it, for example, to - and nothing will change:

Kolya 25 1000

$xml = simplexml_load_file(file path or url); echo $xml->name; //displays "Kolya" echo $xml->age; //outputs 25 echo $xml->salary; //outputs 1000

There can be only one root tag in XML, just like the root tag in plain HTML.

Let's modify our XML a bit:

Kolya 25 1000

In this case, we get a chain of calls:

$xml = simplexml_load_file(file path or url); echo $xml->worker->name; //displays "Kolya" echo $xml->worker->age; //outputs 25 echo $xml->worker->salary; //outputs 1000

Working with Attributes

Let some data be stored in attributes:

Number 1

$xml = simplexml_load_file(file path or url); echo $xml->worker["name"]; //displays "Kolya" echo $xml->worker["age"]; //outputs 25 echo $xml->worker["salary"]; //outputs 1000 echo $xml->worker; //prints "Number 1"

Tags with hyphens

In XML, tags (and attributes) with a hyphen are allowed. In this case, such tags are accessed like this:

Kolya Ivanov

$xml = simplexml_load_file(file path or url); echo $xml->worker->(first-name); //displays "Kolya" echo $xml->worker->(last-name); //displays "Ivanov"

Loop iteration

Let now we have not one worker, but several. In this case, we can iterate over our object with a foreach loop:

Kolya 25 1000 Vasya 26 2000 Peter 27 3000

$xml = simplexml_load_file(file path or url); foreach ($xml as $worker) ( echo $worker->name; //prints "Kolya", "Vasya", "Petya" )

From object to normal array

If you don't feel comfortable working with an object, you can convert it to a normal PHP array with the following trick:

$xml = simplexml_load_file(file path or url); var_dump(json_decode(json_encode($xml), true));

More information

Parsing based on sitemap.xml

Often, a site has a sitemap.xml file. This file stores links to all pages of the site for the convenience of indexing them by search engines (indexing is, in fact, parsing the site by Yandex and Google).

In general, we should not care much why this file is needed, the main thing is that if it exists, you can not climb the pages of the site by any tricky methods, but simply use this file.

How to check the presence of this file: let's parse the site site.ru, then refer to site.ru/sitemap.xml in the browser - if you see something, then it is there, and if you don't see it, then alas.

If there is a sitemap, then it contains links to all pages of the site in XML format. Feel free to take this XML, parse it, separate links to the pages you need in any way convenient for you (for example, by parsing the URL that was described in the spider method).

As a result, you get a list of links for parsing, all that remains is to go to them and parse the content you need.

Read more about the sitemap.xml device in wikipedia.

What do you do next:

Start solving problems at the following link: tasks for the lesson.

When everything is decided - go to the study of a new topic.

→ What are XML parsers for and how they can be useful

If you're into website development, you've probably heard of XML, even if you haven't used it in your work yet. In this case, it's time to get acquainted, because having experienced a real boom, over the past ten years this new format has grown from a pioneering project to a true industry standard, with reports of success stories appearing almost daily.

One of the most important components of XML technology is a special class of programs responsible for parsing documents and extracting the necessary information - parsers. It is about them that will be discussed in this article. Let's figure out what parsers are for, what they are and where you can get them.

In general, an XML document is a simple text file in which the necessary data structure is stored with the help of special syntactic constructions (they are called "tags"). This allows you to store information not in a continuous array, but in the form of hierarchically linked fragments. Since text files are very easy to create and transfer over a network, they are an extremely convenient way to store information and are widely used in the creation of complex distributed applications.

But versatility text format XML turns out to be quite an obvious inconvenience - before extracting data from a document, you need to suffer a lot with parsing the text and determining its structure. Implementing all the necessary procedures manually is a very non-trivial task, and will require considerable effort. Parsers are one of the standard mechanisms that make life easier for developers.

What is it? An XML parser is a program designed to parse content text document, which conforms to the XML specification. She gets all the "dirty" work: getting general information about the document, text analysis, search for service structures (elements, attributes, entities, etc.), checking for compliance with syntactic rules, as well as providing an interface for accessing the document. As a result, carefully extracted data will be passed to the user application, which may not know anything at all about what XML is.

The parser can be implemented as a separate program module or ActiveX component, it can be connected to the application through special class libraries at the compilation or execution stage. Parsers are divided into verifying (validating) and non-verifying (non-validating). The former can check the structure of a document based on DTDs or data schemas, while the latter do not care about this - and therefore are usually smaller. Many of the modern parsers are "loaded" with numerous additional features(advanced error handling, adding and editing data), which makes them more convenient to use, although it increases the size of the programs. Almost all common parsers also support a number of important XML standards (XSLT, data schemas, Name spaces, XPath, etc.) - or are bundled with parsers of other languages ​​derived from it.

If you have realized the usefulness of the parser for XML, then it's time to start practical experiments. Where can you get them? There should be no particular problems with finding the appropriate software: the Internet is full of freely distributed parsers written in various programming languages, working on all platforms and having a variety of characteristics and purposes.

The most common and well-known is the Expat parser, written by James Clark, one of the creators of the XML specification. It is implemented in the C++ programming language and is distributed along with source code. By the way, support for this markup language in such well-known environments as PHP and Perl is based on it. Another common parser is Xerces, available from the Apache XML Project (implemented in Java and C++). You can find many parsers for C++, Perl and Python. Most of them are written in Java, and are suitable for any platform familiar with Java. Market leaders (Microsoft, Oracle, Sun) did not stand aside, always distinguished by their scale and monumentality. They released more "heavyweight" and functional packages, which contain, in addition to the actual parsers, a lot of additional utilities that make life easier for developers.

Of course, it is impossible to tell everything about parsers within the framework of one note. But I hope you understand that working with XML is not as difficult as it might seem. All the complexities of this format are hidden from us inside the parsers, and there is no reason to be afraid to implement the new format in existing projects.

Internet