Found this in the IXmlReader docs at msdn but it's also valid for XMLReader in PHP.
You should save the value of $isEmptyElement before processing attributes, or call moveToElement to make $isEmptyElement valid after processing attributes.
$isEmptyElement returns FALSE when XMLReader is positioned on an attribute node, even if attribute's parent element is empty.
La classe XMLReader
(No version information available, might only be in SVN)
Introduction
L'extension XMLReader est un analyseur XML. L'analyseur fonctionne comme un curseur qui parcourt le document et s'arrête sur chaque noeud.
Synopsis de la classe
Propriétés
- attributeCount
-
Le nombre d'attributs dans le noeud
- baseURI
-
La base URI du noeud
- depth
-
Profondeur du noeud dans l'arbre démarrant à 0
- hasAttributes
-
Indique si le noeud a des attributs
- hasValue
-
Indique si le noeud a une valeur de texte
- isDefault
-
Indique si l'attribut est par défaut à partir du DTD
- isEmptyElement
-
Indique si le noeud est un élément vide
- localName
-
Le nom local du noeud
- name
-
Le noeud qualifié du noeud
- namespaceURI
-
L'URI de l'espace de nom associé avec le noeud
- nodeType
-
Le type de noeud pour le noeud
- prefix
-
Le préfixe de l'espace de nom associé avec le noeud
- value
-
La valeur du texte du noeud
- xmlLang
-
La portée xml:lang dans lequel le noeud réside
Constantes pré-définies
Types de noeud XMLReader
-
XMLReader::NONE -
Pas de type de noeud
-
XMLReader::ELEMENT -
Élément de départ
-
XMLReader::ATTRIBUTE -
Noeud Attribut
-
XMLReader::TEXT -
Noeud texte
-
XMLReader::CDATA -
Noeud CDATA
-
XMLReader::ENTITY_REF -
Noeud de référence d'entité
-
XMLReader::ENTITY -
Noeud de déclaration d'entité
-
XMLReader::PI -
Noeud d'instruction de processus
-
XMLReader::COMMENT -
Noeud de commentaire
-
XMLReader::DOC -
Noeud document
-
XMLReader::DOC_TYPE -
Noeud de type de document
-
XMLReader::DOC_FRAGMENT -
Noeud de fragment de document
-
XMLReader::NOTATION -
Noeud de notation
-
XMLReader::WHITESPACE -
Noeud "espace"
-
XMLReader::SIGNIFICANT_WHITESPACE -
Noeud "espace" significatif
-
XMLReader::END_ELEMENT -
Élément de fin
-
XMLReader::END_ENTITY -
Entité de fin
-
XMLReader::XML_DECLARATION -
Noeud de déclaration XML
Options de l'analyseur XMLReader
-
XMLReader::LOADDTD -
Charge une DTD mais ne la valide pas
-
XMLReader::DEFAULTATTRS -
Charge une DTD et les attributs par défaut mais ne la valide pas
-
XMLReader::VALIDATE -
Charge une DTD et valide le document au moment de l'analyse
-
XMLReader::SUBST_ENTITIES -
Substitue les entités et étend les références
Sommaire
- XMLReader::close — Ferme l'entrée XMLReader
- XMLReader::expand — Retourne une copie du noeud courant comme un noeud d'objet DOM
- XMLReader::getAttribute — Récupère la valeur d'un attribut par nom
- XMLReader::getAttributeNo — Récupère la valeur d'un attribut par index
- XMLReader::getAttributeNs — Récupère la valeur d'un attribut par nom local et URI
- XMLReader::getParserProperty — Indique si la propriété spécifiée a été fixée
- XMLReader::isValid — Indique si le document analysé est valide
- XMLReader::lookupNamespace — Consulte l'espace de nom pour un préfixe
- XMLReader::moveToAttribute — Déplace un curseur à un attribut nommé
- XMLReader::moveToAttributeNo — Déplace le curseur à un attribut par index
- XMLReader::moveToAttributeNs — Déplace le curseur à un attribut d'espace de nom
- XMLReader::moveToElement — Positionne le curseur sur l'élément parent de l'attribut courant
- XMLReader::moveToFirstAttribute — Positionne le curseur sur le premier attribut
- XMLReader::moveToNextAttribute — Positionne le curseur sur le prochain attribut
- XMLReader::next — Déplace le curseur au prochain noeud en sautant tous les sous arbres
- XMLReader::open — Fixe le URI contenant le XML à analyser
- XMLReader::read — Déplace le curseur sur le prochain noeud du document
- XMLReader::readInnerXML — Lit le code XML du noeud courant
- XMLReader::readOuterXML — Lit le code XML du noeud courant, y compris lui-même
- XMLReader::readString — Lit le contenu du noeud courant sous forme de chaîne
- XMLReader::setParserProperty — Fixe des options pour l'analyseur
- XMLReader::setRelaxNGSchema — Fixe le nom du fichier ou l'URI pour le Schéma RelaxNG
- XMLReader::setRelaxNGSchemaSource — Spécifie le schéma RelaxNG
- XMLReader::setSchema — Valide le document avec XSD
- XMLReader::XML — Fixe les données contenant le XML à analyser
Guys, I hope this example will help
you can erase prints showing the process-
and it will be a piece of nice code.
<?php
function xml2assoc($xml, $name)
{
print "<ul>";
$tree = null;
print("I'm inside " . $name . "<br>");
while($xml->read())
{
if($xml->nodeType == XMLReader::END_ELEMENT)
{
print "</ul>";
return $tree;
}
else if($xml->nodeType == XMLReader::ELEMENT)
{
$node = array();
print("Adding " . $xml->name ."<br>");
$node['tag'] = $xml->name;
if($xml->hasAttributes)
{
$attributes = array();
while($xml->moveToNextAttribute())
{
print("Adding attr " . $xml->name ." = " . $xml->value . "<br>");
$attributes[$xml->name] = $xml->value;
}
$node['attr'] = $attributes;
}
if(!$xml->isEmptyElement)
{
$childs = xml2assoc($xml, $node['tag']);
$node['childs'] = $childs;
}
print($node['tag'] . " added <br>");
$tree[] = $node;
}
else if($xml->nodeType == XMLReader::TEXT)
{
$node = array();
$node['text'] = $xml->value;
$tree[] = $node;
print "text added = " . $node['text'] . "<br>";
}
}
print "returning " . count($tree) . " childs<br>";
print "</ul>";
return $tree;
}
echo "<PRE>";
$xml = new XMLReader();
$xml->open('test.xml');
$assoc = xml2assoc($xml, "root");
$xml->close();
print_r($assoc);
echo "</PRE>";
?>
It reads this xml:
<test>
<hallo volume="loud"> me <br/> lala </hallo>
<hallo> me </hallo>
</test>
The "XML2Assoc" functions noted here should be used with caution... basically they are duplicating the functionality already present in SimpleXML. They may work but they won't scale.
Their are two main uses cases for parsing XML, each suited to either XMLReader or SimpleXML.
1. SimpleXML is an excellent tool for easy access to an XML document tree using native PHP data types. It starts to flounder with massive (> 50M or so) XML documents, as it reads the entire document into memory before it can be processed. SimpleXML will just laugh at you then die when your server runs out of memory (or it will cause a load spike).
2. Aside from the reasoning behind massive XML documents, if you have to deal with massive XML documents, use XMLReader to process them. Don't try and gather an entire XML document into a PHP data structure using XMLReader and a PHP xml2assoc() function, you are reinventing the SimpleXML wheel.
When parsing massive XML documents using XMLReader, gather the data you need to perform an operation then perform it before skipping to the next node. Do not build massive data structures from a massive XML document, your server (and it's admins) will not like you.
Next version xml2assoc with some improve fixes:
- no doubled data
- no buffer arrays
<?php
/*
Read XML structure to associative array
--
Using:
$xml = new XMLReader();
$xml->open([XML file]);
$assoc = xml2assoc($xml);
$xml->close();
*/
function xml2assoc($xml) {
$assoc = null;
while($xml->read()){
switch ($xml->nodeType) {
case XMLReader::END_ELEMENT: return $assoc;
case XMLReader::ELEMENT:
$assoc[$xml->name][] = array('value' => $xml->isEmptyElement ? '' : xml2assoc($xml));
if($xml->hasAttributes){
$el =& $assoc[$xml->name][count($assoc[$xml->name]) - 1];
while($xml->moveToNextAttribute()) $el['attributes'][$xml->name] = $xml->value;
}
break;
case XMLReader::TEXT:
case XMLReader::CDATA: $assoc .= $xml->value;
}
}
return $assoc;
}
?>
make some modify from Sergey Aikinkulov's note
<?php
function xml2assoc(&$xml){
$assoc = NULL;
$n = 0;
while($xml->read()){
if($xml->nodeType == XMLReader::END_ELEMENT) break;
if($xml->nodeType == XMLReader::ELEMENT and !$xml->isEmptyElement){
$assoc[$n]['name'] = $xml->name;
if($xml->hasAttributes) while($xml->moveToNextAttribute()) $assoc[$n]['atr'][$xml->name] = $xml->value;
$assoc[$n]['val'] = xml2assoc($xml);
$n++;
}
else if($xml->isEmptyElement){
$assoc[$n]['name'] = $xml->name;
if($xml->hasAttributes) while($xml->moveToNextAttribute()) $assoc[$n]['atr'][$xml->name] = $xml->value;
$assoc[$n]['val'] = "";
$n++;
}
else if($xml->nodeType == XMLReader::TEXT) $assoc = $xml->value;
}
return $assoc;
}
?>
add else if($xml->isEmptyElement)
may be some xml has emptyelement
For those of you getting xml files that do not contain duplicate elements (in the same element), the following converter converts to arrays with key/value mapping (thus overwriting duplicate elements!):
Note this is untested with attributes although I built in support.
<?php
function xml2assoc($xml, array &$target = array()) {
while ($xml->read()) {
switch ($xml->nodeType) {
case XMLReader::END_ELEMENT:
return $target;
case XMLReader::ELEMENT:
$name = $xml->name;
$target[$name] = $xml->hasAttributes ? array() : '';
if (!$xml->isEmptyElement) {
$target[$name] = array();
xml2assoc($xml, $target[$name]);
}
if ($xml->hasAttributes)
while($xml->moveToNextAttribute())
$target[$name]['@'.$xml->name] = $xml->value;
break;
case XMLReader::TEXT:
case XMLReader::CDATA:
$target = $xml->value;
}
}
return $target;
}
?>
To verify that all nodes are read without error/warning you can use this code:
<?php
$endofxml = false;
$xml_url = "example.xml";
$reader = new XMLReader();
if(!$reader->open($xml_url)){
print "Error to open XML: $xml_url\n";
} else {
while ($reader->read()) {
$firstnode = (!isset($firstnode)) ? $reader->name : $firstnode;
/*
DO SOMETHING
*/
if ($reader->nodeType == XMLReader::END_ELEMENT && $reader->name == $firstnode) {
$endofxml = true;
}
}
}
if($endofxml) {
print "no error found";
} else {
print "error found";
}
?>
This code is useful to trap $reader->read() error/warning.
Sometimes you have an unusual URL that doesn't actually point to an xml file but still returns xml as output (Like the Battlefield Heroes generated syndication urls). Using get_file_contents(url) you can retrieve the xml data from these urls and pass it as a variable for processing as an XML String.
Unfortunately simpleXML or xml DOM cannot process all xml strings. Some have error boxes added to the end of them (such as Battlefield Heroes syndicated news). These boxes cause an end of file sort of error and closes out the script. XMLReader grabs data from these strings without error.
Take care about how to use XMLReader::$isElementEmpty. I don't know if it is a bug or not, but $isElementEmpty is set for the current context and NOT just for the element. If you move your cursor to an attribute, $isElementEmpty will ALWAYS be false.
<?php
$xml = new XMLReader();
$xml->XML('<tag attr="value" />');
$xml->read();
var_dump($xml->isEmptyElement);
$xml->moveToNextAttribute();
var_dump($xml->isEmptyElement);
?>
will output
(bool) true
(bool) false
So be sure to store $isEmptyElement before moving the cursor.
Just in case someone is confused, if you're wanting to simply pass a string of XML instead of an entire file, you would do this.
<?php
$foo = new XMLReader();
$foo->xml($STRING);
?>
.... where $STRING holds your XML. You cannot pass it like $foo = $STRING or $foo->xml = $STRING.
A basic parser
<?php
function xml2assoc($xml) {
$arr = array();
if (!preg_match_all('|\<\s*?(\w+).*?\>(.*)\<\/\s*\\1.*?\>|s', $xml, $m)) return $xml;
if (is_array($m[1]))
for ($i = 0;$i < sizeof($m[1]); $i++) $arr[$m[1][$i]] = xml2assoc($m[2][$i]);
else $arr[$m[1]] = xml2assoc($m[2]);
return $arr;
}
?>
XML to ASSOCIATIVE ARRAY
Improved algorithm based on Sergey Aikinkulov's. The problem was that it would overwrite nodes if they had the same tag name. Because of that <a><b/><b/><a> would be read as if <a><b/><a/>. This algorithm handles it better and outputs an easy to understand array:
<?php
function xml2assoc($xml) {
$tree = null;
while($xml->read())
switch ($xml->nodeType) {
case XMLReader::END_ELEMENT: return $tree;
case XMLReader::ELEMENT:
$node = array('tag' => $xml->name, 'value' => $xml->isEmptyElement ? '' : xml2assoc($xml));
if($xml->hasAttributes)
while($xml->moveToNextAttribute())
$node['attributes'][$xml->name] = $xml->value;
$tree[] = $node;
break;
case XMLReader::TEXT:
case XMLReader::CDATA:
$tree .= $xml->value;
}
return $tree;
}
?>
Usage:
myxml.xml:
------
<PERSON>
<NAME>John</NAME>
<PHONE type="home">555-555-555</PHONE>
</PERSON>
----
<?
$xml = new XMLReader();
$xml->open('myxml.xml');
$assoc = xml2assoc($xml);
$xml->close();
print_r($assoc);
?>
Outputs:
Array
(
[0] => Array
(
[tag] => PERSON
[value] => Array
(
[0] => Array
(
[tag] => NAME
[value] => John
)
[1] => Array
(
[tag] => PHONE
[value] => 555-555-555
[attributes] => Array
(
[type] => home
)
)
)
)
)
For reasons that have to do with recursion, it returns an array with the ROOT xml node as the first childNode, rather than to return only the ROOT node.
<?php
//Pull certain elements
$reader = new XMLReader();
$reader->open($xmlfile);
while ($reader->read()) {
switch ($reader->nodeType) {
case (XMLREADER::ELEMENT):
if ($reader->name == "Code")
{
$reader->read();
$code = trim($reader->value);
echo "$code\n";
break;
}
if ($reader->name == "Name")
{
$reader->read();
$customername = trim( $reader->value );
echo "$name\n";
break;
}
if ($reader->name == "Camp")
{
$camp = trim($reader->getAttribute("ID"));
echo "$camp\n";
break;
}
}
}
?>
Thanks rein_baarsma33 AT hotmail DOT com for bugfixes.
This is my new child of XML parsing method based on my and yours modification.
XML2ASSOC Is a complete solution for parsing ordinary XML
<?php
/**
* XML2Assoc Class to creating
* PHP Assoc Array from XML File
*
* @author godseth (AT) o2.pl & rein_baarsma33 (AT) hotmail.com (Bugfixes in parseXml Method)
* @uses XMLReader
*
*/
class Xml2Assoc {
/**
* Optimization Enabled / Disabled
*
* @var bool
*/
protected $bOptimize = false;
/**
* Method for loading XML Data from String
*
* @param string $sXml
* @param bool $bOptimize
*/
public function parseString( $sXml , $bOptimize = false) {
$oXml = new XMLReader();
$this -> bOptimize = (bool) $bOptimize;
try {
// Set String Containing XML data
$oXml->XML($sXml);
// Parse Xml and return result
return $this->parseXml($oXml);
} catch (Exception $e) {
echo $e->getMessage();
}
}
/**
* Method for loading Xml Data from file
*
* @param string $sXmlFilePath
* @param bool $bOptimize
*/
public function parseFile( $sXmlFilePath , $bOptimize = false ) {
$oXml = new XMLReader();
$this -> bOptimize = (bool) $bOptimize;
try {
// Open XML file
$oXml->open($sXmlFilePath);
// // Parse Xml and return result
return $this->parseXml($oXml);
} catch (Exception $e) {
echo $e->getMessage(). ' | Try open file: '.$sXmlFilePath;
}
}
/**
* XML Parser
*
* @param XMLReader $oXml
* @return array
*/
protected function parseXml( XMLReader $oXml ) {
$aAssocXML = null;
$iDc = -1;
while($oXml->read()){
switch ($oXml->nodeType) {
case XMLReader::END_ELEMENT:
if ($this->bOptimize) {
$this->optXml($aAssocXML);
}
return $aAssocXML;
case XMLReader::ELEMENT:
if(!isset($aAssocXML[$oXml->name])) {
if($oXml->hasAttributes) {
$aAssocXML[$oXml->name][] = $oXml->isEmptyElement ? '' : $this->parseXML($oXml);
} else {
if($oXml->isEmptyElement) {
$aAssocXML[$oXml->name] = '';
} else {
$aAssocXML[$oXml->name] = $this->parseXML($oXml);
}
}
} elseif (is_array($aAssocXML[$oXml->name])) {
if (!isset($aAssocXML[$oXml->name][0]))
{
$temp = $aAssocXML[$oXml->name];
foreach ($temp as $sKey=>$sValue)
unset($aAssocXML[$oXml->name][$sKey]);
$aAssocXML[$oXml->name][] = $temp;
}
if($oXml->hasAttributes) {
$aAssocXML[$oXml->name][] = $oXml->isEmptyElement ? '' : $this->parseXML($oXml);
} else {
if($oXml->isEmptyElement) {
$aAssocXML[$oXml->name][] = '';
} else {
$aAssocXML[$oXml->name][] = $this->parseXML($oXml);
}
}
} else {
$mOldVar = $aAssocXML[$oXml->name];
$aAssocXML[$oXml->name] = array($mOldVar);
if($oXml->hasAttributes) {
$aAssocXML[$oXml->name][] = $oXml->isEmptyElement ? '' : $this->parseXML($oXml);
} else {
if($oXml->isEmptyElement) {
$aAssocXML[$oXml->name][] = '';
} else {
$aAssocXML[$oXml->name][] = $this->parseXML($oXml);
}
}
}
if($oXml->hasAttributes) {
$mElement =& $aAssocXML[$oXml->name][count($aAssocXML[$oXml->name]) - 1];
while($oXml->moveToNextAttribute()) {
$mElement[$oXml->name] = $oXml->value;
}
}
break;
case XMLReader::TEXT:
case XMLReader::CDATA:
$aAssocXML[++$iDc] = $oXml->value;
}
}
return $aAssocXML;
}
/**
* Method to optimize assoc tree.
* ( Deleting 0 index when element
* have one attribute / value )
*
* @param array $mData
*/
public function optXml(&$mData) {
if (is_array($mData)) {
if (isset($mData[0]) && count($mData) == 1 ) {
$mData = $mData[0];
if (is_array($mData)) {
foreach ($mData as &$aSub) {
$this->optXml($aSub);
}
}
} else {
foreach ($mData as &$aSub) {
$this->optXml($aSub);
}
}
}
}
}
?>
[EDIT BY danbrown AT php DOT net: Fixes were also provided by "Alex" and (qdog AT qview DOT org) in user notes on this page (since removed).]
Some more documentation (i.e. examples) would be nice :-)
This is how I read some mysql parameters in an xml file:
<?php
$xml = new XMLReader();
$xml->open("config.xml");
$xml->setParserProperty(2,true); // This seems a little unclear to me - but it worked :)
while ($xml->read()) {
switch ($xml->name) {
case "mysql_host":
$xml->read();
$conf["mysql_host"] = $xml->value;
$xml->read();
break;
case "mysql_username":
$xml->read();
$conf["mysql_user"] = $xml->value;
$xml->read();
break;
case "mysql_password":
$xml->read();
$conf["mysql_pass"] = $xml->value;
$xml->read();
break;
case "mysql_database":
$xml->read();
$conf["mysql_db"] = $xml->value;
$xml->read();
break;
}
}
$xml->close();
?>
The XML file used:
<?xml version='1.0'?>
<MySQL_INIT>
<mysql_host>localhost</mysql_host>
<mysql_database>db_database</mysql_database>
<mysql_username>root</mysql_username>
<mysql_password>password</mysql_password>
</MySQL_INIT>
<?php
function parseXML($node,$seq,$path) {
global $oldpath;
if (!$node->read())
return;
if ($node->nodeType != 15) {
print '<br/>'.$node->depth;
print '-'.$seq++;
print ' '.$path.'/'.($node->nodeType==3?'text() = ':$node->name);
print $node->value;
if ($node->hasAttributes) {
print ' [hasAttributes: ';
while ($node->moveToNextAttribute()) print '@'.$node->name.' = '.$node->value.' ';
print ']';
}
if ($node->nodeType == 1) {
$oldpath=$path;
$path.='/'.$node->name;
}
parseXML($node,$seq,$path);
}
else parseXML($node,$seq,$oldpath);
}
$source = "<tag1>this<tag2 id='4' name='foo'>is</tag2>a<tag2 id='5'>common</tag2>record</tag1>";
$xml = new XMLReader();
$xml->XML($source);
print htmlspecialchars($source).'<br/>';
parseXML($xml,0,'');
?>
Output:
<tag1>this<tag2 id='4' name='foo'>is</tag2>a<tag2 id='5'>common</tag2>record</tag1>
0-0 /tag1
1-1 /tag1/text() = this
1-2 /tag1/tag2 [hasAttributes: @id = 4 @name = foo ]
2-3 /tag1/text() = is
1-4 /text() = a
1-5 /tag2 [hasAttributes: @id = 5 ]
2-6 /text() = common
1-7 /text() = record
