The above "XML to array" code does not work properly if you have several tags on the same level and with the same name, example:
<currenterrors>
<error>
<description>This is a real error...</description>
</error>
<error>
<description>This is a second error...</description>
</error>
<error>
<description>Lots of errors today...</description>
</error>
<error>
<description>This is the last error...</description>
</error>
</currenterrors>
It will then only display the first <error>-tag.
In this case you will need to number the tags automatically or maybe have several arrays for each new element.
xml_parser_create
(PHP 4, PHP 5)
xml_parser_create — Création d'un analyseur XML
Description
$encoding
] )xml_parser_create() crée un analyseur XML et retourne une référence sur cet analyseur pour qu'il puisse être utilisé ultérieurement par d'autres fonctions XML.
Liste de paramètres
-
encoding -
Le paramètre optionnel
encodingspécifie le jeu de caractères d'encodage pour l'entrée/sortie dans PHP 4. Depuis PHP 5, ce jeu de caractères est automatiquement détecté et, donc, le paramètreencodingne spécifie plus que la sortie. En PHP 4, le jeu de caractères de sortie par défaut est le même que celui d'entrée. Si une chaîne vide est passée, l'analyseur tente d'identifier quel jeu de caractère a été utilisé pour encoder le document en regardant les 3 ou 4 octets du haut. En PHP 5.0.0 et PHP 5.0.1, le jeu de caractères d'entrée par défaut est ISO-8859-1, tandis qu'en PHP 5.0.2 et suivant, il vaut UTF-8. Les jeux de caractères supportés sont ISO-8859-1, UTF-8 et US-ASCII.
Valeurs de retour
Retourne une ressource, gérant le nouvel analyseur XML.
Voir aussi
- xml_parser_create_ns() - Crée un analyseur XML
- xml_parser_free() - Détruit un analyseur XML
I created a function, which combines xml_paresr_create and all functions around.
<?php
function html_parse($file)
{
$array = str_split($file, 1);
$count = false;
$text = "";
$end = false;
foreach($array as $temp)
{
switch($temp)
{
case "<":
between($text);
$text = "";
$count = true;
$end = false;
break;
case ">":
if($end == true) {end_tag($text);}
else {start_tag($text);}
$text = "";
break;
case "/":
if($count == true) {$end = true;}
else {$text = $text . "/";}
break;
default:
$count = false;
$text = $text . $temp;
}
}
}
?>
The input value is a string.
It calls functions start_tag() , between() and end_tag() just like the original xml parser.
But it has a few differences:
- It does NOT check the code. Just resends values to that three functions, no matter, if they are right
- It works with parameters. For example: from tag <sth b="42"> sends sth b="42"
- It works wit diacritics. The original parser sometimes wrapped the text before the first diacritics appearance.
- Works with all encoding. If the input is UTF-8, the output will be UTF-8 too
- It works with strings. Not with file pointers.
- No "Reserved XML name" error
- No doctype needed
- It does not work with commentaries, notes, programming instructions etc. Just the tags
definition of the handling functions is:
<?php
function between($stuff) {}
?>
No other attributes
In PHP 5, when including in your xml file the definition '<?xml version="1.0" encoding="ISO-8859-1" ?>', I'd also recommend adding the option below:
xml_parser_set_option($xml_parser,XML_OPTION_TARGET_ENCODING, "ISO-8859-1").
It works fine!
If your enconding is 'UTF-8', just replace 'ISO-8859-1'.
I'd also recommend adding the option below
xml_parser_set_option($parser,XML_OPTION_SKIP_WHITE,1);
Even though I passed "UTF-8" as encoding type PHP (Version 4.3.3) did *not* treat the input file as UTF-8. The input file was missing the BOM header bytes (which may indeed be omitted, according to RFC3629...but things are a bit unclear there. The RFC seems to make mere recommendations concering the BOM header). If you want to sure that PHP treats an UTF-8 encoded file correctly, make sure that it begins with the corresponding 3 byte BOM header (0xEF 0xBB 0xBF)
To maintain compatibility between PHP4 and PHP5 you should always pass a string argument to this function. PHP4 autodetects the format of the input if you leave it out whereas PHP5 will assume the format to be ISO-8859-1 (and choke on the byte order marker of UTF-8 files).
Calling the function as <?php $res = xml_parser_create('') ?> will cause both versions of PHP to autodetect the format.
thought I'd share this small piece of PHP code that prepares a proper array from XML Data
(uses xml_parse_into_struct to get a raw array)
features : 1) can easily adjust to multiple levels 2) simple.
<code>
$file = "data.xml";
$xml_parser = xml_parser_create();
if (!($fp = fopen($file, "r"))) {
die("could not open XML input");
}
$data = fread($fp, filesize($file));
fclose($fp);
xml_parse_into_struct($xml_parser, $data, $vals, $index);
xml_parser_free($xml_parser);
$params = array();
$level = array();
foreach ($vals as $xml_elem) {
if ($xml_elem['type'] == 'open') {
if (array_key_exists('attributes',$xml_elem)) {
list($level[$xml_elem['level']],$extra) = array_values($xml_elem['attributes']);
} else {
$level[$xml_elem['level']] = $xml_elem['tag'];
}
}
if ($xml_elem['type'] == 'complete') {
$start_level = 1;
$php_stmt = '$params';
while($start_level < $xml_elem['level']) {
$php_stmt .= '[$level['.$start_level.']]';
$start_level++;
}
$php_stmt .= '[$xml_elem[\'tag\']] = $xml_elem[\'value\'];';
eval($php_stmt);
}
}
echo "<pre>";
print_r ($params);
echo "</pre>";
</code>
Example :
I/P XML ...
<country id="ZZ">
<name>My Land</name>
<location>15E</location>
<area>40000</area>
<state1>
<name>Hi State</name>
<area>1000</area>
<population>2000</population>
<city1>
<location>13E</location>
<population>500</population>
<area>500</area>
</city1>
<city2>
<location>13E</location>
<population>500</population>
<area>5000</area>
</city2>
</state1>
<state2>
<name>Low State</name>
<area>3000</area>
<population>20000</population>
<city1>
<location>15E</location>
<population>5000</population>
<area>1500</area>
</city1>
</state2>
</country>
O/P Array :
Array
(
[ZZ] => Array
(
[NAME] => My Land
[LOCATION] => 15E
[AREA] => 40000
[STATE1] => Array
(
[NAME] => Hi State
[AREA] => 1000
[POPULATION] => 2000
[CITY1] => Array
(
[LOCATION] => 13E
[POPULATION] => 500
[AREA] => 500
)
[CITY2] => Array
(
[LOCATION] => 13E
[POPULATION] => 500
[AREA] => 5000
)
)
[STATE2] => Array
(
[NAME] => Low State
[AREA] => 3000
[POPULATION] => 20000
[CITY1] => Array
(
[LOCATION] => 15E
[POPULATION] => 5000
[AREA] => 1500
)
)
)
)
xml_parser_create () on php5 sometimes detects the wrong input format for me -- for example, sometimes when i try to parse data that has been fetched from a databse by my script and that only contains a handful of special ISO-8859-1 characters, it seems to think the input was something else and xml_parse() chokes on things like umlauts.
the only reason i was able to figure out so far would be that -- unlike my data files -- the xml data generated by my script doesn't contain the <?xml [...] encoding="..." ?> definition. every data source with that definition seemed just fine; it's kinda odd that it worked *sometimes* without it *shrugs*.
no matter what the reason, using utf8_encode () on the string made it work, and prepending '<?xml version="1.0" encoding="ISO-8859-1" ?>' worked as well.
this problem shouldn't occur in php4, since there you would specify the input encoding along with the output encoding.
