This class has a getAttribute method.
Assume that a DOMNode object $ref contained an anchor taken out of a DOMNode List. Then
$url = $ref->getAttribute('href');
would isolate the url associated with the href part of the anchor.
The DOMNode class
(PHP 5)
Class synopsis
Properties
- nodeName
-
Returns the most accurate name for the current node type
- nodeValue
-
The value of this node, depending on its type
- nodeType
-
Gets the type of the node. One of the predefined XML_xxx_NODE constants
- parentNode
-
The parent of this node
- childNodes
-
A DOMNodeList that contains all children of this node. If there are no children, this is an empty DOMNodeList.
- firstChild
-
The first child of this node. If there is no such node, this returns
NULL. - lastChild
-
The last child of this node. If there is no such node, this returns
NULL. - previousSibling
-
The node immediately preceding this node. If there is no such node, this returns
NULL. - nextSibling
-
The node immediately following this node. If there is no such node, this returns
NULL. - attributes
-
A DOMNamedNodeMap containing the attributes of this node (if it is a DOMElement) or
NULLotherwise. - ownerDocument
-
The DOMDocument object associated with this node.
- namespaceURI
-
The namespace URI of this node, or
NULLif it is unspecified. - prefix
-
The namespace prefix of this node, or
NULLif it is unspecified. - localName
-
Returns the local part of the qualified name of this node.
- baseURI
-
The absolute base URI of this node or
NULLif the implementation wasn't able to obtain an absolute URI. - textContent
-
This attribute returns the text content of this node and its descendants.
Notes
Note:
The DOM extension uses UTF-8 encoding. Use utf8_encode() and utf8_decode() to work with texts in ISO-8859-1 encoding or Iconv for other encodings.
Table of Contents
- DOMNode::appendChild — Adds new child at the end of the children
- DOMNode::C14N — Canonicalize nodes to a string
- DOMNode::C14NFile — Canonicalize nodes to a file
- DOMNode::cloneNode — Clones a node
- DOMNode::getLineNo — Get line number for a node
- DOMNode::getNodePath — Get an XPath for a node
- DOMNode::hasAttributes — Checks if node has attributes
- DOMNode::hasChildNodes — Checks if node has children
- DOMNode::insertBefore — Adds a new child before a reference node
- DOMNode::isDefaultNamespace — Checks if the specified namespaceURI is the default namespace or not
- DOMNode::isSameNode — Indicates if two nodes are the same node
- DOMNode::isSupported — Checks if feature is supported for specified version
- DOMNode::lookupNamespaceURI — Gets the namespace URI of the node based on the prefix
- DOMNode::lookupPrefix — Gets the namespace prefix of the node based on the namespace URI
- DOMNode::normalize — Normalizes the node
- DOMNode::removeChild — Removes child from list of children
- DOMNode::replaceChild — Replaces a child
You cannot simply overwrite $textContent, to replace the text content of a DOMNode, as the missing readonly flag suggests. Instead you have to do something like this:
<?php
$node->removeChild($node->firstChild);
$node->appendChild(new DOMText('new text content'));
?>
This example shows what happens:
<?php
$doc = DOMDocument::loadXML('<node>old content</node>');
$node = $doc->getElementsByTagName('node')->item(0);
echo "Content 1: ".$node->textContent."\n";
$node->textContent = 'new content';
echo "Content 2: ".$node->textContent."\n";
$newText = new DOMText('new content');
$node->appendChild($newText);
echo "Content 3: ".$node->textContent."\n";
$node->removeChild($node->firstChild);
$node->appendChild($newText);
echo "Content 4: ".$node->textContent."\n";
?>
The output is:
Content 1: old content // starting content
Content 2: old content // trying to replace overwriting $node->textContent
Content 3: old contentnew content // simply appending the new text node
Content 4: new content // removing firstchild before appending the new text node
If you want to have a CDATA section, use this:
<?php
$doc = DOMDocument::loadXML('<node>old content</node>');
$node = $doc->getElementsByTagName('node')->item(0);
$node->removeChild($node->firstChild);
$newText = $doc->createCDATASection('new cdata content');
$node->appendChild($newText);
echo "Content withCDATA: ".$doc->saveXML($node)."\n";
?>
Just discovered that node->nodeValue strips out all the tags
For a reference with more information about the XML DOM node types, see http://www.w3schools.com/dom/dom_nodetype.asp
(When using PHP DOMNode, these constants need to be prefaced with "XML_")
For clarification:
The assumingly 'discoverd' by previous posters and seemingly undocumented methods (.getElementsByTagName and .getAttribute) on this class (DOMNode) are in fact methods of the class DOMElement, which inherits from DOMNode.
See: http://www.php.net/manual/en/class.domelement.php
It took me forever to find a mapping for the XML_*_NODE constants. So I thought, it'd be handy to paste it here:
1 XML_ELEMENT_NODE
2 XML_ATTRIBUTE_NODE
3 XML_TEXT_NODE
4 XML_CDATA_SECTION_NODE
5 XML_ENTITY_REFERENCE_NODE
6 XML_ENTITY_NODE
7 XML_PROCESSING_INSTRUCTION_NODE
8 XML_COMMENT_NODE
9 XML_DOCUMENT_NODE
10 XML_DOCUMENT_TYPE_NODE
11 XML_DOCUMENT_FRAGMENT_NODE
12 XML_NOTATION_NODE
And apparently also a setAttribute method too:
$node->setAttribute( 'attrName' , 'value' );
The issues around mixed content took me some experimentation to remember, so I thought I'd add this note to save others time.
When your markup is something like: <div><p>First text.</p><ul><li><p>First bullet</p></li></ul></div>, you'll get XML_ELEMENT_NODEs that are quite regular. The <div> has children <p> and <ul> and the nodeValue for both <p>s yields the text you expect.
But when your markup is more like <p>This is <b>bold</b> and this is <i>italic</i>.</p>, you realize that the nodeValue for XML_ELEMENT_NODEs is not reliable. In this case, you need to look at the <p>'s child nodes. For this example, the <p> has children: #text, <b>, #text, <i>, #text.
In this example, the nodeValue of <b> and <i> is the same as their #text children. But you could have markup like: <p>This <b>is bold and <i>bold italic</i></b>, you see?</p>. In this case, you need to look at the children of <b>, which will be #text, <i>, because the nodeValue of <b> will not be sufficient.
XML_TEXT_NODEs have no children and are always named '#text'. Depending on how whitespace is handled, your tree may have "empty" #text nodes as children of <body> and elsewhere.
Attributes are nodes, but I had forgotten that they are not in the tree expressed by childNodes. Walking the full tree using childNodes will not visit any attribute nodes.
This class apparently also has a getElementsByTagName method.
I was able to confirm this by evaluating the output from DOMNodeList->item() against various tests with the is_a() function.
Try canonicalization:
<?php
$dom = new DOMDocument;
$dom->loadHTMLFile('http://www.example.com/');
echo $dom->documentElement->C14N();
?>
Or output it to a file, using C14NFile()
Undocumented stuff ;)
If you have empty $node->textContent and $node->textValue, check if document that is loaded have UTF-8 encoding.
getAttribute() returns an empty string if the requested attribute doesn't exist in the node.
