Quick meta data grabber
[code]
if(get_meta_tags('http://'.$_POST['pagina'])){
print '<font class="midden">Meta data from http://'.$_POST['pagina'].'</font>';
$metadata = get_meta_tags('http://'.$_POST['pagina']);
echo '<table width="100%">';
print '<tr><td>Meta</td><td>Waarde</td></tr>';
foreach($metadata as $naam => $waarde){
echo '<tr><td valign="top">'.$naam.'</td><td>'.$waarde.'</td></tr>';
}
print '</table>';
}else{
print '
<div class="red_h">Incorrect</div>
';
}
[/code]
get_meta_tags
diel at caroes dot be
16-Jan-2008 10:38
16-Jan-2008 10:38
dev at No_SpAm dot phpartist dot com
03-Feb-2007 07:05
03-Feb-2007 07:05
Please consider that get_meta_tags and get_headers returns error when you try to connect a secure site (https) if you did'nt enable ssl support. For PHP socket functions like fsockopen, the same error occures.
roganty at gmail dot com
19-Aug-2006 03:33
19-Aug-2006 03:33
This is a slight amendment to jimmyxx at gmail dot com function
I tried using the regex displayed in his code, and php threw up a couple of errors
Below is the correct regular expression that works
(Please note that I had to split the regex into strings because php.net was complaining about the line being to long)
<?php
preg_match_all(
"|<meta[^>]+name=\"([^\"]*)\"[^>]" . "+content=\"([^\"]*)\"[^>]+>|i",
$html, $out,PREG_PATTERN_ORDER);
?>
The problem was due to the quotes being incorrectly escaped.
I hope this helps anyone who has been having problems with his code
jimmyxx at gmail dot com
20-Sep-2005 12:37
20-Sep-2005 12:37
I used this as part of my mini php search based search engine - it really slowed the whole thing down. I wrote this function to read HTML (just fetch the file or use something like snoopy) and extract the meta data via a simple regex, works a treat and made my crawler much faster:
<?php
function get_meta_data($html) {
preg_match_all(
"|<meta[^>]+name=\\"([^"]*)\\"[^>]+content="([^\\"]*)"[^>]+>|i", $html, $out,PREG_PATTERN_ORDER);
for ($i=0;$i < count($out[1]);$i++) {
// loop through the meta data - add your own tags here if you need
if (strtolower($out[1][$i]) == "keywords") $meta['keywords'] = $out[2][$i];
if (strtolower($out[1][$i]) == "description") $meta['description'] = $out[2][$i];
}
return $meta;
}
?>
mariano at cricava dot com
12-Sep-2005 05:18
12-Sep-2005 05:18
Based on Michael Knapp's code, and adding some regex, here's a function that will get all meta tags and the title based on a URL. If there's an error, it will return false. Using the function getUrlContents(), also included, it takes care of META REFRESH re-directions, following up to the specified number of redirections. Please note that the regular expressions included were split into strings because php.net was complaining about the line being to long ;)
<?php
function getUrlData($url)
{
$result = false;
$contents = getUrlContents($url);
if (isset($contents) && is_string($contents))
{
$title = null;
$metaTags = null;
preg_match('/<title>([^>]*)<\/title>/si', $contents, $match );
if (isset($match) && is_array($match) && count($match) > 0)
{
$title = strip_tags($match[1]);
}
preg_match_all('/<[\s]*meta[\s]*name="?' . '([^>"]*)"?[\s]*' . 'content="?([^>"]*)"?[\s]*[\/]?[\s]*>/si', $contents, $match);
if (isset($match) && is_array($match) && count($match) == 3)
{
$originals = $match[0];
$names = $match[1];
$values = $match[2];
if (count($originals) == count($names) && count($names) == count($values))
{
$metaTags = array();
for ($i=0, $limiti=count($names); $i < $limiti; $i++)
{
$metaTags[$names[$i]] = array (
'html' => htmlentities($originals[$i]),
'value' => $values[$i]
);
}
}
}
$result = array (
'title' => $title,
'metaTags' => $metaTags
);
}
return $result;
}
function getUrlContents($url, $maximumRedirections = null, $currentRedirection = 0)
{
$result = false;
$contents = @file_get_contents($url);
// Check if we need to go somewhere else
if (isset($contents) && is_string($contents))
{
preg_match_all('/<[\s]*meta[\s]*http-equiv="?REFRESH"?' . '[\s]*content="?[0-9]*;[\s]*URL[\s]*=[\s]*([^>"]*)"?' . '[\s]*[\/]?[\s]*>/si', $contents, $match);
if (isset($match) && is_array($match) && count($match) == 2 && count($match[1]) == 1)
{
if (!isset($maximumRedirections) || $currentRedirection < $maximumRedirections)
{
return getUrlContents($match[1][0], $maximumRedirections, ++$currentRedirection);
}
$result = false;
}
else
{
$result = $contents;
}
}
return $contents;
}
?>
Here's an example of its usage. Check that the included URL has a META REFRESH redirection:
<?php
$result = getUrlData('http://www.marianoiglesias.com.ar/');
echo '<pre>'; print_r($result); echo '</pre>';
?>
For the above code the output would be:
<?php
Array
(
[title] => Mariano Iglesias: El Eternauta
[metaTags] => Array
(
[description] => Array
(
[html] => <meta name="description" content="Java, PHP, and some other technological mumble jumble. Also, some real-life stuff as well." />
[value] => Java, PHP, and some other technological mumble jumble. Also, some real-life stuff as well.
)
[DC.title] => Array
(
[html] => <meta name="DC.title" content="Mariano Iglesias - Weblog" />
[value] => Mariano Iglesias - Weblog
)
[ICBM] => Array
(
[html] => <meta name="ICBM" content="-34.6017, -58.3956" />
[value] => -34.6017, -58.3956
)
[geo.position] => Array
(
[html] => <meta name="geo.position" content="-34.6017;-58.3956" />
[value] => -34.6017;-58.3956
)
[geo.region] => Array
(
[html] => <meta name="geo.region" content="AR-BA">
[value] => AR-BA
)
[geo.placename] => Array
(
[html] => <meta name="geo.placename" content="Buenos Aires">
[value] => Buenos Aires
)
)
)
?>
Michael Knapp
12-Mar-2005 12:47
12-Mar-2005 12:47
Tim's code is good (thanks Tim), except it won't work very well if the tag is part of a long non-breaking string.
E.g. try getting the title from Google Maps (http://www.google.com/maps).
A better solution is:
<?php
$title = "";
if ($fp = @fopen( $_POST['url'], 'r' )) {
$cont = "";
// read the contents
while( !feof( $fp ) ) {
$buf = trim(fgets( $fp, 4096 )) ;
$cont .= $buf;
}
// get tag contents
@preg_match( "/<title>([a-z 0-9]*)<\/title>/si", $cont, $match );
// tag contents
$title = strip_tags(@$match[ 1 ]);
}
?>
Note the strip_tags. Another thing to be careful of is to check for ", <, and >. You will need to strip those out if you are posting the output to a form.
Also, it is probably best to use the /i modifier, because some people might code <TITLE> etc...
rehfeld
05-Feb-2005 06:45
05-Feb-2005 06:45
in response to
jp at webgraphe dot com
this function grabs meta tags, not http headers
if you need the headers
<?php
$fp = fopen('http://example.org/somepage.html', 'r');
// the variable $http_response_header magically appears
print_r($http_response_header);
// or
$meta_data = stream_get_meta_data($fp);
print_r($meta_data);
?>
tim dot bennett at haveaniceplay dot com
01-Feb-2005 07:43
01-Feb-2005 07:43
If you want to get the contents of tags other than meta you can use:
<?php
$page = "http://www.mysite.com/apage.php";
// tags
$start = '<atag>';
$end = '<\/atag>';
// open the file
$fp = fopen( $page, 'r' );
$cont = "";
// read the contents
while( !feof( $fp ) ) {
$buf = trim( fgets( $fp, 4096 ) );
$cont .= $buf;
}
// get tag contents
preg_match( "/$start(.*)$end/s", $cont, $match );
// tag contents
$contents = $match[ 1 ];
?>
jp at webgraphe dot com
12-Dec-2003 05:37
12-Dec-2003 05:37
If the URL is doing a redirection using the headers (like you would do with PHP function header("Location: URL");), the page has no content (in general). It appears get_meta_tags() doesn't catch that kind of redirection (like cURL would do) and it lead me to a timeout of my script.
I experienced this in a spider I wrote in order to feed my database of all available pages on my site and one link was linking to a page that simply has the following code:
<?php
header("Location: sections.php?section=home");
exit();
?>
That made my script hang for a moment and apparently, get_meta_tags() wasn't even able to return me an error.
JP.
bill.neumann at hatworld.com
13-Mar-2003 07:50
13-Mar-2003 07:50
The get_meta_tags function does not seem to be able to grab values if there are spaces between the attribute, the equal sign, and the opening quote marks.
20-Dec-2001 11:01
Tested PHP 4.0.6<br>
get_meta_tags() seems to look only in the beginning of a file, meaning that e.g. if there is a lot of PHP code before the HTML header it will return nothing ...<br>
Tested using get_meta_tags() on local files with about 9000 characters of PHP code before HTML HEADER.<br>
Workaround: if possible move code after header or if not: include a file.
Ben dot Davis at furman dot edu
30-Jul-2001 03:29
30-Jul-2001 03:29
I have found that for large searches, get_meta_tags is very slow. I created a large search engine for a website that couldnt use a database and I first tried pulling out the meta tags.
I have found that it is actually much faster to use eregi to pull out the meta tags. This code below pulls out the description:
if (eregi ("<meta name=\"description\" content=[^>]*", $contents, $descresult))
{
$description = explode("<meta name=\"description\" content=", $descresult[0]);
echo "<font face=\"Arial\" size=2>$description[1]</font>";
}
richard at pifmagazine dot com
21-Apr-2000 05:17
21-Apr-2000 05:17
Something that is not mentioned above and should be : When using get_meta_tags on a remote PHP page the page will be parsed before the meta tags are returned - so you can capture meta tags generated dynamically (by PHP??) on the remote end.
<p>
This DOES NOT work the same way when getting meta tags on local file systems. Local files are not parsed through the web server before returning to get_meta_tags(). If the META tag is hard-coded into the page, you'll be fine - but if it dynamically generated you will not be able to capture it unless you use the full URL when calling your local files.
richard at pifmagazine dot com
21-Apr-2000 05:00
21-Apr-2000 05:00
An Important Note about META tags and this function : if your META tag contains newline "\n" characters, get_meta_tags() will return a NULL value for that name property. Removing the newlines from the source META tag corrects the problem.
