PHP
downloads | documentation | faq | getting help | mailing lists | reporting bugs | php.net sites | links | conferences | my php.net

search for in the

http_build_query> <get_headers
Last updated: Fri, 25 Jul 2008

view this page in

get_meta_tags

(PHP 4, PHP 5)

get_meta_tagsExtrae todo el contenido de atributos de etiquetas meta de un archivo y devuelve una matriz

Descripción

array get_meta_tags ( string $nombre_archivo [, bool $usar_ruta_inclusion ] )

Abre nombre_archivo y lo procesa línea por línea en busca de etiquetas <meta> en el archivo. El procesamiento se detiene al encontrar </head>.

Lista de parámetros

nombre_archivo

La ruta al archivo HTML, como una cadena. Éste puede ser un archivo local o una URL.

Example #1 Lo que procesa get_meta_tags()

<meta name="author" content="nombre">
<meta name="keywords" content="php documentacion">
<meta name="DESCRIPTION" content="un manual de php">
<meta name="geo.position" content="49.33;-86.59">
</head> <!-- el procesamiento se detiene aquí -->
(preste atención a los finales de línea, - PHP usa una función nativa para procesar la entrada, así que un archivo Mac no funcionará en Unix).

usar_ruta_inclusion

Definir usar_ruta_inclusion como TRUE producirá que PHP intente abrir el archivo a lo largo de la ruta de inclusión estándar, tal y como se define en la directiva include_path. Éste parámetro es usado para archivos locales, no URLs.

Valores retornados

Devuelve una matriz con todas las etiquetas meta procesadas.

El valor de la propiedad 'name' se convierte en la clave, el valor del contenido de la propiedad se convierte en el valor de la matriz devuelta, de modo que es posible usar fácilmente funciones estándar de matrices para recorrerlo o acceder a valores sencillos. Los caracteres especiales en el valor de la propiedad name son sustituidos con '_', el resto es convertido a minúsculas. Si dos etiquetas meta tienen el mismo nombre, sólo se devuelve la última.

Registro de cambios

Versión Descripción
4.0.5 Se agregó el soporte para atributos HTML sin comillas.

Ejemplos

Example #2 Lo que devuelve get_meta_tags()

<?php
// Asumiendo que las anteriores etiquetas se encuentran en www.example.com
$etiquetas get_meta_tags('http://www.example.com/');

// Note como las claves están ahora en minúsculas, y como . fue
// reemplazado con _ en la clave
echo $etiquetas['author'];       // nombre
echo $etiquetas['keywords'];     // documentación php
echo $etiquetas['description'];  // un manual de php
echo $etiquetas['geo_position']; // 49.33;-86.59
?>



http_build_query> <get_headers
Last updated: Fri, 25 Jul 2008
 
add a note add a note User Contributed Notes
get_meta_tags
diel at caroes dot be
16-Jan-2008 10:38
Quick meta data grabber
[code]
if(get_meta_tags('http://'.$_POST['pagina'])){
        print '<font class="midden">Meta data from http://'.$_POST['pagina'].'</font>';
        $metadata = get_meta_tags('http://'.$_POST['pagina']);
        echo '<table width="100%">';
        print '<tr><td>Meta</td><td>Waarde</td></tr>';
        foreach($metadata as $naam => $waarde){
            echo '<tr><td valign="top">'.$naam.'</td><td>'.$waarde.'</td></tr>';
        }
        print '</table>';
    }else{
        print '
        <div class="red_h">Incorrect</div>
        ';
    }
[/code]
dev at No_SpAm dot phpartist dot com
03-Feb-2007 07:05
Please consider that get_meta_tags and get_headers returns error when you try to connect a secure site (https) if you did'nt enable ssl support. For PHP socket functions like fsockopen, the same error occures.
roganty at gmail dot com
19-Aug-2006 03:33
This is a slight amendment to jimmyxx at gmail dot com function

I tried using the regex displayed in his code, and php threw up a couple of errors

Below is the correct regular expression that works
(Please note that I had to split the regex into strings because php.net was complaining about the line being to long)
<?php
preg_match_all
(
  
"|<meta[^>]+name=\"([^\"]*)\"[^>]" . "+content=\"([^\"]*)\"[^>]+>|i",
  
$html, $out,PREG_PATTERN_ORDER);
?>

The problem was due to the quotes being incorrectly escaped.
I hope this helps anyone who has been having problems with his code
jimmyxx at gmail dot com
20-Sep-2005 12:37
I used this as part of my mini php search based search engine - it really slowed the whole thing down. I wrote this function to read HTML (just fetch the file or use something like snoopy) and extract the meta data via a simple regex, works a treat and made my crawler much faster:

<?php

function get_meta_data($html) {

   
preg_match_all(
   
"|<meta[^>]+name=\\"([^"]*)\\"[^>]+content="([^\\"]*)"[^>]+>|i"$html, $out,PREG_PATTERN_ORDER);

    for (
$i=0;$i < count($out[1]);$i++) {
       
// loop through the meta data - add your own tags here if you need
       
if (strtolower($out[1][$i]) == "keywords") $meta['keywords'] = $out[2][$i];
        if (
strtolower($out[1][$i]) == "description") $meta['description'] = $out[2][$i];
    }

return
$meta;   
}

?>
mariano at cricava dot com
12-Sep-2005 05:18
Based on Michael Knapp's code, and adding some regex, here's a function that will get all meta tags and the title based on a URL. If there's an error, it will return false. Using the function getUrlContents(), also included, it takes care of META REFRESH re-directions, following up to the specified number of redirections. Please note that the regular expressions included were split into strings because php.net was complaining about the line being to long ;)

<?php
function getUrlData($url)
{
   
$result = false;
   
   
$contents = getUrlContents($url);

    if (isset(
$contents) && is_string($contents))
    {
       
$title = null;
       
$metaTags = null;
       
       
preg_match('/<title>([^>]*)<\/title>/si', $contents, $match );

        if (isset(
$match) && is_array($match) && count($match) > 0)
        {
           
$title = strip_tags($match[1]);
        }
       
       
preg_match_all('/<[\s]*meta[\s]*name="?' . '([^>"]*)"?[\s]*' . 'content="?([^>"]*)"?[\s]*[\/]?[\s]*>/si', $contents, $match);
       
        if (isset(
$match) && is_array($match) && count($match) == 3)
        {
           
$originals = $match[0];
           
$names = $match[1];
           
$values = $match[2];
           
            if (
count($originals) == count($names) && count($names) == count($values))
            {
               
$metaTags = array();
               
                for (
$i=0, $limiti=count($names); $i < $limiti; $i++)
                {
                   
$metaTags[$names[$i]] = array (
                       
'html' => htmlentities($originals[$i]),
                       
'value' => $values[$i]
                    );
                }
            }
        }
       
       
$result = array (
           
'title' => $title,
           
'metaTags' => $metaTags
       
);
    }
   
    return
$result;
}

function
getUrlContents($url, $maximumRedirections = null, $currentRedirection = 0)
{
   
$result = false;
   
   
$contents = @file_get_contents($url);
   
   
// Check if we need to go somewhere else
   
   
if (isset($contents) && is_string($contents))
    {
       
preg_match_all('/<[\s]*meta[\s]*http-equiv="?REFRESH"?' . '[\s]*content="?[0-9]*;[\s]*URL[\s]*=[\s]*([^>"]*)"?' . '[\s]*[\/]?[\s]*>/si', $contents, $match);
       
        if (isset(
$match) && is_array($match) && count($match) == 2 && count($match[1]) == 1)
        {
            if (!isset(
$maximumRedirections) || $currentRedirection < $maximumRedirections)
            {
                return
getUrlContents($match[1][0], $maximumRedirections, ++$currentRedirection);
            }
           
           
$result = false;
        }
        else
        {
           
$result = $contents;
        }
    }
   
    return
$contents;
}
?>

Here's an example of its usage. Check that the included URL has a META REFRESH redirection:

<?php
$result
= getUrlData('http://www.marianoiglesias.com.ar/');

echo
'<pre>'; print_r($result); echo '</pre>';

?>

For the above code the output would be:

<?php
Array
(
    [
title] => Mariano Iglesias: El Eternauta   
   
[metaTags] => Array
        (
            [
description] => Array
                (
                    [
html] => <meta name="description" content="Java, PHP, and some other technological mumble jumble. Also, some real-life stuff as well." />
                    [
value] => Java, PHP, and some other technological mumble jumble. Also, some real-life stuff as well.
                )

            [
DC.title] => Array
                (
                    [
html] => <meta name="DC.title" content="Mariano Iglesias - Weblog" />
                    [
value] => Mariano Iglesias - Weblog
               
)

            [
ICBM] => Array
                (
                    [
html] => <meta name="ICBM" content="-34.6017, -58.3956" />
                    [
value] => -34.6017, -58.3956
               
)

            [
geo.position] => Array
                (
                    [
html] => <meta name="geo.position" content="-34.6017;-58.3956" />
                    [
value] => -34.6017;-58.3956
               
)

            [
geo.region] => Array
                (
                    [
html] => <meta name="geo.region" content="AR-BA">
                    [
value] => AR-BA
               
)

            [
geo.placename] => Array
                (
                    [
html] => <meta name="geo.placename" content="Buenos Aires">
                    [
value] => Buenos Aires
               
)

        )

)
?>
Michael Knapp
12-Mar-2005 12:47
Tim's code is good (thanks Tim), except it won't work very well if the tag is part of a long non-breaking string.

E.g. try getting the title from Google Maps (http://www.google.com/maps).

A better solution is:

<?php
$title
= "";
   
if (
$fp = @fopen( $_POST['url'], 'r' )) {

   
$cont = "";
   
   
// read the contents
   
while( !feof( $fp ) ) {
      
$buf = trim(fgets( $fp, 4096 )) ;
      
$cont .= $buf;
    }

   
// get tag contents
   
@preg_match( "/<title>([a-z 0-9]*)<\/title>/si", $cont, $match );
   
   
// tag contents
   
$title = strip_tags(@$match[ 1 ]);
}

?>

Note the strip_tags. Another thing to be careful of is to check for ", <, and >. You will need to strip those out if you are posting the output to a form.

Also, it is probably best to use the /i modifier, because some people might code <TITLE> etc...
rehfeld
05-Feb-2005 06:45
in response to
jp at webgraphe dot com

this function grabs meta tags, not http headers

if you need the headers

<?php

$fp
= fopen('http://example.org/somepage.html', 'r');

// the variable $http_response_header magically appears
print_r($http_response_header);

// or
$meta_data = stream_get_meta_data($fp);
print_r($meta_data);

?>
tim dot bennett at haveaniceplay dot com
01-Feb-2005 07:43
If you want to get the contents of tags other than meta you can use:

<?php

$page
= "http://www.mysite.com/apage.php";

   
// tags
   
$start = '<atag>';
   
$end = '<\/atag>';

   
// open the file
   
$fp = fopen( $page, 'r' );

   
$cont = "";

   
// read the contents
   
while( !feof( $fp ) ) {
       
$buf = trim( fgets( $fp, 4096 ) );
       
$cont .= $buf;
    }
   
   
// get tag contents
   
preg_match( "/$start(.*)$end/s", $cont, $match );

   
// tag contents
   
$contents = $match[ 1 ];

?>
jp at webgraphe dot com
12-Dec-2003 05:37
If the URL is doing a redirection using the headers (like you would do with PHP function header("Location: URL");), the page has no content (in general). It appears get_meta_tags() doesn't catch that kind of redirection (like cURL would do) and it lead me to a timeout of my script.

I experienced this in a spider I wrote in order to feed my database of all available pages on my site and one link was linking to a page that simply has the following code:

<?php
  header
("Location: sections.php?section=home");
  exit();
?>

That made my script hang for a moment and apparently, get_meta_tags() wasn't even able to return me an error.

JP.
bill.neumann at hatworld.com
13-Mar-2003 07:50
The get_meta_tags function does not seem to be able to grab values if there are spaces between the attribute, the equal sign, and the opening quote marks.
20-Dec-2001 11:01
Tested PHP 4.0.6

get_meta_tags() seems to look only in the beginning of a file, meaning that e.g. if there is a lot of PHP code before the HTML header it will return nothing ...
Tested using get_meta_tags() on local files with about 9000 characters of PHP code before HTML HEADER.

Workaround: if possible move code after header or if not: include a file.
Ben dot Davis at furman dot edu
30-Jul-2001 03:29
I have found that for large searches, get_meta_tags is very slow.  I created a large search engine for a website that couldnt use a database and I first tried pulling out the meta tags. 
I have found that it is actually much faster to use eregi to pull out the meta tags.  This code below pulls out the description:

if (eregi ("<meta name=\"description\" content=[^>]*", $contents, $descresult))
                                {
                                    $description = explode("<meta name=\"description\" content=", $descresult[0]);
                                    echo "<font face=\"Arial\" size=2>$description[1]</font>";
                                   
                                }
richard at pifmagazine dot com
21-Apr-2000 05:17
Something that is not mentioned above and should be : When using get_meta_tags on a remote PHP page the page will be parsed before the meta tags are returned - so you can capture meta tags generated dynamically (by PHP??) on the remote end.

This DOES NOT work the same way when getting meta tags on local file systems.  Local files are not parsed through the web server before returning to get_meta_tags().  If the META tag is hard-coded into the page, you'll be fine - but if it dynamically generated you will not be able to capture it unless you use the full URL when calling your local files.
richard at pifmagazine dot com
21-Apr-2000 05:00
An Important Note about META tags and this function :  if your META tag contains newline "\n"  characters, get_meta_tags() will return a NULL value for that name property.  Removing the newlines from the source META tag corrects the problem.

http_build_query> <get_headers
Last updated: Fri, 25 Jul 2008
 
 
show source | credits | stats | sitemap | contact | advertising | mirror sites