downloads | documentation | faq | getting help | mailing lists | licenses | wiki | reporting bugs | php.net sites | links | conferences | my php.net

search for in the

mb_convert_case> <Multibyte String Functions
[edit] Last updated: Fri, 25 May 2012

view this page in

mb_check_encoding

(PHP 4 >= 4.4.3, PHP 5 >= 5.1.3)

mb_check_encodingCheck if the string is valid for the specified encoding

Description

bool mb_check_encoding ([ string $var = NULL [, string $encoding = mb_internal_encoding() ]] )

Checks if the specified byte stream is valid for the specified encoding. It is useful to prevent so-called "Invalid Encoding Attack".

Parameters

var

The byte stream to check. If it is omitted, this function checks all the input from the beginning of the request.

encoding

The expected encoding.

Return Values

Returns TRUE on success or FALSE on failure.



mb_convert_case> <Multibyte String Functions
[edit] Last updated: Fri, 25 May 2012
 
add a note add a note User Contributed Notes mb_check_encoding
richard at phase dot org 08-May-2012 03:40
The issue whereby mb_check_encoding($string,'UTF-8') falsely returns true for invalid UTF8 byte sequences was resolved somewhere between
PHP 5.2.0 and 5.2.6

The following equivalence seems to work in PHP 5.2.0 and 5.1.6
$valid_utf8 = (@iconv('UTF-8','UTF-8',$string) === $string);

 (with apologies for the @)
javalc6 at gmail dot com 24-Dec-2009 04:52
In order to check if a string is encoded correctly in utf-8, I suggest the following function, that implements the RFC3629 better than mb_check_encoding():

<?php
function check_utf8($str) {
   
$len = strlen($str);
    for(
$i = 0; $i < $len; $i++){
       
$c = ord($str[$i]);
        if (
$c > 128) {
            if ((
$c > 247)) return false;
            elseif (
$c > 239) $bytes = 4;
            elseif (
$c > 223) $bytes = 3;
            elseif (
$c > 191) $bytes = 2;
            else return
false;
            if ((
$i + $bytes) > $len) return false;
            while (
$bytes > 1) {
               
$i++;
               
$b = ord($str[$i]);
                if (
$b < 128 || $b > 191) return false;
               
$bytes--;
            }
        }
    }
    return
true;
}
// end of check_utf8
?>
jbricci at ya-right dot com 01-Mar-2009 06:52
This function does not check for bad byte sequence(s), it only checks if the byte stream is valid. If you want to verify a encoded string is valid, (IE: does not contain any bad byte sequences do the following...

<?php

/* check a strings encoded value */

function checkEncoding ( $string, $string_encoding )
{
   
$fs = $string_encoding == 'UTF-8' ? 'UTF-32' : $string_encoding;

   
$ts = $string_encoding == 'UTF-32' ? 'UTF-8' : $string_encoding;

    return
$string === mb_convert_encoding ( mb_convert_encoding ( $string, $fs, $ts ), $ts, $fs );
}

/* test 1 variables */

$string = "\x00\x81";

$encoding = "Shift_JIS";

/* test 1 mb_check_encoding (test for bad byte stream) */

if ( true === mb_check_encoding ( $string, $encoding ) )
{
    echo
'valid (' . $encoding . ') encoded byte stream!<br />';
}
else
{
    echo
'invalid (' . $encoding . ') encoded byte stream!<br />';
}

/* test 1 checkEncoding (test for bad byte sequence(s)) */

if ( true === checkEncoding ( $string, $encoding ) )
{
    echo
'valid (' . $encoding . ') encoded byte sequence!<br />';
}
else
{
    echo
'invalid (' . $encoding . ') encoded byte sequence!<br />';
}

/* test 2 */

/* test 2 variables */

$string = "\x00\xE3";

$encoding = "UTF-8";

/* test 2 mb_check_encoding (test for bad byte stream) */

if ( true === mb_check_encoding ( $string, $encoding ) )
{
    echo
'valid (' . $encoding . ') encoded byte stream!<br />';
}
else
{
    echo
'invalid (' . $encoding . ') encoded byte stream!<br />';
}

/* test 2 checkEncoding (test for bad byte sequence(s)) */

if ( true === checkEncoding ( $string, $encoding ) )
{
    echo
'valid (' . $encoding . ') encoded byte sequence!<br />';
}
else
{
    echo
'invalid (' . $encoding . ') encoded byte sequence!<br />';
}

?>

 
show source | credits | stats | sitemap | contact | advertising | mirror sites