downloads | documentation | faq | getting help | mailing lists | licenses | wiki | reporting bugs | php.net sites | conferences | my php.net

search for in the

mb_convert_case> <Multibyte String Functions
[edit] Last updated: Fri, 26 Apr 2013

view this page in

mb_check_encoding

(PHP 4 >= 4.4.3, PHP 5 >= 5.1.3)

mb_check_encodingCheck if the string is valid for the specified encoding

Description

bool mb_check_encoding ([ string $var = NULL [, string $encoding = mb_internal_encoding() ]] )

Checks if the specified byte stream is valid for the specified encoding. It is useful to prevent so-called "Invalid Encoding Attack".

Parameters

var

The byte stream to check. If it is omitted, this function checks all the input from the beginning of the request.

encoding

The expected encoding.

Return Values

Returns TRUE on success or FALSE on failure.



mb_convert_case> <Multibyte String Functions
[edit] Last updated: Fri, 26 Apr 2013
 
add a note add a note User Contributed Notes mb_check_encoding - [3 notes]
up
0
javalc6 at gmail dot com
3 years ago
In order to check if a string is encoded correctly in utf-8, I suggest the following function, that implements the RFC3629 better than mb_check_encoding():

<?php
function check_utf8($str) {
   
$len = strlen($str);
    for(
$i = 0; $i < $len; $i++){
       
$c = ord($str[$i]);
        if (
$c > 128) {
            if ((
$c > 247)) return false;
            elseif (
$c > 239) $bytes = 4;
            elseif (
$c > 223) $bytes = 3;
            elseif (
$c > 191) $bytes = 2;
            else return
false;
            if ((
$i + $bytes) > $len) return false;
            while (
$bytes > 1) {
               
$i++;
               
$b = ord($str[$i]);
                if (
$b < 128 || $b > 191) return false;
               
$bytes--;
            }
        }
    }
    return
true;
}
// end of check_utf8
?>
up
0
jbricci at ya-right dot com
4 years ago
This function does not check for bad byte sequence(s), it only checks if the byte stream is valid. If you want to verify a encoded string is valid, (IE: does not contain any bad byte sequences do the following...

<?php

/* check a strings encoded value */

function checkEncoding ( $string, $string_encoding )
{
   
$fs = $string_encoding == 'UTF-8' ? 'UTF-32' : $string_encoding;

   
$ts = $string_encoding == 'UTF-32' ? 'UTF-8' : $string_encoding;

    return
$string === mb_convert_encoding ( mb_convert_encoding ( $string, $fs, $ts ), $ts, $fs );
}

/* test 1 variables */

$string = "\x00\x81";

$encoding = "Shift_JIS";

/* test 1 mb_check_encoding (test for bad byte stream) */

if ( true === mb_check_encoding ( $string, $encoding ) )
{
    echo
'valid (' . $encoding . ') encoded byte stream!<br />';
}
else
{
    echo
'invalid (' . $encoding . ') encoded byte stream!<br />';
}

/* test 1 checkEncoding (test for bad byte sequence(s)) */

if ( true === checkEncoding ( $string, $encoding ) )
{
    echo
'valid (' . $encoding . ') encoded byte sequence!<br />';
}
else
{
    echo
'invalid (' . $encoding . ') encoded byte sequence!<br />';
}

/* test 2 */

/* test 2 variables */

$string = "\x00\xE3";

$encoding = "UTF-8";

/* test 2 mb_check_encoding (test for bad byte stream) */

if ( true === mb_check_encoding ( $string, $encoding ) )
{
    echo
'valid (' . $encoding . ') encoded byte stream!<br />';
}
else
{
    echo
'invalid (' . $encoding . ') encoded byte stream!<br />';
}

/* test 2 checkEncoding (test for bad byte sequence(s)) */

if ( true === checkEncoding ( $string, $encoding ) )
{
    echo
'valid (' . $encoding . ') encoded byte sequence!<br />';
}
else
{
    echo
'invalid (' . $encoding . ') encoded byte sequence!<br />';
}

?>
up
-1
richard at phase dot org
11 months ago
The issue whereby mb_check_encoding($string,'UTF-8') falsely returns true for invalid UTF8 byte sequences was resolved somewhere between
PHP 5.2.0 and 5.2.6

The following equivalence seems to work in PHP 5.2.0 and 5.1.6
$valid_utf8 = (@iconv('UTF-8','UTF-8',$string) === $string);

 (with apologies for the @)

 
show source | credits | stats | sitemap | contact | advertising | mirror sites