utf8_encode

(PHP 4, PHP 5, PHP 7, PHP 8)

utf8_encode — ISO-8859-1 文字列を UTF-8 に変換する

警告

この関数は PHP 8.2.0 で 非推奨になります。この関数に頼らないことを強く推奨します。

説明

#[\Deprecated]
function utf8_encode(string $string): string

この関数は、文字列 string を ISO-8859-1 エンコードから UTF-8 へ変換します。

注意:
この関数は、指定された文字列の現在の文字エンコーディングを推測しません。代わりに、 ISO-8859-1 ("Latin 1" とも呼ばれています) としてエンコードされていると解釈し、UTF-8 に変換します。全てのバイト列は有効な ISO-8859-1 の文字列であるため、この関数は決してエラーになりません。しかし、異なるエンコーディングを意図していた場合、有用な結果にはならないでしょう。

ISO-8859-1 文字エンコーディングを使っているとマークされている多くの Web ページが、実際にはそれと似た Windows-1252 を使っており、 Web ブラウザは ISO-8859-1 Web ページを Windows-1252 として解釈しています。Windows-1252 は ISO-8859-1 のある制御文字の代わりに、ユーロ記号 (€) や curly quote (“ ”) を印字可能な文字として追加しています。この関数はそうした Windows-1252 文字を正しく変換しません。 Windows-1252 の変換が必要な場合は、別の関数を使ってください。

パラメータ

string: ISO-8859-1 形式の文字列。

戻り値

string を UTF-8 に変換した結果を返します。

変更履歴

バージョン	説明
8.2.0	この関数は、推奨されなくなりました。
7.2.0	この関数は、XML拡張モジュールから PHP のコアに移動しました。これより前のバージョンでは、この関数は XML拡張モジュールをインストールしていた場合にのみ利用可能でした。

例

例1 基本的な例

<?php
// Convert the string 'Zoë' from ISO 8859-1 to UTF-8
$iso8859_1_string = "\x5A\x6F\xEB";
$utf8_string = utf8_encode($iso8859_1_string);
echo bin2hex($utf8_string), "\n";
?>

上の例の出力は以下となります。

5a6fc3ab

注意

注意: この関数は推奨されません。代替については下記のとおりです。

この関数は、PHP 8.2.0 以降は 推奨されなくなり、将来のバージョンで削除される予定です。この関数を使っているコードをチェックし、適切な代替に置き換えるべきです。

この関数と似た機能は、 mb_convert_encoding() で実現できます。この関数は、ISO-8859-1 と、多くの他の文字エンコーディングをサポートしています。
<?php
$iso8859_1_string = "\xEB"; // 'ë' (e with diaeresis) in ISO-8859-1
$utf8_string = mb_convert_encoding($iso8859_1_string, 'UTF-8', 'ISO-8859-1');
echo bin2hex($utf8_string), "\n";

$iso8859_7_string = "\xEB"; // the same string in ISO-8859-7 represents 'λ' (Greek lower-case lambda)
$utf8_string = mb_convert_encoding($iso8859_7_string, 'UTF-8', 'ISO-8859-7');
echo bin2hex($utf8_string), "\n";

$windows_1252_string = "\x80"; // '€' (Euro sign) in Windows-1252, but not in ISO-8859-1
$utf8_string = mb_convert_encoding($windows_1252_string, 'UTF-8', 'Windows-1252');
echo bin2hex($utf8_string), "\n";
?>
上の例の出力は以下となります。
c3ab
cebb
e282ac
他の代替として、インストールされている拡張機能に依存した関数ですが、 UConverter::transcode() と iconv() が挙げられます。

次のコードは、いずれも同じ結果を返します:
<?php
$iso8859_1_string = "\x5A\x6F\xEB"; // 'Zoë' in ISO-8859-1

$utf8_string = utf8_encode($iso8859_1_string);
echo bin2hex($utf8_string), "\n";

$utf8_string = mb_convert_encoding($iso8859_1_string, 'UTF-8', 'ISO-8859-1');
echo bin2hex($utf8_string), "\n";

$utf8_string = UConverter::transcode($iso8859_1_string, 'UTF8', 'ISO-8859-1');
echo bin2hex($utf8_string), "\n";

$utf8_string = iconv('ISO-8859-1', 'UTF-8', $iso8859_1_string);
echo bin2hex($utf8_string), "\n";
?>
上の例の出力は以下となります。
5a6fc3ab
5a6fc3ab
5a6fc3ab
5a6fc3ab

参考

utf8_decode() - UTF-8 エンコードされた文字列を、ISO-8859-1 に変換し、表現できない文字を置換する
mb_convert_encoding() - ある文字エンコーディングの文字列を、別の文字エンコーディングに変換する
UConverter::transcode() - ある文字エンコーディングから別の文字エンコーディングに文字列を変換する
iconv() - ある文字エンコーディングの文字列を、別の文字エンコーディングに変換する

Found A Problem?

Learn How To Improve This Page • Submit a Pull Request • Report a Bug

＋add a note

User Contributed Notes 3 notes

down

139

deceze at gmail dot com ¶

15 years ago

Please note that utf8_encode only converts a string encoded in ISO-8859-1 to UTF-8. A more appropriate name for it would be "iso88591_to_utf8". If your text is not encoded in  ISO-8859-1, you do not need this function. If your text is already in UTF-8, you do not need this function. In fact, applying this function to text that is not encoded in ISO-8859-1 will most likely simply garble that text.

If you need to convert text from any encoding to any other encoding, look at iconv() instead.

down

Aidan Kehoe <php-manual at parhasard dot net> ¶

21 years ago

Here's some code that addresses the issue that Steven describes in the previous comment; 

<?php

/* This structure encodes the difference between ISO-8859-1 and Windows-1252,
   as a map from the UTF-8 encoding of some ISO-8859-1 control characters to
   the UTF-8 encoding of the non-control characters that Windows-1252 places
   at the equivalent code points. */

$cp1252_map = array(
    "\xc2\x80" => "\xe2\x82\xac", /* EURO SIGN */
    "\xc2\x82" => "\xe2\x80\x9a", /* SINGLE LOW-9 QUOTATION MARK */
    "\xc2\x83" => "\xc6\x92",     /* LATIN SMALL LETTER F WITH HOOK */
    "\xc2\x84" => "\xe2\x80\x9e", /* DOUBLE LOW-9 QUOTATION MARK */
    "\xc2\x85" => "\xe2\x80\xa6", /* HORIZONTAL ELLIPSIS */
    "\xc2\x86" => "\xe2\x80\xa0", /* DAGGER */
    "\xc2\x87" => "\xe2\x80\xa1", /* DOUBLE DAGGER */
    "\xc2\x88" => "\xcb\x86",     /* MODIFIER LETTER CIRCUMFLEX ACCENT */
    "\xc2\x89" => "\xe2\x80\xb0", /* PER MILLE SIGN */
    "\xc2\x8a" => "\xc5\xa0",     /* LATIN CAPITAL LETTER S WITH CARON */
    "\xc2\x8b" => "\xe2\x80\xb9", /* SINGLE LEFT-POINTING ANGLE QUOTATION */
    "\xc2\x8c" => "\xc5\x92",     /* LATIN CAPITAL LIGATURE OE */
    "\xc2\x8e" => "\xc5\xbd",     /* LATIN CAPITAL LETTER Z WITH CARON */
    "\xc2\x91" => "\xe2\x80\x98", /* LEFT SINGLE QUOTATION MARK */
    "\xc2\x92" => "\xe2\x80\x99", /* RIGHT SINGLE QUOTATION MARK */
    "\xc2\x93" => "\xe2\x80\x9c", /* LEFT DOUBLE QUOTATION MARK */
    "\xc2\x94" => "\xe2\x80\x9d", /* RIGHT DOUBLE QUOTATION MARK */
    "\xc2\x95" => "\xe2\x80\xa2", /* BULLET */
    "\xc2\x96" => "\xe2\x80\x93", /* EN DASH */
    "\xc2\x97" => "\xe2\x80\x94", /* EM DASH */

    "\xc2\x98" => "\xcb\x9c",     /* SMALL TILDE */
    "\xc2\x99" => "\xe2\x84\xa2", /* TRADE MARK SIGN */
    "\xc2\x9a" => "\xc5\xa1",     /* LATIN SMALL LETTER S WITH CARON */
    "\xc2\x9b" => "\xe2\x80\xba", /* SINGLE RIGHT-POINTING ANGLE QUOTATION*/
    "\xc2\x9c" => "\xc5\x93",     /* LATIN SMALL LIGATURE OE */
    "\xc2\x9e" => "\xc5\xbe",     /* LATIN SMALL LETTER Z WITH CARON */
    "\xc2\x9f" => "\xc5\xb8"      /* LATIN CAPITAL LETTER Y WITH DIAERESIS*/
);

function cp1252_to_utf8($str) {
        global $cp1252_map; 
        return  strtr(utf8_encode($str), $cp1252_map);
}

?>

down

Mark AT modernbill DOT com ¶

21 years ago

If you haven't guessed already: If the UTF-8 character has no representation in the ISO-8859-1 codepage, a ? will be returned. You might want to wrap a function around this to make sure you aren't saving a bunch of ???? into your database.

＋add a note