utf8_encode

(PHP 4, PHP 5, PHP 7, PHP 8)

utf8_encode — Konvertiert eine Zeichenkette von ISO-8859-1 nach UTF-8

Warnung

Diese Funktion ist seit PHP 8.2.0 als DEPRECATED (veraltet) markiert. Von der Verwendung dieser Funktion wird dringend abgeraten.

Beschreibung

#[\Deprecated]
function utf8_encode(string $string): string

Diese Funktion konvertiert die Zeichenkette string von der ISO-8859-1-Kodierung nach UTF-8.

Hinweis:
Diese Funktion versucht nicht, die aktuelle Kodierung der angegebenen Zeichenkette zu erraten, sondern nimmt an, dass sie als ISO-8859-1 (auch bekannt als "Latin 1") kodiert ist, und konvertiert nach UTF-8. Da jede Folge von Bytes eine gültige ISO-8859-1-Zeichenkette ist, führt dies zwar nie zu einem Fehler, aber auch nicht zu einer brauchbaren Zeichenkette, wenn eine andere Kodierung vorgesehen war.

Viele Webseiten, die ausgewiesen sind, die ISO-8859-1-Zeichenkodierung zu verwenden, nutzen in Wirklichkeit die ähnliche Windows-1252-Kodierung, und Webbrowser interpretieren ISO-8859-1-Webseiten als Windows-1252. Windows-1252 ermöglicht zusätzliche druckbare Zeichen, so wie das Euro-Zeichen (€) und geschweifte Anführungszeichen (“”), anstelle von bestimmten ISO-8859-1-Kontrollzeichen. Diese Funktion konvertiert solche Windows-1252-Zeichen nicht korrekt. Verwenden Sie eine andere Funktion, wenn Windows-1252-Konvertierung erforderlich ist.

Parameter-Liste

string: Eine ISO-8859-1-kodierte Zeichenkette.

Rückgabewerte

Gibt die UTF-8-Übersetzung von string zurück.

Changelog

Version	Beschreibung
8.2.0	Diese Funktion ist veraltet.
7.2.0	Diese Funktion wurde von der Erweiterung XML in den Kern von PHP verschoben. In früheren Versionen war diese Funktion nur verfügbar, wenn die Erweiterung XML installiert war.

Beispiele

Beispiel #1 Grundlegendes Beispiel

<?php
// Konvertiert die Zeichenkette "Zoë" von ISO 8859-1 nach UTF-8
$iso8859_1_string = "\x5A\x6F\xEB";
$utf8_string = utf8_encode($iso8859_1_string);
echo bin2hex($utf8_string), "\n";
?>

Das oben gezeigte Beispiel erzeugt folgende Ausgabe:

5a6fc3ab

Anmerkungen

Hinweis: Wegfall und Alternativen

Diese Funktion ist ab PHP 8.2.0 veraltet und wird in einer zukünftigen Version entfernt. Bereits vorhandene Verwendungen sollten überprüft und durch geeignete Alternativen ersetzt werden.

Die Funktion mb_convert_encoding() bietet eine ähnliche Funktionalität und unterstützt ISO-8859-1 und viele andere Zeichenkodierungen.
<?php
$iso8859_1_string = "\xEB"; // 'ë' (e mit Trema) in ISO-8859-1
$utf8_string = mb_convert_encoding($iso8859_1_string, 'UTF-8', 'ISO-8859-1');
echo bin2hex($utf8_string), "\n";

$iso8859_7_string = "\xEB"; // in ISO-8859-7 steht dieselbe Zeichenkette für 'λ' (griechisches Lambda kleingeschrieben)
$utf8_string = mb_convert_encoding($iso8859_7_string, 'UTF-8', 'ISO-8859-7');
echo bin2hex($utf8_string), "\n";

$windows_1252_string = "\x80"; // '€' (Euro-Zeichen) in Windows-1252, aber nicht in ISO-8859-1
$utf8_string = mb_convert_encoding($windows_1252_string, 'UTF-8', 'Windows-1252');
echo bin2hex($utf8_string), "\n";
?>
Das oben gezeigte Beispiel erzeugt folgende Ausgabe:
c3ab
cebb
e282ac
Andere Optionen, die in Abhängigkeit von den installierten Erweiterungen verfügbar sein können, sind UConverter::transcode() und iconv().

Die folgenden Beispiele führen alle zum selben Ergebnis:
<?php
$iso8859_1_string = "\x5A\x6F\xEB"; // 'Zoë' in ISO-8859-1

$utf8_string = utf8_encode($iso8859_1_string);
echo bin2hex($utf8_string), "\n";

$utf8_string = mb_convert_encoding($iso8859_1_string, 'UTF-8', 'ISO-8859-1');
echo bin2hex($utf8_string), "\n";

$utf8_string = UConverter::transcode($iso8859_1_string, 'UTF8', 'ISO-8859-1');
echo bin2hex($utf8_string), "\n";

$utf8_string = iconv('ISO-8859-1', 'UTF-8', $iso8859_1_string);
echo bin2hex($utf8_string), "\n";
?>
Das oben gezeigte Beispiel erzeugt folgende Ausgabe:
5a6fc3ab
5a6fc3ab
5a6fc3ab
5a6fc3ab

Siehe auch

utf8_decode() - Konvertiert eine Zeichenkette von UTF-8 nach ISO-8859-1 und ersetzt ungültige und nicht darstellbare Zeichen
mb_convert_encoding() - Convert a string from one character encoding to another
UConverter::transcode() - Convert a string from one character encoding to another
iconv() - Konvertiert eine Zeichenkette von einem Zeichensatz in einen anderen

Found A Problem?

Learn How To Improve This Page • Submit a Pull Request • Report a Bug

＋add a note

User Contributed Notes 3 notes

down

139

deceze at gmail dot com ¶

15 years ago

Please note that utf8_encode only converts a string encoded in ISO-8859-1 to UTF-8. A more appropriate name for it would be "iso88591_to_utf8". If your text is not encoded in  ISO-8859-1, you do not need this function. If your text is already in UTF-8, you do not need this function. In fact, applying this function to text that is not encoded in ISO-8859-1 will most likely simply garble that text.

If you need to convert text from any encoding to any other encoding, look at iconv() instead.

down

Aidan Kehoe <php-manual at parhasard dot net> ¶

21 years ago

Here's some code that addresses the issue that Steven describes in the previous comment; 

<?php

/* This structure encodes the difference between ISO-8859-1 and Windows-1252,
   as a map from the UTF-8 encoding of some ISO-8859-1 control characters to
   the UTF-8 encoding of the non-control characters that Windows-1252 places
   at the equivalent code points. */

$cp1252_map = array(
    "\xc2\x80" => "\xe2\x82\xac", /* EURO SIGN */
    "\xc2\x82" => "\xe2\x80\x9a", /* SINGLE LOW-9 QUOTATION MARK */
    "\xc2\x83" => "\xc6\x92",     /* LATIN SMALL LETTER F WITH HOOK */
    "\xc2\x84" => "\xe2\x80\x9e", /* DOUBLE LOW-9 QUOTATION MARK */
    "\xc2\x85" => "\xe2\x80\xa6", /* HORIZONTAL ELLIPSIS */
    "\xc2\x86" => "\xe2\x80\xa0", /* DAGGER */
    "\xc2\x87" => "\xe2\x80\xa1", /* DOUBLE DAGGER */
    "\xc2\x88" => "\xcb\x86",     /* MODIFIER LETTER CIRCUMFLEX ACCENT */
    "\xc2\x89" => "\xe2\x80\xb0", /* PER MILLE SIGN */
    "\xc2\x8a" => "\xc5\xa0",     /* LATIN CAPITAL LETTER S WITH CARON */
    "\xc2\x8b" => "\xe2\x80\xb9", /* SINGLE LEFT-POINTING ANGLE QUOTATION */
    "\xc2\x8c" => "\xc5\x92",     /* LATIN CAPITAL LIGATURE OE */
    "\xc2\x8e" => "\xc5\xbd",     /* LATIN CAPITAL LETTER Z WITH CARON */
    "\xc2\x91" => "\xe2\x80\x98", /* LEFT SINGLE QUOTATION MARK */
    "\xc2\x92" => "\xe2\x80\x99", /* RIGHT SINGLE QUOTATION MARK */
    "\xc2\x93" => "\xe2\x80\x9c", /* LEFT DOUBLE QUOTATION MARK */
    "\xc2\x94" => "\xe2\x80\x9d", /* RIGHT DOUBLE QUOTATION MARK */
    "\xc2\x95" => "\xe2\x80\xa2", /* BULLET */
    "\xc2\x96" => "\xe2\x80\x93", /* EN DASH */
    "\xc2\x97" => "\xe2\x80\x94", /* EM DASH */

    "\xc2\x98" => "\xcb\x9c",     /* SMALL TILDE */
    "\xc2\x99" => "\xe2\x84\xa2", /* TRADE MARK SIGN */
    "\xc2\x9a" => "\xc5\xa1",     /* LATIN SMALL LETTER S WITH CARON */
    "\xc2\x9b" => "\xe2\x80\xba", /* SINGLE RIGHT-POINTING ANGLE QUOTATION*/
    "\xc2\x9c" => "\xc5\x93",     /* LATIN SMALL LIGATURE OE */
    "\xc2\x9e" => "\xc5\xbe",     /* LATIN SMALL LETTER Z WITH CARON */
    "\xc2\x9f" => "\xc5\xb8"      /* LATIN CAPITAL LETTER Y WITH DIAERESIS*/
);

function cp1252_to_utf8($str) {
        global $cp1252_map; 
        return  strtr(utf8_encode($str), $cp1252_map);
}

?>

down

Mark AT modernbill DOT com ¶

21 years ago

If you haven't guessed already: If the UTF-8 character has no representation in the ISO-8859-1 codepage, a ? will be returned. You might want to wrap a function around this to make sure you aren't saving a bunch of ???? into your database.

＋add a note