PHP Conference Japan 2024

UConverter::transcode

(PHP 5 >= 5.5.0, PHP 7, PHP 8, PECL >= 3.0.0a1)

UConverter::transcodeConvert a string from one character encoding to another

Beschreibung

public static UConverter::transcode(
    string $str,
    string $toEncoding,
    string $fromEncoding,
    ?array $options = null
): string|false

Converts str from fromEncoding to toEncoding.

Parameter-Liste

str

The String to be converted.

toEncoding

The desired encoding of the result.

fromEncoding

The current encoding used to interpret str.

options

An optional Array, which may contain the following keys:

  • 'to_subst' - the substitution character to use in place of any character of str which cannot be encoded in toEncoding. If specified, it must represent a single character in the target encoding.

Rückgabewerte

Returns the converted stringBei einem Fehler wird false zurückgegeben..

Beispiele

Beispiel #1 Converting from UTF-8 to UTF-16 and back

<?php
$utf8_string
= "\x5A\x6F\xC3\xAB"; // 'Zoë' in UTF-8
$utf16_string = UConverter::transcode($utf8_string, 'UTF-16BE', 'UTF-8');
echo
bin2hex($utf16_string), "\n";

$new_utf8_string = UConverter::transcode($utf16_string, 'UTF-8', 'UTF-16BE');
echo
bin2hex($new_utf8_string), "\n";
?>

Das oben gezeigte Beispiel erzeugt folgende Ausgabe:

005a006f00eb
5a6fc3ab

Beispiel #2 Invalid characters in input

If the input string contains a sequence of bytes which is not valid in the encoding specified by fromEncoding, they are replaced by Unicode code point U+FFFD (Replacement Character) before converting to toEncoding.

<?php
$invalid_utf8_string
= "\xC3"; // incomplete multi-byte UTF-8 sequence
$utf16_string = UConverter::transcode($invalid_utf8_string, 'UTF-16BE', 'UTF-8');
echo
bin2hex($utf16_string), "\n";
?>

Das oben gezeigte Beispiel erzeugt folgende Ausgabe:

fffd

Beispiel #3 Characters which cannot be encoded

If the input string contains characters which cannot be represented in toEncoding, they are replaced with a single character. The default character to use depends on the encoding, and can be controlled using the 'to_subst' option.

<?php
$utf8_string
= "\xE2\x82\xAC"; // € (Euro Sign) does not exist in ISO 8859-1

// Default replacement in ISO 8859-1 is "\x1A" (Substitute)
$iso8859_1_string = UConverter::transcode($utf8_string, 'ISO-8859-1', 'UTF-8');
echo
bin2hex($iso8859_1_string), "\n";

// Specify a replacement of '?' ("\x3F") instead
$iso8859_1_string = UConverter::transcode(
$utf8_string, 'ISO-8859-1', 'UTF-8', ['to_subst' => '?']
);
echo
bin2hex($iso8859_1_string), "\n";

// Since ISO 8859-1 cannot map U+FFFD, invalid input is also replaced by to_subst
$invalid_utf8_string = "\xC3"; // incomplete multi-byte UTF-8 sequence
$iso8859_1_string = UConverter::transcode(
$invalid_utf8_string, 'ISO-8859-1', 'UTF-8', ['to_subst' => '?']
);
echo
bin2hex($iso8859_1_string), "\n";
?>

Das oben gezeigte Beispiel erzeugt folgende Ausgabe:

1a
3f
3f

Siehe auch

  • mb_convert_encoding() - Convert a string from one character encoding to another
  • iconv() - Konvertiert eine Zeichenkette von einem Zeichensatz in einen anderen

add a note

User Contributed Notes

There are no user contributed notes for this page.
To Top