PHP 8.3.4 Released!

idn_to_ascii

(PHP 5 >= 5.3.0, PHP 7, PHP 8, PECL intl >= 1.0.2, PECL idn >= 0.1)

idn_to_ascii将域名转换为 IDNA ASCII 格式

说明

过程化风格

idn_to_ascii(
    string $domain,
    int $flags = IDNA_DEFAULT,
    int $variant = INTL_IDNA_VARIANT_UTS46,
    array &$idna_info = null
): string|false

该函数可以将 Unicode 域名转换为 IDNA ASCII 兼容格式。

参数

domain

要转换的域,必须使用 UTF-8 编码。

flags

转换选项 — IDNA_* 开头的常量(除 IDNA_ERROR_* 开头的常量)。

variant

对于 IDNA 2003 是 INTL_IDNA_VARIANT_2003 (自 PHP 7.2.0 起已弃用), 对于 UTS #46 是 INTL_IDNA_VARIANT_UTS46 (仅 ICU 4.6 起可用)。

idna_info

仅当 INTL_IDNA_VARIANT_UTS46 用于 variant 时,才可以使用该参数。 在这种情况下,它将用这些键组成的数组来填充: 'result' 键,转换结果(有可能是一个非法结果); 'isTransitionalDifferent' 键,布尔值,指示使用 UTS #46 的过滤机制是否会改变结果; 'errors' 键,是 IDNA_ERROR_* 常量集里一个常量对应的 int

返回值

IDNA ASCII 兼容格式编码的域名, 或者在失败时返回 false

更新日志

版本 说明
7.4.0 现在 variant 的默认值为 INTL_IDNA_VARIANT_UTS46 , 而不是已弃用的 INTL_IDNA_VARIANT_2003
7.2.0 INTL_IDNA_VARIANT_2003 已被弃用; 可以使用 INTL_IDNA_VARIANT_UTS46 代替。

示例

示例 #1 idn_to_ascii() 示例

<?php

echo idn_to_ascii('täst.de');

?>

以上示例会输出:

xn--tst-qla.de

参见

add a note

User Contributed Notes 4 notes

up
8
mschrieck at gmail dot com
6 years ago
To convert IDN Domains with the IDNA2008 definition use following command.

idn_to_ascii('teßt.com',IDNA_NONTRANSITIONAL_TO_ASCII,INTL_IDNA_VARIANT_UTS46)

The result is then as expected

xn--tet-6ka.com
up
11
edible dot email at gmail dot com
11 years ago
The notes on this function are not very clear and a little misleading.

Firstly, <=5.3, you will need to make use of one of several scripts or classes available on the internet which might, or might not, require the installation of of the intl and idn PECL extensions ...and you will need to have !<4.0 in order to be able to install both.

Secondly, if you have >=5.4 you will not require the PECL extensions.

Third, use of utf8_encode() is not necessary. In fact, it will potentially prevent idn_to_ascii() from working at all.

On my setup it was necessary to change the charset in the script meta tags to UTF-8:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

...and to change charset_default in the php.ini file (/usr/local/lib/php.ini, whereis php.ini, find / -name php.ini):

default_charset = "UTF-8"

The above changes mean that idn_to_ascii() can now be used with that syntax (no need for utf8_encode()). Previously, the function worked to convert some IDNs, but failed to convert Japanese and Cyrillic IDNs. Further, no additional locales were enabled or added, and Apache's charset file was left unmodified.

It is also important to remember only to apply the function where required, eg:

idn_to_ascii(cåsino.com) // is wrong

...whereas...

iden_to_ascii(cåsino) // is right

...and also be aware of text editors that don't support UTF-8 encoding, or the $domain = 'cåsino' value will end up as $domain = '??????' ...and the function will fail.

I have found that Notepad++ easily and reliably handles UTF-8 encoding that works for this function using UTF-8 as the encoding option, not UTF-8 without BOM.
up
1
alexchexes at gmail dot com
6 months ago
idn_to_ascii and idn_to_utf8 functions don't properly handle full URLs (i.e. with schema and paths), so here's the helper functions which handles all URLs, including ones with path but without a scheme

<?php
/**
* Converts URLS to punycode
* It doesn't url-encodes other parts
* The initial code from snipp dor ru website, here is modified version that handles urls without scheme
*/
function punycode_encode($url)
{
$no_scheme = false;
if (!
preg_match('/^.+?:\/\//', $url) && substr($url, 0, 2) !== '//') {
$url = '//' . $url;
$no_scheme = true;
}

$parts = parse_url($url);

$out = '';
if (!empty(
$parts['scheme'])) $out .= $parts['scheme'] . ':';
if (!empty(
$parts['host'])) $out .= '//';
if (!empty(
$parts['user'])) $out .= $parts['user'];
if (!empty(
$parts['pass'])) $out .= ':' . $parts['pass'];
if (!empty(
$parts['user'])) $out .= '@';
if (!empty(
$parts['host'])) $out .= idn_to_ascii($parts['host']);
if (!empty(
$parts['port'])) $out .= ':' . $parts['port'];
if (!empty(
$parts['path'])) $out .= $parts['path'];
if (!empty(
$parts['query'])) $out .= '?' . $parts['query'];
if (!empty(
$parts['fragment'])) $out .= '#' . $parts['fragment'];

if (
$no_scheme) {
$out = substr($out, 2);
}

return
$out;
}

function
punycode_decode($url)
{
$no_scheme = false;
if (!
preg_match('/^.+?:\/\//', $url) && substr($url, 0, 2) !== '//') {
$url = '//' . $url;
$no_scheme = true;
}

$parts = parse_url($url);
$out = '';
if (!empty(
$parts['scheme'])) $out .= $parts['scheme'] . ':';
if (!empty(
$parts['host'])) $out .= '//';
if (!empty(
$parts['user'])) $out .= $parts['user'];
if (!empty(
$parts['pass'])) $out .= ':' . $parts['pass'];
if (!empty(
$parts['user'])) $out .= '@';
if (!empty(
$parts['host'])) $out .= idn_to_utf8($parts['host']);
if (!empty(
$parts['port'])) $out .= ':' . $parts['port'];
if (!empty(
$parts['path'])) $out .= $parts['path'];
if (!empty(
$parts['query'])) $out .= '?' . $parts['query'];
if (!empty(
$parts['fragment'])) $out .= '#' . $parts['fragment'];

if (
$no_scheme) {
$out = substr($out, 2);
}

return
$out;
}
up
0
mpf at mk dot de
4 months ago
The documentation ist not clear what failure in the return section means. This should be substituted to something like this:

"Returns failure if the given string could not be converted".
To Top