mb_strcut

(PHP 4 >= 4.0.6, PHP 5, PHP 7, PHP 8)

mb_strcut — 文字列の一部を得る

説明

function mb_strcut(
    string $string,
    int $start,
    ?int $length = null,
    ?string $encoding = null
): string

mb_strcut() は、ある文字列からの部分文字列の抽出を mb_substr() と同じ方法で行います。ただし、処理は文字単位ではなくバイト単位で行います。切り出し位置がたまたまマルチバイト文字の 2 バイト目以降だった場合、切り出しはその文字の最初のバイトから行われます。この挙動もまた substr() 関数と異なるところです。 substr() の場合は、マルチバイト文字の 2 バイト目以降であってもその位置から切り出しを行い、結果的に壊れたバイト列を返すことになります。

パラメータ

string

取り出しの対象となる文字列。

start

start が非負である場合に返される文字列は、 string の start バイト目以降の文字列となります (ゼロから数えます)。たとえば、文字列 'abcdef' の 0 バイト目は 'a' で、 2 バイト目は 'c' のようになります。

start が負の場合に返される文字列は、 string の後ろから数えて start バイト目以降となります。負の start の絶対値が文字列の長さよりも大きい場合、返される文字列は、string の先頭から始まります。

length

バイト単位での長さ。省略したり NULL を指定したりした場合は、文字列の最後までの全バイトを取り出します。

length が負の場合、返される文字列は string の後ろから数えて length バイト目で終了します。しかし、負の length の絶対値が start の位置を超える場合、空の文字列が返されます。

encoding

encoding パラメータには文字エンコーディングを指定します。省略した場合、もしくは null の場合は、内部文字エンコーディングを使用します。

戻り値

mb_strcut() は、 start および length パラメータで指定した string の一部を返します。

変更履歴

バージョン	説明
8.4.0	不正な `UTF-8` および `UTF-16` 文字列に対する挙動が、より一貫したものになりました。
8.0.0	`encoding` は、nullable になりました。

参考

mb_substr() - 文字列の一部を得る
mb_internal_encoding() - 内部文字エンコーディングを設定あるいは取得する

Found A Problem?

Learn How To Improve This Page • Submit a Pull Request • Report a Bug

＋add a note

User Contributed Notes 4 notes

down

olivthill at gmail dot com ¶

8 years ago

Here is an example with UTF8 characters, to see how the start and length arguments are working:

  $str_utf8 = utf8_encode("Déjà_vu");
  $str_utf8_0 = mb_strcut($str_utf8, 0, 4, "UTF-8"); // Déj
  $str_utf8_1 = mb_strcut($str_utf8, 1, 4, "UTF-8"); // éj
  $str_utf8_2 = mb_strcut($str_utf8, 2, 4, "UTF-8"); // éj
  $str_utf8_3 = mb_strcut($str_utf8, 3, 4, "UTF-8"); // jà_
  $str_utf8_4 = mb_strcut($str_utf8, 4, 4, "UTF-8"); // à_v

The string includes two special charaters, "é" and "à" internally coded with two bytes.
Note that a multibyte character is removed rather than kept in half at the end of the output.
Note also that the result is the same for a cut 1,4 and a cut 2,4 with this string.

down

t dot starling at physics dot unimelb dot edu dot au ¶

21 years ago

What the manual and the first commenter are trying to say is that mb_strcut uses byte offsets, as opposed to mb_substr which uses character offsets. 

Both mb_strcut and mb_substr appear to treat negative and out-of-range offsets and lengths in the basically the same way as substr. An exception is that if start is too large, an empty string will be returned rather than FALSE. Testing indicates that mb_strcut first works out start and end byte offsets, then moves each offset left to the nearest character boundary.

down

David Juhasz ¶

4 years ago

This was driving me crazy, because mb_strcut() kept returning an empty string.  The $length parameter seems to have a max value of 2^32-1 (2147483647).

Works:
<?php
  # output: Полуустав
  echo mb_strcut('Полуустав', 0, pow(2,31)-1);
?>

Doesn't work:
<?php
  # nothing is output
  echo mb_strcut('Полуустав', 0, pow(2,31));
?>

My PHP_INT_MAX value is much larger than 2^32-1, so I'm not sure why larger values for $length don't work. :(

<?php
  # output: 9223372036854775807
  echo PHP_INT_MAX;
?>

down

-2

oyag02 at yahoo dot co dot jp ¶

22 years ago

diffrence between mb_substr and mb_substr

example:
mb_strcut('I_ROHA', 1, 2) returns 'I_'. Treated as byte stream.
mb_substr('I_ROHA', 1, 2) returns 'ROHA' Treated as character stream.

# 'I_' 'RO' 'HA' means multi-byte character

＋add a note