PCRE has very good utf-8 support. Simply add the /u modifier to your pattern.
preg_match('/non-utf-8 matching pattern/', $string);
preg_match('/utf-8 matching pattern/u', $string);
Regular Expression (POSIX Extended)
- Introduction
- Installing/Configuring
- Predefined Constants
- Examples
- POSIX Regex Functions
- ereg_replace — Replace regular expression
- ereg — Regular expression match
- eregi_replace — Replace regular expression case insensitive
- eregi — Case insensitive regular expression match
- split — Split string into array by regular expression
- spliti — Split string into array by regular expression case insensitive
- sql_regcase — Make regular expression for case insensitive match
Daniel Klein ¶
1 year ago
Ray dot Paseur at Gmail dot com ¶
1 year ago
The POSIX functions are deprecated. Instead of the "ereg" collection you want to use something from the PCRE world.
http://www.php.net/manual/en/book.pcre.php
arekm ¶
1 year ago
If you switch to PCRE world functions note that pcre doesn't support UTF-8 well.
There are limitations - read "POSIX CHARACTER CLASSES" or "UNICODE CHARACTER PROPERTY SUPPORT" chapter at http://www.pcre.org/pcre.txt.
