PHP Conference Japan 2024

Parle pattern matching

Parle supports regex matching similar to flex. Also supported are the following POSIX character sets: [:alnum:], [:alpha:], [:blank:], [:cntrl:], [:digit:], [:graph:], [:lower:], [:print:], [:punct:], [:space:], [:upper:], [:xdigit:] .

The Unicode character classes are currently not enabled by default, pass --enable-parle-utf32 to make them available. A particular encoding can be mapped with a correctly constructed regex. For example, to match the EURO symbol encoded in UTF-8, the regular expression [\xe2][\x82][\xac] can be used. The pattern for an UTF-8 encoded string could be [ -\x7f]{+}[\x80-\xbf]{+}[\xc2-\xdf]{+}[\xe0-\xef]{+}[\xf0-\xff]+.

Character representations

Character representations
SequenceDescription
\aAlert (bell).
\bBackspace.
\eESC character, \x1b.
\nNewline.
\rCarriage return.
\fForm feed, \x0c.
\tHorizontal tab, \x09.
\vVertical tab, \x0b.
\octCharacter specified by a three-digit octal code.
\xhexCharacter specified by a hex code.
\ccharNamed control character.

Character classes

Character classes
SequenceDescription
[...]A single character listed or contained within a listed range. Ranges can be combined with the {+} and {-} operators. For example [a-z]{+}[0-9] is the same as [0-9a-z] and [a-z]{-}[aeiou] is the same as [b-df-hj-np-tv-z].
[^...]A single character not listed and not contained within a listed range.
.Any character, default [^\n].
\dDigit character, [0-9].
\DNon-digit character, [^0-9].
\sWhite space character, [ \t\n\r\f\v].
\SNon-white space character, [^ \t\n\r\f\v].
\wWord character, [a-zA-Z0-9_].
\WNon-word character, [^a-zA-Z0-9_].

Unicode character classes

Unicode character classes
SequenceDescription
\p{C}Other.
\p{Cc}Other, control.
\p{Cf}Other, format.
\p{Co}Other, private use.
\p{Cs}Other, surrogate.
\p{L}Letter.
\p{LC}Letter, cased.
\p{Ll}Letter, lowercase.
\p{Lm}Letter, modifier.
\p{Lo}Letter, other.
\p{Lt}Letter, titlecase.
\p{Lu}Letter, uppercase.
\p{M}Mark.
\p{Mc}Mark, space combining.
\p{Me}Mark, enclosing.
\p{Mn}Mark, nonspacing.
\p{N}Number.
\p{Nd}Number, decimal digit.
\p{Nl}Number, letter.
\p{No}Number, other.
\p{P}Punctuation.
\p{Pc}Punctiation, connector.
\p{Pd}Punctuation, dash.
\p{Pe}Punctuation, close.
\p{Pf}Punctuation, final quote.
\p{Pi}Punctuation, initial quote.
\p{Po}Punctuation, other.
\p{Ps}Punctuation, open.
\p{S}Symbol.
\p{Sc}Symbol, currency.
\p{Sk}Symbol, modifier.
\p{Sm}Symbol, math.
\p{So}Symbol, other.
\p{Z}Separator.
\p{Zl}Separator, line.
\p{Zp}Separator, paragraph.
\p{Zs}Separator, space.

These character classes are only available, if the option --enable-parle-utf32 was passed at the compilation time.

Alternation and repetition

Alternation and repetition
SequenceGreedyDescription
...|...-Try sub-patterns in alternation.
*yesMatch 0 or more times.
+yesMatch 1 or more times.
?yesMatch 0 or 1 times.
{n}noMatch exactly n times.
{n,}yesMatch at least n times.
{n,m}yesMatch at least n times but no more than m times.
*?noMatch 0 or more times.
+?noMatch 1 or more times.
??noMatch 0 or 1 times.
{n,}?noMatch at least n times.
{n,m}?noMatch at least n times but no more than m times.
{MACRO}-Include the regex MACRO in the current regex.

Anchors

Anchors
SequenceDescription
^Start of string or after a newline.
$End of string or before a newline.

Grouping

Grouping
Sequence Description
(...) Group a regular expression to override default operator precedence.
(?r-s:pattern) Apply option r and omit option s while interpreting pattern. Options may be zero or more of the characters i, s, or x. i means case-insensitive. -i means case-sensitive. s alters the meaning of . to match any character whatsoever. -s alters the meaning of . to match any character except \n. x ignores comments and whitespace in patterns. Whitespace is ignored unless it is backslash-escaped, contained within ""s, or appears inside a character range. These options can be applied globally at the rules level by passing a combination of the bit flags to the lexer.
(?# comment ) Omit everything within (). The first ) character encountered ends the pattern. It is not possible for the comment to contain a ) character. The comment may span lines.

add a note

User Contributed Notes

There are no user contributed notes for this page.
To Top