PHP: 性能 - Manual

性能

模式中一些项可能比其他一些更加高效。比如使用 [aeiou] 这样的字符类会比可选路径 (a|e|i|o|u) 高效。一般而言，用尽可能简单的构造描述需求是最高效的。 Jeffrey Friedl 书(精通正则表达式)中包含了很多关于正则表达式性能的讨论。

当一个模式以 .* 开始并且设置了 PCRE_DOTALL 选项时，模式通过PCRE隐式锚定，因为它可以匹配字符串的开始。然而，如果 PCRE_DOTALL 没有设置，PCRE 不能做这个优化，因为.元字符不能匹配换行符，如果目标字符串包含换行符，模式可能会从一个换行符后面开始匹配，而不是最开始位置。比如，模式 (.*) second 匹配目标字符串 ”first\nand second”(\n 是一个换行符)第一个捕获子组结果是 ”and”。为了这样做， PCRE 尝试从目标字符串中每个换行符后开始匹配。

如果你使用模式匹配没有换行符的目标字符串，可以通过设置 PCRE_DOTALL 或以 ^.* 开始的模式明确指示锚定以获取最佳性能。这样节省了 PCRE 沿目标字符串扫描查找换行符重新开始的时间。

小心模式中的无限重复嵌套。这在应用到不匹配字符串时可能会导致运行时间很长。考虑模式片段 (a+)*

这个模式可以有 33 种方式匹配 ”aaaa”，并且这个数字会随着字符串的长度的增加迅速增加. (*重复可以匹配0,1,2,3,4次, 并且除了0外每种情况+都有不同次数的匹配对应)。当模式的剩余部分导致整个匹配失败的时候， PCRE原则上回尝试每种可能的变化，这将会非常耗时。

对于一些简单的情况的优化是像 (a+)*b 这样紧接着使用原文字符串.。在着手正式匹配工作之前，PCRE 检查目标字符串后面是否有 ”b” 字符，如果没有就立即失败。然而当紧接着没有原文字符的时候这个优化是不可用的。你可以比较观察 (a+)*\d 和上面模式的行为差异。前者在应用到整行的 ”a” 组成的字符串时几乎是立即报告失败，而后者在目标字符串长于 20 个字符时，时间消耗就相当可观。

发现了问题？

了解如何改进此页面 • 提交拉取请求 • 报告一个错误

＋添加备注

用户贡献的备注 1 note

down

arthur200126 at gmail dot com ¶

2 years ago

> Beware of patterns that contain nested indefinite repeats. These can take a long time to run when applied to a string that does not match.

To say that it takes a "long time" is an understatement: the time taken would be exponential, specifically 2^n, where n is the number of "a" characters. This behavior could lead to a "regular expression denial of service" (ReDoS) if you run such a expression on user-provided input.

To not be hit by ReDoS, do one (or maybe more than one) of the three things:

* Write your expression so that it is not vulnerable. https://www.regular-expressions.info/redos.html is a good resource (both the "atomic" and "possessive" options are available in PHP/PCRE). Use a "ReDoS detector" or "regex linter" if your eyeballs can't catch all the issues.
* Set up some limits for preg_match. Use `ini_set(...)` on the values mentioned on https://www.php.net/manual/en/pcre.configuration.php. Reducing the limits might cause regexes to fail, but that is usually better than stalling your whole server.
* Use a different regex implementation. There used to be an RE2 extension; not any more!