Compiling
Regular Expressions
Randal L. Schwartz
Perl's regular expressions set Perl apart from most other scripting languages.
Some features (like the positive and negative lookahead, and the optional lazy
evaluation of multipliers) make matching troublesome strings trivial in Perl.
This power has not gone unnoticed by the open software community -- a package
called PCRE available on the 'Net (use your favorite search engine) claims to
implement a regular expression matching engine that's compatible with
Perl 5.004's syntax.
Just like the strings they are matching, regular expressions can come from
many sources. The most common source is inside the program:
@ARGV = qw(/usr/dict/words);
while (<>) {
print if /foo/ || /bar/;
}
This little guy ran in about a quarter of a CPU second for me, and generated
a nice list of words that contain foo and bar. Notice that I wrote
/foo/ and /bar/ as separate regular expressions, instead of the
seemingly identical /foo|bar/. Why did I do that? Experience. As reported
by the following program:
@ARGV = qw(/usr/dict/words);
@words = <>;
use Benchmark;
timethese (10 => {
'expression or' =>
'@x = grep /foo/ || /bar/, @words',
'regex or' =>
'@x = grep /foo|bar/, @words',
});
We get the following output from Benchmark:
Benchmark: timing 10 iterations of expression or, regex or...
expression or: 1 wallclock secs ( 0.97 usr + 0.00 sys = 0.97 CPU)
regex or: 3 wallclock secs ( 2.8
|