Parsing Interesting Things
Randal L. Schwartz Someone recently popped into one of the newsgroups I frequent and asked how to parse an INI file. You might have seen those before, with sections and keyword=value lines, like:
[login]
timeout=30
remote=yes
[password]
minlength=6
I think they started in the Microsoft world, since no sane UNIX hacker would have come up with something like that. No, we come up with things like .Xdefaults and sendmail.cf and termcap. But the request seemed simple: parse the file and gather the information into a hash for quick access, two levels deep, of course.
Now, I usually carry the banner here for use the CPAN, and in fact, there are numerous CPAN modules that parse INI files (too many, I think). But lets take a different route here. Suppose we were parsing a file that wasnt already CPANned to death. What tools could we use?
Well, certainly Perls regular expressions are pretty powerful in the first place, and this task really wouldnt be that difficult with hand-written code, but we can go a bit further and pull out a nifty tool from the CPAN: the madman of Perl Damian Conways Parse::RecDescent. This module permits extremely complex parsers to be built by specifying a nice hierarchical description of the data (as a grammar), and a series of actions to be taken as each portion of the data is returned. I find it very simple to use, and whipped up a parser in no time.
The key to a useful grammar is getting the description right, and what to do once youve seen that. First, lets look at a file. A file is a series of sections, so in the grammar language, thats given as:
file: sections
Actually, a file is a bit more than that. If we just used that, the grammar would match any prefix of the input that also had sections.
|