[Pyrex] (from c.l.py) SeeGramWrap, a C parser
Bob Ippolito
bob at redivi.com
Fri Mar 5 17:00:21 CET 2004
This is something I have been waiting a long time for. I haven't tried
it yet, but here is a post I saw today on c.l.py..
On 2004-03-05 08:43:26 -0500, "Edward C. Jones" <edcjones at erols.com>
said:
I have uploaded "SeeGramWrap-03.02.2004.tgz" to my webpage at
"http://members.tripod.com/~edcjones/pycode.html". SeeGramWrap parses a
piece of C code and the resulting parse tree is output in man and
machine readable form. The result can be used for program
transformations. Since a particular trnsformation algorithm may not
require all the information present in the tree, the user can select
what to output.
This program has been written and tested only under linux.
Thanks to John Mitchell and Monty Zukowski for "cgram.tgz". Every
parser generator need to have a good C grammar. Also thanks to Terrence
Parr for ANTLR (http://www.antlr.org/).
Note: I have also uploaded
pydocs.tar.gz which searchs the Python documentation.
linenum.py which prints a linenumber and a message. Useful for
debugging.
========================================================================
CONTEXT
A number of large C libraries have been wrapped so they can be called
by Python. The wrapping code is repetitive and there may be a lot of it
so methods have been developed for automated wrapping.
The best-known approach is SWIG (http://www.swig.org/). For complex
wrappings, SWIG requires the writing of "typemaps", an unintuitive
process where pieces of C code you write are spliced into the wrapper
code generated by SWIG.
Another wrapper related approach is Pyrex which is found at
http://www.cosc.canterbury.ac.nz/~greg/python/Pyrex/
Pyrex has its own repetitive boilerplate that has to be written. But
the Pyrex boilerplate is so straightforward that it can be taught
algorithmically. See "Michael's Quick Guide to Pyrex" at
"http://ldots.org/pyrex-guide/".
I think that the Pyrex boilerplate is _so_ straightforward that it can
be machine generated. Therefore I have been sporatically developing
software to do this. A thoroughly buggy version of this is on my web
page, "http://members.tripod.com/~edcjones". It is called
"cgram.tar.gz" (The name will be changed). Look at it but don't use it.
"SeeGramWrap" is a major revision of the front end of "cgram.tr.gz".
I think the automatic-wrapper program can be made to work. It might be
easier to use than SWIG. It is still a lot of work to prepare complex C
header files. What we have is really a "program transformation" or
"tree transformation" problem.
I think some of the issues are:
1. Since parser generators have a long and steep learning curve, I
prefer to use them as black boxes which generate parsers which output
results that I can analyze using Python. The parser created by a parser
generator should output trees in two formats: one easy to look at and
another that a program can easily read. For examples, see below.
2. I find trees very easy to work with. I want the trees to be front
and center and highly visible. I prefer to "manipulate a tree" rather
than "fire a rule".
3. The most common type of C macro has a type as one of its arguments:
#define CAST(x, type) (type *) x
How can these be automatically wrapped for Python which is a
dynamically typed language?
888888888888888888888888888888888888888888888888888888888888888888888888
TECHNICAL OVERVIEW
I use some C grammars associated with ANTLR. The grammar package is
called "cgram". See "http://www.antlr.org/resources.html".
In "cgram" there is a java program "TestThrough.java" which parses C
code into an AST then runs a tree grammar on the AST and outputs the
original code. The tree grammar is named "GnuCEmitter.g". I work with
this grammar because the terminal tokens are printed in the correct
order. I modified the grammar turning it into a template. A piece of
the original "GnuCEmitter.g" is:
----
typeQualifier
: a:"const" { print( a ); }
| b:"volatile" { print( b ); }
;
----
The modified version is:
----
typeQualifier
: a:"const" { <@ a @> }
| b:"volatile" { <@ b @> }
;
----
In this template, strings of the form "<@ ... @>" will each be replaced
by a set of print statements. Moreover the entire rule will be wrapped
by prints. The template is used in "emitter/insert_prints.py". If
"insert_prints.py" is run the result is:
----
typeQualifier
{ if ( inputState.guessing==0 ) {
print(Open);
print("typeQualifier");
}
}
: (
a:"const" { print(Open);
print("typeQualifier.0"); print( a ); print(Close); }
| b:"volatile" { print(Open);
print("typeQualifier.1"); print( b ); print(Close); }
)
{ currentOutput.print(Close + MyTokenSep); }
;
----
If the original C program , "temp2.c", is
char* s = "ab";
The output of the modified emitter grammar is "temp2.c.data":
----
<<OPEN>> <<OPEN>> <<OPEN>>
externalList declarator expr
<<OPEN>> <<OPEN>> <<OPEN>>
externalDef pointerGroup primaryExpr
<<OPEN>> <<OPEN>> <<OPEN>>
declaration pointerGroup.0 stringConst
<<OPEN>> * <<OPEN>>
declSpecifiers <<CLOSE>> stringConst.0
<<OPEN>> <<CLOSE>> "ab"
typeSpecifier <<OPEN>> <<CLOSE>>
<<OPEN>> declarator.0 <<CLOSE>>
typeSpecifier.1 s <<CLOSE>>
char <<CLOSE>> <<CLOSE>>
<<CLOSE>> <<CLOSE>> <<CLOSE>>
<<CLOSE>> <<OPEN>> <<CLOSE>>
<<CLOSE>> initDecl.0 <<CLOSE>>
<<OPEN>> = ;
initDeclList <<CLOSE>> <<CLOSE>>
<<OPEN>> <<OPEN>> <<CLOSE>>
initDecl initializer <<CLOSE>>
----
This output can be processed by "tree.py" to produce "temp2.c.nest"
----
(externalList,
(externalDef,
(declaration,
(declSpecifiers,
(typeSpecifier,
(typeSpecifier.1, |char|))),
(initDeclList,
(initDecl,
(declarator,
(pointerGroup,
(pointerGroup.0, |*|)),
(declarator.0, |s|)),
(initDecl.0, |=|),
(initializer,
(expr,
(primaryExpr,
(stringConst,
(stringConst.0, |"ab"|))))))), |;|)))
----
or "temp2.c.src":
char * s = "ab" ;
If "temp2.c.src" is put through the entire process itself we get
"temp2.c.src.src" which is identical to "temp2.c.src". This test is
done by "docheck.py".
In the ".data" or ".nest" files the tokens from the original C code are
in the correct order. It is easy to recover
('char', '*', 's', '=', '"ab"', ';')
More information about the Pyrex
mailing list