[Pyrex] lxml's patches against cython

Stefan Behnel stefan_ml at behnel.de
Fri Jul 27 00:02:42 CEST 2007


Hi,

here is an updated C-API patch for cython (capi-diff.patch, actually against
sagex-20070710, hadn't downloaded cython yet while doing offline work).

However, when I compile lxml with the resulting translator, it yields a number
of errors, some of which are ok (and easily fixed), but some of which should
be fixed in cython (and some also exist in Pyrex).

The biggest number of errors (note that lxml wraps C libraries) result from
the fact that Pyrex doesn't handle enums as ints, so you can't |, &, etc. enum
values. That was a bug in 0.9.5 that is still not fixed in an official
release. The attached "enum.patch" fixes it (not the first time it goes
through this list, BTW).

A cython specific feature seems to be that it knows about (c-implemented)
builtins and requires them to obey a specific signature. However, the
"unicode" function can be used without argument in Python, so the easiest way
to create an empty unicode object in Pyrex is to say "unicode()". cython maps
this to PyObject_Unicode, which requires an argument. Easy to work around in
the code with unicode('') or a direct call to the C-function, though.

I think cython should support unicode literals, preferably following the
source encoding PEPs for Py3k (defaulting to UTF-8 etc.), but allowing only
ASCII escapes would be fine for the beginning. It would at least allow you to
create unicode strings straight away in the source, without explicitly
wrapping it in unicode("literal", encoding).

Although I was opposed at the beginning, I'm actually quite happy with the
compile-time globals/builtins detection now. cython found two long standing
typos in never-tested-corner-case-code of lxml.etree. :)

I also added a patch that allows switching off assertions at compile time
based on a compiler define (nicely supported by distutils).

One remaining problem is that the module is now named "src.lxml.etree"
internally (visible in exceptions, especially doctests). But "src" is not a
package, just the main source directory. What is the best way to fix that?

Apart from that, the C-API implementation is working nicely with cython, so
now the trunk of lxml can be compiled with cython plus the attached patches.

In case cython wants a bug tracker for project management, I'm currently
getting a very good impression of Ubuntu's launchpad. Simple, non-intrusive
sign-up, pretty good features and a close-to-intuitive interface.

Happily,
Stefan

-------------- next part --------------
A non-text attachment was scrubbed...
Name: capi-diff.patch
Type: text/x-diff
Size: 11537 bytes
Desc: not available
Url : http://lists.copyleft.no/pipermail/pyrex/attachments/20070727/732b825c/attachment.bin 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: assertions.patch
Type: text/x-diff
Size: 765 bytes
Desc: not available
Url : http://lists.copyleft.no/pipermail/pyrex/attachments/20070727/732b825c/attachment-0001.bin 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: enum.patch
Type: text/x-diff
Size: 658 bytes
Desc: not available
Url : http://lists.copyleft.no/pipermail/pyrex/attachments/20070727/732b825c/attachment-0002.bin 


More information about the Pyrex mailing list