[Pyrex] faster in / output from objects [long post + code!]
Robert Bradshaw
robertwb at math.washington.edu
Thu Jan 24 21:10:17 CET 2008
On Jan 24, 2008, at 4:24 AM, Martijn Meijers wrote:
> Dear list members,
>
> Currently I'm working in the geo-informatics field and I'm doing
> research on storage of vector data in a DBMS. For this my programming
> language of choice is Python. . Although there are some vector
> libraries
> in C with Python bindings, I feel that those are not really
> comfortable
> to work with (due to their API).Therefore, I decided to roll my own
> library for educational and research purposes and I'm using Cython for
> this purpose (as I'm not really proficient in C or C++, and I'm not
> really willing to go that route, as it involves quite a steep learning
> curve).
Sounds like a good choice.
> Below, you'll find my library that I created. Creation of objects is
> fairly fast, compared with the C-lib-with-python bindings that I have
> available for comparison (my approach is around 1.5 times faster with
> object creation). However, I'm stuck with in/output of my objects: Two
> formats I'd like to support: a text based format and a binary format.
> Here, I have the feeling I don't understand how I can use Cython to
> push
> the throughput to the limits. My approach (with Visitors) is fairly
> slow. As I understand it, Cython is more geared towards (mathematical)
> computations, then to text processing...
Our Sage branch of Pyrex used to be called SageX, and we were all
surprised after the first year how little our improvements were
specific to the mathematics infrastructure we were supporting.
However, it is true that the Python/C api doesn't make it easy to
naively do fast string processing without having to think about the
underlaying string representation.
>
> I'd like to know some things about my code:
> (a) Did I do things the right way, or can the code be optimized more
> (while staying in Cython)?
Lots.
I didn't read all of your code, but here's some things that jumped
out at me:
1) Use a more object-oriented style (this should clean up code as
well as optimizing). E.g.
def is_empty(Geometry geom):
if geom.type == __POINT:
return False # Point cannot be empty, at the moment
elif geom.type == __LINESTRING:
return num_points(geom) == 0
elif geom.type == __POLYGON:
return num_rings(geom) == 0
would be better as a method of Point, LineString, and Polygon rather
than branching on geom.type
2) Store just the actual data, rather than list of python objects
wrapping the data. E.g. in LineString, rather than points being a
python list, let it be a c-array of Coordinate structs. Only
construct the Point class for __getitem__ or other methods that
expose it to the outside.
3) You're using def functions all over the place, consider using more
cdef (or cpdef) functions.
> (b) Is it possible to speed up the in- and output of text and binary
> formats (here a lot of python functions are still used, but I can't
> seem
> to find examples of how to do text/binary stream processing with
> Cython)...?
See above, especially (3).
If one's writing to a file, one can access the c FILE* pointer and
operate on that directly. I notice you keep converting back and forth
between strings and streams--this has got to be expensive.
I had to write something that is very similar to what you're doing
(but in 3d) and the fastest way I found was to output a (possibly)
neseted list of strings, which are then joined at the very end. See
http://www.sagemath.org/hg/sage-main/file/a66354d13708/sage/
plot/plot3d/index_face_set.pyx
specifically [tachyon | obj | jmol]_repr(). this is passed to an
extremely optimized "flatten_list" command at the end of
http://www.sagemath.org/hg/sage-main/file/a66354d13708/sage/
plot/plot3d/base.pyx
Also relevant is
http://www.sagemath.org/hg/sage-main/file/a66354d13708/sage/
plot/plot3d/point_c.pxi
Note, code doesn't need to be near as tightly written, or use the
Python/C API directly to take advantage of the ideas illustrated.)
There's been several requests on this streaming/fast IO, but no
examples of using buffers/stringio in cython directly, so I hope the
above is useful to lots of people.
- Robert
>
> Thanks very much for your time and advice in advance!
>
> Kind regards,
>
> Martijn Meijers
> Delft University of Technology, The Netherlands
> OTB, Section GIS-technology
>
...
More information about the Pyrex
mailing list