[Pyrex] Serialization of a wrapped C structure

Samuele Kaplun Samuele.Kaplun at cern.ch
Fri Jul 27 09:26:10 CEST 2007


Dear list,
I'm moving my first steps in the PyRex world. I'm writing a C abstract data 
structure for managing a set of integers, in the form of a stream of bit (I 
want to exploit 64bit machines power, so I work on an array of unsigned long 
long unsigned int (word_t), for all my operations).
Everything is going fine for wrapping it into a new Python extension that I 
called intbitset.
Now, one of the main part in which I'm more interested is the ability to fast 
save and retrieve this object from a database (saved as a BLOB).
Right now I was able to make my extension iterable in order to convert it to a 
list and serialize this one. But this is rather slow and space consuming.

What would be the best way to just obtain my raw array of word_t in order to 
serialize it (and to zip it - the array will contain few bit set to 1, so 
it's a sparse array, easily compressable). I saw that every interface for 
writing something on a stream is to convert my data to a Python string, which 
I think is a resource consuming activity.
I also found that I can "pack" my data in a string, but I think it can't work.
My C structure is:
[...]
typedef struct {
    size_t size;
    word_t *bitset;
} IntBitSet;
[...]
where word_t *bitset is obviously the word_t array I want to serialize. In 
PyRex I wrapped everything with:
[...]
cdef class intbitset:
    cdef IntBitSet *bitset
[...]
So that anytime I'm managing a pointer to my C structure, which has a pointer 
to the array.

My problem is that when I write the method:
[...]
    def __getstate__(intbitset self):
        return pack("I%is" % (intBitSetGetSize(self.bitset) / wordbytesize), 
self.bitset.size, <char *>self.bitset.bitset)
[...]
I think that forcing self.bitset.bitset (i.e. my pointer to the word_t array) 
to be a <char *> -- which is the only way I found to obtain raw stream of 
byte from memory -- is wrong, because it will be converted in a Python string 
by PyRex, before passing it to the pack function.

So what do you suggest? How can I reach a binary sequence of data in memory to 
serialize it? 

Best regards,
	Samuele Kaplun




More information about the Pyrex mailing list