[Pyrex] intern(str) fails if string is not a C string
robertwb at math.washington.edu
Thu Oct 22 06:27:33 CEST 2009
On Oct 21, 2009, at 1:10 PM, John Arbash Meinel wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> This bug seems to affect both cython and pyrex.
> Namely, I'm parsing a string that has NULL characters in it (known
> width), which is also likely to be redundant within the data stream.
> I'm doing something like:
> mystr = PyString_FromStringAndSize(NULL, count+other_count)
> memcpy(mystr, some_bytes, count)
> memcpy(mystr+count, more_bytes, other_count)
> mystr = intern(mystr)
> This fails because pyrex and cython both effectively translate this
> char *temp;
> temp = PyString_AsString(mystr);
> mystr = PyString_InternFromString(temp);
> With, of course, appropriate error checking and incref/decref
> I would, of course, like to use PyString_InternInPlace(PyObject **),
> however that fails for other reasons. "taking address of a non l-
> if you try to do:
> cdef extern from "Python.h":
> ctypedef struct PyObject:
> void PyString_InternInPlace(PyObject **)
> st = 'my string'
> PyString_InternInPlace(&<PyObject *>st)
> Now I can probably do some trickery with
> cdef PyObject *as_ptr
> as_ptr = <PyObject *>st
> st = <object>as_ptr
> However, because InternInPlace may destroy 'st', and that final
> assignment will be doing a DECREF on the 'st' object, I'm pretty
> sure it
> will blow up.
> It feels like the only thing left to do is define a macro in a header
> with something like:
> #define INTERN_STRING(obj) (PyString_InternInPlace(&(obj))))
> and then
> cdef extern from "myheader.h":
> Is this true?
Good catch. I've disabled optimizing the intern builtin in Cython for
now. We could re-enable it for char* only if someone finds interning
strings to be a bottleneck.
More information about the Pyrex