[Pyrex] [Cython] intern(str) fails if string is not a C string

Stefan Behnel stefan_ml at behnel.de
Fri Oct 23 10:31:42 CEST 2009


Robert Bradshaw wrote:
> On Oct 21, 2009, at 1:10 PM, John Arbash Meinel wrote:
>> I'm doing something like:
>>
>> mystr = PyString_FromStringAndSize(NULL, count+other_count)
>> memcpy(mystr, some_bytes, count)
>> memcpy(mystr+count, more_bytes, other_count)
>>
>> mystr = intern(mystr)
> 
> Good catch. I've disabled optimizing the intern builtin in Cython for  
> now. We could re-enable it for char* only if someone finds interning  
> strings to be a bottleneck.

I've reimplemented it for byte strings now (in Cython), so using a char*
will lead to a coercion and work in Py2. However, note that the intern()
builtin was removed in Py3, so using it in your code may not be portable.
Also, if we choose to implement it somehow,it will necessarily return a
unicode string in Py3, which may not be what you want, given the above code
fragment.

I recommend using a dedicated dict instead, and put your byte strings
there. This will not change the performance in any way, given that intern()
on a char* has always been creating a Python byte string before interning
(and possibly dropping) it. But it will make it clearer in the code what is
actually happening.

Stefan



More information about the Pyrex mailing list