[Pyrex] [Cython] intern(str) fails if string is not a C string

John Arbash Meinel john at arbash-meinel.com
Fri Oct 23 16:31:10 CEST 2009


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


...
> I recommend using a dedicated dict instead, and put your byte strings
> there. This will not change the performance in any way, given that intern()
> on a char* has always been creating a Python byte string before interning
> (and possibly dropping) it. But it will make it clearer in the code what is
> actually happening.
> 
> Stefan
> 

So I can't intern() a char* because it has NULLs in the array.

I don't want to use a dedicated dict, because then the strings become
immortal.

At the moment, I don't care about Py3 compatibility, because we are a
long way off from there.

I do understand that interning in python is really meant for internal
use. Because attributes, etc are all managed via py strings (becoming
Unicode in Py3), and thus lookups in dicts, etc are better if you intern
everything.

However, there is no way to implement de-duping without immortality in
python, other than something like weakrefs (which strings and tuples
don't support, and really exacerbates the memory problems w/ interning,
which you are trying to make *better*), or being truly evil and poking
at the 'tp_dealloc' slot of the object you are working with. (To teach
the deallocator that there is a copy over in a special dict that they
should remove.)

Anyway, thanks for looking at my issue and responding. It was certainly
confusing why adding the line:

  s = intern(s)

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkrhvi4ACgkQJdeBCYSNAAOAGACfe1aprNd7S8APrcpZWAEU157d
CFMAoM6gG/YZnNpKT1UjDBLtBajMVww5
=bAM1
-----END PGP SIGNATURE-----



More information about the Pyrex mailing list