[Pyrex] [Cython] Python 3

Stefan Behnel stefan_ml at behnel.de
Wed Apr 30 09:44:11 CEST 2008


Christian Heimes wrote:
> The UTF-8 default encoding is hard coded in Python 3.0. IMHO it's the
> most sensible encoding for users from the Western world. Asian users
> would probably prefer UTF-16 but that's a waste of memory for the rest.

I agree that it's a sensible encoding. However, allowing

    cdef char* s = some_object

would still yield surprising results if "some_object" is an ISO-encoded
bytes object instead of a unicode string. In one case, UTF-8 decoding
would work, in the other, it wouldn't.

Since I find it very helpful and straight foreward to support the above
for byte strings, the best way to deal with this in Cython is to raise an
exception if "some_object" is a unicode string.


> In my opinion wchar_t support is much more important than casting
> PyUnicode objects to char*. Especially Windows developers need wchar_t
> for the wide Windows API. Python 2.6 and 3.0 have dropped support for
> the Windows 9x/ME/NT series. Only 2k SP4 and newer are supported.
> wchar_t support is an important step for the poor souls ... err Windows
> developers. ;)

But that still isn't portable, is it? According to unicodeobject.h in
Py3a4, a unicode character can be either wchar_t (unsigned 16-bit) or
"Py_UCS4" (unsigned int or unsigned long).

So automatic conversion to wchar_t would fail on some platforms.

Stefan




More information about the Pyrex mailing list