[Pyrex] Speeding up custom string lowercasing with Pyrex

Stefan Behnel stefan_ml at behnel.de
Wed Oct 31 14:25:30 CET 2007


Alexy Khrabrov wrote:
> Greetings -- I'm counting Russian words, encoded in Cyrillics, and  
> found that lowercasing them in Python slows down the program 5-10  
> times.
> 				ngram = ' '.join(words[i:(i+self.opts.n)]) # mashed
> 				if self.opts.lower:
> 					# x = ngram.lower() # TODO does nothin'!

Are you using a unicode string or some 8-bit string encoding? Because unicode
objects should know how to lower-case characters. I can't see why this should
do nothing.

   >>> print "ÄÜÖ".lower()
   ÄÜÖ
   >>> print u"ÄÜÖ".lower()
   äüö

Using Pyrex here is not a good idea, IMHO.

Stefan




More information about the Pyrex mailing list