[Pyrex] array.array vs long ints for bit vectors

Nitin Madnani nmadnani at wam.umd.edu
Sun Oct 30 21:07:11 CET 2005


Thanks for your comments. I would like to spend time on the  
computational linguistics part of the project which is the real  
"research". I will just go with arrays for now.

Thanks again,
Nitin

On Oct 30, 2005, at 2:56 PM, John Machin wrote:

> Nitin Madnani wrote:
>
>> Thanks for the detailed reply, John. Let me answer some of your  
>> questions and ask some more of mine.
>>
>>> For a start, what operations do you want to perform on this "bit  
>>> vector"?
>>>
>> The only operations I need are setting bits, getting the bits that  
>> are not set etc.
>>
>
> What does "getting the bits that are not set" mean? Is this a loose  
> description of x = avector[k], or something else? Have you  
> contemplated designing the API you think you will need?
>
>
>>> "grow" and "needed" are a little contradictory. Do you want an  
>>> *exact* size, so that you can do this:
>>>
>>> zero = Bitvector(size=20, contents=[0])
>>> all_ones = ~zero
>>>
>>> and know exactly how long the result will be?
>>>
>>> What makes you think a long int would grow unconscionably longer  
>>> than needed? How long is your longest map going to be? In any  
>>> case, just think int, which morphs seamlessly into long as and  
>>> when required. A 20-bit map would fit into an int.
>>>
>>> Another thing to worth contemplating: given the amount of going- 
>>> over by Tim Peters that the long int code has had, if long ints  
>>> waste more memory on leading zeroes than n bits [where n is < 32  
>>> and could be determined by inspecting the code], then the end of  
>>> the world is nigh.
>>>
>>>
>> My bit-vector sizes can vary from 10-bits to 60-bits with an  
>> average length of around 25-30 bits.
>>
>
> Any chance of using a 64-bit machine?
>
>
>> So, would a long int be the best way then ?
>>
>
> Note Josiah Carlson's comment re immutability -- which probably  
> isn't a problem if your bit vectors will always fit in an int, but  
> "always" sometimes never happens. Design your API, implement a  
> prototype, RTF long int source code to see what happens with  
> longint & (1 << k) ... you did mention "research", didn't you?
>
>
>> I was just worried that I would be wasting too much memory.
>>
>
> Stop worrying and do some rough calculations! How many bit vectors?  
> How much memory on your machine?
>
>
>>> "This" wouldn't be doing the storing. *You* would be [mis]using a  
>>> byte array as a bit array.
>>>
>>>
>> Hmm. So, is there a correct way to use array.array for bit vectors?
>>
>
> Correct in what sense? [Rhetorical question]
>
> Use shifts, ands, and ors to pack multiple bits into a storage  
> container.
>
>
>




More information about the Pyrex mailing list