[Pyrex] array.array vs long ints for bit vectors
John Machin
sjmachin at lexicon.net
Sun Oct 30 10:55:05 CET 2005
Nitin Madnani wrote:
> Hi Guys
>
> I am planning to use a bit vector as a member of an extension type.
For a start, what operations do you want to perform on this "bit vector"?
Let's assume at the very least it should have get/set like a list, but
contained objects are restrained to belong to set([0, 1]).
Maybe it should also support something like this:
three = Bitvector([1, 1,])
six = Bitvector([ 0, 1, 1])
seven = Bitvector([1, 1, 1])
two = Bitvector([0, 1])
assert three | six == seven
assert three & six == two
Which operations are required only inside the extension, and which
operations need to be exposed to the user of the extension?
> I first thought that I can easily do this by just using a long int for
> the vector but then I realized that I would like the bit vector to grow
> only to the size needed (which would be defined by an argument to the
> __init__ method of the type.
"grow" and "needed" are a little contradictory. Do you want an *exact*
size, so that you can do this:
zero = Bitvector(size=20, contents=[0])
all_ones = ~zero
and know exactly how long the result will be?
What makes you think a long int would grow unconscionably longer than
needed? How long is your longest map going to be? In any case, just
think int, which morphs seamlessly into long as and when required. A
20-bit map would fit into an int.
Another thing to worth contemplating: given the amount of going-over by
Tim Peters that the long int code has had, if long ints waste more
memory on leading zeroes than n bits [where n is < 32 and could be
determined by inspecting the code], then the end of the world is nigh.
> So, then I remembered the 'array' module
> but then this would also store each bit as a byte (according to the
> python documentation). So, if I had a 20-bit map, 'array' would take
> >20 bytes to store this bitmap (only the buffer would be 20 bytes) ?
"This" wouldn't be doing the storing. *You* would be [mis]using a byte
array as a bit array.
> I have also read that a long int also comes with a bit of baggage.
*Every* type necessarily has more than one bit of baggage. [Sorry; the
cheer squad was hollerin' "Pun! Pun! Pun!"]
>
> So, is there no other memory-efficient option other than to write an
> extension myself?
What language would you write it in? C? If so, think how you would pack
the bits into the elements of a C byte array, and then think how you
could use the elements of a Python array.array('B', .....) instead. Then
think how much easier it would be to write it in Pyrex.
> However, I am guessing whatever I come up with would
> be slower compared to the both long ints and arrays.
Not necessarily. Such guesses can be wildly wrong.
> So, should I just
> bite the bullet and live with the wasted memory ?
*If* you can live with it. How many bitvector instances will you have?
How long will each be? How much memory will be wasted by which
implementation? How much memory do you have, anyway? The cost ratio of
(say) 256MB of memory to a contract developer's hourly rate is what?
>
>
> Thanks !
> Nitin
>
> PS: I know this question is sort of out of place here but since I am
> planning to use it in Pyrex, I wanted to ask it here. I also know
> little about either long ints or the 'array' module, so please correct
> me if I am being completely naive.
Correct yourself; RTF source code. That's one of the big advantages of
open source software. You get to stand on the shoulders of giants, like
the afore-mentioned timbot.
Cheers,
John
More information about the Pyrex
mailing list