[Pyrex] array.array vs long ints for bit vectors
Josiah Carlson
jcarlson at uci.edu
Sun Oct 30 19:31:01 CET 2005
Nitin Madnani <nmadnani at wam.umd.edu> wrote:
>
> Thanks, Josiah ! That's a very cool way to use array.arrays. However,
> do you mean that I should use 2-bytes for each bit ? Isn't that a lot
> of waste ?
No, you don't use 2 bytes per bit, you use 1 bit per bit. Here is an
example in pure Python.
class BitVector:
def __init__(self, lst_or_len):
if isinstance(lst_or_len, (int, long)):
bits = lst_or_len*[0]
else:
bits = lst_or_len
blocks = (len(bits)+15)//16
tblock = blocks*[0]
self.vector = array.array('H', tblock)
map(self.SetBit, enumerate(bits))
def SetBit(self, posn, val):
val = val and 1 or 0
block = posn//16
shift = posn&15
cv = self.vector[block]
if (cv>>shift)&1 != val:
if val:
self.vector[block] = cv | (1 << shift)
else:
self.vector[block] = cv ^ (1 << shift)
def GetBit(self, posn):
return (self.vector[posn//16]>>(posn&15))&1
I'm pretty sure there are ways to reduce the bit operations, but I've
been ill, so my brain is a bit out of sorts.
If you don't care about bit packing (saving all space possible, etc.),
then an array.array('B',...) is your best bet, again because it bypasses
the far slower long integer processing.
- Josiah
> On Oct 30, 2005, at 3:35 AM, Josiah Carlson wrote:
>
> >
> > Use an array + bit masking. When using the array, use 16 bit unsigned
> > ints. Why? While 32 bit ints would be faster when running in pure
> > Python, you commonly end up with having to deal with the sign of your
> > ints, which is easy, but annoying. On little-endian platforms, you
> > even
> > get the convenience of getting consistant behavior regardless of
> > whether
> > you want to deal with bytes, shorts, longs, or long longs when
> > masking.
> > (use the buffer interface [1] to get the raw pointer).
> >
> > Why wouldn't you want to use a long int instead of an array? It's
> > immutable, and you have to deal with doing Python long-int math all
> > the
> > time. With the array, all of your math can be in pure C, which will be
> > /far/ faster.
> >
> > - Josiah
> >
> >
> > [1]
> > cdef extern from "Python.h":
> > int PyObject_AsCharBuffer(object obj, void **buffer, int
> > *buffer_len)
> >
> > cdef int get_buffer(obj, char **data, int *length) except -1:
> > cdef int result
> > cdef void *vd
> > result = PyObject_AsCharBuffer(obj, &vd, length)
> > data[0] = <char *>vd
> > return result
> >
> >
> > Nitin Madnani <nmadnani at wam.umd.edu> wrote:
> >
> >> Hi Guys
> >>
> >> I am planning to use a bit vector as a member of an extension type. I
> >> first thought that I can easily do this by just using a long int for
> >> the vector but then I realized that I would like the bit vector to
> >> grow only to the size needed (which would be defined by an argument
> >> to the __init__ method of the type. So, then I remembered the 'array'
> >> module but then this would also store each bit as a byte (according
> >> to the python documentation). So, if I had a 20-bit map, 'array'
> >> would take >20 bytes to store this bitmap (only the buffer would be
> >> 20 bytes) ? I have also read that a long int also comes with a bit of
> >> baggage.
> >>
> >> So, is there no other memory-efficient option other than to write an
> >> extension myself? However, I am guessing whatever I come up with
> >> would be slower compared to the both long ints and arrays. So, should
> >> I just bite the bullet and live with the wasted memory ?
> >>
> >>
> >> Thanks !
> >> Nitin
> >>
> >> PS: I know this question is sort of out of place here but since I am
> >> planning to use it in Pyrex, I wanted to ask it here. I also know
> >> little about either long ints or the 'array' module, so please
> >> correct me if I am being completely naive.
> >>
> >>
> >> _______________________________________________
> >> Pyrex mailing list
> >> Pyrex at lists.copyleft.no
> >> http://lists.copyleft.no/mailman/listinfo/pyrex
> >>
> >
> >
More information about the Pyrex
mailing list