[Pyrex] Sparse vectors using dicts in Pyrex.
Robert Bradshaw
robertwb at math.washington.edu
Sun Jan 20 10:36:59 CET 2008
On Jan 19, 2008, at 2:01 PM, Daniel Yarlett wrote:
> Hi,
>
> I'm having trouble finding any information on this question, so I hope
> the solution isn't too obvious.
>
> I have decided to use Python dictionaries to encode sparse vectors
> that I work with (initial exploration suggested this was faster than
> using the sparse vector implementation in scipy). If a vector has a
> non-zero value at a given index, the integer index is used as key to
> the appropriate value. E.g. if the 7th element of a vector is equal to
> 3.2 then I would write something like:
>
> vector = {}
> vector[7] = 3.2
>
> This works pretty well, except that I wish to expose the dictionary
> type using Pyrex so that I can hopefully improve the speed of my
> implementation. I was wondering if anyone could point me to a simple
> example in which a Pyrex function receives a dictionary and (a) gets
> the integer keys (indices) of this dictionary, and (b) does something
> simple like computes the sum of the vector (where each value is cast
> as a double, say). As I said, I haven't been able to find any
> documentation on this, and my knowledge of the C API for Python is too
> underdeveloped to allow me to figure this out on my own.
If you're going the dictionary route, I would recommend getting
familiar with http://docs.python.org/api/dictObjects.html
For example, if you want to get an item from a dictionary, put
cdef extern from "Python.h":
void* PyDict_GetItem(object, object) # don't declare as object
because it's a "borrowed" reference
then in your code, you can write
value = <object>PyDict_GetItem(vector, index)
However, this will not be a huge savings from just writing value =
vector[index] because it will still be using all the python
dictionary machinery and wrapping integers/floats in python objects.
> Also, does anyone have any thoughts on the potential speed benefits of
> using Pyrex in this instance as opposed to using Python's built-in
> dictionary methods?
Really, if you want to make things fast (easily a 25x speedup over
the above) you should write your own data structure. If you will be
getting much more often then setting, or creating vectors all at once
then reading from it a lot, something like
cdef int n # number of non-zero entries
cdef int* indices # e.g. 1,5,19,33,...
cdef double* values # e.g. v[1], v[5], v[19], v[33], ...
with a binary search on __getitem__ will be much, much faster. If
you need to set entries a lot, then writing a custom B-tree or simple
hashtable will still beat storing objects in Python dictionaries by a
huge margin. All of these will be super fast to enumerate over to
compute, e.g. the sum of the elements in your vector.
- Robert
More information about the Pyrex
mailing list