[Pyrex] Sparse vectors using dicts in Pyrex.

Robert Bradshaw robertwb at math.washington.edu
Sun Jan 20 10:36:59 CET 2008


On Jan 19, 2008, at 2:01 PM, Daniel Yarlett wrote:

> Hi,
>
> I'm having trouble finding any information on this question, so I hope
> the solution isn't too obvious.
>
> I have decided to use Python dictionaries to encode sparse vectors
> that I work with (initial exploration suggested this was faster than
> using the sparse vector implementation in scipy). If a vector has a
> non-zero value at a given index, the integer index is used as key to
> the appropriate value. E.g. if the 7th element of a vector is equal to
> 3.2 then I would write something like:
>
> vector = {}
> vector[7] = 3.2
>
> This works pretty well, except that I wish to expose the dictionary
> type using Pyrex so that I can hopefully improve the speed of my
> implementation. I was wondering if anyone could point me to a simple
> example in which a Pyrex function receives a dictionary and (a) gets
> the integer keys (indices) of this dictionary, and (b) does something
> simple like computes the sum of the vector (where each value is cast
> as a double, say). As I said, I haven't been able to find any
> documentation on this, and my knowledge of the C API for Python is too
> underdeveloped to allow me to figure this out on my own.

If you're going the dictionary route, I would recommend getting  
familiar with http://docs.python.org/api/dictObjects.html

For example, if you want to get an item from a dictionary, put

cdef extern from "Python.h":
     void* PyDict_GetItem(object, object)  # don't declare as object  
because it's a "borrowed" reference

then in your code, you can write

value = <object>PyDict_GetItem(vector, index)

However, this will not be a huge savings from just writing value =  
vector[index] because it will still be using all the python  
dictionary machinery and wrapping integers/floats in python objects.

> Also, does anyone have any thoughts on the potential speed benefits of
> using Pyrex in this instance as opposed to using Python's built-in
> dictionary methods?

Really, if you want to make things fast (easily a 25x speedup over  
the above) you should write your own data structure. If you will be  
getting much more often then setting, or creating vectors all at once  
then reading from it a lot, something like

cdef int n # number of non-zero entries
cdef int* indices # e.g. 1,5,19,33,...
cdef double* values # e.g. v[1], v[5], v[19], v[33], ...

with a binary search on __getitem__  will be much, much faster. If  
you need to set entries a lot, then writing a custom B-tree or simple  
hashtable will still beat storing objects in Python dictionaries by a  
huge margin. All of these will be super fast to enumerate over to  
compute, e.g. the sum of the elements in your vector.

- Robert
  



More information about the Pyrex mailing list