[Pyrex] any ideas to speed this up?

Thu Feb 23 03:20:23 CET 2006

Hello,

I am a bit new to python and pyrex, and was wondering if anyone can help me here.  I 
am trying to translate some Matlab/mex code to Python, for doing neural simulations. 
  This application is definitely computing-time limited, and I need to optimize at 
least one inner loop of the code, or perhaps even rethink the algorithm.  I've coded 
it into Pyrex, because it seems straightforward and efficient.  In doing so, I have 
made sure to

1) take any python objects, and extract C datatypes from them outside of the main 
loop, like:

cdef double eta
eta=params['eta']

2) all of the variables in the loop are C datatypes, either ints or pointers.

Is there something else that I could do to make it speed up?

For my algorithm, the procedure is very simple, after initializing any variables:

1) select a random input vector, which I will call "x".  right now I have it as an 
array, and I choose columns from that array randomly.  in other cases, I may need to 
take an image, select a patch, and then make that a column vector.

2) calculate an output value, which is the dot product of the "x" and a weight 
vector, "w", so

     y=dot(x,w)

3) modify the weight vector based on a matrix equation, like:

     w=w+ eta * (y*x - y**2*w)
               ^
               |
               +---- learning rate constant

4) repeat steps 1-3 many times

I've organized it like:

for e in 100:   # outer loop
     for i in 1000:  # inner loop
         (steps 1-3)

     display things.

so that the bulk of the computation is in the inner loop, and is amenable to 
converting to a faster language.  This is my issue:

straight python, in the example posted below for 250000 inner-loop steps, takes 20 
seconds for each outer-loop step.  I tried Pyrex, which should work very fast on such 
a problem, takes about 8.5 seconds per outer-loop step.  The same code as a C-mex 
file in matlab takes 1.5 seconds per outer-loop step.

Given the huge difference between the Pyrex and the Mex, I feel that there is 
something I am doing wrong, because the C-code for both should run comparably. 
Perhaps the approach is wrong?  I'm willing to take any suggestions!

The code is below.

		thanks,

			Brian Blais

#--------------------------------
#  main.py

from dohebb import *
import pylab as p
from Numeric import *
from RandomArray import *
import time

import numpy

x=random((100,1000))    # 1000 input vectors

numpats=x.shape[0]
w=random((numpats,1));

th=random((1,1))

params={}
params['eta']=0.001;
params['tau']=100.0;
old_mx=0;
for e in range(100):

     rnd=randint(0,numpats,250000)
     t1=time.time()
     if 0:  # straight python
         for i in range(len(rnd)):
             pat=rnd[i]
             xx=reshape(x[:,pat],(1,-1))
             y=matrixmultiply(xx,w)
             w+=params['eta']*(y*transpose(xx)-y**2*w);
             th=th+(1.0/params['tau'])*(y**2-th);
     elif 1: # pyrex with Numeric
         dohebb(params,w,th,x,rnd)

     print time.time()-t1

p.plot(w,'o-')
p.xlabel('weights')
p.show()

#----------------------------------------------------

# dohebb.pyx

cdef extern from "Numeric/arrayobject.h":

   struct PyArray_Descr:
     int type_num, elsize
     char type

   ctypedef class Numeric.ArrayType [object PyArrayObject]:
     cdef char *data
     cdef int nd
     cdef int *dimensions, *strides
     cdef object base
     cdef PyArray_Descr *descr
     cdef int flags

def dohebb(params,ArrayType w,ArrayType th,ArrayType X,ArrayType rnd):

     cdef int num_iterations
     cdef int num_inputs
     cdef int offset
     cdef double *wp,*xp,*thp
     cdef int *rndp
     cdef double eta,tau

     eta=params['eta']  # learning rate
     tau=params['tau']  # used for variance estimate

     cdef double y
     num_iterations=rnd.dimensions[0]
     num_inputs=w.dimensions[0]

     # get the pointers
     wp=<double *>w.data
     xp=<double *>X.data
     rndp=<int *>rnd.data
     thp=<double *>th.data

     for it from 0 <= it < num_iterations:

         offset=rndp[it]*num_inputs

         # calculate the output
         y=0.0
         for i from 0 <= i < num_inputs:
             y=y+wp[i]*xp[i+offset]

         # change in the weights
         for i from 0 <= i < num_inputs:
             wp[i]=wp[i]+eta*(y*xp[i+offset] - y*y*wp[i])

         # estimate the variance
         thp[0]=thp[0]+(1.0/tau)*(y**2-thp[0])

-- 
-----------------

             bblais at bryant.edu
             http://web.bryant.edu/~bblais