[Pyrex] Question, plus suggestions for improvement?

Josiah Carlson jcarlson at uci.edu
Wed Aug 2 00:50:07 UTC 2006


"Giovanni Bajo" <rasky at develer.com> wrote:
> 
> Josiah Carlson wrote:
> 
> >> Depends. What if you have an unbounded size, and it's often going to
> >> be larger than Python standard pools so PyMem_Malloc won't help?
> >
> > PyMem_Malloc isn't any better than malloc.
> 
> Yeah, maybe I got the API wrong. What I meant is that if you have an
> unbounded size your *only* way to go is to use malloc(). You can't even try
> to somehow reuse Python pool-based allocator which is faster.
> 
> 
> > What I meant by 'stack
> > allocation', and "a maximum fixed size" is something like...
> >
> > int fcn(...) {
> >     int stack_memory[maximum_fixed_size];
> >     ...
> > }
> 
> Yes, this is why I said "what if you have unbounded size". I'm not sure of
> what algorithms you are thinking of, but the ones I think can't really have
> some hardcoded limit. One strategy is to get the best of the two worlds: if
> the memory is smaller than a certain amount, use the stack, otherwise go
> through malloc/free.

From the look of the OP's source, it iterates through a predefined
algorithm for generating all of the possible permutations of a sequence.
As such, only short sequences would have any chance of executing to
completion. Any sequence with more than about 16 unique values likely
won't execute to completion, and any significantly larger sequence would
only permute through an insignificant fraction of the total permutation
space.

So in this case, there is actually a practical limit to the maximum size
of the sequence, and that is somewhere on the order of 16 (about 2**44
possible outputs), certainly not more than 21 (about 2**65 possible
outputs).


> >> It also depends on how many times you access the list. If you do it
> >> many times, it's much better to pay for the penalty of malloc once,
> >> rather than going through PyList_GetItem + PyFloat_AsFloat thousands
> >> of times.
> >
> > I'm not familliar with the algorithm that the OP has chosen to
> > implement for permutations, but I wasn't speaking directly to the
> > OP's particular application with regards to performance.  I was
> > stating that generally malloc/free tends to have some performance
> > gotchas that most people
> > don't realize until they stop using it.
> 
> I agree that, generically speaking, incrementing a register (the stack
> pointer) is faster than going through a complex library routine.
> Nonetheless, in this specific context (and in that of converting arbitrary
> Python structure to C sturctures in Pyrex for speed reasons), I believe that
> malloc() is the only possible compromise, with some possible cheating like
> proposed above.

I disagree.  Considering what the algorithm actually does, unless the OP
plans on running his algorithm for years, or plans on parallelizing it
for millions of machines, the liklihood of it being worth it to generate
permutations for large sequences is fairly low.

Further, one doesn't need to actually permute the elements until they
are returned.  In fact, permuting a single integer sequence, whose
values are the position of the original elements*, would solve this
particular problem for all possible inputs regardless of type.  Also,
removing the repeated self.list references could also speed up access 
(I never used pyrex classes, so I don't know for certain).


> I also would like to reiterate that the overhead of malloc, even if larger
> than a stack-based allocation, can still be in the noise, compared with the
> whole execution of the algorithm. Before even implementing the simple
> fallback "use the stack if size < N", I would suggest some profiling.

In this case it is certainly likely.

 - Josiah

* Here is what I would try using...

cdef class Permute2:

    cdef int n, first
    cdef int lst3[32]
    cdef object lst, lst2

    def __init__(self, lst):
        cdef int i
        self.lst = lst
        self.lst2 = lst[:]
        self.first = 0
        self.n = len(lst) - 1
        if n >= 31:
            raise Exception, "Are you kidding?"
        for i from 0 <= i < len(lst):
            self.lst3[i] = i

    def __iter__(self):
        return self

    def __next__(self):
        cdef int j, l, k, x, y, z, sn
        cdef int* sl
        
        if self.first == 0:
            self.first = 1
            return self.lst2
        if self.n == 1:
            return [self.lst[1], self.lst[0]]
        
        sn = self.n
        sl = self.lst3
        while 1:
            if sl[-2] < sl[-1]:
                sl[-2], sl[-1] = sl[-1], sl[-2]
            elif sl[-3] < sl[-2]:
                if sl[-3] < sl[-1]:
                    sl[-3], sl[-2], sl[-1] = sl[-1], sl[-3], sl[-2]
                else:
                    sl[-3], sl[-2], sl[-1] = sl[-2], sl[-1], sl[-3]
            else:
                j = sn - 3
                if j < 0: raise StopIteration
                y = sl[j]
                x = sl[-3]
                z = sl[-1]
                while y >= x:
                    j = j - 1
                    if j < 0: raise StopIteration
                    x = y
                    y = sl[j]
                if y < z:
                    sl[j] = z
                    sl[j+1] = y
                    sl[sn] = x
                else:
                    l = sn - 1
                    while y >= sl[l]:
                        l = l - 1
                    sl[j], sl[l] = sl[l], y
                    sl[self.n], sl[j+1] = sl[j+1], sl[sn]
                k = j + 2
                l = sn - 1
                while k < l:
                    sl[k], sl[l] = sl[l], sl[k]
                    k = k + 1
                    l = l - 1
            
            lst = self.lst
            lst2 = self.lst2
            z = self.n + 1
            for j from 0 <= j < z:
                lst2[j] = lst[sl[j]]
            
            return sl




More information about the Pyrex mailing list