[Pyrex] Pyrex and char *x[]

Matt Hammond matt.hammond at rd.bbc.co.uk
Wed May 16 12:53:23 UTC 2007


On Wed, 16 May 2007 12:45:51 +0100, Georg Grabler <ggrabler at gmail.com>  
wrote:

> Hello Matt,
>
> Thank you for all your answers, i really appreciate your help, since
> the tutorials did not cover any of this issues, and i'm not an expert
> on bindings as you already realized, rather the other way round, just
> starting with it.. but i'm doing my best to become better.

Thats ok - no problem. Thanks really go to Greg for making writing  
bindings as relatively easy as it is :-)

> Well, now, when i remove a package - do i need to free the pkgname
> char* as well? It's basically the memory space i allocated by using
> malloc. But i call free() on the whole structure. I'm not that used to
> C development anymore, just developed c++ and perl the past few years.
> And with new/delete it behaves a little bit different as it seems....

> If i free the ptr.pkgname, i get a glibc error (see below), so i
> expect i must not free this memory space, but just the ptr object /
> structure object. What happens with the pointers in this structure? If
> i remember right (5 years ago, -sigh-), i must free space i allocated
> using malloc, but this calls up the glibc error.

Yes, you do need to free each thing separately - as you would when writing  
ordinary C code. I'm not sure why you're getting an error. That said since  
you're manipulating pointers, if anything else is going wrong, it could  
have unexpected sideeffects that could be causing this error.

I think there might be a mistake in how you copy the string and add the  
terminator:

     cString = in_pkgname
     length = len(in_pkgname)

     --- skip stuff about getting ptr set up --

     ptr.pkgname = <char*> malloc(length+1)
     memcpy(ptr.pkgname, cString, length)
     cString[length] = 0
     ptr.pkgname = cString

This copies the string from cString into ptr.pkgname; but it then tries to  
add the null terminator to cString, and makes ptr.pkgname point at cString  
(which is the string data inside the python object). You actually want to  
add the terminator to the copy of the string, in ptr.pkgname and then  
leave it at that:

     cString = in_pkgname
     length = len(in_pkgname)

     --- skip stuff about getting ptr set up --

     ptr.pkgname = <char*> malloc(length+1)
     memcpy(ptr.pkgname, cString, length)
     ptr.pkgname[length] = 0

This could conceivably be the cause of the issue - especially if python is  
garbage collecting the pyhton string object; and is therefore deallocating  
the string storage inside it with free()

A separate issue: I'm not sure if this comparison will work:

     if ptr.pkgname == in_pkgname:

I'd guess what happens is that in_pkgname gets typecast to a char*,  
thereby making it comparable with ptr.pkgname. However all that is then  
happening is it will be checking if the two pointers are the same - not if  
the strings are the same. To be sure, I'd recommend using the C standard  
library string comparison functions. If you're finding it does work, or  
Greg wants to correct me, I'll happily defer :-)


regards


Matt

>
> On 5/16/07, Matt Hammond <matt.hammond at rd.bbc.co.uk> wrote:
>> Hi Georg,
>>
>> As I've just mentioned in a followup email, I missed the rather crucial
>> malloc needed to actually copy the string (oops).
>>
>> As an aside, I misread my own use of memcpy too - assigning the result  
>> of
>> this call to a variable is not necessary.
>>
>> It doesn't surprise me that it appears to work better before the copy
>> operation: Before using memcpy, you'll be handling the pointers to the
>> string data held inside the python string object. So it appears to work.
>> The garbage you see suffixed on each string is probably because of the
>> null-termination issue I mentioned (more on that in a moment). The worse
>> looking result when using memcpy is because the copy operation was  
>> copying
>> without first actually allocating storage for teh data to be copied to.
>> This probably resulted in it being copied to some random place, and then
>> overwriting the same location with each new copy operation - so what
>> you'll get will be complete garbage.
>>
>> The garbage suffixed on each string printed is not unexpected. Remember,
>> that the moment you stop using a python string object and just start  
>> using
>> a char* string, you've lost all the information about the length of the
>> string. Some c code and libraries assume you can work out how long the
>> string is because they assume you'll have put a null/zero byte at the  
>> end
>> of the string. Python does not do that. You therefore either need to  
>> store
>> (and use) the length of the string, or suffix your own zero byte at the
>> end of the string when you make the copy of it. That is why I specify  
>> the
>> string length in the call to memcpy. If the c library you are  
>> interfacing
>> to needs zero/null terminators at the end of the string, then you
>> definitely need to add these. Something like this I guess:
>>
>>      char *copy
>>      char *cString
>>      int length
>>
>>      cString = pyString
>>      length = len(pyString)
>>      copy = <char *> malloc(length+1)
>>      memcpy(copy,cString,length)
>>      copy[length] = 0
>>
>> Regarding garbage collection. True, I don't think it will run whilst  
>> your
>> function is being called. But you can't guarantee that it won't be run
>> between calls to the functions you have created - ie. when execution
>> returns to the python code calling your pyrex created functions. pyrex
>> isn't clever enough (someone correct me if I'm wrong!) to deduce that a
>> char* c variable you've been manipulating is actually related to a  
>> python
>> string object - so no, it won't inform python that the string is still  
>> in
>> use. Consider, for example, what happens if you decide to do pointer
>> arithmetic (yuk!) - its really rather hard for pyrex to keep track of  
>> your
>> intentions.
>>
>> regards
>>
>>
>> Matt
>>
>> On Wed, 16 May 2007 10:03:37 +0100, Georg Grabler <ggrabler at gmail.com>
>> wrote:
>>
>> > Hello, first of all - thank youf or this very detailed answer on my
>> > question. It helped me out a little, but i now want to give code
>> > examples. Since it's oss anyway (currently trying a python binding for
>> > libalpm, the arch linux package manager, with a own API interface, not
>> > reflecting the crappy original one .. and the original one is really
>> > something to hate).
>> >
>> > To the garbage collection issue: It makes me think a little about my
>> > garbage collection at all. My current code structure works fine, don't
>> > really want to run into troubles with this, so i wanted to ask
>> > further, and i don't expect the garbage collection running while i'm
>> > just running test scripts.
>> >
>> > The current addList function defines as follows (different name ofc):
>> > def addIgnorePkg(self, char *in_pkgname):
>> >     cdef t_config.pyalpm_list *ptr
>> >
>> >     if self.IgnorePkg != NULL:
>> >       ptr = self.IgnorePkg
>> >       while ptr.next != NULL:
>> >         ptr = ptr.next
>> >       ptr.next = <t_config.pyalpm_list *> malloc(sizeof(pyalpm_list))
>> >       ptr.next.last = ptr
>> >       ptr = ptr.next
>> >       ptr.next = NULL
>> >       ptr.pkgname = in_pkgname
>> >     else:
>> >       self.IgnorePkg = <t_config.pyalpm_list *>
>> > malloc(sizeof(pyalpm_list))
>> >       self.IgnorePkg.pkgname = in_pkgname
>> >       self.IgnorePkg.next = NULL
>> >       self.IgnorePkg.last = NULL
>> >     return
>> >
>> >   def addIgnorePkgList(self, ignorelist):
>> >     cdef char *cString
>> >     cdef char *copy
>> >     cdef int length
>> >
>> >     for pyString in ignorelist:
>> >         self.addIgnorePkg(pyString)
>> >     return
>> >
>> >   def remIgnorePkg(self, char *in_pkgname):
>> >     cdef t_config.pyalpm_list *ptr
>> >
>> >     if self.IgnorePkg == NULL:
>> >       return 1
>> >
>> > Accordingly, on destruction time the list self.IgnorePkg is iterated,
>> > and the objects destroyed, by using
>> > def remIgnorePkg(self, char *in_pkgname):
>> >     cdef t_config.pyalpm_list *ptr
>> >
>> >     if self.IgnorePkg == NULL:
>> >       return 1
>> >
>> >     ptr = self.IgnorePkg
>> >     while ptr != NULL:
>> >       if ptr.pkgname == in_pkgname:
>> >         if (ptr.next != NULL):
>> >           ptr.next.last = ptr.last
>> >         if (ptr.last != NULL):
>> >           ptr.last.next = ptr.next
>> >
>> >         if self.IgnorePkg == ptr and self.IgnorePkg.next != NULL:
>> >           self.IgnorePkg = self.IgnorePkg.next
>> >           self.IgnorePkg.last = NULL
>> >
>> >         ptr.next = NULL
>> >         ptr.last = NULL
>> >         free(ptr)
>> >         if ptr == self.IgnorePkg:
>> >           self.IgnorePkg = NULL
>> >         ptr = NULL
>> >         return 0
>> >       ptr = ptr.next
>> >     return 1
>> >
>> >
>> > Since i'm experiencing problems with the memcpy, i didn't use it. Does
>> > this mean, that i actually could come into troubles for python
>> > releasing my objects in the structure? Doesn't pyrex tell python that
>> > the object is still in use?
>> >
>> > To be true, the malloc:
>> > I used the following structure you wrote, which gave me the error that
>> > the void* can't be converted into a char*
>> > copy = memcpy(copy, cString, length)
>> > So i casted the memcpy result
>> > copy = <char*> memcpy(copy, cString, length)
>> >
>> > resultin in
>> > new1 ���
>> > new2 ���
>> > new3 ���
>> > new4 ���
>> >
>> > what's quite strange. So well, it did not work out, since the values
>> > were new1, new2, new3, new4. After the copy, by printing the values, i
>> > got the following (even more strange output)
>> > oew4 ���
>> > stdout
>> > stdout
>> > stdout
>> >
>> > By using the structure above, it works flawlessly.
>> > I've attached the current config.pyx file to the e-mail, sometimes
>> > it's easier to see the whole thing.
>> >
>> > Thank you,
>> > Georg
>> >
>> > On 5/16/07, Matt Hammond <matt.hammond at rd.bbc.co.uk> wrote:
>> >> On Wed, 16 May 2007 06:29:41 +0100, Georg Grabler  
>> <ggrabler at gmail.com>
>> >> wrote:
>> >>
>> >> > Hello everybody.
>> >> >
>> >> > I want an array to be passed to a function, so basically i started  
>> the
>> >> > function as follows:
>> >> >
>> >> > def addToList (self, char *array[]):
>> >> > ....
>> >> >
>> >> > This throws an error compiling:
>> >> > "Cannot convert Python object argument to type 'char(*(*))'"
>> >>
>> >> Functions and classes defined python style (without a "cdef" prefix)  
>> are
>> >> made available to python and therefore can only deal with python
>> >> objects -
>> >> which c-style arrays are not. Similarly c-style strings are not the  
>> same
>> >> as python strings.
>> >>
>> >> Could you give a simple example of how you would use addToList from
>> >> python? Would you be passing it something like a list or tuple, or
>> >> using a
>> >> library defined specialist array type?
>> >>
>> >> Assuming you'd be passing it a list/tuple, eg:
>> >>
>> >>      items=["hello","doctor","yesterday","tomorrow","continue"]
>> >>      X.addToList(items)
>> >>
>> >> There is also the complication, since you're dealing with strings, of
>> >> handling garbage collection issues: if you don't copy the string data
>> >> out
>> >> of the python string object, you need to inform python that you're  
>> still
>> >> using that object, otherwise it may get garbage collected and the
>> >> storage
>> >> containing the string might be re-used!
>> >>
>> >> Of course, the simplest, though not necessarily most efficient,  
>> solution
>> >> is to copy the string data out of the string into a newly allocated
>> >> character array.
>> >>
>> >> If you want to declare c datatypes, or functions with c datatype
>> >> arguments, then they need to be prefixed with cdef; and they will  
>> not be
>> >> accessible from python (the specific variables, functions or classes
>> >> declared with cdef)
>> >>
>> >>
>> >> Here's roughly what I end up doing (in the .pyx file):
>> >>
>> >>
>> >>      cdef extern from "string.h":
>> >>          cdef void *memcpy(void *, void *, int)
>> >>
>> >>
>> >>      class MyClass:
>> >>
>> >>          def addToList(self, listitems):
>> >>              cdef char *cString
>> >>              cdef char *copy
>> >>              cdef int length
>> >>
>> >>              for pyString in listitems:
>> >>                  cString = pyString        # pyrex auto converts
>> >>                  length = len(pyString)
>> >>
>> >>                  # now make our own copy of the string data
>> >>                  copy = memcpy(copy, cString, length)
>> >>
>> >>                  self.addList(copy)
>> >>
>> >>
>> >>          cdef addList(self, char *str):
>> >>              # your code
>> >>
>> >> Caveat: I've not tested this code above; but as I said, I've found  
>> this
>> >> general approach works for me.
>> >>
>> >> Remember python string's don't use null termination conventions, so  
>> if
>> >> you're retrieving them from your list for later use, you probably  
>> also
>> >> need to store the length of the string somewhere too. Hope that's not
>> >> teaching you to suck eggs :-)
>> >>
>> >>
>> >> Hope this helps
>> >>
>> >>
>> >> Matt
>> >> --
>> >> | Matt Hammond
>> >> | Research Engineer, FM&T, BBC, Kingswood Warren, Tadworth, Surrey,  
>> UK
>> >> | http://kamaelia.sf.net/
>> >> | http://www.bbc.co.uk/rd/
>> >>
>> >> _______________________________________________
>> >> Pyrex mailing list
>> >> Pyrex at lists.copyleft.no
>> >> http://lists.copyleft.no/mailman/listinfo/pyrex
>> >>
>>
>>
>>
>> --
>> | Matt Hammond
>> | Research Engineer, FM&T, BBC, Kingswood Warren, Tadworth, Surrey, UK
>> | http://kamaelia.sf.net/
>> | http://www.bbc.co.uk/rd/
>>



-- 
| Matt Hammond
| Research Engineer, FM&T, BBC, Kingswood Warren, Tadworth, Surrey, UK
| http://kamaelia.sf.net/
| http://www.bbc.co.uk/rd/



More information about the Pyrex mailing list