Python memory handling - Python

This is a discussion on Python memory handling - Python ; Greets, I've some troubles getting my memory freed by python, how can I force it to release the memory ? I've tried del and gc.collect() with no success. Here is a code sample, parsing an XML file under linux python ...

+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 10 of 18

Python memory handling

  1. Default Python memory handling

    Greets,

    I've some troubles getting my memory freed by python, how can I force
    it to release the memory ?
    I've tried del and gc.collect() with no success.
    Here is a code sample, parsing an XML file under linux python 2.4
    (same problem with windows 2.5, tried with the first example) :
    #Python interpreter memory usage : 1.1 Mb private, 1.4 Mb shared
    #Using http://www.pixelbeat.org/scripts/ps_mem.py to get memory
    information
    import cElementTree as ElementTree #meminfo: 2.3 Mb private, 1.6 Mb
    shared
    import gc #no memory change

    et=ElementTree.parse('primary.xml') #meminfo: 34.6 Mb private, 1.6 Mb
    shared
    del et #no memory change
    gc.collect() #no memory change

    So how can I free the 32.3 Mb taken by ElementTree ??

    The same problem here with a simple file.readlines()
    #Python interpreter memory usage : 1.1 Mb private, 1.4 Mb shared
    import gc #no memory change
    f=open('primary.xml') #no memory change
    data=f.readlines() #meminfo: 12 Mb private, 1.4 Mb shared
    del data #meminfo: 11.5 Mb private, 1.4 Mb shared
    gc.collect() # no memory change

    But works great with file.read() :
    #Python interpreter memory usage : 1.1 Mb private, 1.4 Mb shared
    import gc #no memory change
    f=open('primary.xml') #no memory change
    data=f.read() #meminfo: 7.3Mb private, 1.4 Mb shared
    del data #meminfo: 1.1 Mb private, 1.4 Mb shared
    gc.collect() # no memory change

    So as I can see, python maintain a memory pool for lists.
    In my first example, if I reparse the xml file, the memory doesn't
    grow very much (0.1 Mb precisely)
    So I think I'm right with the memory pool.

    But is there a way to force python to release this memory ?!

    Regards,
    FP


  2. Default Re: Python memory handling

    In <1180611604.247696.149060@h2g2000hsg.googlegroups.com>, frederic.pica
    wrote:

    > So as I can see, python maintain a memory pool for lists.
    > In my first example, if I reparse the xml file, the memory doesn't
    > grow very much (0.1 Mb precisely)
    > So I think I'm right with the memory pool.
    >
    > But is there a way to force python to release this memory ?!


    AFAIK not. But why is this important as long as the memory consumption
    doesn't grow constantly? The virtual memory management of the operating
    system usually takes care that only actually used memory is in physical
    RAM.

    Ciao,
    Marc 'BlackJack' Rintsch

  3. Default Re: Python memory handling

    On 31 mai, 14:16, Marc 'BlackJack' Rintsch <bj_...@gmx.net> wrote:
    > In <1180611604.247696.149...@h2g2000hsg.googlegroups.com>, frederic.pica
    > wrote:
    >
    > > So as I can see, python maintain a memory pool for lists.
    > > In my first example, if I reparse the xml file, the memory doesn't
    > > grow very much (0.1 Mb precisely)
    > > So I think I'm right with the memory pool.

    >
    > > But is there a way to force python to release this memory ?!

    >
    > AFAIK not. But why is this important as long as the memory consumption
    > doesn't grow constantly? The virtual memory management of the operating
    > system usually takes care that only actually used memory is in physical
    > RAM.
    >
    > Ciao,
    > Marc 'BlackJack' Rintsch


    Because I'm an adept of small is beautiful, of course the OS will swap
    the unused memory if needed.
    If I daemonize this application I will have a constant 40 Mb used, not
    yet free for others applications. If another application need this
    memory, the OS will have to swap and loose time for the other
    application... And I'm not sure that the system will swap first this
    unused memory, it could also swap first another application... AFAIK.
    And these 40 Mb are only for a 7 Mb xml file, what about parsing a big
    one, like 50 Mb ?

    I would have preferred to have the choice of manually freeing this
    unused memory or setting manually the size of the memory pool

    Regards,
    FP


  4. Default Re: Python memory handling

    Hello,

    frederic.pica@ wrote:
    > I've some troubles getting my memory freed by python, how can I force
    > it to release the memory ?
    > I've tried del and gc.collect() with no success.


    [...]

    > The same problem here with a simple file.readlines()
    > #Python interpreter memory usage : 1.1 Mb private, 1.4 Mb shared
    > import gc #no memory change
    > f=open('primary.xml') #no memory change
    > data=f.readlines() #meminfo: 12 Mb private, 1.4 Mb shared
    > del data #meminfo: 11.5 Mb private, 1.4 Mb shared
    > gc.collect() # no memory change
    >
    > But works great with file.read() :
    > #Python interpreter memory usage : 1.1 Mb private, 1.4 Mb shared
    > import gc #no memory change
    > f=open('primary.xml') #no memory change
    > data=f.read() #meminfo: 7.3Mb private, 1.4 Mb shared
    > del data #meminfo: 1.1 Mb private, 1.4 Mb shared
    > gc.collect() # no memory change
    >
    > So as I can see, python maintain a memory pool for lists.
    > In my first example, if I reparse the xml file, the memory doesn't
    > grow very much (0.1 Mb precisely)
    > So I think I'm right with the memory pool.
    >
    > But is there a way to force python to release this memory ?!


    This is from the 2.5 series release notes
    (http://www.python.org/download/relea...5.1/NEWS.txt):

    "[...]

    - Patch #1123430: Python's small-object allocator now returns an arena to
    the system ``free()`` when all memory within an arena becomes unused
    again. Prior to Python 2.5, arenas (256KB chunks of memory) were never
    freed. Some applications will see a drop in virtual memory size now,
    especially long-running applications that, from time to time, temporarily
    use a large number of small objects. Note that when Python returns an
    arena to the platform C's ``free()``, there's no guarantee that the
    platform C library will in turn return that memory to the operating
    system.
    The effect of the patch is to stop making that impossible, and in
    tests it
    appears to be effective at least on Microsoft C and gcc-based systems.
    Thanks to Evan Jones for hard work and patience.

    [...]"

    So with 2.4 under linux (as you tested) you will indeed not always get
    the used memory back, with respect to lots of small objects being
    collected.

    The difference therefore (I think) you see between doing an f.read() and
    an f.readlines() is that the former reads in the whole file as one large
    string object (i.e. not a small object), while the latter returns a list
    of lines where each line is a python object.

    I wonder how 2.5 would work out on linux in this situation for you.

    Paul

  5. Default Re: Python memory handling

    On 31 mai, 16:22, Paul Melis <p...@science.uva.nl> wrote:
    > Hello,
    >
    > frederic.p...@ wrote:
    > > I've some troubles getting my memory freed by python, how can I force
    > > it to release the memory ?
    > > I've tried del and gc.collect() with no success.

    >
    > [...]
    >
    >
    >
    > > The same problem here with a simple file.readlines()
    > > #Python interpreter memory usage : 1.1 Mb private, 1.4 Mb shared
    > > import gc #no memory change
    > > f=open('primary.xml') #no memory change
    > > data=f.readlines() #meminfo: 12 Mb private, 1.4 Mb shared
    > > del data #meminfo: 11.5 Mb private, 1.4 Mb shared
    > > gc.collect() # no memory change

    >
    > > But works great with file.read() :
    > > #Python interpreter memory usage : 1.1 Mb private, 1.4 Mb shared
    > > import gc #no memory change
    > > f=open('primary.xml') #no memory change
    > > data=f.read() #meminfo: 7.3Mb private, 1.4 Mb shared
    > > del data #meminfo: 1.1 Mb private, 1.4 Mb shared
    > > gc.collect() # no memory change

    >
    > > So as I can see, python maintain a memory pool for lists.
    > > In my first example, if I reparse the xml file, the memory doesn't
    > > grow very much (0.1 Mb precisely)
    > > So I think I'm right with the memory pool.

    >
    > > But is there a way to force python to release this memory ?!

    >
    > This is from the 2.5 series release notes
    > (http://www.python.org/download/relea...5.1/NEWS.txt):
    >
    > "[...]
    >
    > - Patch #1123430: Python's small-object allocator now returns an arena to
    > the system ``free()`` when all memory within an arena becomes unused
    > again. Prior to Python 2.5, arenas (256KB chunks of memory) were never
    > freed. Some applications will see a drop in virtual memory size now,
    > especially long-running applications that, from time to time, temporarily
    > use a large number of small objects. Note that when Python returns an
    > arena to the platform C's ``free()``, there's no guarantee that the
    > platform C library will in turn return that memory to the operating
    > system.
    > The effect of the patch is to stop making that impossible, and in
    > tests it
    > appears to be effective at least on Microsoft C and gcc-based systems.
    > Thanks to Evan Jones for hard work and patience.
    >
    > [...]"
    >
    > So with 2.4 under linux (as you tested) you will indeed not always get
    > the used memory back, with respect to lots of small objects being
    > collected.
    >
    > The difference therefore (I think) you see between doing an f.read() and
    > an f.readlines() is that the former reads in the whole file as one large
    > string object (i.e. not a small object), while the latter returns a list
    > of lines where each line is a python object.
    >
    > I wonder how 2.5 would work out on linux in this situation for you.
    >
    > Paul



    Hello,

    I will try later with python 2.5 under linux, but as far as I can see,
    it's the same problem under my windows python 2.5
    After reading this document :
    http://evanjones.ca/memoryallocator/python-memory.pdf

    I think it's because list or dictionnaries are used by the parser, and
    python use an internal memory pool (not pymalloc) for them...

    Regards,
    FP


  6. Default Re: Python memory handling

    If the memory usage is that important to you, you could break this out
    into 2 programs, one that starts the jobs when needed, the other that
    does the processing and then quits.
    As long as the python startup time isn't an issue for you.


    On 31 May 2007 04:40:04 -0700, frederic.pica@
    <frederic.pica@> wrote:
    > Greets,
    >
    > I've some troubles getting my memory freed by python, how can I force
    > it to release the memory ?
    > I've tried del and gc.collect() with no success.
    > Here is a code sample, parsing an XML file under linux python 2.4
    > (same problem with windows 2.5, tried with the first example) :
    > #Python interpreter memory usage : 1.1 Mb private, 1.4 Mb shared
    > #Using http://www.pixelbeat.org/scripts/ps_mem.py to get memory
    > information
    > import cElementTree as ElementTree #meminfo: 2.3 Mb private, 1.6 Mb
    > shared
    > import gc #no memory change
    >
    > et=ElementTree.parse('primary.xml') #meminfo: 34.6 Mb private, 1.6 Mb
    > shared
    > del et #no memory change
    > gc.collect() #no memory change
    >
    > So how can I free the 32.3 Mb taken by ElementTree ??
    >
    > The same problem here with a simple file.readlines()
    > #Python interpreter memory usage : 1.1 Mb private, 1.4 Mb shared
    > import gc #no memory change
    > f=open('primary.xml') #no memory change
    > data=f.readlines() #meminfo: 12 Mb private, 1.4 Mb shared
    > del data #meminfo: 11.5 Mb private, 1.4 Mb shared
    > gc.collect() # no memory change
    >
    > But works great with file.read() :
    > #Python interpreter memory usage : 1.1 Mb private, 1.4 Mb shared
    > import gc #no memory change
    > f=open('primary.xml') #no memory change
    > data=f.read() #meminfo: 7.3Mb private, 1.4 Mb shared
    > del data #meminfo: 1.1 Mb private, 1.4 Mb shared
    > gc.collect() # no memory change
    >
    > So as I can see, python maintain a memory pool for lists.
    > In my first example, if I reparse the xml file, the memory doesn't
    > grow very much (0.1 Mb precisely)
    > So I think I'm right with the memory pool.
    >
    > But is there a way to force python to release this memory ?!
    >
    > Regards,
    > FP
    >
    > --
    > http://mail.python.org/mailman/listinfo/python-list
    >


  7. Default Re: Python memory handling

    On 31 mai, 17:29, "Josh Bloom" <joshbl...@> wrote:
    > If the memory usage is that important to you, you could break this out
    > into 2 programs, one that starts the jobs when needed, the other that
    > does the processing and then quits.
    > As long as the python startup time isn't an issue for you.
    >
    > On 31 May 2007 04:40:04 -0700, frederic.p...@
    >
    > <frederic.p...@> wrote:
    > > Greets,

    >
    > > I've some troubles getting my memory freed by python, how can I force
    > > it to release the memory ?
    > > I've tried del and gc.collect() with no success.
    > > Here is a code sample, parsing an XML file under linux python 2.4
    > > (same problem with windows 2.5, tried with the first example) :
    > > #Python interpreter memory usage : 1.1 Mb private, 1.4 Mb shared
    > > #Usinghttp://www.pixelbeat.org/scripts/ps_mem.pyto get memory
    > > information
    > > import cElementTree as ElementTree #meminfo: 2.3 Mb private, 1.6 Mb
    > > shared
    > > import gc #no memory change

    >
    > > et=ElementTree.parse('primary.xml') #meminfo: 34.6 Mb private, 1.6 Mb
    > > shared
    > > del et #no memory change
    > > gc.collect() #no memory change

    >
    > > So how can I free the 32.3 Mb taken by ElementTree ??

    >
    > > The same problem here with a simple file.readlines()
    > > #Python interpreter memory usage : 1.1 Mb private, 1.4 Mb shared
    > > import gc #no memory change
    > > f=open('primary.xml') #no memory change
    > > data=f.readlines() #meminfo: 12 Mb private, 1.4 Mb shared
    > > del data #meminfo: 11.5 Mb private, 1.4 Mb shared
    > > gc.collect() # no memory change

    >
    > > But works great with file.read() :
    > > #Python interpreter memory usage : 1.1 Mb private, 1.4 Mb shared
    > > import gc #no memory change
    > > f=open('primary.xml') #no memory change
    > > data=f.read() #meminfo: 7.3Mb private, 1.4 Mb shared
    > > del data #meminfo: 1.1 Mb private, 1.4 Mb shared
    > > gc.collect() # no memory change

    >
    > > So as I can see, python maintain a memory pool for lists.
    > > In my first example, if I reparse the xml file, the memory doesn't
    > > grow very much (0.1 Mb precisely)
    > > So I think I'm right with the memory pool.

    >
    > > But is there a way to force python to release this memory ?!

    >
    > > Regards,
    > > FP

    >
    > > --
    > >http://mail.python.org/mailman/listinfo/python-list



    Yes it's a solution, but I think it's not a good way, I did'nt want to
    use bad hacks to bypass a python specific problem.
    And the problem is everywhere, every python having to manage big
    files.
    I've tried xml.dom.minidom using a 66 Mb xml file => 675 Mb of memory
    that will never be freed. But that time I've got many unreachable
    object when running gc.collect()
    Using the same file with cElementTree took me 217 Mb, with no
    unreachable object.
    For me it's not a good behavior, it's not a good way to let the system
    swap this unused memory instead of freeing it.
    I think it's a really good idea to have a memory pool for performance
    reason, but why is there no 'free block' limit ?
    Python is a really really good language that can do many things in a
    clear, easier and performance way I think. It has always feet all my
    needs. But I can't imagine there is no good solution for that problem,
    by limiting the free block pool size or best, letting the user specify
    this limit and even better, letting the user completely freeing it
    (with also the limit manual specification)

    Like:
    import pool
    pool.free()
    pool.limit(size in megabytes)

    Why not letting the user choosing that, why not giving the user more
    flexibility ?
    I will try later under linux with the latest stable python

    Regards,
    FP


  8. Default Re: Python memory handling

    > Like:
    > import pool
    > pool.free()
    > pool.limit(size in megabytes)
    >
    > Why not letting the user choosing that, why not giving the user more
    > flexibility ?
    > I will try later under linux with the latest stable python
    >
    > Regards,
    > FP
    >


    The idea that memory allocated to a process but not being used is a
    "cost" is really a fallacy, at least on modern virtual memory sytems.
    It matters more for fully GCed languages, where the entire working set
    needs to be scanned, but the Python GC is only for breaking refcounts
    and doesn't need to scan the entire memory space.

    There are some corner cases where it matters, and thats why it was
    addressed for 2.5, but in general it's not something that you need to
    worry about.

  9. Default Re: Python memory handling

    * (31 May 2007 06:15:18 -0700)
    > On 31 mai, 14:16, Marc 'BlackJack' Rintsch <bj_...@gmx.net> wrote:
    > > In <1180611604.247696.149...@h2g2000hsg.googlegroups.com>, frederic.pica
    > > wrote:

    > And I'm not sure that the system will swap first this
    > unused memory, it could also swap first another application... AFAIK.


    Definitely not, this is the principal function of virtual memory in
    every Operating System.

  10. Default Re: Python memory handling

    * Chris Mellon (Thu, 31 May 2007 12:10:07 -0500)
    > > Like:
    > > import pool
    > > pool.free()
    > > pool.limit(size in megabytes)
    > >
    > > Why not letting the user choosing that, why not giving the user more
    > > flexibility ?
    > > I will try later under linux with the latest stable python
    > >
    > > Regards,
    > > FP
    > >

    >
    > The idea that memory allocated to a process but not being used is a
    > "cost" is really a fallacy, at least on modern virtual memory sytems.
    > It matters more for fully GCed languages, where the entire working set
    > needs to be scanned, but the Python GC is only for breaking refcounts
    > and doesn't need to scan the entire memory space.
    >
    > There are some corner cases where it matters, and thats why it was
    > addressed for 2.5, but in general it's not something that you need to
    > worry about.


    If it's swapped to disk than this is a big concern. If your Python app
    allocates 600 MB of RAM and does not use 550 MB after one minute and
    this unused memory gets into the page file then the Operating System
    has to allocate and write 550 MB onto your hard disk. Big deal.

    Thorsten

+ Reply to Thread
Page 1 of 2 1 2 LastLast

Similar Threads

  1. Memory handling in IDL VM
    By Application Development in forum Idl-pvwave
    Replies: 0
    Last Post: 10-10-2007, 06:07 PM
  2. Replies: 2
    Last Post: 09-20-2007, 04:10 AM
  3. Handling a pointer pointing to deallocated memory chunk
    By Application Development in forum c++
    Replies: 1
    Last Post: 07-19-2007, 04:14 PM
  4. handling memory leaks
    By Application Development in forum Adobe Indesign
    Replies: 0
    Last Post: 01-19-2007, 05:56 AM
  5. handling memory leaks
    By Application Development in forum Adobe Indesign
    Replies: 0
    Last Post: 01-19-2007, 05:56 AM