| Register | FAQ | Calendar | Search | Today's Posts | Mark Forums Read |
|
#1
| |||
| |||
| Hello, I have a program that uses increasingly more memory over time and eventually crashes because of heap exhaustion. The program performs many independent steps which, when finished, should leave no references to any of the allocated objects (this is a web server). There are few global structures that are populated only at the startup of the program and they do not hold any references to memory created in these steps. I am at a loss as to which part of the code allocates the memory that cannot be reclaimed, and which long-living structures reference this memory. SBCL has a number of functions that appear to be useful in tracking down the culprit. In particular, sb-vm:list-allocated-objects, and sb- vm::list-referencing-objects. I wanted to try two different strategies with these functions - creating "checkpoints" by doing a full gc and calling list-allocated-objects, and then diffing the results to see which objects allocated between the checkpoints could not be deallocated; and building a graph of object references, collapsing cycles into single vertices, and doing a topological sort on the resulting DAG to figure out which objects hold most of the memory. Unfortunately the two functions above are rather fragile - equality primitives don't work on all the objects returned and result in strange errors, memory faults occassionally occur, etc. I was unable to get around these limitations to implement either of the above strategies. I am not sure how to proceed at this point. Are there alternaitve (but more stable) functions on other implementations that I could use? Are there other tools to debug this sort of thing? How would you go about solving this problem? Regards, - Slava Akhmechet |
|
#2
| |||
| |||
| CoffeeMug <coffeemug@gmail.com> writes: > Hello, > > I have a program that uses increasingly more memory over time and > eventually crashes because of heap exhaustion. The program performs > many independent steps which, when finished, should leave no > references to any of the allocated objects (this is a web server). > There are few global structures that are populated only at the startup > of the program and they do not hold any references to memory created > in these steps. I am at a loss as to which part of the code allocates > the memory that cannot be reclaimed, and which long-living structures > reference this memory. > > SBCL has a number of functions that appear to be useful in tracking > down the culprit. In particular, sb-vm:list-allocated-objects, and sb- > vm::list-referencing-objects. I wanted to try two different strategies > with these functions - creating "checkpoints" by doing a full gc and > calling list-allocated-objects, and then diffing the results to see > which objects allocated between the checkpoints could not be > deallocated; and building a graph of object references, collapsing > cycles into single vertices, and doing a topological sort on the > resulting DAG to figure out which objects hold most of the memory. > Unfortunately the two functions above are rather fragile - equality > primitives don't work on all the objects returned and result in > strange errors, memory faults occassionally occur, etc. I was unable > to get around these limitations to implement either of the above > strategies. > > I am not sure how to proceed at this point. Are there alternaitve (but > more stable) functions on other implementations that I could use? Are > there other tools to debug this sort of thing? How would you go about > solving this problem? Keeping these lists in the image would prevent the garbage collector to collect these items... You'd have to dump them to files, and diff the files. Did yo udo a deep garbage collection? AFAIK, SBCL garbage collector only collects garbage from the youngest generation. (But I guess it would do automatically a deep garbage collection on memory exhaustion). Also, note that on X86, it is conservative: http://sbcl-internals.cliki.net/GENCGC This could explain some leaking. -- __Pascal Bourguignon__ http://www.informatimago.com/ "Indentation! -- I will show you how to indent when I indent your skull!" |
|
#3
| |||
| |||
| On Jun 12, 1:58*am, p...@informatimago.com (Pascal J. Bourguignon) wrote: > Keeping these lists in the image would prevent the garbage collector > to collect these items... *You'd have to dump them to files, and diff > the files. I don't think this is a problem. I create a checkpoint by doing a full gc, then grabbing a list of objects. I then run some more code, and do another checkpoint in the same manner. If I do a diff now, it should give me objects allocatted between checkpoint one and checkpoint two that could not be reclaimed (note that I do a gc before I generate the list, so presumably objects in the list are already referenced by someone else). A bigger problem is allocatting things on the heap while walking the heap (calling list-allocated-objects does exactly that). I think this is why these functions are so unstable. > Did yo udo a deep garbage collection? Yes. > Also, note that on X86, it is conservative:http://sbcl-internals.cliki.net/GENCGC I think this could explain minor "accidental" leaking, but not consistent large leaks I am seeing. I've also verified this behavior on other implementations. |
|
#4
| |||
| |||
| On Thu, 12 Jun 2008 10:13:30 -0700 (PDT), CoffeeMug <coffeemug@gmail.com> wrote: >On Jun 12, 1:58*am, p...@informatimago.com (Pascal J. Bourguignon) >wrote: >> Keeping these lists in the image would prevent the garbage collector >> to collect these items... *You'd have to dump them to files, and diff >> the files. > >I don't think this is a problem. I create a checkpoint by doing a full >gc, then grabbing a list of objects. I then run some more code, and do >another checkpoint in the same manner. If I do a diff now, it should >give me objects allocatted between checkpoint one and checkpoint two >that could not be reclaimed (note that I do a gc before I generate the >list, so presumably objects in the list are already referenced by >someone else). Yes, but those references might not still exist at the second GC, in which case being in the list will prevent the objects being collected. You really need to somehow dump the objects to a file, free the list and do another GC to start the next allocation round fresh. I'm not sure how to get an accurate picture with a moving collector unless you somehow tag the objects with a unique ID. With a non-moving collector you can just dump the object addresses. >A bigger problem is allocatting things on the heap while walking the >heap (calling list-allocated-objects does exactly that). I think this >is why these functions are so unstable. I don't use SBCL so this might be an unworkable suggestion, but if there is a way to tie into and observe the GC heap walk you could dump live objects one at a time as they are encountered. >> Did yo udo a deep garbage collection? >Yes. > >> Also, note that on X86, it is conservative:http://sbcl-internals.cliki.net/GENCGC >I think this could explain minor "accidental" leaking, but not >consistent large leaks I am seeing. I've also verified this behavior >on other implementations. George -- for email reply remove "/" from address |
|
#5
| |||
| |||
| In article <9c74e4b0-2a68-4950-96af-d589c019a6cd@e53g2000hsa.googlegroups.com>, CoffeeMug <coffeemug@gmail.com> wrote: > Hello, > > I have a program that uses increasingly more memory over time and > eventually crashes because of heap exhaustion. The program performs > many independent steps which, when finished, should leave no > references to any of the allocated objects (this is a web server). There is nothing that makes a web server so. A 'tuned' web server has caches that are filled/used during runtime. Then there are continuation-based applications that may store lots of stuff in the continuation. > There are few global structures that are populated only at the startup > of the program and they do not hold any references to memory created > in these steps. I am at a loss as to which part of the code allocates > the memory that cannot be reclaimed, and which long-living structures > reference this memory. > > SBCL has a number of functions that appear to be useful in tracking > down the culprit. In particular, sb-vm:list-allocated-objects, and sb- > vm::list-referencing-objects. I wanted to try two different strategies > with these functions - creating "checkpoints" by doing a full gc and > calling list-allocated-objects, and then diffing the results to see > which objects allocated between the checkpoints could not be > deallocated; and building a graph of object references, collapsing > cycles into single vertices, and doing a topological sort on the > resulting DAG to figure out which objects hold most of the memory. > Unfortunately the two functions above are rather fragile - equality > primitives don't work on all the objects returned and result in > strange errors, memory faults occassionally occur, etc. I was unable > to get around these limitations to implement either of the above > strategies. > > I am not sure how to proceed at this point. Are there alternaitve (but > more stable) functions on other implementations that I could use? Are > there other tools to debug this sort of thing? How would you go about > solving this problem? I'm not a user of your software combination, but some general remarks. Not sure if they are helpful. There are many places where the problem may occur and one would try to find the 'area' which might be responsible: * is it cross platform (Lisp implementation) or not? * does it happen with another web server? * is it a problem of a particular version of the Lisp system, the web server or some libraries. * the FFI interface to do the network calls might leak memory -> try to replay the requests without network code and see if it still happens * the (conservative) GC might not reclaim all memory, or might not reclaim certain kinds of memory. I'm not sure I wanted to use a conservative GC for a long running server app without having a wizard around able to check for problems * caches of the web server (for users, sessions, connections, buffers, pages, html snippets, header lines, ...) * input/output buffers * in-memory logs of the web server * lisp system data structures (symbol table, ...) * OS resources, the web server could use some OS resources (threads, connections, file handles, ...) without ever freeing them * continuation-based servers can potentially have a problem. It depends on what the continuation keeps between requests and how long the data is kept. There could also be a leak that data of the previous request (say, assembled pages) is still around * it could be in application code. Run it without network code, then without web server code invoked and see if the application code is responsible. Watch out for global data structures that are filled with data (resources, logs, ...). Closures/continuations also could keep lots of data 'alive'. One closure keeps the next closure and so on. I would also check if runtime compilation happens and if that's a problem. > > Regards, > - Slava Akhmechet -- http://lispm.dyndns.org/ |
|
#6
| |||
| |||
| CoffeeMug wrote: > Hello, > > I have a program that uses increasingly more memory over time and > eventually crashes because of heap exhaustion. The program performs > many independent steps which, when finished, should leave no > references to any of the allocated objects (this is a web server). > There are few global structures that are populated only at the startup > of the program and they do not hold any references to memory created > in these steps. I am at a loss as to which part of the code allocates > the memory that cannot be reclaimed, and which long-living structures > reference this memory. > > SBCL has a number of functions that appear to be useful in tracking > down the culprit. In particular, sb-vm:list-allocated-objects, and sb- > vm::list-referencing-objects. I wanted to try two different strategies > with these functions - creating "checkpoints" by doing a full gc and > calling list-allocated-objects, and then diffing the results to see > which objects allocated between the checkpoints could not be > deallocated; and building a graph of object references, collapsing > cycles into single vertices, and doing a topological sort on the > resulting DAG to figure out which objects hold most of the memory. > Unfortunately the two functions above are rather fragile - equality > primitives don't work on all the objects returned and result in > strange errors, memory faults occassionally occur, etc. I was unable > to get around these limitations to implement either of the above > strategies. > > I am not sure how to proceed at this point. Are there alternaitve (but > more stable) functions on other implementations that I could use? Are > there other tools to debug this sort of thing? How would you go about > solving this problem? > > Regards, > - Slava Akhmechet What about adding finalization hooks to some of the data you expect to be removed after a (sb-ext:gc :full t) and see before vs. after? Store references to the objects in a weak hash (see the :WEAKNESS keyarg for make-hash-table) and see what's left after a full GC. ...I have no idea if this'll work.. -- Lars Rune Nøstdal http://nostdal.org/ |
|
#7
| |||
| |||
| On Sun, 15 Jun 2008 10:24:05 +0200, Lars Rune Nøstdal <larsnostdal@gmail.com> wrote: >CoffeeMug wrote: >> Hello, >> >> I have a program that uses increasingly more memory over time and >> eventually crashes because of heap exhaustion. > > >What about adding finalization hooks to some of the data you expect to be >removed after a (sb-ext:gc :full t) and see before vs. after? > >Store references to the objects in a weak hash (see the :WEAKNESS keyarg >for make-hash-table) and see what's left after a full GC. > >..I have no idea if this'll work.. I don't know about SBCL, but in many GC implementations, it takes 2 collections to recycle objects that have finalizers. George -- for email reply remove "/" from address |
![]() |
| Thread Tools | |
| Display Modes | |
In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.