Blog Closed

This blog has moved to Github. This page will not be updated and is not open for comments. Please go to the new site for updated content.

Saturday, January 30, 2010

GSoC Idea: Implement a new GC

Blah Blah Blah Parrot needs a new GC. Blah Blah the current GC is lousy Blah Blah.

I've said it all before. At length. Ad nausea. Hell, I even attempted this same GSoC project myself two years ago.

The garbage collector is one of those systems that absolutely effects almost every aspect of Parrot's operation and performance. A bad GC means a slow, obnoxious VM with pauses and huge memory consumption. Of course, there is no one single "best" GC algorithm to use that provides optimal behavior to all classes of programs. This is why Parrot was designed to have a pluggable GC system where the most pertinent GC from among multiple options can be selected.

At least, that's the theory.

Parrot really only has one GC right now and it's a very simplistic Mark and Sweep collector with some less-than-desirable properties. bacek has been making good progress on porting the Boehm GC to Parrot, but that has it's drawbacks as well (though drawbacks are being mitigated). What we need is more people working on it, more people trying new things, and more eyes on the code.

So what kinds of GC could Parrot use? The projects page on the wiki describes three types that Parrot is specifically interested in:
  1. Generational
  2. Incremental
  3. Concurrent
These adjectives are not orthogonal, you could easily have an incremental-concurrent collector, or a generational-incremental collector, or even a generational-incremental-concurrect collector. But, these three things tend to have nice properties.

Generational collectors tend to have nice throughput because we subdivide the memory space and only take the time to mark and sweep a subset. Incremental collectors break the mark and sweep phases up into smaller increments which in turn decrease pause times. Concurrent collectors utilize multiprocessor systems to operate the GC in parallel with the program code and therefore experience no pauses and high throughput (but lousy performance on single-processor systems, or multiprocessor systems with many other running processes).

The aspiring GSoC student working on GC will benefit from two years of hard cleanup and refactoring work from myself and other contributors. Plus, there is strong community support to get a new GC up and running as quickly as possible to replace our current one. It's a project that will be tough no matter what, but where there are huge gains to be made for the Parrot community and a large amount of available support.

As I mentioned above, the GC system is supposed to be pluggable. So in reality you don't need to implement a great GC that will solve all our problems, you just need to implement any GC to prove that the system is, indeed, pluggable. If the GC you implement has some nice properties that's just a nice bonus.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.