Whiteknight's World: August 2009

Sunday, August 23, 2009

PMCs: Now without PMC_EXT or UnionVal

Last night I fixed the last of the test errors that I was seeing and merged the pmc_sans_unionval branch into trunk. This branch made a number of sweeping changes, but the most important two were removing the UnionVal structure from PObjs, and merging the PMC_EXT structure into the PMC structure. The bulk of this work was provided via patch from newcomer jessevdam, I mostly did some cleanups and post-facto changes for consistency and cleanliness. I think it was much more important to get this change merged then to have it be perfectly pretty.

Almost immediately after I merged the branch reports of test failures on various platforms started rolling in. kid51 was seeing consistent failures on PPC. mikehh and GeJ were seeing consistent failures on x86 Linux too. I ran through the tests another two times before I spotted an intermittent failure on x86_64 too. I hadn't seen them when I committed, but they did appear about 1/4 of the time when I ran coretest.

The failures that kid51 was seeing were different from those that GeJ reported. It turns out that I was seeing both these failures on my machine too, about half the time when any failure manifested. A little bit of debugging this morning turned up the culprits: The first was caused by a strange situation where a PMC metadata object wasn't being marked properly by the GC and getting prematurely collected. This error was easy to spot because of a helpful debugging message that had been inserted into the GC code long ago. With this error fixed and the GC structure sufficiently changed, I removed that debugging message.

The second failure was more difficult to nail down. When I took a backtrace, I saw that the failure was happening deep inside the PCC system function call argument processing functions. This system is currently a hideous web of evil code, but luckily it is what Allison is working to refactor right now. I would have thrown up my hands in despair at seeing that, and given up on any hope of a fix but for one thing: I had seen a backtrace like this before when I was doing some early work trying to replace Parrot_PCCINVOKE with the more modern Parrot_pcc_invoke_method family of functions. The error, in a nutshell, is a PCC bug: PMC** arguments passed to a function call in order to receive a return value need to be initialized to point to a valid PMC value before being passed. This despite the fact that they should only be treated as a storage location and overwritten without ever looking at the value originally pointed to. Despite the "should" in that last sentence, somewhere along the PCC pipeline the values are being passed off to the GC which tries dutifully to mark it, and things go down hill.

This fix, unfortunately, means turning this:


PMC *retval;
Parrot_mmd_multi_dispatch_from_c_args(interp, "subtract", "PPP->P", pmc, value, dest, &retval);

into this:


PMC *retval = PMCNULL;
Parrot_mmd_multi_dispatch_from_c_args(interp, "subtract", "PPP->P", pmc, value, dest, &retval);

One line changed and now the test appears to be working perfectly on every system where I can get a report. I sincerely hope Allison can get these issues resolved in the PCC refactors work, because I hate tracking these kinds of issues down. If I hadn't recognized that backtrace and the problem that caused it, I would still not have a fix for this bug.

With this, three of the branches I mentioned before the release have landed: the auto_attrs branch, the Parrot_Sub refactors branch, and now the pmc_sans_unionval branch. Bacek is getting damn close to finishing the Context PMC work too (now in the context_pmc3 branch) too. I don't know where chromatic and cotto are in the pluggable_runcore branch, and I don't know where Allison is in the pcc_arg_unify branch either. But, I still have very high hopes that they will all land soon and Parrot will be a better place.

I'm shooting through some code today making miscellaneous cleanups. There have also been some problems recently involving the fixed-size allocator (especially on Windows) so I am going to have to dig into that issue soon too. Once I get that working perfectly again, I want to run some benchmarks. I bet we're going to see some significant speed improvements over 1.5.0 when all the numbers come in.

Wednesday, August 19, 2009

Parrot Release Managers

A while back Coke said I could take over the job of herding release managers, and I gleefully took him up on the offer. A few days ago I sent out an email asking for people to fill in for the remaining releases this year. Specifically we're looking for some fresh blood, new people to give the release a shot. Every new person we get doing releases helps to increase our bus number ever higher, and a good bus number is good for us (and bad for buses).

Shortly after the announcement I got a few emails from interested people, including a few from people who have never done a release before. Signed up fo the rest of the year are: particle (1.6), dukeleto (1.7), bernhard (1.8) and gerd (1.9). After that the schedule is wide open.

I've managed a few releases at this point, and I can say with some confidence that it's really not a hard process at all. Actually, there are several steps that I think would be ripe for automation and could become even easier, if an intrepid releaser were interested in doing so. I won't opine about that too much here, just mentioning it off-hand.

Here are the basic steps for making a release.

Send out some messages on IRC and on the mailing list for people to update NEWS and PLATFORMS.
Update NEWS and PLATFORMS yourself, because everybody ignored your messages to the mailing list.
Update the version number in a variety of files, even though we have computer programs and programming languages capable of incrementing digits.
Come up with a name that's too clever by half. Don't tell anybody, that would ruin the surprise (Since our release schedule is so regular and predictable, we need to have some surprises!)
~~Waste 20 minutes of your life~~ Run fulltest. I like to spend this time watching TV, chatting on IRC, eating junk food, or looking up funny pictures of cats on the interwebs.
Make the release. This is the hard part: Type "make release VERSION=a.b.c".
Unpack the tarball into a new directory. Build it and ~~waste 20 more minutes of your life~~ run make fulltest again.
Upload the tarball to the FTP server
Send out a bunch of announcement emails telling people that we've made a release even though we could program Google Calendar to do this on the third tuesday of the month, every month, for the next thousand years. Tell everybody it's a big deal regardless of how dependable and boring our release cycle is.
Rake in the karma on IRC.

I am joking about several of these things, obviously. I make light precisely because it's not a big deal by design. We want Parrot releases to be quick, easy, and regular. Unextraordinary (ordinary?). It's easy to do, and we want lots of people to do it. It's easy, it's no big deal, there is no reason why you shouldn't be involved in it.

Branches Come Crashing In

The release wasn't barely out the door before some branches started merging into trunk. Just like I predicted, there were a few changes ripe and waiting to land as soon as the feature freeze was listed, and so they did.

First, NotFound's auto_attrs branch landed, bringing with it a change to almost every PMC type and a major simplification to the way PMC attribute structures are managed. Also, it is the first major user of the fixed-size structure allocator I developed last month, so it's a cool test of that. My hope is that once we've given it a good exercise we can start using the allocator in a variety of other places where we need small, fixed-size structure allocations and we are able to manage the lifetimes of those structures manually. This branch represents a big algorithmic improvement and with a few cleanups, improvements, and optimizations, this could usher in some other big changes in the near future.

Closely thereafter, bacek (who I swear is some sort of magical coding robot) landed his branch to cleanup the Sub PMC and remove the Parrot_Sub structure. This brought with it another huge changeset that swept most of the repo, but provided us with some major improvements to subroutine allocation and management (not to mention some much-needed code aesthetics). Very nice.

That success out of the way, I suggested he take a look at the context_pmc2 branch and see if he could work some similar magic on the Parrot_Context structure and the new Context PMC. Less then 24 hours later he had a gigantic change committed that not only cleaned up the structure but completed the conversion and got Parrot to build! There were some segfaults in miniparrot during initialization though (trying to create the first context before all the PMC types were initialized, and things like that) so it wasn't perfect, but was quite good progress. I just committed a fix to the initialization logic, and bacek claims that he'll be able to get the branch finished by the end of the day. Magical robot indeed!

I talked with Allison last night about the pmc_sans_unionval branch which I had gotten building and passing all tests (including codingstd tests) and was ready to merge in. However, by the time I got the thumbs up and was ready to merge, the above two branches had already landed and I could no longer merge that branch cleanly. Now, the branch segfaults on one particular test (t/pmc/complex.t) because of a bad pointer in a RetContinuation PMC, which I haven't been able to track down yet. As soon as I can get that last test to STFU, I'll merge that branch in as well.

I'm hearing good news about the pluggable runcores branch and the profiling stuff from chromatic and Cotto. Not to mention the long-awaited PCC refactors that Allison has been working on. With any luck we could see both these two branches landed into trunk soon, folloed by a flurry of activity from people wanting to use and improve these new toys.

Yesterday at the #parrotskech meeting, chromatic asked what the weekly development priority should be and I suggested, knowing how many branches were poised to land, that we should focus on basic, raw stability. With a few test failures being reported here and there it seems that might not have been such a bad suggestion. Plus, as I mentioned above, there are at least three branches that could plausibly land soon, which will just increase the short-term instability. Basic program stability is a very good thing to focus on (and it gives us a good excuse to write a bunch more tests for all the fun new things).

So things are moving quickly and I hope that the rest of this release cycle stays as eventful as the first two days were. I'm also looking forward to tomorrow's Rakudo release because I've heard good things about that too.

Tuesday, August 18, 2009

Parrot 1.5.0 "TEH PARROTZ!" Released!

I released Parrot 1.5.0 a few moments ago. Here's the release announcement:

On behalf of the Parrot team, I'm proud to announce Parrot 1.5.0 "TEH PARROTZ!." Parrot is a virtual machine aimed at running all dynamic languages.

Parrot 1.5.0 is available on Parrot's FTP site, or follow the download instructions. For those who would like to develop on Parrot, or help develop Parrot itself, we recommend using Subversion on our source code repository to get the latest and best Parrot code.

Parrot 1.5.0 News:

- Core
  + Removed several deprecated functions and features
  + Removed bsr, jsr, branch_cs, and ret opcodes
  + Removed global stacks system
  + Changed OPS file format to include explicit preamble
  + Changed all "new 'Iterator'" instructions into 'iter' instructions
  + Removed Configure.pl options for specifying non-working GC cores
  + Removed unexecuting code as found by Coverity
  + Improvements to the Parrot Debugger
  + Added experimental fixed-size structure allocator to the GC
  + Added experimental lazy arena allocation to the GC
  + Removed the defunct PASM1 compiler object
  + Refactored hashes, keys, and iterators
  + Added "corevm" make target to build Parrot without all the supporting libraries
  + Removed Random PMC type and added in a "rand" dynop
  + Optimization and Improvements to the NCI thunk generator
  + New include file libpaths.pasm
- Compilers
  + Multiple .local with same name and different type is now an error on IMCC.
- Platforms
  + Improved support for detecting Fink and Macports
  + Updated search directories for libraries
- Documentation
  + "Parrot Developers Guide: PIR" released to publisher and available to purchase
  + Improved documentation about Parrot Debugger
  + Update PGE Documentation
- Miscellaneous
  + Added tests
  + Fixes to code, documentation, and standards

Thanks to all our contributors for making this possible, and our sponsors
for supporting this project. Our next release is 15 September 2009.

"TEH PARROTZ!" is a name that I've been wanting to use for a while. It pays nice homage to a popular internet meme and shows that we're having fun and that development isn't always SRS BIZNS!!

It can be a lot of fun, especially as more new people are getting involved. And with the release out of the way, we can get back to some serious development.

Sunday, August 16, 2009

Ongoing Branchwork

As I mentioned yesterday, I called for a freeze on big changes to trunk before the Parrot release. However, that doesn't mean that development ceases entirely, and some branches are still moving forward at a pretty nice clip. With the 1.5.0 release coming out tomorrow, we could see some of these things land tomorrow.

GC Refactors

A few days ago I was pretty stunned to see a huge patch from newcomer jessevdam that had some major effects on the GC. What he set out to do was remove the UnionVal structure, which was a large and obnoxious union of several data storage types, from the PMC structure. What he ended up doing was far more then that.

What jessevdam did was to remove the UnionVal structure from Parrot completely, fix up several other structures that used it (STRING, Buffer, etc), merged the PMC_EXT structure into the PMC structure, renamed several macros, added some documentation, and made a variety of other cleanups. The patch was huge, and not only that it was invasive, so I applied it (and some subsequent cleanups) to the pmc_sans_unionval branch for testing. Initial test results that I have been seeing are all clean, so if we can get some approval from Allison about it's methodology, I would like to get it merged to trunk shortly after the release.

Context PMC

I created a new branch this morning to tackle the first stage in the process to convert the Parrot_Context structure to a PMC. I had the realization that there are really two separate reasons for making this conversion: First we want to get Contexts to be properly garbage collected, so we can avoid several outstanding memory leaks. The second is purely a refactor, we want to properly encapsulate the Context functionality and cleanup the code that uses it.

So instead of replacing the Parrot_Context structure wholesale, I am taking a path of less resistance: Keeping that structure as-is, and creating a PMC type that acts as a simple wrapper for it. See, for example, a similar strategy used for the Sub PMC (Parrot_Sub struct) and the Continuation PMC (Parrot_Cont struct). This makes the structure garbage-collectible with a minimum amount of effort, but doesn't do anything to make the code cleaner, better encapsulated, or have better performance. I've started this messy process in the context_pmc2 branch, and am having some promising results so far.

After I get that branch building and passing tests, I plan to merge it in and then start another branch to work on performance, encapsulation, and other improvements. Of course, this second branch will probably have to wait for a few other branches to land first so it doesn't get in the way of other ongoing work.

On a related note, for cleaning up the Parrot_Sub stuff that I mentioned above, bacek has started the tt795_kill_parrot_sub_structure branch. That branch will serve as an archetype for the second round of Context cleanups, since I'm purposefully creating the same "situation" with the Context PMC that we are trying to resolve using the Sub PMC. I've heard from him that this branch is complete and ready for merging in right after 1.5.0.

PMC Attributes

NotFound's auto_attrs branch is getting pretty mature and I hope he is able to commit it to trunk shortly after the Tuesday release. We are definitely going to talk about this in the #parrotsketch meeting tomorrow. It's my hope that once this new PMC initialization mechanism is added to trunk, we will be able to start using the fixed-size memory allocator in other parts of Parrot as well for some major speed improvements.

PCC Refactors

Allison is still hard at work on the PCC refactors, having recently started the new pcc_arg_unify branch to get going on it again. If she can successfully unify the way arguments are passed to subroutines, and get rid of some of the older entry points to the PCC system, it will be a huge win for Parrot. Plus, as I've discussed at length before, those kinds of refactors will enable so many other stalled projects to leap ahead quickly. I have high hopes that if this branch lands soon that we will see some major improvements to other areas of Parrot, based on this work, being added as early as 1.6.0.

Pluggable Runcores

chromatic and Cotto have been working on a very cool project in the pluggable_runcore branch. The goal, in grave oversimplicity, is to treat runcores as dynamically-loadable libraries. Also, they're hard at work using the new loading mechanism to add a much-needed profiling core. I don't know what the ETA is for that branch, I don't know if it will make it in to 1.6.0 or not.

Conclusion

1.5.0 is going to be a great release because so much crufty old nonsense has been removed from it. And with all the cool new features waiting in branches to be merged next month, 1.6.0 has the potential to be even better.

Approaching the 1.5.0 Release

I'm dressing up as the release manager for Parrot this month, hoping to push 1.5.0 out the door without any major issues. This is going to be quite an impressive release for our community because 1.4.0 was a deprecation point, and 1.5.0 is going to be the first release where not all features present in 1.0.0 will still be available. And a lot of old "features" have been through the grinder this month.

Because of all the massive changes and removals of features, I sent an email to the list a few days ago asking for a freeze on big changes or branch merges starting today and lasting through Tuesday. I want to make damn certain that trunk is stable when it's time to push it out of the nest. To that end, I am asking everybody who is reading this blog post right now to get the latest version of Parrot trunk from SVN and submit a smolder report on your platform. These reports are very important, and will give us the information we need to be confident about Parrot's stability on Tuesday. It doesn't take much time to put together a smolder report, and it's amazingly helpful to us.

Because of the feature freeze there is active development going on in some branches. I expect some of these to be merged in to trunk shortly after the 1.5.0 release. I'll post some information about a few of these tomorrow.

Tuesday, August 11, 2009

Lazy GC And Fixed-Size Allocation

So I've mentioned in an off-hand way recently about some of the small projects I've been working on in the Parrot GC. When I do talk about GC it's either on one of the two topics "fix a bug" or "make drastic improvements", but these that I have been working on recently are neither of those: they are small, incremental improvements that will provide only limited improvements.

When we look at performance in GC, we can make a number of simplifying assumptions that help to decrease the complexity of our algorithm. And in GC, algorithmic simplicity is key: no matter what changes you make internally to the GC, the number of data objects being allocated and managed will remain constant (unless other parts of the system severely optimize their own memory usage, which is necessary too). So too is the amount of work required per object: we must visit it at least once to determine if it is alive and to mark it, and visit things again in order to sweep them. So there isn't a whole lot of performance that we are going to eek out of this basic procedure. However, there are four things that we can really do to improve GC performance*:

Improve the efficiency of Allocation/Deallocation. We need to decrease the algorithmic complexity of an allocation, because it's a very common operation. Deallocations are (in theory) exactly as common as Allocations.
Linearize Allocation/Deallocation. If we can allocate items from a nice, large, fresh stretch of memory we can avoid needing to search for appropriate-sized chunks, or needing to copy/compact existing data.
Decrease the number of items needing to be marked and swept. We can't control how much junk the program generates, but we can control how often we mark/sweep all of it. This is done through heuristics such as generational GC algorithms which choose to mark/sweep a subset of all objects during a GC run, instead of all objects every time. This can also be done in an incremental core where we sweep a portion on each run.
Decrease the frequency of GC mark/sweep runs. This basically means means we increase our memory footprint, or increase collector latency. More data available to our program means we need to do less searching for free memory.

Parrot uses (in theory anyway) two types of aggregate objects: PMCs and STRINGs. All other types of data items that get allocated are attached in one way or another to these two types. When we allocate a PMC, we also need to allocate and initialize it's Attributes structure, which is a separate memory object. Currently this is done through an additional malloc() call, although in the auto_attrs branch, it is done using the new fixed-size allocator instead, which should be less expensive then normal malloc calls (#1). By allocating these items from our own managed pools and using a lazy technique to do it, we can improve allocator linearization (#2). By realizing that these fixed-size data items are always considered part of the PMC, We can simplify the marking and sweeping algorithm to ignore them completely and treat them as just an extension of the single PMC object (#3). And finally, by using the lazy allocator and increasing the initial size of our pools, we can decrease the number of GC runs that need to occur during startup, maybe eliminating them entirely (#4).

When Parrot allocates a new swath of memory for it's fixed-size allocator, it immediately iterates over the entire block, breaking it up into individual objects and adding them each to the free list. The loops used for this are very tight and efficient in terms of the number of machine instructions needed to operate them. However, we lose a lot of performance because of processor cache faults. Essentially we need to move the entire block, often piecemeal, into the processor's cache for no other purpose then to prove that it's there. What we can do instead is allocate the block and keep a pointer to the beginning of it only. When we need to allocate a new object, we first see if the free list is empty, and if so we allocate the next object from the most recent memory block. We're still iterating over the entire block, but we're only examining objects when we need to allocate and use them, keeping the vast majority out of the processor cache until they are needed.

Allocating a memory arena in Parrot right now, whether we use any of the objects in the arena or not, is O(n) complexity because we need to iterate over the entire arena and add all items to the free list at once. Increasing the size of the memory arena therefore increases allocation time more or less linearly, which offsets the gains we would make from requiring fewer GC runs. By using the lazy allocator instead, allocating a new arena is always O(1) no matter what the size of the arena is, allowing us to increase the size of the startup pools without cost, and decreasing the number of GC runs during startup without penalty.

It's not all roses and unicorns though, having to check both the free list and the pointer to the newly allocated block for every allocation increases complexity slightly, but that should be offset by better efficiency in terms of processor caching, so I think it will be a net win.

chromatic estimates, and I think I agree with it in theory, that we could see about a 10-15% improvement in the runtime of our test suite from these kinds of changes, once we add the necessary features and then tune all the values and sizes. If we could, for instance, increase startup memory pool size to be large enough so that we don't need to run GC at all for Rakudo startup, they would see some significant gains in the performance of their test suite too. Of course, that might turn Parrot into a huge memory hog, so we need to tread lightly and carefully tune our pool sizes. Also, it wouldn't hurt if we could find ways for things like PGE and Rakudo to allocate fewer objects, but that's a different topic altogether.

[* When I talk about "GC performance", I'm talking about a combination of throughput and latency. We want more of the first and less of the second. Basically, we want the GC to cause less of a slowdown for the user, and be able to turn over junk quickly.]

Monday, August 10, 2009

Matrixy: Current Status

In a post yesterday I mentioned Matrixy, and today I wanted to give a quick overview about the state of that project.

History

Matrixy was an Octave/Matlab clone that I had started working on a while ago when I first got involved with Parrot. I had been working on it idly while still in school, but never made it very far because of my lack of knowledge at the time and my other commitments (like, school). I got in contact with a guy named Blair from the UK who was interested in working on the project, and together we set up a project page on Google Code to pursue it.

For a while things moved pretty quickly and we were able to implement many of the idiosyncratic features of that language reasonably well. Blair also did some great work writing bindings for CLAPACK and CBLAS, so Matrixy has reasonable linear algebra support (although many operations just haven't been implemented yet) through those libraries.

However, things eventually slowed down. Blair started working on some changes to function dispatch that would enable variable function parameter lists and variable function return lists. Because of his work in this area, I held off on some of my own dispatch-related work, and started focusing on Parrot proper. Long story short, Blair got stymied by some bugs in Parrot which also happened to be things that I was blocking on (PCC refactors being a good example), and development on Matrixy stopped completely. Yesterday I fixed a few small things, but that's hardly the same as the project being active.

Current Status

Despite the fact that Matrixy isn't being as actively developed now as it has been in the past, it still has and supports a number of important language features. Not all of the implementations are as good or as efficient as they should be, but they do exist. Here's a quick rundown:

Support for variables, scalars, and matrices, and most of the operators that act on those things. This includes automatically-growing matrices, autovivification, basic matrix operations, and functions that can tell whether a value is a matrix, a vector, or a scalar. 2-D matices only.
Support for basic string operations, string printing, string concatenation
Nearly-proper handling of semicolons to print statement values
Nearly-proper disambiguation between variables and functions using parenthesis
Proper runtime loading of library functions from files. So the function foo() will be automatically found and loaded from file foo.m
if/then/else, for loops, maybe some other control structures too that I can't remember.
Ranges and iterators using ..
Calls to system commands using !
handing of path search locations
Function handles and anonymous functions
Complex numbers
try/catch
Several library functions written in PIR and M.
Help functions to read documentation from M files

So that's a decent list of features that work or mostly-work, and that's nothing to thumb your nose at.

The Path Ahead

There are obviously more things that we need: cell arrays, structures, nargin/varargin/nargout/varargout, and others. We also need to find some kind of graphics package to tap into eventually that will enable us to implement some of the graphing functions. There are hundreds of other functions as well that need to be implemented and tested, more then I would be able to name here.

At the moment Matrixy only works with a development Parrot, it needs to be made to work with an installed Parrot (or both, if possible).

It needs to be better tested and several corner cases need to be stabilized.

And I really think that we need some significant internal refactors for a variety of systems and functions. I would love to even rewrite some of the PIR parts in Close, if I was able to do that.

I would like to separate it out into two separate (but related) projects: the Matrixy compiler itself, and the linear algebra libraries that could be reused by other languages to gain immediate, powerful, linear algebra support.

Overview

So that's a general overview of what Matrixy is, what it's current state is, and where I think development is headed. I don't think I will be focusing much of my effort on it in the near-term because of other tasks I want to work on for Parrot, and other things I have going on in real life. However, I will be dabbling on it here and there, and would like to get other developers interested in working on it too. No reason why it shouldn't be progressing just because I am not devoting a lot of time to it.

Sunday, August 9, 2009

Eclectic Sunday

My wife is away today for a bridal shower, so I've got much of the day to myself for hacking. Here's a quick rundown of some of the things I'm looking to work on today:

My mind was brought back to Matrixy this morning because of a post to the mailing list where another developer talked about creating his own Octave clone. I pointed him to Matrixy, and then did a quick update/build myself to verify that what we had there still works. A few changes to Parrot in the past few months broke some tests, so I fixed those up, and then I even managed to close one ticket as well. I'll write up another post this week about the current state of Matrixy, and what the next steps in that project need to be. I don't think I'm going to do much more work on it today.
GC, again. I put together a proof-of-concept lazy allocator for the new fixed-size allocator that I added last week, and it appears to work alright. Since the fixed-size allocator isn't actually used anywhere, I haven't been able to benchmark anything to say that it actually outperforms the eager allocator, but we can easily switch between the two using a preprocessor macro at compile time, so it's no big deal. I started trying to write up a similar lazy allocator for the primary PObj allocator, but that problem is significantly less trivial and I haven't been able to get it to work yet. I may play with that a little bit more today, and I will definitely post about it sometime this week.
The io_cleanups branch is getting old, critically so, and I want to get that branch tied down very very soon. The biggest thing that it adds is proper support for pipes, and those are mostly the brainchild of Infinoid. If I can tie down the last of the test failures today, I can close the branch without waiting for him (either merge into trunk if it's non-destructive, or take a good diff and start a new more up-to-date branch).
Tene has been very eager to get working on a select/poll functionality, which is something that I had been planning to implement for AIO but that doesn't strictly need to be done at that time. So, if he's eager and available, I may help him draft up a solution to that problem.

I doubt I'll get all of that done today, but it's nice to have a list of things to work on.

Wednesday, August 5, 2009

Contexts: Path Forward and Problems

I mentioned the ongoing work for the Contexts PMC stuff in a previous post, and wanted to elaborate on that a little more here. Currently, as I've mentioned before elsewhere, context structures in Parrot are just basic data structures that are manually managed using an idiosyncratic reference counting scheme. The goal is to turn Contexts into a normal PMC type so they can be garbage collected, because currently we leak a lot of memory through miscounted contexts.

Coke started a branch to do the conversion, and I quickly realized that we are going to need to do the work in baby steps to get it right. It's really multiple projects: Encapsulating the Context PMC behind a good interface, cleaning up all the associated logic, possibly separating out the context structure based on it's various purposes, and then converting everything over to be a garbage-collectable PMC. It's worth mentioning here that there is a faster way to do this, but faster does note equate to better, or correct in terms of proper design, so I don't bother to discuss it here.

Contexts serve a number of purposes in Parrot:

They manage the register sets, and provide access to the Parrot registers. A new context allocates storage space for the register types, does the pointer arithmetic to access them, marks their contents for garbage collection, and cleans them up when the context goes out of scope.
Contexts hold pointer items to a variety of other data items: the current namespace PMC. The current invocant PMC on a method, the current return continuation PMC, current argument and return signatures, the current lexpad, and the current HLL ID.
Contexts form a type of linked list (actually, more of an acyclic graph) that represents the call chain. Every context also contains a pointer to an enclosing context, such as is the case in a closure.
Contexts keep track of control flow by holding a reference to the current PC, and also hold fields of flags that determine which errors and warnings are enabled currently.
Contexts provide access to the current list of constants in the packfile.

You can see that these uses aren't exactly orthogonal to one another but at the same time it's not strictly necessary that all this information be stored in exactly one data structure either. For instance, we could separate out the organizational aspects (call chain, current PC, current errors/warnings, access to constants, pointers to namespace/lexpad/sub/invocant PMCs) into one PMC type and separate out the register management functions into a different PMC type. This would have the benefit of allowing unrelated activities to be stored in separate PMCs, and would allow all the necessary functions to be hidden behind a reasonable VTABLE interface without too much shoehorning. I'm not saying that this is what we will do or even what we should do, but all possibilities are worth thinking about at this juncture.

It's especially worthwhile to think about when you consider the severe limitations stipulated by the VTABLE interface, which was clearly designed with "data" PMCs (Integer, String, Float) in mind and not "process" types like this. Start divviing up the various VTABLE functions and you run out pretty quickly. We can use the get_integer_keyed_int, get_string_keyed_int, get_number_keyed_int, and get_pmc_keyed_int interfaces, and the set_* equivalents to access the register set. The invoke VTABLE could be used to directly activate the RetContinuation, preventing us from needing a fast accessor for that item. We can still have an accessor for it, but it doesn't need to be a fast or dedicated one. The push and pop vtables can be used to manage the context stack for the common case of simple subroutine calls and returns.

That is, unfortunately, where things start getting messy. We could conceivably use the get_*_keyed_str as delegate accessors for the LexPad PMCs, to access lexical variables by name without needing a fast accessor to retrieve the LexPad PMC itself (again, we can have an accessor, but it doesn't need to be a hugely fast one). This single type of accessor doesn't give us the flexibility to restrict a search for a lexical variable to either the current context or all :outer containing contexts, but it's a start.

And even then, how do we access the current Sub PMC, the current NameSpace, the current invocant object, the Outer context (for closures), the parameter and return signatures, etc? The pcc_rewiring branch may offer some answers here, by encapsulating the current invocant, current parameter and current return signatures, and maybe other things. That's fine and good in the short term, but the long-term plan is to merge the Context PMC and the CallSignature PMC together into a single object that represents a subroutine state (with functionality to call into and call out of that state as well). And then we're back in the same mess with more data then we have interfaces to access it all.

Of course, this is all speculative, and it's a few steps away from where we need to be focusing right now. This kind of thinking does help to inform our short-term decisions, but we cannot be mired in them. Instead of finding creative and perhaps obnoxious ways to shoehorn all this functionality into the limited VTABLE interface, maybe instead we need to think about creating a different, and more appropriate, abstraction layer for Contexts. In short, we need to create an API for them.

Creating the abstraction layer should be very straight-forward since we know the kinds of things that we need to access. Once we have it, we can start migrating parts of Parrot to use the new interface instead of poking into the context structure directly. And once all accesses use the new interface, we can change the mechanics behind that interface to anything we want (such as converting the Contexts to a PMC type).

I posted a quick file to the trac ticket a few days ago that demonstrates some ideas that this API could follow and also hopefully demonstrates how easy one would be to create. It's not complete and it's not pretty, but it shows how it could be done. I will mention one possible problem area being in the .ops files, where several macros (IREG, NREG, PREG, SREG) are used to access registers as l-values and r-values interchangably, which makes a function-based "don't poke into my structures directly" interface kind of troublesome. So in the foreseeable future we will probably have to allow some of these same kinds of accesses, although it really would be nice to fix the OPS compiler to avoid these kinds of issues entirely.

So the first step is to create some kind of API for contexts, and then start the long and arduous process of converting Parrot's core codebase to use that API instead. Hopefully by that time pcc_rewiring will have landed and we can start planning the next steps.

Monday, August 3, 2009

ParrotTheory: Garbage Collection

So I've talked a lot about garbage collection on my blog before, and I think now is the time to really go in depth about some of the theory behind them. Garbage Collection is a subset of a larger field of study on Memory Management. In fact, much of the functionality in the Parrot GC system is only concerned with memory management, not the actual detection and collection of garbage.

When we look at a non-garbage-collected program, memory is typically allocated from the operating system with malloc and variants, and is returned to the OS with free. This is fine for tiny little programs, but as complexity increases you can start experiencing memory leak problems, and you end up spending more and more and more effort allocating memory, checking if memory is unused, and returning it to the OS manually. In short, it's not a system that scales well.

Enter Garbage Collection. GC is a system in your program that automatically handles memory allocation and dealloction. It does this by detecting when allocated memory is not being used any more and frees it or recycles it automatically. There are two basic types of Garbage Collection: Cooperative and Non-Cooperative.

Cooperative GC

Cooperative GC, more commonly known as "reference counting" is a form of GC where the entire program needs to participate in the GC algorithm in order for it to work correctly. In a cooperative GC system, every time a pointer to an object is made, the reference count needs to be increased. Every time a pointer is deleted or redirected, the reference count needs to be decreased. When the reference count for a particular object reaches 0, nothing points to it and it can be automatically and immediately freed. Cooperative GC can be done more easily through the use of macros:


#define ADD_REFERENCE(obj, newref) do {\
 obj->refcount++; \
 newref = obj; \
} while(0);

#define REMOVE_REFERECE(obj)  do {\
 obj->refcount--; \
 if(obj->refcount = 0) \
 free_my_object(obj);\
} while(0);

So that's not such a huge problem and all the code fits into these two little macros. The problem is remembering to faithfully use these macros for every single pointer access throughout the entire program. Notice that, especially for concurrent systems this really needs to be every single access, because another thread could be decreasing the refcount at the same time that you are using it, and free it out from under you.

Non-Cooperative GC

The other form of GC, and the type that Parrot uses, is a non-cooperative GC. In a non-cooperative system, the GC module is completely separate, and the rest of the program does not even need to be aware of it's existence. In the Boehm-Demer-Weiser Collector, for example, the only change that needs to be made to the program is to replace calls to malloc/free with the allocation/deallocation functions, and everything else proceeds like normal with garbage collection happening invisibly in the background. The benefit to this system is that you write a single GC system, and then can write normal code throughout the rest of your program without ever having to remember to reference your data again.

Non-cooperative collectors typically work in two stages: mark and sweep. In the mark phase, the collector walks the stack and the heap looking for pointers to data. Every data item has an associated flag, and when we find a pointer to that item we mark the flag with a special "Alive" value. We then look through that data item for other pointers to follow and mark. The collector starts at a root set of data pointers that we know we can reach (current processor register contents, current stack contents, global variables, etc) and then trace through memory like a large graph. In the sweep phase, we iterate over data items in the heap and look for all data items which have not been marked. Anything that is not marked is not reachable, and can be freed.

I don't want to do a huge compare/contrast of the two GC types, they both have pros and cons. Perl5 uses Cooperative GC, while Parrot (and therefore Rakudo Perl 6) uses Non-Cooperative GC instead. The only real differences are where and when objsects are detected to be dead. For the rest of this post I'll just focus on non-cooperative systems because that's what Parrot uses.

Non-cooperative GC tends to run all at once, such as when a new allocation is requested from the program. The GC will look for some available memory and i none is found it will mark then sweep memory to try and free some up. The mark and sweep algorithm could be expensive and this will cause a system slow-down or even a complete pause while it is running. The basic mark and sweep algorithm is notorous for these pauses. Luckily a number of new algorithms have been developed over the years to help mitigate their effects.

Algorithms

In an Incremental GC, we break GC runs up into pieces. We do a subsection of the entire algorithm each time, with the intention that we still have pauses, but the pauses are each shorter and more spread out. Incremental GCs are typically implemented using a tri-color mark: Instead of a memory object being marked "alive" or "dead", we use three colors: "black", "grey", and "white". White objects are completely not marked ("dead"). Grey objects are marked, but their children are not marked. Children in this case are pointers in one memory object that point to other memory objects. Black objects are completely marked including all their children. By keeping track of the state of objects and the the states of their children we can pause at almost any time and resume by taking the list of all grey objects and continuing the mark from there. Incremental GC helps to spread the pauses out, but doesn't really do anything to fundamentally decrease the amount of time that's needed to pause over long execution runs.

In a Generational GC, we use the heuristic that most memory objects are generated, used, and abandoned very rapidly. Think about temporary variables in a tight loop, for instance. So we only perform GC mark and sweep on certain groups of memory objects, called generations. Memory objects are allocated into the "young" generation, which is processed by the GC very frequently. Objects that survive a sweep can be graduated into an older generation which is swept less frequently. This algorithmic change means less pause time for the GC (because not all memory is being marked and swept). However, dead objects in the older generations won't be found and freed as often, resulting in longer delays for some objects to be destroyed (think closure of a file handle when it's containing scope exits), and more memory footprint needed to hold these dead objects.

Concurrent GC makes use of system-level parallelism to perform aspects of the GC algorithm in separate threads, which negates pauses entirely. Actually this isn't entirely true, in a system where there are more running threads then there are hardware processors to execute them we still have pauses in the form of thread context switches. Concurrent GC has the benefit that we can really run any algorithm we want regardless of execution time or complexity, so long as it runs in another thread, without any additional performance penalty. In a concurrent system, however, there is typically a need for significant synchronization, plus the overhead inherent in executing separate threads (which is more expensive on some systems then in others).

A basic mark and sweep algorithm has another problem of memory fragmentation. We allocate memory linearly in memory, but start to deallocate it randomly, leaving large odd-shaped holes in memory between live data. The allocator runs quickly at first, but as fragmentation increases, it becomes more complicated to find suitable holes in memory for new allocations to be made from. To avoid this problem, we can use a special GC algorithm called a compacting or copying collector. A compacting collector actually moves data in memory to keep fragmentation down. If we think of memory as a long chain, we move objects to the front of the chain, forming a "dense prefix" where live data tends to be, and a sparsely-populated area at the end of the chain where there are few if any live data items and therefore the allocator can execute more quickly. We can then focus our mark and sweep efforts on the dense prefix because that's where all the data items are.

In a region-based memory system, we break memory up into chunks called "regions", and associate each region with an executing context. When the context exits (such as through a subroutine return) we can free the entire region at once. A freed region can then be used by the allocator to quickly allocate memory linearly.

These various algorithms aren't atomic, we can mix and match features from each to suit individual needs. For example, Java's new Garbage First collector uses a compacting region-based collector that focuses it's attentions on sparsely-populated regions to clear garbage out of the way of the allocator. This gives Java good linear allocation performance while also decreasing the number of long pauses the collector needs to run.

Overview

Garbage collection methods and algorithms are a trade off of several components: memory footprint, throughput, and latency. Memory footprint relates to how aggressive the GC is. If it is very aggressive and always frees dead objects quickly, it will use the least amount of memory. Throughput involves the overall speed of the GC in a long-running application. The GC should be able to process a lot of data quickly. Latency is the amount of pause time necessary for a GC run, we want to minimize this. The general rule of thumb is that you can maximize two of these requirements at the expense of the third.

If we are running a GUI application such as a videogame, we want to decrease latency because pauses for a GUI application are unacceptable. We may also want to maximize throughput because those kinds of applications churn through a lot of memory for all it's physics and graphics calculations. If we are running a background service process, we may want to keep memory footprint low, but may not care at all about latency. A web server application may want high throughput and low latency, but will be allowed to hog extra memory.

So that's garbage collection in a nutshell. I can always go more in-depth about any topics that people are more interested in.

Sunday, August 2, 2009

Fixed-Size Allocations For Parrot

Parrot has a dirty little secret. Actually, it's not much of a secret because I've mentioned it on my blog here before (although I can't find link right now). The problem is that Parrot, despite the fact that it has an integrated GC mechanism, still relies on malloc/free for most of it's allocations. The only things that aren't allocated this way are the "PObjs": PMC, and STRING structures mostly. However, just about every single PMC allocation involves a malloc anyway to allocate the structure of attribute storage used by that PMC. These structures are typically allocated manually using malloc in the PMC init VTABLE, and manually deallocated in the destroy VTABLE.

In TT #895 NotFound started working on a task that I've been thinking of for a while: Fix the GC so it could automatically allocate and deallocate these attributes structures, to help prevent the myriad of memory leak problems that have arisen from their misuse. Plus, we can get rid of a lot of unnecessary code if we do all the allocations in one place. His idea is to store the necessary size of the attributes structure in the VTABLE structure, and then use that information in the GC to automatically allocate and deallocate these structres. We would allocate the storage directly before the init VTABLE was called, and deallocate the storage directly after the destroy VTABLE was called. Plus, we can flag certain PMC types that use this mechanism, so the transition can be gradual and pain-free. Quite a good plan!

Now I read over that ticket and thought to myself "Self, this would be the perfect opportunity to remove malloc/free from the picture", and so it is! By having all the allocations and deallocations happen in one place, it's painfully easy to abstract that away behind a new GC interface and implement our own fixed-size structure allocator. Last night, this is exactly what I did and Parrot now has an (experimental) fixed-size allocator that's designed to replace many instances of malloc/free not only in the PMC management routines, but also in many other places throughout Parrot. The only requirements that my allocator has that malloc/free doesn't is that you must specify the object size when you free it, and you can't currently resize the buffer once you've allocated it (at least not easily). The new fixed-size allocator isn't currently used anywhere in Parrot, but it's available for testing and should be able to be used in many places on an experimental basis.

The new fixed-size allocator is very simple, and is based in part on our existing PObj allocation routines. We maintain an array of pointers to pools. Each pool contains a linked list of arenas. Each arena contains a large storage space devoted to holding objects of a fixed size. All items in the pool that are free are organized into a linked list called the "free list". When we need a new object of that size, we either take the item on the top of the free list, or else we allocate a new arena, add all the new items to the free list, and then take the item on the top of the free list. On deallocation, we take the deallocated item and simply push it back onto the top of the list.

Using the new allocator is very easy: To allocate a new attributes structure for a PMC, use the new Parrot_gc_allocate_pmc_attributes function. To free it again, use Parrot_gc_free_pmc_attributes instead. Likewise, to allocate a new arbitrary data structure, use Parrot_gc_allocate_fixed_size_storage and Parrot_gc_free_fixed_size_storage. If you're interested to see how Parrot's memory allocation works in general, these functions and the supporting functions that they call are very instructive.

Parrot's interpreter structure maintains an array of pointers to the fixed-sized pools, and the array is resized to support larger allocations. This system works well for small-sized allocations, but does not work well for large allocations. There are some things I think I can do to improve it's efficiency for larger-sized allocations as well, but in general I think it's good right now to limit this mechanism to small structures (as opposed to large buffers).

The GC doesn't do any kind of management for these items, it simply allocates and deallocates them in a very simple manner. It does not compact them, and does not free unused arenas back to the system like it probably should. Of course the primary GC core doesn't do this either with it's PObj storage spaces, so I don't feel so bad. When we finally get Parrot fixed up to support a compacting garbage collector for the PObj stuff, we should also be able to make the fixed-size storage use the same mechanism.

So we've got a new tool to play with in Parrot's memory management toolbox, and I hope that we like it and decide to keep it around for a while. There are plenty of ways to improve this new allocator, make it more efficient, and tune it to behave properly for Parrot's needs. However, I think we're going to see immediate improvements over malloc/free, and I'll start posting benchmark numbers as I get them.

Saturday, August 1, 2009

Parrot: 1.4 and 1.5

The 1.4 release of Parrot was a big deal because it was a major deprecation point. Items for which a deprecation notice had been added prior to 1.4 can now be removed. Some such "features" have been deprecated for months, and were sorely in need of being violently removed. Plus, unwanted features that exist in Parrot now cannot be removed without a proper deprecation notice until after 2.0 in January, which is quite a long way to wait to remove or improve some of the remaining ugly warts that are littered here and there.

In addition to the things that need to be removed, a few important things need to be added as well. The pcc_rewiring branch that Allison started back in days of yore is supposed to be a priority in the coming weeks. About 50% of the tickets assigned to me in trac are blocking on this work, and I've blogged about some projects that are dependent on it ad nausea. I sincerely hope that pcc_rewiring makes it into trunk before 1.5, for a large variety of reasons.

Another branch that I've been working on with Infinoid, io_cleanups, is running behind schedule and really needs to make it in before 1.5 or else it will just keep getting older, more stale, and more out of sync with the rapid development in trunk. chromatic and cotto are working on a cool new project to make runloops pluggable, and I hope that makes it into 1.5 as well (although I don't know the status of it right now). That would be a very cool addition to Parrot. Bacek has been continuing his work on keys and hashes, and has another branch to cleanup OrderedHash that I hope lands before 1.5 also.

I personally had hoped to start on the AIO project before 1.5, but I'm blocking that on the io_cleanups branch. There are two good reasons for this: First, I want to make sure we have all the necessary primitives in place (files, sockets, pipes) so that when we add AIO it can properly handle everything and will be generic enough to be expanded to new types in the future. Second, I don't want to be opening two branches of sweeping changes in the same subsystem. It's a recipe for disaster and the time we spent cleaning up the fallout could be used to just get io_cleanups working first.

A while back Coke started a branch to convert Contexts into a properly-encapsulated PMC type, and I jumped right in to help with the conversion. I didn't suspect the branch would succeed, it's a huge task and we had a lot of things we needed to learn about it. Luckily, I think we have learned a lot from this context_pmc branch, and we can now make a more intelligent attempt at the conversion. Here is a basic work list for how we can make this work:

Begin properly encapsulating Contexts behind a standard (if temporary) internal interface.
Figure out the kinds of ways Contexts are being used, and separate them out into different interfaces and possibly different storage structures
Begin converting Contexts into PMCs, only having to change things behind the abstraction barrier we've set up in #1 and #2.

I have more to write about the Contexts conversion, so I will save the rest for another post. However, I think it's reasonable that we could start on task #1 and get a few cleanups merged into trunks before 1.5. It's a big job and it makes sense to break it up into small pieces.

The PIR book finally hit the press this week, and I've already gotten three copies of it sitting here in my apartment. It's a very small little book, more of a "pocket guide" then anything, but it's still a very cool start and I am looking forward to expanding it and improving it in the coming months. I think Allison mentioned that she wanted to have a larger book out for the 2.0 release, and I'm antsy to get working on that.

Speaking of 1.5, I'm going to be playing the role of release manager for it. rgrjr was schedule to do the release, but he has become busy in real life, so I am going to fill in. It's really a testament to the Parrot project management that so many people are capable of doing a release, because having one person become busy doesn't harm us or slow us down at all. You don't realize how important the Bus Number is until you're working on a project with a very high one.

The next two weeks are going to be busy in preparation for 1.5, and I think it's going to turn out to be an awesome release with lots of changes and improvements.

Blog Closed

Pages