Blog Closed

This blog has moved to Github. This page will not be updated and is not open for comments. Please go to the new site for updated content.

Saturday, January 30, 2010

GSoC Idea: Implement a new GC

Blah Blah Blah Parrot needs a new GC. Blah Blah the current GC is lousy Blah Blah.

I've said it all before. At length. Ad nausea. Hell, I even attempted this same GSoC project myself two years ago.

The garbage collector is one of those systems that absolutely effects almost every aspect of Parrot's operation and performance. A bad GC means a slow, obnoxious VM with pauses and huge memory consumption. Of course, there is no one single "best" GC algorithm to use that provides optimal behavior to all classes of programs. This is why Parrot was designed to have a pluggable GC system where the most pertinent GC from among multiple options can be selected.

At least, that's the theory.

Parrot really only has one GC right now and it's a very simplistic Mark and Sweep collector with some less-than-desirable properties. bacek has been making good progress on porting the Boehm GC to Parrot, but that has it's drawbacks as well (though drawbacks are being mitigated). What we need is more people working on it, more people trying new things, and more eyes on the code.

So what kinds of GC could Parrot use? The projects page on the wiki describes three types that Parrot is specifically interested in:
  1. Generational
  2. Incremental
  3. Concurrent
These adjectives are not orthogonal, you could easily have an incremental-concurrent collector, or a generational-incremental collector, or even a generational-incremental-concurrect collector. But, these three things tend to have nice properties.

Generational collectors tend to have nice throughput because we subdivide the memory space and only take the time to mark and sweep a subset. Incremental collectors break the mark and sweep phases up into smaller increments which in turn decrease pause times. Concurrent collectors utilize multiprocessor systems to operate the GC in parallel with the program code and therefore experience no pauses and high throughput (but lousy performance on single-processor systems, or multiprocessor systems with many other running processes).

The aspiring GSoC student working on GC will benefit from two years of hard cleanup and refactoring work from myself and other contributors. Plus, there is strong community support to get a new GC up and running as quickly as possible to replace our current one. It's a project that will be tough no matter what, but where there are huge gains to be made for the Parrot community and a large amount of available support.

As I mentioned above, the GC system is supposed to be pluggable. So in reality you don't need to implement a great GC that will solve all our problems, you just need to implement any GC to prove that the system is, indeed, pluggable. If the GC you implement has some nice properties that's just a nice bonus.

Friday, January 29, 2010

GSOC Idea: Strings and NFG

Unicode is a messy thing. You wouldn't think that something so trivial as representing all the text in all the languages in all the world, keeping track of the relationships between various related symbols and characters, and creating a clear delineation between languages and dialects would be such a hassle, but apparently it is. Pause for a second while I look for a decent way to signal my sarcasm.

Unicode text doesn't have a single format. In fact, the same string of text can be represented in a number of different normalization forms. These normalization forms can optimize for byte-width by combining character codes with modifiers like accents into a single symbol, or breaking these complex characters into a sequence of a character and modifiers. Current Unicode normalization forms are called NFC, NFKC, NFD, and NFKD.

The Parrot design documents speak of a new, special normalization form that can be used internally to Parrot to implement some of our internal string algorithms. We call this new normalization form "NFG" because it normalizes over individual graphemes. In NFG, each grapheme (which is a fancy word for a single symbol) corresponds to a single sequence of characters and composing modifers. In other normalization forms these sequences don't need to be unique.

NFG has some interesting properties that make it useful internally to Parrot as an intermediate form for string operations. Being able to convert strings from multiple encodings and charsets into a single unique character sequence could be a big help in a lot of ways. At the moment when two strings are combined together, Parrot tries to convert directly from the more restrictive encoding/charset to the less restrictive one. For N different encoding/charset combinations, we have potentially N2 such transformations. This is non-ideal.

With NFG we only need 2N transformations: One to convert each encoding/charset combo to and from NFG. Strings are converted to NFG, composed together, and then converted out into whatever format they are needed in. It's really a nice system on paper and I have high hopes that it would be a big benefit to Parrot in terms of scalability and performance.

Parrot needs two things to satisfy the letter of PDD28: A comprehensive string subsystem cleanup and refactor, and implementation of the new NFG normalization form. The later is probably more suitable for a GSoC project, while the former would be a good job for a prospective student to start looking at now to become familiarized with the system.

Thursday, January 28, 2010

GSoC Idea: Asynchronous IO System

I talked to Coke briefly yesterday evening about GSoC. I hadn't heard specifically that Parrot would be applying for GSoC again this year and even though I was hopeful I didn't want to presume. He didn't have a firm answer about whether Parrot was planning to apply as a mentoring orgnization, but he did say "I think we'd be crazy not to". That's good enough for me.

There are a few topics that I've covered more than others on this blog. One of them has been asynchronous IO. I've covered that topic at length, and was hoping to have the time to implement the system myself. Regrettably I don't think I will have time to do it before summer, which means it's ripe for prospective GSoC students to gander at.

Parrot has a basic, synchronous-blocking IO system right now. This is all well and good but let's all face facts: IO tends to be among the slowest operations that a program performs. IO represents a huge bottleneck, especially for certain types of IO-intensive programs. To alleviate this, Parrot's design documents describe an asynchronous system where IO operations are performed in the background while the program itself can continue performing non-IO operations at the same time.

I'm purposefully conflating ideas of "asynchronous" and "non-blocking" ideas, though strictly speaking there are differences between the two. What Parrot really needs is an "asynchronous non-blocking" IO system, which I will just call "AIO" for short.

In an AIO system, IO requests are dispatched to the operating system and program execution immediately resumes. The dispatch operation returns a request object. There are two ways to keep track of the IO request, if you need to keep track at all: First is to poll the request object to see if it's completed (or had an error). The second is to provide a callback with the request and the OS will execute the callback when the request has been satisfied (again, with error information if the request was not successful).

A current Parrot print operation looks like this:

print "Hello World\n"

An AIO implemention would add new forms like this:

request = print "Hello World\n", callback

This presents a lot of flexibility, and could potentially improve performance significantly for IO-intensive programs.

The IO system could use a few miscellaneous cleanups but is generally not in bad condition. A good implementation of AIO will be well-integrated with Parrot's event scheduler, and possibly with with the threading implementation as well. The scheduler is in good condition but the threading system is not. The aspiring AIO developer may find herself making fixes to any of these systems along the way. If AIO sounds like the job for you, you may want to start reading through the source code now so there are no surprises over the summer.

Wednesday, January 27, 2010

GSoC Summer 2010 Projects

I saw an email today from Leslie Hawthorn, the program director for Google's GSoC program. I have not heard official confirmation from anybody, but I can't imagine that Parrot won't apply to be a mentoring organization this year. We've had such good success in the past with it, and so many of the students (including myself!) have stayed on to become long-term productive community members.

Mentoring organizations need to apply by the beginning of the March. Students can begin submitting their applications by the end of March. Before that, assuming Parrot gets accepted as a mentoring organization, I think we should have a few ideas for projects prepared. Of course a student is free to come up with any proposal they want, but I think it's good to have some ideas floating around to inspire people beforehand. Parrot has a wiki page devoted to project ideas like this, though that page is a little threadbare right now. Any ideas I get will be cross-post to that page as well.

I'm going to try to post some good GSoC project ideas on this blog in the coming weeks. If you have a good idea for a project, let me know and I'll write about it. If you're a student and want more information about a particular idea, get in touch and I will do the best I can to steer you in the right direction. Let's hope that this summer turns out as good as previous ones have!

Monday, January 25, 2010

The Problem with PIR

I've not liked PIR for a long time now, but I've never quite been able to put my finger on the reason why until now. It's a reasonably nice language to use for low-level interaction with Parrot and it's certainly more friendly than most other assembly-level languages that I've used in the past. So what's my problem with PIR?

Let's spin the question around and first ask this: What is the purpose of PIR?

When you talk about an assembly language, you tend to talk about a human-readable language that has a 1:1 relationship with the underlying machine code. On an x86/Windows computer for example I can write MASM code, compile that into machine code and then disassemble that back into MASM code that would be nearly identical to the original input. I can use a simple table lookup to convert between assembly language mnemonics and binary machine code words. This allows, among other things, live disassembly of code in a debugger and, within reason, in-place code modification. OllyDbg is a great program in this regard and was one of my favorites back when I was doing more work in this area.

PIR is not an assembly language. It's close, but if you read my paragraph above and take that as a strict definition of what an "assembly language" is, PIR is not that. PIR does not have a 1:1 relationship with the underlying bytecode, and it's not as simple as a table lookup to convert between the two forms. It's not significantly more difficult, but it is not strictly so easy either.

Another benefit to assembly language is that they tend to be easy to parse. Ridiculously easy, in fact. Almost all assembly languages use this format for their instructions:

<label>: <mneumonic> <arg>, <arg>, <arg> ...

If all your statements take this form (where the label is typically optional and there can be a different number of args, depending on the particular mneumonic), parsing is painfully simple. And fast. Here's pseudocode for a basic assembler using statements of this form:

Statement * stmt;
while(! file.end_of_file()) {
char * line = file.readline();
byte[INSTR_LENGTH] machinecode;
stmt = parse(line);
machinecode[0] = mneumonic_to_machine_instr(stmt.mneumonic)
for (int i = 0; i < stmt.args.length; i++)
machinecode[i+1] = convert_to_machine_addr(stmt.args[i]);

This code example calls several simple, single-purpose helper functions which I won't discuss in detail here but I could explain if any of the readers were interested. The point I am trying to make is that we could make a reasonably fast, stable assembler for a PASM-like assembly language if we optimized for parse-ability. The total parser could be within 300 lines of code and be completely readable and straight-forward.

This assembler, of course, wouldn't provide hooks for things like optimizations but that's the point exactly. What comes out of this assembler would have an exact 1:1 relationship with what went into it. We could make a disassembler that had almost exactly the same program structure too, and do complete back-and-forth translations of the code losing nothing but variable and label names, comments and whitespace.

And herein lies an interesting dichotomy: assembly language is the lowest-level interface to the machine, but is almost completely disregarded in favor of coding in C or C++. Most programmers don't want to program assembly. Most programmers shouldn't. I can entertain arguments about that point in the comments or even address them in a different post. Assembly language can be a huge waste of time, a huge headache, and your bug rate will tend to be higher than if you used a higher-level language. If the computer that you are using right now has software which represents over a billion lines of source code total, I would estimate that less than one ten-thousandth of one percent of that code was written directly in assembly language. Your bootloader is probably entirely written in assembly language, and small snippets in your kernel and device drivers may have been as well, but that's it. Almost nobody else uses assembly for anything because it doesn't make sense to do it. For almost the same capabilities and almost none of the same headache (or, different but less severe headache) people use C or C++ instead for low-level applications.

I've digressed, but I think some readers are going to understand my point. PIR isn't really an assembly language; it's had too many additions and syntax "improvements" to aid in readability and usability for the developer and has lost the intimate relationship with the underlying bytecode representaton because of it. At the same time, it's not really not an assembly language either: It's not as high-level, comparatively, as C is on the hardware machine and doesn't really provide the same kinds of structures or paradigms as a normal structured programming language would do. It's nice for the programmer, but not nice enough. Plus, programmers don't want to be using an assembly language anyway. Why pretty up an assembly-level language for developers in the first place if developers don't want to use assembly-level languages anyway? Why not just give them a low-level structured programming language that presents structures and idioms that they will be more familiar with? Sure, PIR offers a large series of complex and limited macros to simulate high-level control stuctures (which, by the way, are error-prone on many levels and difficult to parse), but why not give developers a real language with those structures properly integrated into the language from the beginning?

So what's the alternative?

NQP seems like a decent language to jump in and serve as the "system language" of choice for most developers on Parrot. Because of it's use in PCT, NQP does enjoy a certain amount of "market share" among developers and that kind of installed base is hard to ignore. Also, it's well-documented. I'm a little bit mistrustful of NQP because it is tied very closely to the Perl 6 language and doesn't necessarily represent the needs of Parrot developers in an agnostic way. I don't have any particular complaints, just a general mistrust of relying on software that has a different motivation than what I need. Also, NQP isn't even hosted in the Parrot repo, though the newest versions of it are copied to the repo for releases. A good systems language, in my opinion, has no motivation besides effectively bridging the gap between the raw power of the machine and the friendly abstraction needed by the programmers. I have high hopes for Austin Hasting's Close language, though that project appears to be on hold for now while he works on other foundational things. We want something simple but structured, with access to the intimate details of the Parrot VM but with a layer of abstraction to help keep programmers from getting lost in the arcana and minutia.

What I would like to see on the road to 3.0 are these things:
  1. PIR should really be deprecated as the primary interface to Parrot. To program on Parrot you should either use PASM (typically as the raw text-based output of automated code generators), NQP (or another similar system language), or one of the HLLs. PIR does not satisfy a useful niche, it has been thrust upon people as the way to program on Parrot when it doesn't have any particular merits in this regard.
  2. PIR can continue to exist, but as a separate compiler and not a part of Parrot directly. PIRC should be setup as a stand-alone compiler tool that converts PIR->PBC. The Parrot executable should have the ability to load this as an HLL compiler, but Parrot should natively act on PASM and PBC only.
  3. PASM should be fixed to accurately represent the Parrot bytecode format in a 1:1 way. It's not far off from this, but a few fixes are needed. PASM should also not have things like macros or any kind of "syntax sugar", and should have limited assembler directives.
  4. PCT should be updated to directly output PBC or, barring that, to output PASM instead of PIR. Of course, once we have PASM output, it's a matter of a simple table lookup to convert that into PBC. If PASM is the primary human-readable interface to Parrot, and if we have simple, fast routines for converting PASM to PBC on the fly, more programs and utilities will output PBC directly. This, in turn, improves performance for everybody.
  5. Extend NQP or whatever systems language we want to push as the default to have access to all or almost all capabilities present in PASM. Possibly give the option to inline small PASM sequences easily.
There are lots of benefits to doing this, I think: Decreasing developer confusion, increasing performance, and gaining momentum to build languages that people actually want to use and have fitness for particular purposes, etc. I think it's a really good idea and I would like to hear feedback about it. I say we let the assembly languages (PASM) be assembly languages which are intended to be close to the machine, and that we give developers structured programming languages by default that people will actually be comfortable programming in. It seems win-win to me.

Sunday, January 24, 2010

Parrot: the 1.x months and beyond

Parrot 2.0 is out and the era of 1.x releases is behind us. 2.0 was envisaged as being "production ready", though I think we can all take a look at the final result and agree that this isn't the case. It was a motivation, no doubt, but the released software fell pretty far short of that goal.

But then again, maybe what I think of as being "production ready" is different from what other people think.

In contrast, Parrot 1.0 was described as being a "stable API for language developers", and in hindsight I think we can all also agree that this wasn't the reality of that release either. I'm not trying to be disparaging here, but I do want to take a critical look at these two releases and try to see what is going on with Parrot.

1.0 was stable in the sense that we had the deprecation policy and could not consciously pull the rug out from under our compiler devs without adequate warning. That said, we were making pretty significant changes to the API (defined loosely as the ops, PMCs, command-line options, C embedding API and C extending API) every opportunity we got. I have also said before that the API that we had stabilized at that time was hardly golden. There were plenty of warts, and the problem with that deprecation policy is that those warts are around to stay. We can't just fix a problem or cleanup a bad interface, we are stuck with it until the next deprecation point. But, that's only if we put in a deprecation notice sufficiently early. In some cases HLL developers, most notably Patrick Michaud from the Rakudo group, were urging us to make changes faster than the deprecation policy allowed because these warts were too taxing to work around for months on end.

In hindsight, I think we can better label 1.0 not as a stable API, but instead as a critical maturation point for the community. 1.0 was a coming of age. It was a time when the community got it's act together, put some policy together, outlined our priorities, and started making promises to people. Say whatever negative things you want about our deprecation policy (everybody knows I do), but at least we have a policy.

The magic 1.0 (and now "2.0") numbers are a little bit misleading because not a lot of people understand the way we do numbering. People think 1.0 means it is "complete", when any of the Parrot devs will tell you that this is not the case. We number and release according to the calendar, not according to the state of the repo. Similarly, we released 2.0 because the calendar said to, not because we had implemented any specific feature or reached any specific milestone. The tagline "production ready" was only a vague motivation and not the final result.

That said, what was the result of 2.0?

1.0 as we mentioned was a critical maturation point for the community. 2.0 then, I think, provides that stable API that we would have liked to have had 9 months ago. This is not to say that all our warts have washed away, but the API is in a much better condition now than it was when 1.0 came out the gates. Since the 1.0 release we have done a lot of cleanup, refactoring, and improving of various systems. It's worthwhile to mention that we haven't added a whole ton of new features during that time. It really has been 9 months of non-stop cleanup and because of that effort we are in a position that I would be happy to call "stable".

Now where are we headed? What will 3.0 look like in a year?

Coming up to 2.0 we've done much cleanup. Systems are improved, refactored, encapsulated. We've improved naming conventions and code style. All the groundwork has been laid to start adding some of the blockbuster new features that we'll need to really get Parrot accepted into the world.

When I think "production ready", I think of a few key concepts: Robustness, Scalability, Performance. Business simply aren't going to employ software that is buggy, cannot adapt to different sized tasks, and makes inefficient use of expensive hardware. We can call a piece of software "production ready" all day long and pat ourselves on the back over how great Parrot seems to us, but if we haven't satisfied these three principals nobody is going to use it. So, how do we get these properties? What do we need to do between now and 3.0 to truely make Parrot business-ready?

In terms of robustness I think we really need to focus on two things: Cleanup and documentation of our external API, and improving the comprehensiveness of our test suite.

For scalability, we absolutely need to rein in our memory usage and I think we also need to significantly improve our threading implementation.

For performance I could rattle off a laundry list of things we need to improve but I will limit my list to only a few:
  1. Improved Garbage Collector
  2. Add [pluggable] optimizations to PCT
  3. Enable PCT to output bytecode directly
These three things will improve performance for most applications, while improvements to other individual subsystems (such as MMD or IO) will improve performance for smaller classes of applications.

If we can get address the three priorities of Robustness, Scalability, and Performance, I think that 3.0 can truely be the production-ready release that we've been saying 2.0 was going to be. Because of all the wonderful cleanup work we've been doing in the past few months, I think the stage is set to really get to work on these things and make that goal happen.

Saturday, January 23, 2010

Neko VM

There has been some traffic on the Parrot-developers mailing list recently involving the Neko VM. In terms of capability and scope, it does appear that the Neko VM is very close in scope to Parrot: a virtual machine designed to host dynamic languages. There are plenty of similarities between the two projects but a few differences too which I will try to talk about below.

This discussion really got started when people saw some incorrect ideas expressed about Parrot in the Neko FAQ:

Targeting Parrot requires you to learn another language which is more complex that Neko itself, with different possibilties at different levels (low level PASM, medium level PIR, high level NQP).

It is also difficult to differenciate beetween the language and the standard library because of the numerous cores apis (PMC) whereas NekoVM has a single builtins core api which a single highlevel language with minimal syntax and core types.

Parrot is written in C while Neko compiler is written... in Neko. The language is fully bootstrapped right now. Also, Neko is lightweight and the Virtual Machine is only 68 KB on Linux and 40 KB on Windows, while still offering a very good speed.

It's obvious from this and other mentions of Parrot I have seen around TEH INTERWEBZ that the marketing wing of Parrot isn't doing a lot to dispel these kinds of common misconceptions. Let me do what I can here.

The Parrot VM runs natively on Parrot bytecode, with the most common human-readable form of that being PIR. I see people talk about PASM all the time and I think this is a huge mistake. Nobody uses PASM. Basically nobody should use PASM either. It's mostly useless, ugly, and limited. PASM exists only as a direct human-readable translation of raw bytecode in lieu of any good utility to convert bytecode to (unnuanced) PIR or to do PIR-level debugging. Seriously, forget PASM exists.

PIR is the more friendly, more familiar programming language that people should use to interact in a basic, low-level way with Parrot. PIR is to Parrot what HLA is to the x86 platform: it's basically an assembly language but with some improved syntax, macros, and other nice features. Most coders will be reasonably familiar with the syntax of PIR, though it will feel primitive and clunky.

The FAQ above says that Parrot is written in C, while Neko is written in Neko. This is a pretty big distortion of the truth, and is certainly not comparing apples to apples. The Neko compiler frontend is written in a boot-strapped version of Neko itself. This much is true. However, the Neko VM that executes the compiled bytecode is itself written in C. Likewise, Parrot's core is written in C and the PIR compiler is written in C (with flex/bison magic), but our premier compiler utility, PCT, is written in Parrot. There is a difference in scope here, but if you are writing software of any significance on Parrot your code is more likely to be compiled by PCT than by Parrot's native PIR compiler.

The second paragraph of the FAQ is problematic too. It talks about the differentiation between the language and the runtime, which doesn't make sense in the context of Parrot. Parrot VM is, at heart, only a runtime. The PIR language exists only as a way for people to interact with that runtime. None of the Parrot developers would call the PIR language a "feature" of the VM in the sense that the language has any value outside of the Parrot project. The Neko language is in a different league from PIR because it is much higher-level than PIR is. In a direct side-by-side comparison it looks to be more similar in terms of capability and scope to NQP. PIR is a human-friendly low-level interface to Parrot, and once some of our tools mature sufficiently it is reasonable to expect that PIR may disappear completely. That occurance is still a long time away but there is no reason to suspect that it cannot or will not happen.

As an aside, since the Neko language is designed to be easy to parse using an LL(1) compiler (which PCT's recursive descent parser is), it would probably be very easy to create a Neko compiler frontend for Parrot. The benefits in interoperability might be quite nice, because any language that has a compiler into Neko could then be made to run directly on Parrot. But, that's an aside that I won't talk about at length here.

NQP is explicitly designed as a language without a runtime. Parrot provides a number of default object types (PMCs) for basic use, but they can almost all be easily overridden with user-defined types in C or PIR as necessary. In a sense, the PMCs themselves represent the low-level data structures that user-defined objects share. But they're more dynamic and general than that.

I don't see a huge difference in terms of the design goals between PMCs and Neko's underlying object model. Parrot's version appears to be a little more heavy-weight and a little bit more powerful because of that, but I would need to look much more closely at the Neko model before I make any firm conclusions. Notice that "heavy-weight" can very much be a bad thing to some people, as much as "powerful" can be a good thing to others.

The final bit of the last paragraph in the FAQ is where some of the real differences start to emerge. I don't know the complete in-memory size of the Parrot binary is but I have to suspect it is bigger than 68K on Linux and 40K on Windows. If a small footprint is your driving criterion, maybe Neko is a better choice for your language than Parrot is.

Looking at the documentation Parrot really seems to be a more ambitious project with a larger scope than Neko. I'm not going to get on my soapbox and say that bigger is always better because it isn't. Parrot, by design, offers more features and capabilities than any one language would ever need. Sometimes developers need a smaller platform with a smaller footprint. Sometimes people need more power and capability. I think you would be hard pressed to try and run a standards-compliant Perl 6 implementation, along with decent Tcl and M implementations on Neko VM. At least, I can't imagine that you could have all these things without a significantly larger development effort than those compilers currently require on Parrot.

I'm not sure what all features Neko VM supports or intends to support, but I can tell you that Parrot has several great features and all intentions of adding some other major features in the future: GC, Exceptions, Asynchronous I/O, JIT, multithreading, and aggressive optimizations. We're developing a large ecosystem of langauges, libraries, and tools for use with Parrot. We also have a world-class compiler generator tool in PCT, which I think any other project would be hard pressed to rival.

I've got the Neko sources now and am going to build it and play with it a little bit. I'll report in when I have done more playing.

Friday, January 22, 2010

Parrot 2.0: Personal Retrospective

A while ago I put out a list of tasks that I wanted to see in Parrot 2.0. This wasn't necessarily a list of tasks I wanted to perform personally, more like a wishlist of tasks that I wanted to see in 2.0 and would have been willing and capable of doing myself if necessary. In other words, it was a list of projects I was personally interested in, even if I didn't have the time to complete them all myself.

Now that 2.0 is fresh out of the factory I think it's a good time to go back over this list and review the status of all the things I wanted to have been included.

GC Internals: At the time I wrote the list this task was already done. What I wanted to do was clean up the GC internals and properly encapsulate things so we could add new cores more easily. I had done that work back in May, and since then other work has happened to cleanup and improve the GC internals for clarity and encapsulation. In large part, I consider this task complete. Final Result: Done.

Incremental GC: This didn't happen, though in the buildup for the 2.0 release a lot of excitement and motivation has been generated to improve the GC. I suspect we will see a Generational GC first, but an incremental GC cannot be far behind (especially if interest in the RTEMS port continues to grow). Final Result: Not Done.

Generational GC: See my comments above. Interest is building and I think the community has at least a tenative agreement that the next GC core we work on will need to be a generational one. Downside to this is the likely need to add write barriers throughout the codebase which can be costly in terms of performance and code bloat. However, we should see performance improvements from it. Final Result: Not Done.

Boehm GC: Bacek put this together recently, but unfortunately the results were not as good as we expected. I did a brief analysis about why performance might have been a problem with the Boehm collector in Parrot, but without doing more benchmarking myself I really don't know what the problem is. Maybe 2010 will see some improvements to this core and better performance as a result. Final Result: Done.

Context PMCs: This was another project that Bacek put together, and the result was quite spectacular. It only lasted for a while, however, before the Context PMC got merged with the CallSignature PMC into the new CallContext. Some tests broke and performance isn't as good yet as I think it could be, but the hardest part is done now. Final Result: Done.

RetContinuation Unification: I had anticipated that RetContinuation would be merged into Context first, and then that PMC would have been merged with CallSignature. Things happened in the opposite order with CallSignature and Context being merged into CallContext instead. As of the 2.0 release RetContinuation is still a separate entity, though there are rumblings about changing that and cashing in on the performance win. Final Result: Not Done.

Subclassable Exception Handlers: I'm not really sure where this task ended up, though I assume the work on the Context PMC would have enabled it. At the very least (and I say this without doing any testing or exploring the codebase) the hardest part of the work is now done. I'll mark it off as a success. Final Result: Done.

PCC Refactoring: This was done and ended up being a herculean task. Also, the new (and cleaner!) code brought along a performance penalty that we've been scraping and clawing to mitigate. One thing that cannot be argued though is that the mechanism has been significantly improved and various optimizations and feature additions will be much easier now. Final Result: Done.

CallSignature Unification: My original intention for this task was that the Context+RetContinuation PMC would be merged into CallSignature. Things happened in the wrong order and CallSignature got merged into Context first. Final Result: Done.

Subclassable Everything: This task, which would allow all PMC types to be completely subclassable from PIR, was daunting and perhaps a little bit too ambitious. First stage required that most fields in existing PMCs were converted to one of the "big 4" types (INTVAL, FLOATVAL, STRING, PMC). Second stage required that we come up with a sane way of working with opaque pointers and data items from PIR, which we don't have yet. Final Result: Not Done.

:invocant: To my knowledge this one hasn't been added yet. With the new PCC system in place it should be trivial to add, but i haven't seen it done yet. Final Result: Not Done.

Morphable Objects: I can't even really remember what I wanted out of this task, but in hindsight the idea of morphing objects between types doesn't sound like something I would really want to stress. Final Result: Not Done.

Override VTABLE Invoke: I believe this one was added shortly after the PCC refactors landed, though I haven't played with the new implementation myself. I'll mark it as being completed for now. Final Result: Done.

IO Cleanups: The IO system is decent but does need some work still. I was hoping we would see a proper pipes implementation, and then once we have all the major fundamentals I was hoping we could get some of the various codepaths unified and simplified. This one, unfortunately, did not happen the way I wanted. Final Result: Not Done.

PIR IO Objects: The goal was to properly subclass IO object types from PIR. This would require both the aforementioned "IO Cleanups" with the pipes implementation, and maybe some variation of "Subclassable Everything" so we could manipulate the low-level buffers from PIR. Sadly, this did not get completed. Final Result: Not Done.

Asynch IO: I got derailed on this project, which I put quite a lot of planning effort into, when the IO Cleanups project got stalled. I still have all the best intentions of getting this feature added, but sadly it did not make it into 2.0. Final Result: Not Done.

IO Unification: This was dependent on the Asynch IO task. The goal was that we could merge the code paths between AIO and regular IO, implementing some blocking primitives in terms of their non-blocking counterparts. Without AIO, this was impossible. Final Result: Not Done.

Concurrency Cleanups: The concurrency system is a mess and I was hoping to change that. Plus, AIO needed improvements in this area to allow us to properly integrate things the way they should be. I never took the time to tackle this one and since it was deemed lower priority I don't think anybody else did either. Final Result: Not Done.

Update PDDs: I haven't spent hardly any time on documentation recently, and the docs didn't get improved to the standard that I was envisioning. Other people did work on these but I am not sure if it's the kind of comprehensive overview that I was hoping for. Final Result: Not Done.

Parrot Developer Guide: The first published book was a PIR users guide. I was hoping to at least start second book, a more in-depth guide for internals developers. This never got off the ground. Final Result: Not Done.

From among 20 tasks that I was personally interested in seeing included in the 2.0 release, 7 of them were completed. That's actually not a bad number considering the work we did accomplish as a development team, the pathetic level that I have been contributing at for the previous few months, and the enormous scope of some of these tasks.

Sometime soon I will try to post a wishlist for the upcoming supported releases: 2.3, 2.6, 2.9, and 3.0.

Wednesday, January 20, 2010

Parrot 2.0 is Out

Behold! Parrot 2.0 has been released by chromatic. 2.0 is going to be a supported release that most other projects and end users should be able to target confidently for the next several months.

Keep in mind that Parrot uses time-based and not feature-based releases. So 2.0 doesn't strictly satisfy any pre-set group of specifications. It does represent a stable and well-tested version of the software however. The number of feature addititions or performance enhancements from 1.9.0 is actually relatively small, but we have a lot of confidence in this release because of all the testing that has been done to it.

I'll have plenty more things to say about 2.0 in the coming days.

Saturday, January 16, 2010

Parrot Test Matrix

In anticipation of Parrot's 2.0 release this thursday I've been going crazy trying to run tests on as many platforms as possible. Since I've started using VirtualBox I've been able to dramatically increase the number of platforms where I can do testing. Here is a matrix of all the various platforms where I've done tests, and what the results of each are.

x64 Ubuntu 9.04
Ubuntu 9.04 is my primary operating system that I use on my personal laptop. Due to problems with the upgrade I will not be switching it to 9.10 any time soon. I used to have a second partition with 64-bit Windows Vista, but after my last reinstall I wiped that partition out. Any Linux variant is going to have decent support for Parrot. Many of Parrot's developers and current end-users use Linux, so we are typically able to keep it running smoothly there.

GCC 4.3.3: Nothing to see here, GCC builds always just work. This is probably because a majority of Parrot developers and testers are working on some combination of Linux/GCC. Test failures on GCC, when they appear at all, tend to be short-lived and quickly fixed. My most recent test runs have shown no errors here. Result: No Failures

perl --cc=gcc --link=gcc --ld=gcc

G++ 4.3.3: The G++ builds almost always perform as well as vanilla GCC builds do. The primary difference to worry about are the few new keywords that C++ defines but C89 does not. Variables called "new" are a great example of something that breaks the G++ build. Assuming the software builds with G++, it typically passes tests just as well as the normal GCC build does. My most recent test runs have shown no test failures here. Result: No Failures

perl --cc=g++ --cxx=g++ --link=g++ --ld=g++

Intel C++ 11.1: Intel offers it's C++ compiler, ICC, free for non-commercial use on Linux. It's a bit of a pain to install and it isn't open-source, but it's a good compiler nonetheless. By default ICC outputs many helpful warning messages about troublesome code that GCC ignores. Also, ICC has some extremely powerful and aggressive optimizations that can make Parrot noticeably faster than a GCC build in some benchmarks. Some test failures do show up in the ICC build, but all of them that I have seen involve incorrect handling of "negative zero". I've posted a question to the list about these errors and didn't see any definitive replies that would have helped me fix these tests. So long as the software you are writing does not depend on proper handling of signed zero, this shouldn't be an issue. Result: 5 Test Failures

perl --cc=icc --link=icc --ld=icc

Clang 1.1: Clang isn't packaged on Ubuntu, so it isn't as easy to install as GCC or G++. ICC comes as a binary with an automated installer. Clang, on the other hand needs to be built from source. Luckily this build, though lengthy, tends to occur without any problems or magic incantations. Parrot built with Clang performs very well and typically doesn't experience any additional failures from the GCC build. Also, like ICC, Clang outputs a number of warning and diagnostics messages by default that can be very helpful. Result: No Failures

perl --cc=clang --link=clang --ld=clang

x86 Ubuntu 9.10
I run Ubuntu 9.10 in a virtual machine because there were so many problems running it natively on my personal laptop. However, since my processor doesn't support hardware-level virtualization I cannot virtualize a 64-bit version of it. Ubuntu 9.10 uses more recent versions of GCC and other build utilities by default, although it's not the cutting edge.

GCC 4.4.1: Again, this is basically Parrot's native environment. As with the GCC build on x64, there are plenty of developers using this platform, plenty of end-users using it, and very few bugs show up here. Result: No Failures

G++ 4.4.1: Again, G++ performs very similarly to GCC and so long as the build isn't broken because of misuse of C++ keywords, there tend to be no extra test failures. Result: No Failures
x86 OpenSolaris 2009.06

Intel C++ 11.1: Similar to my notes above: Install is a pain and is not open source. However, the build is good and optimizations are top notch. ICC on x86 has the same negative zero failures that the build on x86-64 has. For most applications, these failures won't pose any problems for end users.Result: 5 Test Failures

Clang 1.1: Clang always performs similarly to GCC in my experience, and testing produces no challenges to that rule. No problems here. Result: No Failures

x86 OpenSolaris 2009.06:
Using VirtualBox has allowed me to test out some new operating systems that I would never have gambled on otherwise. OpenSolaris is an interesting platform that is very Linux-like in most regards. It has Gnome and Bash, so any Linux user will feel mostly at home with OpenSolaris. There are a few places where the OS feels a little bit more fragile to me. For example, one installation of it started having Xorg problems after I made a change in one of the network adaptors, and I haven't been able to use it since. Intel does not appear to provide a free version of ICC for OpenSolaris, though I haven't tried yet to use the Linux installer. I tried and failed several times to build Clang on OpenSolaris.

GCC 3.4.5: Part of me is amazed that OpenSolaris uses such an old version of GCC by default. If you're trying to keep your platform stable and rock-solid it can be easy to fall into the trap of never upgrading (or upgrading very slowly) the components that "just work". Also, I absolutely refuse to build GCC myself, especially on systems that already have it available (or have a package downloadable). Even though GCC 3.4.5 is a stable and dependable release, it still doesn't produce a properly-performing Parrot binary. There are several test failures, espcially involving mathematics functions asin, acos, atan, ln, log10, and log2. I haven't been able to dig too deep into any of these, but it is appearing like some of the libc functions of the same names are just returning incorrect values. Also, there are some very troubling failures in t/compilers/complete_workflow.t that I haven't looked at in depth yet. Result: 15 Test Failures

G++ 3.4.5: Same as GCC. Result: 15 Test Failures

x86 Windows XP
On a windows platform there are basically two routes to build Parrot: The first is to use ActivePerl and MSVC, the second is to use Strawberry Perl and the included MinGW compiler. Mixing and matching causes problems. This is partly because Parrot still derives too many of it's configuration settings from the Perl binary installation, and on Windows there are too many differences to worry about. Since we can't mix and match, to test multiple compilers we need to have multiple Perl installations, which can get really ugly if we want to have at least one Perl install in your PATH. On my current Windows system I have not taken the effort to install both toolchains in parallel and properly sandbox them. So, for the time being, I am only using ActivePerl/MSVC. The build with Strawberry Perl and MinGW tends to perform very similarly to the GCC build on Linux, so it isn't as interesting to me. Intel does not provide a free compiler for Windows platforms, even for non-commercial uses. At least, I haven't found such a download. I have not tried to build Clang on Windows.

Microsoft Visual C++ 15.00: The build here is always a little sketchy, there are some tests that are marked TODO but always seem to pass anyway. There are a few other tests that fail intermittently. When tests on this platform fail they tend to remain failing because we have so few developers working on Windows. Sometimes we may not find out about and work to fix errors on Windows until moments before a release. Also the build is a little slower, Microsoft's nmake utility doesn't appear to support (or doesn't support well) parallel builds. Despite the flakeyness of this system, at the moment there are no failures here. Result: No Failures

x86 Ubuntu 8.04
This is a slightly older platform, and I would have ignored it entirely if I hadn't heard rumor of problems here. What's so interesting about Ubuntu 8.04 is that it's a long-term support release so it's very possible that Parrot end users could be using this system for some time to come. The GCC and G++ builds on this system, despite the rumors I had heard, go off without a hitch. It's interesting that, even though this OS is more than a year old, it uses a more recent version of GCC than OpenSolaris does. I haven't bothered to try installing ICC or Clang here, yet.

GCC 4.2.4: Build works without problems and all tests pass. Result: No Failures

G++ 4.2.4:
Surprise, Surprise: no difference from the GCC build. Result: No Failures

There are a handful of other systems I have been looking into, but don't have any recent test results to share. Fedora 12 uses a more recent version of GCC for instance, but VirtualBox 3.1.2 isn't compatible with the version of Xorg that's on Fedora 12, so I haven't used Fedora since I upgraded VirtualBox. I haven't been able to get any BSD variants working in VirtualBox either, even though the documentation seems to suggest OpenBSD and maybe FreeBSD should work. I no longer have access to a 64-bit Windows environment either. Apparently it's not even legal to run any Apple OS's without Apple hardware, so I haven't attempted to use any of them.

Monday, January 11, 2010

BookDesigner Under Development

I had been putting it off recently because it was "good enough" and I had other items on my todo list with higher precidence. However recently I've spent a few hours working on my MediaWiki Book Designer extension, and I'm starting to make some real progress with it.

As a bit of history, I started working on the designer when I was an active Wikibookian. It was born from my experience helping new users to get started writing books at Wikibooks, which ended up being unnecessarily complicated. The MediaWiki software is primarily developed and targeted for use with Wikipedia, not Wikibooks after all.

A graphical book outlining tool was one of the major deliverables requested from my contract with the Wittie group, and I was already mostly finished working on the first version of it on Wikibooks. As part of my work on the Wittie project, I slimmed the designer tool down to an easy-to-use subset of features, made a few improvements to the UI, and even wrote up a proper PHP backend to it that I didn't have access to do at Wikibooks.

Not too long ago I pushed my work to Github, where it sat without much fanfare.

Late last week I got a message on github from the first confirmed independent user of my extension. Also this week begins the spring semester and the Wittie group have been asking for a few improvements to the extension so they can begin creating new books for the new semester. These things together have spurred me on to begin a new round of hacking. Specifically I'm working on UI improvements, decreasing memory consumption, i18n compliance, and a series of other small fixes and feature additions.

I've also been spurred on to setup a proper testing environment on my personal laptop. Luckily this is quite easy in Ubuntu: install the mediawiki package, enable mediawiki in the Apache config file and restart Apache, setup the wiki, checkout the BookDesigner gadget into the extensions folder, and add a single require_once() directive in the wiki PHP code. Probably took me less than 5 minutes total.

It's too bad I'm not an active Wikibookian anymore, this tool would really be making my work there much easier. When it gets a little bit more mature I think it will be a real boon for wiki-based book writers. Still plenty of work to do before we reach that point, however.

Monday, January 4, 2010

Boehm in Parrot

It's a project I've talked about before, but not one that I took the time to actually work on: Getting the Boehm-Demers-Weiser GC ("Boehm GC" for short) in Parrot. Recently, prolific Parrot hacker Bacek did just that, creating a new branch to add Boehm as a new GC core in Parrot. It hasn't been merged into trunk just yet, pending some changes to the configuration system.

On one hand it's exciting--very exciting--to have a real production-quality GC core in Parrot. On the other hand, performance is horrible with the Boehm collector. Well, maybe not "horrible", but worse than we have now. Despite the high quality of the Boehm collector and the hard work of Bacek, the new core may prove to be little more than a carnival oddity because of poor performance. The big question to ask first is "Why does Boehm perform so slowly in Parrot?", perhaps followed by "...and is there anything we can do to speed it up?". I won't really tackle the second question here, it's far too big for a single blog post.

I've given a brief overview of GC concepts in the past, but I didn't really cover all aspects of GCs. There isn't enough room in a single blog post to do it (although I am hearing very good reviews about a new book about Garbage Collection that might be able to cover things in more detail). One aspect of GCs that I did not mention in my previous post is conservativeness.

Memory gets broken into regions that we will call "spaces". Spaces can be used for all sorts of things and most programmers will already be familiar with them: stack space, heap space, program space, etc. We know that in our running program all PMCs are allocated in heap space, and are referenced from both heap space (pointers contained in PMCs to other PMCs) and in stack space (PMCs being actively used by C code. If we need to find and follow all pointers to PMCs, as we do in the GC mark phase, we can accomplish this by traversing the stack and heap spaces linearly and following every value that looks like a pointer to heap space.

The problem with a linear memory traversal is that you do it blind: you do not know whether any given value is a valid pointer or not. You can only assume that any value you encounter which--when treated as a pointer--points into heap space, you just assume that it's a pointer. This behavior creates lots of false positives, because values that are not pointers are treated like pointers and followed like pointers. We call this behavior conservative. A Conservative GC, like the Boehm collector, is completely disconnected from the operations of the program, and so must blindly traverse the heap and stack space following all pointer-esque values, whether they are actually pointers or not.

The alternative type is a precise GC, which knows where pointers are beforehand and never has false positives. A precise GC can avoid scanning large regions of memory that are guaranteed to not contain any pointers. It can also avoid scanning numbers that happen to look like pointers into heap space, because it knows which fields are numbers and which are really pointers. Precise GCs are almost always faster than their conservative counterparts, though they require significantly more integration into the rest of the program. This creates both an encapsulation problem and requires ongoing maintenance of the GC's algorithms as program data structure formats change. The GC has access to more information, but the program now has the responsibility to ensure all that information is consistently used and remains accurate.

Parrot's current GC is precise. At least, it attempts to be. Each PMC type defines a "VTABLE_mark()" routine, which can be used to mark the PMC and all the pointers that it contains. Each PMC type knows exactly where it's pointers are, so it can focus it's efforts only marking things that actually need marking. Where Parrot's GC fails to be precise is in the stack-walking code. I've complained about that before and won't really rehash it here. Suffice it to say that if Parrot got cleaned up to avoid unanchored PMCs on the stack and therefore to not need to stackwalk for a GC run, it could provide a nice performance improvement.

But, I digress. The Boehm collector is a stand-alone module that obviously has no intimate knowledge of the internal workings of Parrot. Boehm doesn't know some of the intentional simplifications that Parrot uses to keep track of it's PMCs and to keep them efficient for use in the GC. Let's look at some C code:

int * x = (int *)malloc(sizeof(int) * 32);
int * y = &(x[16]);
int * z = &(x[32]);

In this code snippet we see our aggregate object x, and a pointer y which points to a memory location inside x. When Boehm does a stackwalk here, it sees the value y, determines that it's a pointer into heap space, and then must trace backwards from there to find the start of the aggregate object x. Boehm must then compare the size of x to the distance from there to y to ensure that the pointer y is actually located inside x. Next we find the value of z, which points outside the allocated buffer of x. I don't know why you would intentionally write the code above, but that's hardly the point. Boehm sees that z points into heap space, traces back until it finds x, determines the size of x and then sees that z points outside of it. That's quite a lot of work to mark pointers, but it's necessary when you consider that normal arbitrary C code can create pointers to normal arbitrary locations inside a structure or array. Boehm needs to be very conservative in order to be usable by a wide array of software, much of which won't have the same design for memory objects or access that Parrot has.

So back to the question at hand: Why is Boehm so slow on Parrot? The answer is that the Boehm GC is a high-quality tool that needs to be general enough for adaptation to a wide array of programs. It sacrifices intimate knowledge of Parrot's internals (which it can use to focus and optimize it's behavior) in exchange for improved generality. On the other hand Parrot's current GC, while naive, does have this intimate knowledge that it can use to improve performance. If we know where all the pointers are ahead of time, we can avoid all the effort of having to search for them.