Blog Closed

This blog has moved to Github. This page will not be updated and is not open for comments. Please go to the new site for updated content.

Wednesday, September 30, 2009

Future Flow Control and PCC Hackathon

Contexts are PMCs now. They aren't really usable from PIR yet in a real sense, but the first steps have been taken. Continuations should be more introspectable from PIR. ExceptionHandlers are in line for a major overhaul, and the PCC refactors promise to (eventually) redo the way we pass arguments to and fro. Things are changing in the world of Parrot control flow, and the user is going to be the real winner from all of it.

So the question comes up: What will Parrot's next generation control flow look like? What kinds of exotic control flow will be be able to support? What kinds should be aim to support? What kinds of cool new paradigms could we invent? We have Subroutines, Methods, Multimethods, Coroutines, Continuations, ExceptionHandlers and Closures. What can we get from combining these? What benefits can people see by being able to introspect Contexts and CallSignatures from running PIR code?

An ExceptionHandler Coroutine is one that I've been particularly interested in:

.sub 'WhenBadThingsHappen' :handles('foo')
say "One exception!"
.yield()
say "Two exceptions!"
.yield()
say "Too many exceptions!"
exit 0
.end

Syntax of course will vary, but you can see what I am planning there.

ExceptionHandler subroutines are going to need at least two possible resolutions: Either they properly handle the exception and resume normal control flow (by jumping to a preset resume Continuation or Sub), or they rethrow the exception. A rethrown exception moves to the next handler, and when there are no more handlers to catch it we exit the runloop and rethrow in the previous runloop (or exit with an unhandled exception if there are no more runloops).

In current ExceptionHandlers, we can use ".get_results" to get access to the thrown exception object only. What if ExceptionHandlers were instead curried Continuations that we could use to pass all sorts of arbitrary arguments to it?

$P0 = new ['ExceptionHandler']
push $P0, arg1
push $P0, arg2
push_eh $P0

We have a newclosure opcode which clones a sub and captures it's lexical scope. What if we had a newhandler opcode which cloned an existing sub, captured it's lexical scope, and marked it as a handler? That would allow us to use any arbitrary sub as an exception handler, treated as a lexically enclosed scope of the function where it is created?

With a little bit of work Contexts are going to be usable and introspectable from PIR code. Bacek has already done some work making data from Contexts visible to PIR code, which is a step in the right direction. What kinds of things will we be able to do with these, and what new things do we want? Manually manage registers? Iterate over registers like an array? Call another function but force it to use the same context (preinitialized registers)? Direct manipulation of control flow ("Jump up 3 call frames immediately")? Easy recursion of anonymous functions? Runtime creation and manipulation of LexPad and LexInfo? Change the current invocant object? Direct manipulation of the list of currently registered exception handlers? Redirect the call chain so we "return" to somewhere besides where we were called from?

That last idea is actually kind of fun, because you can start to think about Subs as being an intermediate step in a jump. "Go here, but do X on the way". I call a Sub, but the call gets intercepted by one of these filter functions and the parameter list is manipulated in a way that is invisible to both the caller and the callee. We can also start to think about PIR-based argument passing and processing. Need argument handling that Parrot doesn't provide natively? Write a small PIR argument preprocessor function and insert it invisibly into every function call!

This Saturday, 2 October, we're going to host a little PCC-related hackathon. A bunch of the Parrot hackers will be around, working on the PCC refactors branch, fixing PCC bugs, and hopefully implementing some long-awaited features. If you're a committer and have some time to hack on Saturday, we can use all the manpower we can get: Debugging and fixing test failures, profiling and optimizing codepaths, triaging and deleting old dead code, writing new tests to prevent regression, testing HLLs, and implementing new features. This will also be a great time for newcomers to get involved with Parrot too: Help out with a concerted development project, meet the developers, start getting your hands on the code, and start learning about one of Parrot's core mechanisms. A good time will be had by all!

Tuesday, September 29, 2009

First Project Challenge Reminder

Ten days ago I posted a challenge for my readers. Here's a quick recap of the two challenge options:
  1. Create a proof-of-concept NCI frame builder using a "real" JIT engine. You can pick anything you want (libJIT, LLVM, GNU Lighting, dynASM, nanoJIT, etc). It must be able to take a function pointer, a string representing the call signature, and proper arguments and no other unnecessary state information or metadata. The frame builder should return the compiled thunk from the JIT engine that can be used to call the target function.
  2. Create a proof-of-concept NCI function caller using pure assembly or C with inline assembly. Your assembly routine should take a signature string, a function pointer, and a list of arguments, and a pointer to a memory location to hold a return value (if any).
This morning, I've received my first submission to challenge #1: an NCI frame builder using libJIT. It's quite impressive, and I'll post more information about the solution and the submitter later, with permission.

I'm going to try to set up an online store, like spreadshirt.com or cafepress.com or something to sell Parrot T-Shirts (all profits directed to the Parrot Foundation, of course). Once I get that set up, I'm going to sweeten the pot: All good unique submissions will receive a free Parrot-branded T-Shirt. You'll have to be willing to give me your shirt size and mailing address, of course. I'll post updates as I get things set up.

Submissions are unique if they solve different problems in different ways. So #1 with libJIT is already done (assuming it passes all my tests). Interested entrants could try #1 with LLVM or GNU Lightning, or any other JIT engine besides libJIT. Entrants could also be attempting #2 on various platforms (x86, x86_64, PPC, SPARC, SPARC64, etc). If I receive two or more submissions of the same type, the better one wins. Preference will probably be given to a patch that applies against Parrot trunk to any other type of solution, but integration with Parrot was never a requirement (just a nice bonus).

I'll make the October release as the deadline. So if you are interested, you have until 20 October to get one in.

Sunday, September 27, 2009

The Architect Model

I first became familiar with the idea of a "Software Architect" after reading Frederick P. Brooks' seminal classic, The Mythical Man Month. I read that book, on recommendation from one of my professors, pretty early in my programming career and it has done quite a lot to direct what I currently am as a programmer. To anybody who hasn't read it yet, I highly recommend it.

The idea behind having a single architect in a system is to maintain integrity and unity of vision, and prevent the dreaded "design by committee" that plagues so many projects. A project, in order to succeed in a sea of thousands of other projects, must do one thing and do it well. There must be name recognition, clear association between product and purpose: If you are looking for a good robust webserver, you think Apache or one of it's relations. If you want a good C# IDE, you think about Visual Studio. If you want a solid dynamic language-agnostic virtual machine, you think Parrot (at least, I hope you do!). These associations pop up because these projects do one thing and do it well, and when you have that one problem you know exactly where to find a solution for it.

Parrot could readily be extended to do more then it currently does: it could be more targeted towards static programming languages like C, C++, Fortran, and Java. Maybe it could support both stack- and register-based operations for easier translation from other virtual machines that are stack-based. Maybe it could natively interpreter CLI and Java bytecode formats, as well as PBC. Maybe we could create a huge general-purpose repository of runtime libraries for all of our users to tap into. Parrot could be made to do a lot of things that it doesn't currently do and isn't intended to do. But adding these things would water down the vision and remove the kinds of opportunities that a focused specialization bring. Do one thing and do it well. Very well. There's no benefit to doing a bunch of things poorly.

Parrot has a chief architect, Allison Randal. In the general sense, her job is to maintain that unified vision, to see things from a much higher level then the code monkeys like me see from our place in the trenches. This is not to say that myself or the many other committers are not empowered to make decisions in implementing the high-level designs, or that we are not empowered to make those designs ourselves. In fact, from what I have seen about Parrot there is no real point in the process where Allison must be involved. I could design a new system from scratch, implement that system, and commit my changes to trunk. Assuming my work is sane and nobody complains about it, that would just be the end.

The real benefit to having a dedicated architect like Allison, or a knowledgeable senior programmer like chromatic, is that we have a great sounding board for ideas. It's not just about getting my code committed to Parrot, it's about having the right code committed. My solution may indeed work for the cases I've considered, but will it be the right thing for Parrot in the future? Will it have to be ripped out or refactored later when new use cases are found? Will people years from now be looking at it and saying "That Whiteknight guy sure wrote some lousy code here!"? Will it be what our end-users need it to be?

I can find a solution, think about a few ideas and--before wasting any effort on a potential implementation--I can run them past Allison to pick the one that is best for the project in the long run. On more then one occasion I've presented her with two decent solutions to a problem and she's countered with a third option that I had never even considered, which is far better. Each time we have a discussion like this I can be more confident that the solution I implement will be a good one, I grow and improve as a developer, and I learn a little bit more about the overall design vision for Parrot. Next time around my work will be more informed.

I've known a lot of programmers, at school and at work, that were very good at what they did but who did not share there knowledge or their expertise. There are strong feelings of ownership and secrecy. Strong undercurrents of competition, including hostile competition, between team members. It's not until you join a project like Parrot that doesn't have these things that you realize how much of a huge waste and mistake they are. The best programmers aren't those who toil in solitude and produce code that "just works". The best programmers are the ones who share their skills and learn freely from the skills of others, and who actively try to improve themselves and their team.

We run into situations on occasion where it starts to feel like "us versus her", the lay-developers versus Architect Allison. I worry about these situations, and I try to avoid or diffuse them where possible. We always want what Allison wants: a better Parrot. I've found in my experience that she usually has the best understanding of what that means. It's better to work with her then it is to work against her, even if both paths lead to committed code and fixed problems.

Three days ago we were talking on IRC about the ongoing PCC refactors, and talk turned to how long we were going to have to wait for Allison to finish them. They have been a long time coming, and other developers are starting to become blocked waiting for these particular improvements to land. This combined with a sense that Allison had ownership of the branch where changes were happening and that she alone really knew what needed to be done to resolve things, created a feeling of helplessness. If you read through that IRC log you'll see what I mean: several very talented, capable developers are taking a very passive wait-and-see attitude towards these necessary refactors, when the team of them combined could likely have all outstanding test failures resolved in days, if not hours. We have an extremely talented and capable team working on Parrot, I really can't stress that point enough.

I decided that we shouldn't be waiting for Allison, we should be helping her. She had a really great design, what was needed were a set of hands to jump in and make it real:

I don't think that only allison can work on it. If we put together a little strike team, I think a group of us could fix it up pretty quickly. ... I say that's the new plan: create a copy of the branch and everybody who is blocking on it can get their coding asses in there to fix it

Allison joined the discussion shortly thereafter and, in characteristic fashion, got things moving:

so, here's the plan, anybody who's interested have a look at the code in the next few days. While I'm on the plane, I'll write up a high-level description of how the new PCC is *supposed* to work. (understanding that seems to be the biggest blocker for people getting involved). I'll prepare a list of tasks that need to be done, so they can be distributed.

On IRC for less then 5 minutes and we have a plan to get the problem solved: Combine Allison's design and know-how with our development team's talent and tuits, and everybody wins. This morning, Allison delivered on her promise. Work is already beginning in earnest following Allison's post.

I like the idea of an architect a lot more then I like the idea of a "Benevolent Dictator for Life". Being too reliant on one person in perpetuity always strikes me as a very fragile way to setup an organization. In contrast, there's no reason to suspect that Parrot would flounder if Allison left. Quite the contrary Parrot has been through several architects and has continued development strongly. There has also been some talk around about setting up other authority positions like sub-architects, lead developers, etc. I like that idea too. A rich meritocracy with people who are known to produce good code and good answers is a good thing. I don't have any particular opinions on which "offices" Parrot should have or who should fill them, but that's just me.

So that's my take on Parrot and the Architect. It's generally a good system and I like it, although we all need to keep in mind that "us with the architect" is better then "us against the architect".

Thursday, September 24, 2009

Version Control Systems

After a discussion earlier this week, I created a poll on parrot.org about version control systems. The question is simple: Which VCS would you prefer Parrot to use? Of course, it's just a non-binding strawpoll, but if there is overwhelming support for one option over the others it might inform and even motivate such a decision later.

My personal opinion is that I would like to migrate Parrot from SVN to Git.

Here's some background: I've never really used Git and don't know a lot about using it. Further, all my distributed development work, on Parrot and on other smaller projects has always used SVN. A few months ago I won a major victory at work when I convinced my boss to use SVN for all our in-house development. I setup and actively maintain the SVN server at work, teach my coworkers to use it and use it better, and am happy with the results overall.

pmichaud said it best the other day when I asked about it on IRC:
When rakudo needed to switch repositories earlier in the year, I put out a call to see whether people thought we should stick with svn versus moving to git. I got a lot of feedback. However, the clinching argument is that *nobody* argued in favor of SVN based on it's merits of being easier to use or having superior technology. The only reason that anybody had with staying with SVN was because that was what we had been using previously. So if it comes down to "influential arguments" as to making one choice versus another, I'd like to see informed reasons for staying with SVN. We clearly have a lot of Parrot developers who can give informed reasons for moving to Git
In the back of my head I am reminded of those grizzled old programmers who don't understand why anybody would ever write anything in a language besides Fortran or Cobol. I remember an argument I read once, back in the days when I was still young and stupid enough to participate in programming language flame wars on anonymous internet message boards, that said: "Perl is written in C, so anything Perl can do C can do but faster". Sure it's faster if you don't count all the extra development and debugging time, if you take the time to properly refactor and optimize your algorithms, and if you aren't worried about problems like leaking memory or security issues. But I digress. C, Fortran, Cobol, and Perl all have their uses (C is great for writing infrastructure like Parrot, Perl is great for most other things, and Fortran and Cobol are great for being rewritten in C or Perl).

SVN is great for some things. It's centralized, and although that brings a certain amount of rigidity it also brings some assurances and simplifications. Sure you can use Git in a way similar to SVN, but you don't need to and that option can be scary for some people. Sometimes you want one repository, strict revision numbering, and an unambiguous current state to the code. Sometimes you want to know that all your code is safe on a redundant SVN server, and people don't have "checked in" changes on their local machines that aren't being included in the nightly tape backups. There are plenty of good reasons for workplaces to use SVN, and I think it's a great choice for many of those situations. It's a particular solution to a particular problem, and it's perfect for some situations.

At work we moved to SVN because I could make arguments, impassioned and influential, that SVN was better then our previous system (externally-hosted VSS). And my arguments were good and true. The migration was not completely pain-free, there are some issues that our team is still working out months after the fact. For instance, the mechanics of branching, tagging, and merging is still foreign to some people who have never had to deal with them before. Of course, having problems using powerful new tools effectively is far better in my opinion then not having those tools at all. We went from a system with absolutely zero branches and tags to having dozens of each. We started with a culture having almost zero code sharing or collaborations between developers, and are gradually learning to embrace these ideas. SVN is the learning tool that makes these new ideas possible.

Git is a little more complicated, if only at the conceptual level. When I first started trying to learn Git I was bombarded by all sorts of terminology, long pages about philosophy, and page after page of very pretty charts and graphs. It was overwhelming and I wrote it off as being more hassle then it was worth. However, since my initial forays into the topic I've learned a lot more about it and have become much more comfortable with the ideas. I find use cases almost every single day where I have problems in SVN that could be easily and elegantly solved if I were using Git. Sometimes I want to make a branch where I can just make a few test commits. If those little "commits" turn into a huge SVN patch, it becomes much less easy to review and apply.

In SVN I follow a few personal rules that I've picked up gradually: Don't make multiple simultaneous branches that touch the same subsystem. Don't rename files in a branch. In fact, try to do as little file renaming as possible. Don't sync up branches with trunk too frequently. Don't let branches live too long without merging. Keep track of all your branching/merging revisions. Very very carefully examine merge diffs, especially if there are any conflicts at all, to make sure things are resolving in a sane way. When making branches, I have a script I follow and I don't go off the script. If too many problems pop up, I'll create a second branch or even abandon branches entirely and use a flat diff instead. It's like any tool, you have to know when to use it and how to use it properly when you do.

So why do I like Git? First off, it's flashy and new. As Austin Hastings mentioned on IRC:

And FWIW, switching to a new VC system is always a fashion show. You should expect mad enthusiasm for whatever new thing comes along after git, too.
And he's right, to a point. New tools are good if they fix the problems present in old tools, add new features, and make life easier. Fewer problems and more features always make people excited, and for good reason. Nobody is going to claim that SVN is the pinnacle of VCS technology. Git is an improvement over SVN, and there will be systems in the future that improve upon Git as well. It's the same exact way that we want programming language developers to move to Parrot instead of brewing their own interpreters from the ground-up: Parrot represents a better system that will be easier to use then the old method. We have things like PCT that makes compiler designer far easier then it would be even on other similarly-capable virtual machines. It's the same as how people switched to C++ 20 years ago, switched to Java 10 years ago, and switched to C# 2 years ago. You're never done learning, and new tools will always come along that make developers more productive.

People worry about the learning curve for Git, or the time it's going to take to get our developers working with the new tool. Of course, you rarely ever hear people complain about the learning curve for SVN: It's not just the time it takes to learn the simple commands like commit, checkout and update, but the time it takes to learn about all the nuances and best practices. SVN makes it exceptionally easy in some cases to dig yourself into a hole that is difficult to climb out of. From what I have seen of Git, this same thing is not true. The easy cases stay easy, and the hardest cases become possible.

Another great quote from pmichaud on IRC illustrates this point:

I think we have a significant number of parrot devs who have said that they find git merging much easier than svn. I don't know why we need more examples than that. Either that or we simply believe our developers have no clue about what they're talking about. afaik, none of the people who say "git merging is easier" are doing so based on speculation. It's all from hard experience in doing merges in both svn and git.
I won't even claim that branching and merging is easier in Git. I've never done it. However, I have heard lots of other people say that and I am very optimistic that this is the case. If it makes the hard case less hard, that's really all I need to hear.

In summary, here are my arguments for moving to Git: It represents an improvement in a critical development tool that will increase productivity for our developers. Upgrading tools and methodologies is a natural and healthy thing for projects to do. It keeps best practices current, keeps developers productive and happy, and keeps everything moving forward. Git is not the be-all end-all of version control, but being in a regular habit of upgrading and improving our toolset is a good thing for Parrot or any software development team. SVN is a great tool in certain cases, and has lots of merits. Supporting a large distributed development team working in large numbers of branches like we have been doing recently in Parrot is not one of those cases though.

Note: Parrot foundation members should be able to vote in the straw poll on the website. If you are a member but don't have the proper permissions on the website, talk to myself or one of the other admins to make sure you get properly flagged so you can participate. Keep in mind that this is only an informal straw poll and that the results of it are not binding in any way. It's just being used as a tool to measure the general opinions of the development community.

Saturday, September 19, 2009

JIT: First Project Challenge

Darbelo merged in a branch that finally gets rid of the old JIT and PIC systems. However one vestige of the JIT system still remains: the NCI frame builder. The NCI frame builder is used to create a call frame for interacting with an arbitrary library function. As I mentioned in my earlier post on Stack Frames, to call a function that follows the standard conventions, we push the arguments onto the system stack, use the call opcode to jump to the subroutine, and then when control flow returns we extract the return value from the appropriate register.

The problem is that the C programming language doesn't offer any facilities for manually constructing a call frame. In pure C, you cannot push an arbitrary value onto the stack, perform a non-local jump to a subroutine label, specify which register a value is stored in, etc. You can accomplish some of this with inlined assembly code, but if you're playing with the stack you run the risk of corrupting your program and crashing your computer. Keep in mind that local variables to a function are typically accessed as an offset from the esp register (on x86 anyway), so pushing things onto the stack will change the value of that register, which in turn will make constant offsets from that location point to different values.

When we are talking about calling arbitrary library functions from Parrot, there are three approaches:
  1. Generate a list of function calling "thunks" from a predefined list of function signatures. You end up with hundreds of such thunks but still can't interact with some functions without adding a signature and recompiling Parrot. Also, all those hundreds of functions need to be loaded into memory when Parrot starts, creating a larger memory footprint and slower startup times.
  2. Create a function in pure assembly that takes a signature string and manually constructs a call frame. In assembly we can interact with the system stack and processor registers directly. There are two problems with this: Portability, because we would need to write a new assembly routine for each combination of platform and assembler. Also, we couldn't cache the results of the generated call frame, every call to the library function would have to manually build the frame again.
  3. Use a JIT solution. The current frame builder uses Parrot's old JIT framework to build the frame code in a block of memory, but only works on x86. A "real" solution here would use a JIT package like LLVM or libJIT to generate these frames. This has the benefit that we can cache the generated thunks. We generate them lazily when we need them and cache them for when we need them again.
Option #3 is probably what we are going to end up with, although being able to sanely fall back to Option#2 when a JIT library isn't available might be a good idea, especially for those rare platforms that aren't supported by our JIT engine.

This in mind, I have two challenges for readers who have some spare time:
  1. Create a proof-of-concept NCI frame builder using a "real" JIT engine. You can pick anything you want (libJIT, LLVM, GNU Lighting, dynASM, nanoJIT, etc). It must be able to take a function pointer, a string representing the call signature, and proper arguments and no other unnecessary state information or metadata. The frame builder should return the compiled thunk from the JIT engine that can be used to call the target function.
  2. Create a proof-of-concept NCI function caller using pure assembly or C with inline assembly. Your assembly routine should take a signature string, a function pointer, and a list of arguments, and a pointer to a memory location to hold a return value (if any).
Winning solutions don't need to be integrated into Parrot or use Parrot-style signature strings. Solutions don't need to be large or involved either, just a simple demonstration of the mechanisms needed to call arbitrary functions on your target platform. Bonus points for providing a solution that is related to what Parrot needs, and definite bonus points for targetting an uncommon platform (x86+GCC for instance is very common).

There is no real prize for "winning", but I'll post good submissions here to my blog, publically talk about how awesome you are, and maybe try to get your solution added to Parrot in some way and in some form.

Wednesday, September 16, 2009

ParrotTheory: The C System Stack

I talked a lot about control flow on the C system stack in my post about the inferior runloops problem in Parrot. Since not everybody is familiar with all the mechanics of the platform ABI, I figured I would write up a theory post about how all that works.

C and Assembly Code

The compilation process for a language like C is very straight-forward: Write C code, invoke compiler, convert code to machine code, execute it. Ultimately for a program written in C code or any language at all for that matter to execute, it must be converted to a stream of machine code instructions. The hardware simply doesn't understand anything else. Any C code can be translated, by hand if necessary into an equivalent sequence of assembly instructions, which in turn can be translated (again, you can do this by hand) into machine code. Assembly instructions and machine code tend to have a one-to-one relationship, but C code has a one-to-many relationship with assembly. It's these variants that an optimizer searches for, finding faster ways to perform the same operations. It is also possible to convert a listing of assembly code into equivalent C code. This is called "decompilation", and is an important technique in the field of reverse engineering.

C code and assembly can be converted back and forth, and are generally equivalent. There are some operations in assembly that cannot be strictly converted to C code, but that's a subset we are able to ignore for now.

Function Calls

Look now at a piece of C code that we should all be familiar with, a function call:

x = Foo(a, b, c)

This line of code translates, more or less, into this sequence of x86 assembly:

mov eax, 'c'
push eax
mov eax, 'b'
push eax
mov eax 'a'
push eax
call _Foo
mov 'x', eax
sub esp, 12

I'm obviously simplifying in some places. This is probably not what your compiler would actually generate, but is not entirely dissimilar to what you would get if variables a, b, c, and x were all declared volatile, and if you didn't optimize. To make a function call, the arguments are all pushed onto the system stack (usually in left-to-right order) and the function is called. The 'call' operation does two things: it pushes a return address onto the stack and then jumps to the function Foo. When the function returns, the return value is stored in the eax register (except floating-point values, which are different and beyond the scope of this post). The important point to take away from this is that the arguments and the function return value are all on the system stack in a very particular order when the function call was made. Notice also that we need to "clean" the stack after the function call: We move the stack pointer back to where it was before pushing the three arguments. The return address is popped automatically in the ret command that we will see next.

As a general rule, the calling function should have absolutely no knowledge or interaction with the callee's stack frame or anything below it. Likewise, the callee should have no knowledge of the caller's stack frame or anything above it.

Now, let's look at the guts of the function Foo. Here's some C code to implement it:

int Foo(int a, int b, int c)
{
return a + b + c;
}

Converting to assembly, this is:

_Foo:
mov eax, 0
mov ebx, [esp+4]
add eax, ebx
mov ebx, [esp+8]
add eax, ebx
mov ebx, [esp+12]
add eax, ebx
ret

Again, not what you are likely to actually see as the output from a good compiler. I've intentionally omitted some details for simplicity. The values a, b and c are stored on the stack, so internally to the function we read them off the stack using offsets of the esp stack register. We cannot pop the values here because popping would move the return address (it's on the top of the stack, remember).

The Stack

This is all well and good for the simple example, but what if we want to have local variables in the function? And more importantly, how do we make local variables in a way that we can recurse properly? To do this, we need to create a stack frame. But before we talk about frames, let's talk about the stack.

The stack is just a contiguous region of memory that is pointed to by the stack register esp. The stack grows "down", so it starts at a high value in memory and each additional item pushed goes to the next lowest place on the stack. The "push eax" instruction does this, essentially:

mov [esp], eax
sub esp, 4

And the "pop eax" instruction reverses this process:

add esp, 4
mov eax, [esp]

The biggest difference is that the push and pop instructions execute much more quickly then the add/mov combination does otherwise.

Stack Frames

With this knowledge of the stack, we can now talk about local variables in a function. Let's rewrite Foo to have some local temporary variables:

int Foo(int a, int b, int c)
{
int x, y;
x = a + b;
y = x + c;
return y;
}

Converting to assembly now, we create space on the stack for x and y:

_Foo:
push ebp ; Save the old value of ebp
mov ebp, esp ; ebp points to the stack
sub esp, 8 ; make space on the stack for x and y
...
mov esp, ebp
pop ebp
ret

The code I've shown you here is nearly identical for every function created in C. These are the standard entry and exit sequences for x86 machines. The old value of ebp is pushed onto the stack to save it. This is so we can recurse easily without having to worry about the caller's stack. ebp is then set to point to esp, the "top" of the stack. Then we decrease esp, creating space on the stack to store local variables without having to manually "push" space for each one individually. From this point forward we can address all function arguments as offsets from ebp, and all function variables as offsets from esp. This moving stack frame allows us to recurse the function with no limitations besides stack space. Inside the function this is where the variables are stored. Remember that in x86 assembly, the [] brackets are pointer dereferences like "prefix:*" in C.
  • old ebp = [ebp]
  • return value = [ebp + 4]
  • a = [ebp + 8]
  • b = [ebp + 12]
  • c = [ebp + 16]
  • x = [esp + 0]
  • y = [esp + 4]
When we return, we move esp back to where ebp is, pop the stored value of ebp off the stack (for the calling function), and then pop the return address off the stack and jump to it. In the calling function we "add esp 12" to clean up the arguments from the stack and continue execution from there.

It's worth mentioning that if we create storage on the stack, such as a string or array, and write past the bounds of that array, we can accidentally overwrite things like the stored value of ebp or even the return address. A stack overflow attack does this exact thing to write a specific value to the return address, which causes execution to "return" to somebody's malicious code somewhere. The C standard library string.h is notorious for allowing these kinds of things to happen.

When I talk about having functions "on the stack", what I mean is that we have stack frames for those functions on the stack above where we are currently executing.

Backtraces

If we look at a backtrace in a debugger like GDB, we see a listing of functions representing a call sequence. This sequence can be found by tracing the stack: starting at the current function and iterating over the stack. GDB can find the return address at [ebp + 4], follow that to the caller function, and repeat the process until we reach main(). Combine finding addresses with finding metadata about the function name (in terms of debugging symbols) and we can produce a proper backtrace. Commands in GDB follow a lot of stack terminology: "frame" shows us the current frame, "up" moves us up to the previous stack frame and "down" moves us down to the next stack frame.

Continuation Passing Style

Parrot doesn't use stacks for it's control flow. Instead, it uses continuations. It's beyond the scope of this post to really talk about the differences and nuances of each. Instead of being stored on the system stack, continuation objects form a linked list (actually, more of a tree or graph) in the heap. By doing this, continuations aren't automatically cleaned up when they are invoked, and we can reuse them for things like loops, coroutines, closures, and all sorts of control flow optimizations and other fun things.

Conclusion

That's a quick overview of the C system stack. For more information, I've contributed to a book on Wikibooks called x86 Disassembly that talks more about the translation from assembly language back into C code. It covers function calls and stack frames in far more detail then I could in a single blog post. Of course, I would be happy to post more if people still have questions about it all.

In Parrot, it's difficult to think of control flow as a unified thing. We either think of PIR control flow or C control flow, both of which are very different from the other! We can, if we need to, convert PIR down to C code, and from there convert to assembly or machine code too. It's all equivalent and interrelated,

Tuesday, September 15, 2009

Y Combinator in Pure PIR

Tonight I took the time to do something I've wanted to do for a while: Write a Y-combinator routine in pure PIR code. I was inspired by Aristotle's excellent blog post about the topic, and the ensuing comments from his readers. I'll direct people to him for more information about the function and it's derivation.

So, without further adieu, here is the Y combinator, defined in terms of the U combinator:

# U-Combinator
.sub 'U'
.param pmc f
.lex '$f', f
.const 'Sub' U_inner = '_U_inner'
$P0 = newclosure U_inner
.return($P0)
.end

.sub '_U_inner' :outer('U')
.param pmc args :slurpy
$P0 = find_lex '$f'
$P1 = $P0($P0, args :flat)
.return($P1)
.end

# Y-Combinator, defined in terms of the U-Combinator
.sub 'Y'
.param pmc f
.lex '$f', f
.const 'Sub' Y_inner_1 = '_Y_inner_1'
$P0 = 'U'(Y_inner_1)
$P1 = $P0()
.return($P1)
.end

.sub '_Y_inner_1' :outer('Y')
.param pmc h
.lex '$h', h
.const 'Sub' Y_inner_2 = '_Y_inner_2'
$P0 = newclosure Y_inner_2
.return($P0)
.end

.sub '_Y_inner_2' :outer('_Y_inner_1')
.param pmc args :slurpy
.local pmc f
.local pmc h
f = find_lex '$f'
h = find_lex '$h'
$P0 = 'U'(h)
$P1 = $P0()
$P2 = f($P1)
$P3 = $P2(args)
.return($P3)
.end

And here is a little driver program that uses it to calculate the factorial of 10:

.sub 'main' :main
.const 'Sub' wrapper = 'fact_wrapper'
.local pmc x
x = box 10
$P0 = 'Y'(wrapper)
$P1 = $P0(x)
print "Answer: "
say $P1
.end

.sub 'fact_wrapper'
.param pmc f
.lex '$f', f
.const 'Sub' fact = 'factorial'
$P0 = newclosure fact
.return($P0)
.end

.sub 'factorial' :outer('fact_wrapper')
.param pmc whatev
.local pmc n
n = shift whatev
print "Calculating factorial of "
say n
if n >= 2 goto n_is_large
.return(1)

n_is_large:
.local pmc f
f = find_lex '$f'
.local pmc n_minus_one
n_minus_one = n - 1
$P0 = f(n_minus_one)
$P1 = $P0 * n
.return($P1)
.end


It was quite a fun little exercise, and a great workout for the newclosure opcode, which I'm not sure is well-tested elsewhere. I may add this, or something like it, to the test suite for Parrot to run.

Monday, September 14, 2009

Exception Handlers Done Right

JIT has been all the rage lately, but it isn't the only area of Parrot that needs work. Not by a long shot. One system that is slowly become a pain point is the exceptions system. This is due in no small part to the inferior runloops problem.

Inferior runloops is a problem which I'm not even certain I understand in it's entirety. I do know about certain specific facets of it, and I do understand the mechanism that causes the problem. Let's look at one specific control flow:

  1. We start Parrot. This creates the interpreter and the dynamic context. We call the runloop function to begin executing code.
  2. We create a new class Foo and an object Bar of type Foo. Foo has several VTABLE overrides defined to implement custom behavior.
  3. We perform almost any operation on Foo. Since all operations on PMCs are defined through VTABLES, we are likely to be calling a VTABLE to implement that operation.
  4. The op is a C function, which calls the Object VTABLE (another C function). This searches for an override, finds it, and executes it.
  5. Executing the VTABLE override cannot be done in the current runloop. This is so for several reasons: We're several functions down in the C stack and would need to jump back to it, we very likely require a return value for additional processing somewhere between the runloop and the call to the override, etc. So, we call a new runloop function which is created further down on the C system stack.
  6. The override throws an exception, but doesn't itself define any handlers. The scheduler does find an exception that has been defined previously in the outer runloop, and invokes that.
  7. Exception handlers are just labels, so we handle the exception and continue execution like normal. Except we are now in the inner-runloop further down the C stack, but are executing code that had been being executed by the outer runloop.
  8. The program executes like normal and reaches the end. The inner runloop returns, the VTABLE returns, the op returns, and we end up in the outer runloop again
  9. The outer runloop continues execution, but the program has already ended. The context is completely inconsistent. Bad things happen
The big problem here is that we are essentially executing a single stream of instructions in two separate functions on the C stack, functions which aren't sharing data about current status between each other. This is exacerbated by the fact that exception handlers are just labels in existing functions, and there is no notion of when a handler has "completed"; control flow continues from that label like normal until the end of the program.

If exception handlers were separate subroutines instead of simply being labels, they would have a finite ending point, and control flow would not just "continue" past the point when the exception was handled. When the handler exits, control flow can move to anywhere it needs to move which will likely be some kind of resume continuation, or the next exception handler if the exception was rethrown.

Another possible solution, which is also worth thinking about here, is that if we could avoid recursing runloops, we could avoid this issue entirely. The problem really stems from recursion, and no recursion means no problem. This is the kind of thing that we could achieve with Lorito, eventually. But, that's another topic for another post.

This problem has been a source of issues for Rakudo recently. At least, I am mostly certain that this is the cause. A hallmark sign that this is the root cause of a problem is that we start getting weird segfaults after the program appears to have run to completion. This is because the program has run to completion in the inner runloop, but then execution continues in the outer runloop which is now inconsistent and causes segfaults. The program appears to run to completion, throws a segfault at the end, and the back trace seems to show more things happening in the main runloop after the program has ended.

I sent a quick email to Allison the other day where I outlined a handful of potential solutions--ranging from quick and dirty to more comprehensive--and asking for an opinion of which direction we should be moving in to get this resolved once and for all. There was some back-and-forth between her and chromatic, who have apparently put some significant thought into it already. The way forward, I think, is to convert all exception handlers to be subroutines instead of labels (the latter will be deprecated, probably following 2.0), which should resolve this problem once and for all. It will be a significant effort to get all our HLLs to use the new exception handler type, and we have some issues of syntax to sort out, but it's going to be worthwhile in the long run to do it.

NotFound posted a short-term fix that should get everybody working through the release: Add an explicit "exit 0" command to the end of your main functions. This will force Parrot to exit when the inner runloop reaches the end of the program, instead of unwinding the stack and trying to continue execution in the outer runloop.

I'll definitely post more information about all this as work progresses.

Sunday, September 13, 2009

First Steps on JIT Overhaul

Allison's email was the start of something big, and I have been absolutely excited by the prospect of a proper JIT system since she sent it. It's going to be a lot of work and will have a dramatic and overarching impact on our entire codebase, but it will be well worth it. There are a few things I especially like about this plan: There are a lot of options so we can test what does and does not work as we go. We can break it up into stages so huge changes aren't happening in disruptive chunks. We can get started immediately. We can parallelize several tasks so many people can be working towards the ultimate goal in many ways.

Mostly what I am excited about is that the rest of the developer team has been so excited about it as well. bacek has started moving through the ticket queue closing tickets relating to our current JIT system that won't ever be fixed now. moritz has been talking to developers on #llvm. People have started adding info to the wiki and adding their names to the list of tasks, and giving lots of good feedback as they go!

In the short term, possibly immediately following the 1.6.0 release, the parrot command line interface is going to be tweaked a little bit to hide the current JIT system. The command line argument --runcore=jit is going to be redirected so that it invokes the fastcore instead of the JIT core. What this does is removes the unworking underlying system but keeps the user interface the same. In the end, --runcore=JIT will not only continue working for users that already have it but will actually start working on platforms that have never had it. We're lying because it's not really executing a JIT engine underneith, but the deprecation policy doesn't say anywhere that we can't lie a little bit.

With the commandline papered over, we can get started on the work of removing the guts of the JIT system. This is going to be both interesting and fun work, because we're going to be deleting lots of evil code, but we're also going to need to preserve some functions for the NCI thunk generation. I think [hope] that once we start cleaning things out that we will be able to radically simplify and improve the NCI thunk generator.

LLVM seems like a really cool JIT solution for us to use, and it actually offers a lot more functionality then we will need (or even want). One big benefit is that LLVM operates on a low-level platform-independent intermediate representation (IR), which should be easy to translate to from Parrot ops. Some of our ops, especially the arithmetic ones, will have a very direct translation indeed. We're probably going to want to abstract away the details and create macros or even an improved syntax parser to take away the rough edges so we aren't programming in assembly, but that's something we can worry about later.

We're going to need the configuration system to detect LLVM. Nothing more to say about this one, we've got a great configuration system and this should be no issue.

These other steps out of the way, we are going to need to modify parrot to build the subroutines from LLVM IR fragments, then use LLVM to compile those subs into machine code at runtime and execute them. This step is the biggest one where everything comes together, but luckily we don't need to worry about all the details just yet.

Wednesday, September 9, 2009

The JIT Plan

A big thing has happened today with JIT: Allison has taken a look at all the information and laid out a plan for replacing Parrot's ailing JIT system. Since she said it better then I ever have, I'll just post her email here for your viewing enjoyment:

The current JIT will be disabled, with Parrot set to run the default 'fast' core when the 'jit' core is selected. This change can occur immediately following the 1.6 release. Between 1.6 and 1.7, code supporting the current JIT can be removed. A deprecation notice will go in before 2.0, eliminating the 'jit' runcore option. (There will be other ways of enabling and disabling JIT functionality later, but it won't be a runcore. For one thing, enabling the JIT is likely to require special compilation, see later comments.)

The basic requirement for the replacement JIT is the ability to generate machine code on the fly at runtime for a code object (Sub, Method, NCI, etc), so that any subsequent executions of that code object run the machine code directly. It must support all of Parrot's target platforms.

LLVM provides a set of tools that look promising for allowing us to quickly develop a JIT, taking advantage of the platforms they already support. (Note that LLVM currently has limited support for Windows, so it's not quite there on *all* our target platforms.) The plan is to develop a rapid prototype of an LLVM-based JIT, to see if it might work as the primary Parrot JIT, or perhaps as one of several JIT strategies.

The prototype will implement a subset of the core Parrot ops in a lightweight language ("Lorito", an extension of what we've already been talking about for that language). For the sake of the prototype, the Lorito compiler will output LLVM's Low-level Intermediate Representation (LIR), but the language will be developed so that it's general enough that it could be compiled to various different output formats (including C).

The old JIT compiled each op down to a separate chunk of machine code. The new JIT would potentially be able to JIT at the level of a single opcode (since the definitions are there), but the greater speed gains will come from JITing entire subs (where the opcode definitions are building blocks for larger units to compile, rather than compiled individually), so that's where the prototype will focus.

The JIT is not a runcore (it's a dynamic compilation strategy), but it also doesn't necessarily have to be compatible with every runcore variant we've developed.

Using LLVM for the JIT (or any external JIT library) will add an optional dependency. A Parrot built without LLVM will simply disable the JIT features.

I'll post more indepth information about all this in the days ahead. Also, make sure to keep up with the planning page on the wiki, where things are starting to come together.

Refactoring Parrot Startup

jrtayloriv earned his commit-bit this week, no doubt on the back of his wonderful work with the patch that lead to the kill_parrot_cont branch (which merged a few days ago). First thing that he did was dive into the new gc-refactor branch which is doing something that I've wanted for a while: Making GC pluggable. He's doing a lot of great work already, and I look forward to following along as he makes more progress. I'll talk more about this branch and some other GC ideas that have been floating around in another post.

As part of his work, jrtayloriv wants to add the ability to specify the GC core on the commandline, in the same way that you can currently specify the runcore. However, while looking into this, he noticed a problem: All command line argument processing happens inside IMCC, which is called after the interpreter is initialized. Since the GC is initialized when the interpreter is, we've essentially created the GC core before we've read the command-line options to set it. This is bad.

I've mentioned before that the current startup situation is really a mess. We create an interpreter, call into IMCC, and all program loading and execution happens from there. This is backwards and lousy. We shouldn't be delegating all operation to IMCC, especially when it's on the chopping block in the long term. At the very least we are going to eventually replace IMCC with PIRC. A far better solution would involve a pluggable parser situation, but that's far beyond the scope of this blog post. Let's instead talk about things that need to change in the short term.

cotto posted a patch a few days ago to add a function to the embedding interface that would parse through the program's argv array and convert it into a ResizableStringArray. I can't find the link right now, but essentially he iterated over the argv array and pushed all the strings onto a ResizableStringArray. This would be ideal, because Parrot can work very easily with these PMC objects. It would also open up a lot of cool opportunities to access these options from PIR. However, this raises the immediate problem that we can't allocate a PMC from the GC that contains options for the GC, until the GC is initialized. We could maybe create the PMC on the stack, but then we would need to create all the STRING objects that go into that array there too, and that's a huge problem. We don't want to be allocating all these things using malloc, for certain.

Let's ignore this particular chicken-and-egg problem for a little while. Instead let's take a look at what Parrot should be doing. On program startup we parse through the incoming arguments, separating out the args that need to go to IMCC, the args that go to the interpreter initialization, and the args that need to go to the executing PIR/PBC program. This needs to happen first, before we create anything. First thing we create after the args are sorted out is the interpreter.

The next steps are more open to interpretation. What I would like, ideally, would be to not call IMCC immediately, but instead enter into the runloop immediately and call the PIR/PASM compreg from there. Notice from some of my descriptions below that we will want to expand the capabilities of the PIR/PASM compreg to handle what we want to add. IMCC, instead of executing the subs it compiles directly, returns a reference to them back to the runloop which invokes that and continues execution from there. No recursion, no nonsense. We do run into a problem when dealing with special subs, like :postcomp, :immediate, :load and other types; but these aren't insurmountable. What we can do to get past this is to integrate the scheduler to the process. The IMCC compreg compiles down all it's subroutines, adding any :immediate, :load or :init subs, depending on the command-line options, to the scheduler. We're obviously going to run into problems trying to do this with :immediate subs, but I don't think those problems are insurmountable either.

Since we enter into the runloop immediately, we need a stub program to run that will get the ball rolling. Here is what I think it could be:

.sub _internal_main
.param string type
.param string filename
.param pmc args :slurpy
$P0 = compreg type
$P1 = $P0.'compile_main'(filename)
execute_immediate_subs $P0
$P1()
.end

This little bootstrapped entry point function does three things: it finds the appropriate compreg to use to execute the program, it compiles down the file which adds some subs to the scheduler and returns :main, it executes all the :immediate, :init, and/or :load subs (using a hypothetical new opcode), and then it invokes the :main function to get the program started. Initially we would be able to support PASM, PIR, and PBC compregs (the last of which would need to be written for this purpose), but eventually we could be including other languages as well and calling them directly. It's worthwhile at this juncture, although not something I want to talk about in depth yet, to consider that Parrot may be treating a higher-level language then PIR or PASM as it's default native language. Another post for another time.

A second option, which I like slightly less but is probably easier to do, is to call into IMCC before entering the runloop and having IMCC return the starting sub to invoke which is then passed to the runloop. I don't have a strong aversion to this idea, but there are a few reasons why I would prefer to get away from it (and I'll be happy to share those with anybody who is interested).

So these are some of my long-term ideas about how Parrot startup should happen. There are obviously some issues to work out here, and I would love to hear some feedback about these things. I don't know when I would have time to work on this, but I would love to try and squeeze it in before or shortly after 2.0.

Saturday, September 5, 2009

ParrotTheory: JIT

Lots of talk about JIT recently, so it might be a good idea to talk about what JIT is, what it does, and why it's important. JIT, which stands for "just in time", is a compilation strategy typically used by interpreted languages and virtual machines. It can be used to provide major performance increases for some execution environments, and can provide other benefits as well.

Compare and Contrast

Let's take a look at how statically compiled programs work. We write source code in our favorite language. We call the compiler which converts our code, through a series of transformations and optimizations, into a target language. For the purposes of this example, the "target language" is native machine code. Finally, we execute the native machine code. Everybody knows this pattern, it is nearly as old as computing itself. Benefits: Great performance of code at runtime. Drawbacks: long compilation time, compilation produces platform-specific machine code. Code must be recompiled after changes.

Next consider the very far end of the spectrum: interpreted code. We write code in our favorite language and pass it to an interpreter program. The interpreter program reads the instructions in the code and performs the necessary actions, without converting anything to machine code. However, it may convert the code first to a platform-agnostic bytecode format and then execute that. We save time up front by not compiling all the way down to machine code, but we lose performance at runtime. Our program isn't moving data around, it's telling the interpreter to move the data around for us. That extra "telling the interpreter" step can be relatively expensive. Benefits: No compilation step. Fast turn around between code edits and code execution. Cross-platform, works anywhere we have an interpreter. Drawbacks: Slow runtime performance.

Now look at a middle ground: JIT. We write code in our favorite language and pass it to the interpreter. The interpreter reads the instructions and converts the code to some kind of intermediate representation. As it executes the code, it converts some (or all) of it into platform-specific machine code. Code that has already been converted can be cached for later use for additional speed improvements. Benefits: Fast turn around between code edits and code execution. Cross-platform, works anywhere we have a JIT-enabled interpreter. Fast execution speed of JIT'd code. JIT'ing bytecode to machine code is faster then compiling source code to machine code (we've already done significant work in the compilation to bytecode, which can be cached to disk and loaded later). Drawbacks: additional startup overhead to JIT code at runtime. It might not be beneficial to JIT all code, so some bytecode will be interpreted and we will need to switch contexts to execute it.

These are all some lousy generalizations, but they give an idea of the differences between the different approaches. JIT gives us the portability benefits of the interpreted languages, but gives us a performance boost that is closer to that of statically compiled programs. Keep in mind that the baggage of some languages (runtime libraries, semantics) may make some languages faster or slower to execute then others by nature. Dynamic languages, for example, will tend to have some performance penalties because of things like dynamic dispatch. Combine that with the fact that many dynamic languages are interpreted instead of statically compiled, and you can start to see a relatively large performance gap that becomes misattributed to a fundamental fault with dynamic languages.

JIT Performance Metrics


When talking about JIT there are a few performance metrics that need to be taken under consideration: Speed of code generation and speed of the executed code. It stands to reason that the more time we spend optimizing, the faster the generated code will be. Alternatively, the faster we produce code, the less optimized it will be. It does tend to be a bit of a tradeoff, and the onus is usually placed on the user to determine where in the spectrum they want program performance to be. Short-lived programs will waste more time optimizing then they would executing to completion. Whereas programs that will run for a long time will benefit by taking a little more time up front to optimize.

Trace-Based JIT

While the user is typically tasked with selecting how agressively they want to optimize, there are ways for the computer to make that determination automatically. Once great example of this is in Trace-Based JIT. In a Trace-Based JIT (TBJIT) system, the interpreter begins interpreting code immediately without invoking the JIT system. As it executes, it records individual control flow paths called "traces". When it determines that a particular trace is "hot", or commonly executed, it passes that trace to the JIT for optimization and conversion down into machine code. Any time thereafter that we need to execute the hot trace, we call the cached, optimized machine code version instead.

Trace-based JIT is a nice middle ground because the computer determines which code would benefit the most from JIT, and only spends the overhead compiling code when it knows it will get a good return on investment for it. Only compile the code when the time it takes to compile will be outweighed by the long term performance improvement.

It is my sincere hope that Parrot can integrate a trace-based JIT engine eventually, because it would be a big boon for our overall performance.

Ahead Of Time (AOT) Compilers


Another variation on the theme are called Ahead-Of-Time compilers (AOT). AOT is like JIT, where we compile an interpreters LIR into machine code. The difference being that we do this before starting execution. JIT brings an overhead performance penalty because we are doing the machine code translation at runtime. In AOT, we do the compilation in a separate step entirely, so we can load in pure machine code at runtime with no compilation overhead.

Ideally in Parrot we are going to want to have some sort of AOT option, so users can compile code down to binary executables ahead of time to be executed.

Conclusion: Parrot's Situation

JIT is a very interesting subject, and can provide some significant benefits for Parrot. We gain two potential speedups from JIT (and additional speedups from AOT as well, if we have it): First, we reduce op dispatch cost by converting our loop and dispatch mechanisms with a string of raw machine code. Second, we gain the ability to perform optimization stages on the code as we generate it. However, these two things might be limited in effectiveness by Parrot's current architecture.

In Parrot, some of our ops are relatively large and implement some complex algorithms. The more complicated ops are, more often control flow is located inside them and the less often control flow is involved in transitioning between them. If a relatively small amount of our time is spent on op dispatch, it's not going to provide us with much benefit to optimize dispatch. In other words, if we spend 5% of our time doing dispatch, the most we can possibly improve performance with a new dispatch mechanism would be 5%.

Of course by that same token, if the ops are complicated and opaque to the JIT engine, it won't be able to get information about the internal structure and perform any meaningful optimizations on it. This is why a solution based on a simple, atomic op set like Lorito is so attractive: Because we gain the ability to really start optimizing all the way down to the metal.

Friday, September 4, 2009

JIT Redesign

The time has come. Parrot's JIT system is now in my crosshairs and phasers are set to kill.

bacek has spent a lot of his time in the past two weeks working on the Context PMC stuff. The last thing he was blocking on was a failure in the JIT system that only appeared when "make testj" was run. I, of course, couldn't even observe this failure because Parrot doesn't have JIT support on x86_64. The root problem for him was that the JIT system has a lot of hardcoded knowledge about the shape of some of Parrot's core data structures, and any changes to those structures breaks what JIT thinks it knows. In response to this problem, and some frustrations I have been building up over the past few months, I sent a mail to the list suggesting we deprecate the current JIT system and schedule it for removal in 2.0. Response there was mostly positive, but the ball really got moving when I mentioned the issue in this week's #parrotsketch meeting.

Why do I want to deprecate JIT? Isn't JIT the new hotness for virtual machines and interpreters? Isn't it the savior of performance for dynamic languages? The short answer is that I want to deprecate our current JIT implementation and replace it with something even better.

What JIT does is this: It takes information about the instructions to execute from the parser. This is going to be the compiler's Low-level Intermediate Representation (LIR). Parrot's LIR is the compiled bytecode that we are supposed to execute. JIT takes this information and converts it to machine code on the fly, so that it can be executed directly by the processor without need to dispatch to op routines. In theory what this can do is significantly reduce the overhead of opcode dispatch, by making control flow more "natural" to the machine, which in turn helps with branch prediction and caching. If you combine this with the ability of some JIT engines to optimize the output machine code agressively, you can end up with some major performance savings with a good JIT. I'll discuss the workings of JIT in more detail later this weekend.

In short, Parrot wants JIT. Parrot needs JIT. We simply aren't going to be a viable alternative to other VMs without it in the long term. Especially not when you start looking at some of the amazing performance improvements VMs have been making recently directly because of their JITs.

Parrot's current JIT just doesn't fit the bill though. It suffers from a number of terrible problems, which I enumerated to the list also:
  1. It doesn't really provide much performance improvement for most programs. It also doesn't have much opportunity to perform any optimizations at the machine-code level.
  2. It is too closely tied to Parrot's core, breaking all sorts of encapsulation barriers. As I said earlier, the JIT system needs to mirror certain algorithms and data structures used in Parrot core, which means a change made in the one needs to be faithfully preserved in the other. When we can't do that because none of our current developers know the system well enough we end up with a huge decrease in development momentum. Of course an argument could be made here that more people need to learn the JIT system.
  3. Parrot's JIT system is very platform specific. There simply is no implementation on most systems (only x86 and PPC have it), and there is no easy way to share code between platforms to give new platforms a head start. If I want to write a JIT for a different system (such as amd64 for example), I basically need to start completely from scratch.
  4. It is an absolute mess, and a maintainability nightmare. Nobody really knows what all it does or how it all works. It's also not documented well enough to bring any new developers up to speed on it. Top this all off with the overly complicated way that it is written. We simply can't make any improvements on the code, not without a herculean effort that honestly isn't worth the time.
  5. It is nowhere near some of the other existing JIT engines in terms of usability, quality of generated code or capabilities. libJIT, nanoJIT and LLVM have entire dedicated teams working on them, we aren't ever going to match what they have without a similar expenditure of time and effort.
So I made the request that we deprecate the system. But not just that, I suggested that we start serious planning for a replacement. And I mean serious. I've been beating the pavement for the last two days, tracking down developers and asking questions. Some of the fruits of my labor have already been added to the wiki. Here are some ideas that I have about how our future JIT system could work (I'll explain this all in more detail later). Notice that this is not an order of tasks, just a logical outline of the system I am envisioning:
  1. We rewrite Parrot's PASM ops in terms of a new LIR. This could be Lorito (Previously L1) if we want it to be backend-neutral, or it could be something that's specific to the particular JIT engine we use (like LLVM ops, for example).
  2. We need to be able to convert the LIR to C for direct execution, and to a JIT definition for indirect execution. There are a number of ways we could do this, depending on the skills and availability of our development team.
  3. The configure system will determine which JIT backends are available (if any), and generate the necessary code to support them during the build.
  4. We only need a minimal API that the Core of Parrot will interact with: JIT the incoming PBC, call into JIT'd code, output an executable and maybe a select few other operations. Much of the code can be generated at build time.
This kind of system gives us pluggability in the sense that we can support multiple JIT engines, even though we can only plug them during the build. It also means we don't have to define a huge comprehensive API to satisfy all JIT engines, we only need a minimal API and can generate the rest of the code specifically for individual JITs.

Obviously this is all very preliminary, and we will refine the design as we move forward and gather more information about all our options. I'm sure I'll be posting updates here as new things start happening.

Thursday, September 3, 2009

Fixed-Size Allocator: Now With Extra Working!

A few days ago I toggled the flag to disable the fixed-size allocator. I was receiving too many bug reports that I wasn't able to debug adequately, and I heard that it wasn't working at all on Win32. So we shut it down in hopes that we could find a fix eventually. Well, today I found and committed the fix.

NotFound mentioned out of the blue today that the fixed-size allocator was working on Linux, both 32- and 64-bit flavors. This was news to me, last I had heard it was malfunctioning on those platforms. However this news was quickly confirmed by darbelo too, and then I did it myself. Worked perfectly. So I crossed my fingers and tried on Win32 as well: Nothing. Segfault during miniparrot and the build stopped. I don't have GDB on my windows machine (it's actually my work computer, so it's use is limited). I do have Visual Studio, but I've yet to find a real way to debug a program in VS that I didn't build in VS, especially if it's a native-code executable. In lieu of real debugging tools, I used the "printf" method to try and narrow down where and when the segfault was happening. Here's the function that was causing problems, can you spot the error?

PARROT_CANNOT_RETURN_NULL
PMC_Attribute_Pool *
Parrot_gc_get_attribute_pool(PARROT_INTERP, size_t attrib_size)
{
ASSERT_ARGS(Parrot_gc_get_attribute_pool)
Arenas * const arenas = interp->arena_base;
PMC_Attribute_Pool ** pools = arenas->attrib_pools;
const size_t size = (attrib_size < sizeof (void *))?(sizeof (void *)):(attrib_size);
const size_t idx = size - sizeof (void *);

if (pools == NULL) {
const size_t total_length = idx + GC_ATTRIB_POOLS_HEADROOM;
const size_t total_size = (total_length + 1) * sizeof (void *);
/* Allocate more then we strictly need, hoping that we can reduce the
number of resizes. 8 is just an arbitrary number */
pools = (PMC_Attribute_Pool **)mem_internal_allocate(total_size);
memset(pools, 0, total_size);
arenas->attrib_pools = pools;
arenas->num_attribs = total_length;
}
if (arenas->num_attribs <= idx) {
const size_t total_length = idx + GC_ATTRIB_POOLS_HEADROOM;
const size_t total_size = total_length * sizeof (void *);
const size_t current_size = arenas->num_attribs;
const size_t diff = total_length - current_size;

pools = (PMC_Attribute_Pool **)mem_internal_realloc(pools, total_size);
memset(pools + current_size, 0, diff * sizeof (void *));
arenas->attrib_pools = pools;
arenas->num_attribs = total_length;
}
if (pools[idx] == NULL)
pools[idx] = Parrot_gc_create_attrib_pool(interp, size);
return pools[idx];
}

We keep an array of pool structures ("pools"), where each index into that array represents the size of the object managed by that pool. The smallest object we can allocate is a single pointer, because we link objects together into a linked list, so we need at least enough space for one pointer to handle that. We also want indexing for the smallest items (sizeof(void*)) to start at slot zero in the pools array to save space. To get this effect, we first calculate the minimum necessary size of the object, and then we subtract the sizeof(void*) from that to give us the index into the pools array. This part works just swell.

We then have two large logic blocks. The first determines whether we have allocated any pools at all, and if not first allocates space for the pools array. The second block determines if we have too few slots allocated, in which case the array is resized to accommodate our incoming request. We then check to see if the current slot is null, and allocate a new pool in that slot if it is. Finally we return the pool in the given slot.

The problem, I soon found out, was that for some specific sizes we were returning bad pointers. See the problem yet? Let's zoom in a little:

const size_t total_length = idx + GC_ATTRIB_POOLS_HEADROOM;
const size_t total_size = total_length * sizeof (void *);
/* Allocate more then we strictly need, hoping that we can reduce the
number of resizes. 8 is just an arbitrary number */
pools = (PMC_Attribute_Pool **)mem_internal_allocate(total_size);
memset(pools, 0, total_size);
arenas->num_attribs = total_length;

This is part of the logic for allocating a fresh pools array. GC_ATTRIB_POOLS_HEADROOM is a buffering factor I added. It's #define'd to be 8, which means that when we do allocate we make space for 8 sizes more then we currently need. This prevents us from needing to resize the array if we need to allocate an object that is only a little bit bigger (and we know that we will almost always need to allocate something that's just a little bigger).

Taking the index (idx) of the slot, we calculate the size of the array, allocate it, then initialize it all to zero. Except we don't. In the first run through if idx is 0 then total_length is 8 and total_size is 64. This means we're allocating enough space for 8 pointers, not one pointer plus a headroom of 8 additional pointers. Because the idx is zero-based, my math here was off by one. To see it better, consider if GC_ATTRIB_POOLS_HEADROOM was omitted entirely. Then idx would be 0, total_length would be 0, and total_size would be 0. Try passing that to malloc and see how well it works for you.

So I changed the calculation for total_size to:

const size_t total_size = (total_length + 1) * sizeof (void *);

And Parrot magically started building on Win32. A few more small cleanups and I committed r40962.

I still need to do some benchmarking though, I'll try to do that tonight. I hope that this will bring some measurable performance improvements to Parrot.

Context PMC Follow Up


Quick followup this morning: bacek has merged the context_pmc3 branch into trunk just now in r40958. Testing would be much appreciated!

Wednesday, September 2, 2009

Refactoring Continuations and Contexts

I've written, on average, about 1.5 posts per day in the last two weeks, but haven't managed to post any of them. Sorry about the delay!

When 1.5.0 was released, the Sub and Continuation PMCs were just garbage collectible wrappers around the C structs Parrot_sub and Parrot_cont respectively. My initial attempt at a Context PMC would have followed this same mold, applying a thin garbage-collectible wrapper around the Parrot_context structure. This was certainly not ideal for a number of reasons (multiple allocations, extra layers of pointer dereferencing, lots of extra pointer-checking code to make sure everything is where we think it is, etc), but it would have helped to plug the memory leaks introduced by the reference-counting context system as-is.

Shortly after 1.5.0 was released, however, bacek's branch to clean up the Sub PMC landed, and got rid of the Parrot_sub structure entirely (converting all it's fields into attributes for the PMC). He quickly started work on the Context PMC next, throwing out what little work I had done and moved towards a comprehensive Context refactor. He wasn't just turning the Context PMC into a thin wrapper, he was converting Parrot to have a proper Context PMC wholesale. It was what I was planning to do, though I would have spread it out over several steps and several branches.

His work created a little bit of a stir because it ran afoul of the deprecation policy: Some users were poking directly into the guts of the Parrot_context structure, and his work would have broken that. Nobody said "no", but there was definitely some stalling and arguing about the finer points of the deprecation policy. Plus, he had a weird failure in JIT (Which I will discuss in depth with my next post), so he couldn't merge immediately anyway.

A few days ago I saw a very cool diff from newcomer jrtayloriv to perform the conversion for the Continuation PMC and the Parrot_cont structure. The diff wasn't completely ready to apply to trunk yet, but it was an amazing start and came out of the blue absolutely unexpected. What his patch did was merge the fields of the Parrot_cont structure into the attributes of the Continuation PMC, and replace references to "Parrot_cont" with "Parrot_Continuation_attributes". It's not a huge change to be sure, and it doesn't do anything to address the poor encapsulation of the Continuation PMC, but it's a great first step in a comprehensive cleanup and refactor of that system. I created the kill_parrot_cont branch to test that patch, and after a few more patches from jrtayloriv and help from some other commiters it matured nicely.

Tonight, both branches became 100% ready to merge. chromatic fixed the JIT failure in bacek's branch, and also "clarified" the deprecation policy to smooth the way for these changes. It helped that many of Parrot's users, especially Rakudo, were strongly in favor of a merge sooner rather then later. jrtayloriv got all the rest of his tests passing and cleaned up things that needed it, and decided to get the branch merged before making too many more changes. I sent an email to the list about these two, and am waiting for some feedback. Hopefully we can get both branches reviewed and approved by the end of this week, which gives us about a week and a half to clean up the wreckage and get trunk in shape for the 1.6.0.

1.6.0 is certainly going to be an eventful release. A lot of good branches have merged already and more are on the way. On top of that, we're seeing some great interest from newcomers that is translating to real code being applied. Hopefully we'll have a few new committers in the next coming weeks to help us with all these ambitious changes that we are making.