Blog Closed

This blog has moved to Github. This page will not be updated and is not open for comments. Please go to the new site for updated content.

Thursday, December 10, 2009

Matrixy Progress

I've been doing a lot of work on Matrixy lately. I find that recently I've been able to do a lot on that project in small bits, which is great when I want to touch it during a lunch break or between baby maintenance. I've certainly been able to work more on Matrixy this week then I have been able to write blog posts, for instance. In recent days I've:
  1. Done major cleanup and expansion of the test suite
  2. Added a bunch of builtins, including some new parrot-primitive functions that will allow me to write more functions in M and to possibly migrate some PIR-based builtins to M.
  3. Refactored and cleaned up dispatch
  4. Added Cell Array support
  5. Added proper nargin/varargin support
  6. Created a new branch to begin adding proper nargout/varargout support
It's the last item on the list that's been giving me a bunch of trouble recently, and the inherent difficulty of the task is probably the reason why I haven't gotten it working prior to now. The work I have been doing so far is mostly hackery, trying to add nargout and varargout without having to rewrite the entire grammar and dispatching mechanism . Of course, in the long run I am going to have to rewrite these things, but I'm just not ready to do that yet. I would rather have a proof of concept and some passing tests than nothing.

The problem with dispatch, or anything in M, is that everything is ambiguous until runtime. The same syntax, "X(1)" could be used to refer to the first element of the matrix X, the first element of the cell array X, or a call to the function X with an argument "1". This is further complicated by the fact that variables overshadow functions of the same name, but do not overwrite them completely. If we have a variable and a function named X, we can still access the function version using the feval builtin. In M we can also call functions without using parenthesis at all, so it isn't only the case that postfix parenthesis create ambiguity, almost every single identifier lookup requires a runtime check.

I've talked about all these syntax issues before, and won't dwell on them now. There are also semantic issues that need attention. Let's look at the case of nargout for instance.

function x, y, z = getcoords()
...
endfunction

[x, y, z, w] = getcoords()

In this code snippet above, the "getcoords" function is called with 4 output arguments, but the definition of that function only provides for three. If "getcoords" doesn't explicitly check the number of outputs expected and throw an error, this assignment will proceed without a problem. x, y, and z will get the expected values in the caller context, and the w variable will simply be left undefined.

So what we have is really a fundamental disconnect between caller and callee. The callee can see how it was called by checking nargin and nargout variables, and can choose to error if those numbers do not match what it wants. A function can return a different number of values then the caller expects, too. So if I just did a call to:

getcoords();
disp(ans);

nargout here would be 0, but the function could still return 3 values which would be stored in the global default variable "ans". Yesterday I started a refactor to make this possible, by trying to break assignments up into two parts: The callee returning an arbitrary array of values and the caller having to explicitly unpack those values. It's gotten me through a number of important test cases, although it is obviously not a great or pretty solution.

[a, b, c];

is an R-value, and the generated matrix is stored in the default variable "ans". However,

[a, b, c] = foo()

is obviously an L-value, and I need to be doing some bookkeeping to keep track of the number of arguments so I can populate nargout in the call to foo (if foo is a function call, of course). So I create a global variable to store the L-values in the assignment so when I generate the actual assignment call I have access to that number. One problem I ran into yesterday though is that when a rule fails and we have to backtrack, we end up with these global variables in an unconsistent state. So the call:

foo();

doesn't have any L-values, and when I parse the function call I can't expect the global variable to exist. Likewise, when I parse:

[a, b, c];

I need to keep count of the L-values, even though this isn't an assignment. So yesterday I ran into the problem:

[a, b, c];
foo(); # Thinks nargout = 3

Fun, eh?

I'm not even entirely certain how I'm going to do all this right. Do I create a custom CallSignature subclass, and handle argument passing myself? This has the nice benefit that I can almost always treat "x(1)" as a function call, whether it's an actual function or an internal indexing function. The more I can abstract away the differences, the better. The "almost" in the previous sentence of course refers to "x(1) =" L-values, which would need to be indexed a little differently from a normal function call. And since I need to be manipulating indices before passing them to the PMC, I need to be calling a function to handle indexed assignments anyway.

It's all going to be a little tricky to get past this roadblock and to do it in a way that I find acceptable. However, Matrixy has good momentum right now and has a lot of great features already, so I'm hoping I don't get mired down for too long.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.