Tuesday, June 30, 2009

LIRAsm - Current Status, and the last post

Last two weeks of working on LIR assembler provided one of the most challenging pieces of work done yet. Multiple fragment support in LIRasm was implemented almost completely, and landed in the tracemonkey repo. I got to look up current source code, understand the way it is performing, modify it to support mutliple fragments, battle with segfaults and memory leaks, and do a whole lot of implementation and re-implementation before finally getting an acceptable version. And it would be correct to say the hardest things bear the sweetest fruits in the end. I am really happy that I finished this part of the project.
Undoubtedly, I got great support from my project mentor, jorendorff, humph, and graydon. Without their support, guidance and everlasting patience, I would never have been able to come so far. Although the assembler is far from finished, it has been a great journey so far, with its ups and downs, and we have managed to pull through so far. I really respect this community and the people I have been working with, and I just can't see its engagement with me ending.
In the process, I fear I have disappointed people and committed some stupid mistakes. In particular, I think last post in my blog sent the wrong message to many people, and I would like to clarify whatever I believe was uncalled for on my part. I think I sounded disappointed and possibly hateful(I am not finding the right word here) in the overall tone of the post. It represented in general my feeling at that time, since I had worked hard on the project and most of my work was not going to be used in the new world. I think it was just a manifestation of the feelings associated with the unfortunate event of code duplication. Everyone loves their work.

Anyway, I am very much grateful to Mozilla Education for providing me with this opportunity to work on such a real project. I think its a great gesture, and will definitely publicise it in my college so that more juniors can benefit from it.

Monday, June 15, 2009

[LIR Compiler/Assembler?] Current Status

We hid a roadblock last week when it came to our notice that another version of LIR Assembler(https://bugzilla.mozilla.org/show_bug.cgi?id=484142) was being developed by Graydon Hoare(graydon), using only handcoded c/c++. The whole of last week went into deciding what should be done with the two working versions, seemingly both equally developed upto a certain extent.
The major problem was in merging the two forks - we were using a parser generator based approach, and graydon was using handcoded c++. It is far from trivial to merge the two, and hence, one version had to go. We spent one day going over the pros and cons of either approach. The summary of our discussion follows -
Graydon was advocating the fact that parser generators lead to 'mucky' .y files, a tool dependency, and are difficult to land.
Our case was that the .y files lead to much cleaner, neater classified code, separating the logic from the implementation to a certain extent. It involved significantly lesser number of code lines, and would also be extensible/modifiable(easily) in the future if additional syntax requirements appear.
As things turned out, we were deadlocked, and the decision to choose one particular fork lay upon me.
I went with graydon's version.

graydon's version was actually going to be landed in the tracemonkey repo, and this was one of the reasons I chose his version. The remainder of the week was spent in making lirasm(as our new lir assembler is now called) appropriate for the source tree, and then landing it in the repository.
During this time, I read up on Mercurial Queues(MQ), and learnt about the Bugzilla way of doing things (using patches, filing bugs) compared to the BitBucket approach we used earlier.
Things will get on track from today probably, and we have a bug to work on(https://bugzilla.mozilla.org/show_bug.cgi?id=497991).

The main setback of the whole duplication situation was time. When it occurred, things were going very - I mean very smoothly, and it really seemed most of the work for LIR Compiler would be finished in two weeks time. I will get more idea today on how much time this is going to take now.

Tuesday, May 26, 2009

How to use Loads and Stores in LIR

Well, if you read the Wiki on LIR, and if you aren't previously familiar with LIR, then you might not have gotten the hang of pointers.

Pointers - are like integers in the way that they contain addresses, and can be used anywhere where an integer can(LIR has only one type).
What is different is that you cannot create a pointer(that is get a memory address, unless you are extremely lucky!) by yourself.

Thats where alloc instruction comes in.
Alloc - The wiki will give you the more technical information. Alloc is basically like calloc() in C - it allocates a specified amount of space for your usage later, and returns a pointer to it. Hence the instruction format is:
p = alloc size
Illustration

I will illustrate this with an example straight from jorendorff's desk -
addr = alloc 4;
zero = int 0;
st addr[0] = zero ... (that's roughly how it's intended to be used)
That stores the value 0 at the memory address specified by addr pointer. Now you can load that value into a variable, store another value, etc.


Going one step further, if you allocate 8 bytes of memory
addr = alloc 8;
zero = int 0;
one = int 1;
st addr[0] = zero;
st addr[4] = one;
Note the offset 4 in the second store. The offset is the number of bytes to be offset from the base, as expected.Now, you can load the two bytes in separate variables, or mess up stuff by specifying an offset not divisible by 4!

Having implemented basic loads and stores today, and consolidated my code which looks much more readable now, I am hoping to get information on:
1. Jumps and labels and how to handle those(Are labels solely the responsibility of the parser?).
2. Subroutine calls.
3. Guards

I will go about them in the above order.

Sunday, May 24, 2009

[LIR Compiler]Progress and Issues

I have added support for a number of instructions(mostly, floating points and guards, and 64 bit instructions are left). The project feels pretty robust currently.
I need to add code for reporting errors in a more informative manner(currently, I use the trivial "syntax error at line number ...." format. I will need to look into YYLOC and the related token locating parameters.
Also, I need information on how/what do display as output. Currently, I am using the program here as the base program, and modifying it to suit my needs.
Also, I am not aware how loads and stores work - for instance, what addresses are we providing to the compiler - if we store an integer using store, where is it actually getting stored. Are we directly addressing memory words, or is it stored in some LIns * pointer like other data is stored.

Wednesday, April 22, 2009

[LIR Compiler]Bison Works, Finally

I finally got Bison to work for the first time for LIR Compiler.
The error was in the makefile itself - a statement, though incorrect skipped my notice since it worked for the earlier version using only flex.
What I was doing for making main.o was -
$(CC) $(CFLAGS) $^ -0 $@.
where the dependencies were a .cc file, and a library.
Due to a strange coincidence, this makefile worked fine with my earlier attempts using only flex. With bison also included, things got a little messier, since I had to include the token header tok.h in the dependencies for main.cc. The above line broke down in that case.
The remedy was to replace this line with what should have been there in the first place -
$(CC) $(CFLAGS) -c main.cc -o main.c

and then, include the other libraries in the linking phase when all the .o files were linked.
The makefile and compilation process is starting to make more sense now, although I still don't know what lies inside the .a files.

Once the build is working fine, remains the task of building the parser, which is the fun part. So I should be making good progress with the project in the coming days.

Friday, April 17, 2009

Updates

I have been trying to get Bison to work along with flex using the basic 4 line snippet that was used initially, but I haven't had much success. I haven't been able to devote much time to the project, since nearing the end of the semester, all projects in the curriculum are in a kind of rush, plus alot of practical vivas and presentations are due. I will try to get some time and write some code this weekend, but the coming month is going to be very hectic.

Saturday, April 11, 2009

Updates - LIR Compiler

I am currently reading on Bison. I have today and tomorrow off, so I can get alot of reading done in these two days, and hopefully I have sufficient skills to write the parser.
I am really enjoying reading Bison - the way it parses is very natural yet error free.

Monday, April 6, 2009

Name Directory - Beta Implementation

So after about 4-5 hours of slogging it out to implement the Name Directory I mentioned in my previous post, I got its working implementation, in a not so beautiful piece of code, but it works!
Take this LIR snippet for instance -
start
two = int 2
twoPlusTwo = add two, two
three = int 3
threePlusThree = add three, three
five = int 5
fpf = add five, five
threePlusFive = add three, five
ret threePlusFive
It uses three named immediates, two, three, and five, alongwith other values calculated upon using these immediates. From the parser now, you can reference any value(which should have been the case in the first place), from anywhere in the snippet, and use it. For example, here, we use immediates three and five in threePlusFive, and return that value. The earlier problem of the most immediate result staying in the result variable has also been removed. The parser scans the named variable in the directory, and if found, returns it.

I will illustrate this with another example.Take this LIR for instance -
start
two = int 2
twoPlusTwo = add two, two
three = int 3
threePlusThree = add three, three
five = int 5
fpf = add five, five
threePlusFive = add three, five
temp = add threePlusFive, three
ret temp
As it should, the temp variable returns 11.

The code snippet can be found here.
Would surely love to hear on this.

Sunday, April 5, 2009

Name Directory

I am currently trying to implement a name directory - basically, each expression in JIT is named. It is customary to implement a name(label) array, linked with the expression value, so that whenever a label is referenced, it can be searched within that array, and if found, the resulting expression can be used in the context of the call.
I am basically trying to get this LIR snippet to work -
start
two = int 2
twoPlusTwo = add two, two
three = int 3
threePlusThree = add three, three

ret twoPlusTwo
The current version of the parser will return the output from the latest computation only, since it uses only one variable. What needs to be done is usage of a name directory - whenever an instruction like two = int 2 is seen, or in general,
[a-zA-Z]+" = " are expressions of the assignment form. On seeing this, the parser should create an entry in the directory with this parameter name if it doesn't already exist, and then write the corresponding expression value in the parameter value array. If the value exists, the new value should be overwritten.
Hence, searching must be implemented so as to figure out whether a given value exists or not.
As for the return statement, it must return a variable name. Hence, searching must be employed there as well, to search for the given parameter name, and if found, the desired value is returned via the LIR_ret command, otherwise, an error occurs.
This is the basic outline I will try to implement currently.

Saturday, March 28, 2009

LIR Compiler - Goal 1 reached

I got the lex file to compile and work for the small program to add two integers. I must say the flex script isn't really flexible at the moment, and really, it is the hardcore 0th level in which you can get Goal 1 done, but I can say that goal 1 has been done. Now, I will try to build upon the code to improve the lexical analysis, and hopefully will get a good working parser in the coming weeks. Comments on the work are welcome, especially for those who are following the project.

The lex source code is posted here

Updates

Alright, I have been a little lax on posting updates regarding the project partly because I have not been able to get much done, and I had exams.
In my last post, I had mentioned that I had made the temporary source code ready which would take two numbers and add them, and output the result. In the past week, I wrote down the flex script to convert the little LIR instructions into their corresponding statements in the source. But so far, I haven't gotten much success with flex. The .c file generated by flex doesn't seem to compile, giving an error:
Undefined reference to yywrap().
I am trying to read up on flex more, so that I can get some insight on what I am doing wrong. I will try to post on their mailing lists as well, since I couldn't find any IRC channels or anything.

I have also posted information about Guards on https://developer.mozilla.org/En/Nanojit. The information is not much, and I am trying to get the tracer output so as to illustrate in a more transparent manner about guards(and where they occur).

Sunday, March 1, 2009

Updates for LIR Compiler

Currently, for the LIR Compiler project, I need to make a parser for LIR, which can then generate the required C code to then compile LIR.
Part of this task is also to see how nanojit currently uses LIR. It is pretty simple once you refer to the code snippet provided here - https://developer.mozilla.org/En/Nanojit

Interesting part are the lines 38-44:
// Write a few LIR instructions to the buffer: add the first parameter
// to the constant 2.
writer.ins0(LIR_start);
LIns *two = writer.insImm(2);
LIns *firstParam = writer.insParam(0, 0);
LIns *result = writer.ins2(LIR_add, firstParam, two);
writer.ins1(LIR_ret, result);
Basically, what the code provided above is doing is feeding raw LIR into the LIR Buffer, using the LIRWriter's writer object. From an operational point of view, it is creating a function, which takes an integer input, and adds it to two, and outputs the result. The function is created here on lines 57-69:
// Compile the fragment.
compile(fragmento->assm(), f);
if (fragmento->assm()->error() != None) {
fprintf(stderr, "error compiling fragment\n");
return 1;
}
printf("Compilation successful.\n");

// Call the compiled function.
typedef JS_FASTCALL int32_t (*AddTwoFn)(int32_t);
AddTwoFn fn = reinterpret_cast(f->code());
printf("2 + 5 = %d\n", fn(5));
return 0;

This upper half of this snippet includes code where the raw LIR is first converted into machine code.(where compile(fragmento->assm(), f); is called basically).

Then a pointer to a function is used, which takes an int as input and returns the sum of that parameter with two. (
typedef JS_FASTCALL int32_t (*AddTwoFn)(int32_t); )

Then, printf is hardcoded to call it with a parameter 5, and on linking with nanojit library, the following program will display
2+5=7
Now, what I need to do is generate output for this:

   start
two = int 2
twoPlusTwo = add two, two
ret twoPlusTwo


This adds two and two in the most hardcoded way possible. The conversion from LIR to a program like one shown above is the task of the parser.
What the parser needs to generate for the above code will be this(in a raw form)(changes only):

writer.ins0(LIR_start);
LIns *two = writer.insImm(2);
LIns *firstParam = writer.insImm(2);
LIns *result = writer.ins2(LIR_add, firstParam, two);
writer.ins1(LIR_ret, result);

and
// Call the compiled function.
typedef JS_FASTCALL int32_t (*AddTwoFn)();
AddTwoFn fn = reinterpret_cast(f->code());
printf("2 + 2 = %d\n", fn());
return 0;

Two minor changes, and we have working end product for the parser to produce to complete Goal 0.
Regarding the parser, I am planning to use Flex as suggested by jorendorff

Wednesday, February 18, 2009

Goal 0 Completed

Project reference: http://wiki.mozilla.org/LIR_compiler

Goal 0 was completed on Monday - I successfully linked libjs_static.a with the a custom written program. Earlier, what I was doing was trying to build libjs_static.a from scratch all over again, since I thought that was the basic purpose of the makefile. Incidentally, the above mentioned library is not that easy to build(so I realised, the hard way), and I figured that as my first makefile, I wouldn't have been allotted such a difficult task to complete on my own! Putting two and two together and finally making four, I developed a new makefile to link my program with libjs_static.a, and in turn would be built using the default tools already provided with the source tree.

My project directory will be /js/src/build-debug/lirc. The makefile and custom program reside there itself. The makefile used is:
#First makefile!
CC=g++
CPPFLAGS=-DDEBUG -g3 -include ../mozilla-config.h -I../../nanojit -I ../dist/include/js -I../
all: jittest
jittest: jittest.cpp ../libjs_static.a
$(CC) $(CPPFLAGS) -o $@ $^
clean:
rm *.o
rm jittest
TODO: embed VPATH in makefile itself.

Now, task 0 completed, I have to build a parser for raw LIR instructions, and call the corresponding jit methods to convert the instruction into machine code. More on that later.

Thursday, February 12, 2009

Goal 0: step 0!

My project is to develop a standalone compiler for LIR and converting them into corresponding binary code using nanojit library.(https://wiki.mozilla.org/LIR_compiler)

So far, I have made progress on Goal 0 of the project. Goal 0 was:
Write, compile and link a small C++ program with the Nanojit library.
The main aim is to get to link with the Nanojit library. The program I use just includes LIR.h,, and returns 0 from main.

So far, a working makefile has been developed, although the linking part is not yet complete. Again, I ran into some trouble with the usage of my easy build, and the makefile reported some files missing, namely jsautocfg.h and a few others.

jorendorff pointed me to a few links yesterday to correct this problem, and I am going to try to get a working build during the weekend.