Skip to content

iekilinc/nit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

93 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Inspirations

Stages

When ./cmd/x86/main.go is invoked, input nit source code goes through 5 stages to produce an executable. Usage of this command:

  -asm
        only output assembly; do not output object file or executable
  -asm-path string
        (optional) path to the output assembly file. defaults to [base-path].asm
  -file string
        path to a file containing nit source code
  -obj-path string
        (optional) path to the output file. defaults to [base-path].o
  -out string
        (optional) path to the output executable. defaults to "[base-path].exe"
  -run
        run the generated executable

base-path refers to the name of the .nit file with its extension stripped. However, if the input file does not end in .nit, base-path refers to the name of this file itself.

1. Lexing

The lex package's Lexer takes in or reads nit source code and breaks it up into tokens. For example,

fun sum(a, b int) int {
    let c = 0 // comment are ignored
    return a + b + c
}

Gets translated into

type Token struct {
	Kind    Kind
	Literal string
    // ... (omitted for simplicity)
}

{Fun    "fun"} {Ident  "sum"} {LParen  "("}
{Ident  "a"}   {Comma  ","}    {Ident   "b"}  {Ident "int"}
{RParen ")"}   {LCurly "{"}    {Newline "\n"}

{Let "let"} {Ident "c"} {Assign "="} {Number "0"} {Newline "\n"}

{Return "return"} {Ident   "a"}  {Plus "+"} {Ident "b"} {Plus "+"}
{Ident  "c"}      {Newline "\n"}

{RCurly "}"} {Eof ""}

When the lex package is used by itself, a *Lexer can be created and used in the following way:

// Create a *Lexer
func New(input string, filename string) *Lexer
func NewFromReader(rd io.Reader, filename string) (*Lexer, error)
func NewFromFilename(name string) (*Lexer, error)

// Eg, iterate through tokens and print each

import (
	"log"
	"nit/lex"
)

func main() {
	lexer, err := lex.NewFromFilename("./examples/ret2.nit")
	if err != nil {
		log.Fatalln(err)
	}

	numTokens := 0
	for {
		token := lexer.EatToken()

		log.Println(token)

		numTokens += 1
		if token.Kind == lex.Eof {
			// Reached end of file
			break
		}
	}
	log.Printf("\ngot %d token(s)\n", numTokens)
}

2. Parsing: AST (Abstract Syntax Tree) generation

The ast package's Parser uses lex.Lexer to step through each token and produce a recursive tree structure for later stages to be able to easily step through the code.

The parser would produce roughly the following (simplified for demonstration).

&Fun{
    Name:    &Ident{"sum"},
    Params:  []FunParam{
        {Name: &Ident{"a"}, Type: &Ident{"int"}},
        {Name: &Ident{"b"}, Type: &Ident{"int"}}},
    RetType: &Ident{"int"},
    Body: &Block{
        Stmts: []Stmt{
            &Let{
                Name:  &Ident{"c"},
                Value: &Int{0}},
            &Return{
                Vals: []Expr{
                    &BinaryExpr{
                        Left:     &Ident{"a"},
                        Operator: lex.Token{Kind: lex.Plus},
                        Right:    &BinaryExpr{
                            Left:     &Ident{"b"},
                            Operator: lex.Token{Kind: lex.Plus},
                            Right:    &Ident{"c"}}}}}}}}

When the ast backage is used by itself, a *Parser can be created and user in the following way:

// Create a *Parser
func NewParser(input string, filename string) *Parser {
func NewParserFromFilename(filename string) (*Parser, error) {
func NewParserFromReader(rd io.Reader, filename string) (*Parser, error)

// Eg, print the AST

import (
	"log"
	"nit/ast"
)

func main() {
	parser, err := ast.NewParserFromFilename("./examples/ret2.nit")
	if err != nil {
		log.Fatalln(err)
	}

	program, err := parser.ParseProgram()
	if err != nil {
		log.Fatalln(err)
	}

	log.Println(program)

	log.Printf("\ngot %d top-level statement(s)\n", len(program.Stmts))
}

3. Semantic analysis

The seman (semantic **an**alysis) package walks the raw, untyped AST generated by ast.Parser validates that the language's semantics are adhered to. That is, it type-checks the AST. It makes sure that, among other things,

  • sum is not a name that is already taken (for instance, by another global function or variable)
  • sums parameters a and b actually have a specified type
  • c is not a taken name, and 0 is an expression that produces a type (as opposed to, for example, a call to a function that returns nothing)
  • a + b is a valid operation since both sides are of the same type, and ints support the binary operation +
  • c is in scope and its value can likewise be added to the result of a + b
  • sum returns a single int as it declared it would to so

4. Assembly generation

The asmgen package generates x86_64 assembly which invokes Linux system calls (to achieve things like exiting the program). The generated assembly follows the design of a simple stack machine. The generated x86_64 assembly would look something like this:

global sum
sum:
    ; set up stack frame
    ; redundant push in nit's calling convention? probably
    push rbp
    mov rbp, rsp

    ; allocate local variables
    ; in this case, the only variable is c and it is of type int (size 8)
    sub rsp, 8

    push 0             ; stack = [0]
    pop rax            ; rax = 0; stack = []
    mov [rbp - 8], rax ; c = 0

    ; function parameters are pushed to the stack just before the function is
    ; called, so we access them using the base pointer + an offset, skipping
    ; over the return address.
    ;  - rbp + 8 (upwards) contains the return address
    ;  - rbp + 16 contains b
    ;  - rbp + 24 contains a

    mov rax, [rbp + 24] ; rax = a
    push rax            ; stack = [a]

    mov rax, [rbp + 16] ; rax = b
    push rax            ; stack = [a, b]

    pop rcx      ; rcx = b; stack = [a]
    pop rax      ; rax = a; stack = []
    add rax, rcx ; rax += rcx
    push rax     ; stack = [a+b]

    mov rax, [rbp - 8] ; rax = c
    push rax           ; stack = [a+b, c]

    pop rcx      ; rcx = c; stack = [a+b]
    pop rax      ; rax = a+b; stack = []
    add rax, rcx ; rax += rcx
    push rax     ; stack = [a+b+c]

    ; the caller grows its stack to accomodate the function's return values,
    ; and the callee places the values in those slots
    pop rax             ; rax = a+b+c, stack = []
    mov [rbp + 32], rax ; return_slot = a+b+c

    ; deallocate stack frame and return
    mov rsp, rbp
    pop rbp
    ret

nit has a simple calling convention:

  • The caller grows its stack to make space for the return values. The callee places the return values in these slots using the base pointer.
  • The caller pushes its arguments to the stack. The callee accesses these arguments using the base pointer.

The order of both the arguments and the return values is as declared in the source code: left-to-right in nit: top-to-bottom in the stack.

This step generates an .asm file (if its io.Writer was indeed a handle to a file).

5. Binary machine code generation

The x86 package

  1. Invokes the host's nasm to assemble the assembly file into an ELF64 .o object file.
  2. Then, it passes the .o file to the host's gcc to link the object file via ld and generate the final executable, ready to be run.
  3. In addition, if a main function was declared, an executable file containing main as its entry point will be produces, ready to be run on an x64_64 Linux environment.

I don't like that intermediate files have to be created for the .asm and .o files before the executable is able to be generated; however, I was not able to find library forms of gcc or nasm for Go. I am considering switching to another assembler or writing C bindings myself in order to prevent generating intermediate/temporary files, since disk read/writes are much slower than memory manipulation.

About

A toy language that compiles to x86_64.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages