Skip to content

Implementation

Daniel Șuteu edited this page May 28, 2014 · 14 revisions

This page tries to describe the actual implementation of Sidef language and how the code is parsed and executed, along with other internal details of the language.

  • The most elementary data types of Sidef, are:

  • bool

  • nil

  • string

  • number

  • char

  • byte

  • regexp

  • file

  • directory

  • pipe

  • From this types, more complex types can be created, like:

  • array

  • pair

  • chars

  • bytes

  • hash

  • block

  • file handle

  • file stat

  • directory handle

  • pipe handle

Parsing of data-type objects

The elementary types are created at compile-time, while the complex types brings into existence only at run-time. In this section we will going to look at how data-types are parsed and stored in the data structure. Let's consider the following statement:

'this is a string'.say;
  • The main phases of the parser are:
  1. object (+indices)?
  2. (method (+arguments)? (+indices)?)(s?)

Which means that in the main phase, the parser expects and object followed by zero or more array-like indices ([]), then it expects zero or more methods for the self object. Each method can have arguments and/or array-like indices for the returned object.

The object-phase parses all the data types of the language, including other objects about we will talk in the next section.

  • Here is an example of some data types:
'abc';               # string
false;               # bool
1234.56;             # number
[1,2,3,4];           # array
/^[a-z_]+/;          # regex

...and many others.

The Perl code for the String object looks like this:

package Sidef::Types::String::String {
    sub new {
        my (undef, $str) = @_;
        bless \$str;
    }

   # ... more methods here ...

    sub say {
        my ($self) = @_;
        print "$$self\n";
    }
}

The parser starts from the beginning of the provided code and looks for an object.

'this is a string'.say;
^

When ' is encountered, it already knows that this must be a string, so it takes its content:

'this is a string'.say;
 ^--------------^

and returns a new Perl-string object:

Sidef::Types::String::String->new('this is a string');

The position is advanced with length of the parsed code and the object is then appended to the data structure, as:

{
    self => Sidef::Types::String::String->new('this is a string')
}

Now, the parser enters the second phase where it expects one or more methods for the self object:

'this is a string'.say;
                  ^
  • In Sidef we have three method separators: ., -> and whitespace, each with a special meaning:
  • dot (.) appends the method to the object that precedes the dot
  • arrow (->) appends the method to the object returned by the whole left-side expression
  • whitespace appends the method on the left-side object (almost like .)

In our code, we have . which tell the parser to add the method to the object which precedes the dot.

'this is a string'.say;
                   ^-^

so it adds the method to the data structure as:

{
    call => [{method => 'say'}],
    self => Sidef::Types::String::String->new('this is a string')
}

If it encounters ;, it knows that this is the end of an expression, so it starts over from the object-phase, expecting an object, but now it encounters the end of code. Before returning the data structure, it makes some checks (and some optimizations, optionally), then it returns:

my $structure = {
    main => [
        {
            call => [{method => 'say'}],
            self => Sidef::Types::String::String->new('this is a string'),
        }
    ]
};

Next, the data structure is sent to the executor which will traverse the data structure step-by-step and execute the code by calling the object-methods. First, it starts in the main namespace and iterates over statements. If a self object is encountered, it checks whether it has a call keyword, then iterates over the methods of this array, calling each method on the object returned by previous method.

A basic implementation of the executor looks like this:

foreach my $statement (@{$structure->{main}}) {
    if (exists $statement->{self}) {
        my $self = $statement->{self};
        if (exists $statement->{call}) {
            foreach my $call (@{$statement->{call}}) {
                my $method = $call->{method};
                $self = $self->$method;
            }
        }
    }
}

Parsing of abstract objects

The object-phase includes some other types of objects, called abstract objects. Some examples are if, while, break, return, etc...

To parse an abstract object, the parser does not consume its input. It only looks ahead for a special keyword (e.g.: while) and returns the corresponding Perl object.

For example, if the return keyword is encountered:

return('abc');
^

an object will be returned:

Sidef::Types::Block::Return->new;

The data-structure after the object-phase will look like this:

{
    self => Sidef::Types::Block::Return->new
}

After the object-phase, the parser enters the method-phase, and while we are still at the same position, the method will have same name as that of the self-object.

return('abc');
^----^

Structure after the method-phase:

{
    call => [{method => 'return'}],
    self => Sidef::Types::Block::Return->new
}

The method-phase includes an optional sub-phase, called the argument-phase:

return('abc');
      ^

The argument-phase expects one or more objects, so it enters the object-phase which returns a string which is then returned to the method-phase:

Sidef::Types::String::String->new('abc')

...and added as argument for the return method:

{
    call => [{method => 'return', arg => [Sidef::Types::String::String->new('abc')]}],
    self => Sidef::Types::Block::Return->new
}

Clone this wiki locally