0%

TinyC-like Compiler

Overview


  • tinyc basics
  • frontend
  • backend

TinyC basics


tokens in tinyC

According to their lexical characteristics, the tokens in tinyC are divided into the following three categories.

  1. single char operator ( 15 kinds )

    1
    + * - / % = , ; ! < > ( ) { }
  2. double char operator ( 6 kinds ) and keywords ( 10 kinds )

    1
    2
    <= >= == != && ||
    void int while if else return break continue print readint
  3. integer constant, string constant, identifier ( variable name and function name ), ( 3 kinds )

numbering principle

  • single char op : the token number is the value of its character.
  • others : the token number are numbered from 256

notes

  • When you are writing the rule for single char op, please pay attention to the character ‘-‘ in RegExp :

    1
    2
    (wrong) : {OPERATOR} {[+-*/%=,;<>(){}]} 
    (right) : {OPERATOR} {[+\-*/%=,;<>(){}]} <-- 正则表达式'-'需要转义字符

Frontend

Grammar on tools


flex

1
2
3
4
5
6
7
8
%{
Declarations
%}
Definitions
%%
Rules
%%
User subroutines

bison

1
2
3
4
5
6
7
8
%{
Declarations
%}
Definitions
%%
Productions
%%
User subroutines

Details


scanner.l

  1. When I’m implementing scanner.l, I find that different order can lead to different results or even errors! For example, first order is like:

    1
    2
    3
    4
    "int"           { return T_Int; }
    "print" { return T_Print; }
    ...
    {IDENTIFIER} { _DUPTEXT; return T_Identifier; }

    second order is like:

    1
    2
    3
    4
    {IDENTIFIER}    { _DUPTEXT; return T_Identifier;        }
    ...
    "int" { return T_Int; }
    "print" { return T_Print; }

    the second one results in an error! Because compiler doesn’t know int is a keyword, instead, it treats int as an identifier! Therefore, you need to put all keyword rules before the identifier!!!

Backend

NASM

  • The Netwide Assembler based on x86

Reference