ManTa 3

TRANSLATIONAL SEMANTICS

A very general translation scheme has been conceived to generate source code in several programming languages from ManTa ADTs.  In this section this scheme is presented and as an example the Ocaml Code Generator is explained.
 
 
 6.1  Translation Scheme

  1. Define the syntax of the programming language
  2. Define an AST of the programming language in ManTa
  3. Write a pretty printer for the language, that receives AST and generate textual representations (that is a String)
  4. Write representation of some ADT in the language to achieve efficency (this is not necessary but is highly recommended specially for ADTs NAT, BOOL, Char and String)
  5. Write in ManTa conversion routines from the ManTa AST to the language AST.  The main routine must receive an AbsType (see ManTa AST) and produce an AST of the program in the target language
  6. Generate code and integrate everything in an executable.
 6.2  Ocaml Syntax

This syntax is inspired in [Leroy96] and [Leroy98].   We have omitted:

  • Object oriented features
  • Modules system
  • Streams
  • Ranges
  • Assertion checking
  • Lazy Evaluation
  •  ref Pointers
  • A compilation unit in is an interface (or module signature) and an implementation; the syntax of an OCaml implementation is:
    implCaml  ::= {implPhraseCaml ;;}
       implPhraseCaml  ::= exprCaml
                      | valueDefCaml
                      | typeDefCaml
                      | exceptionDefCaml
                      | directiveCaml
    A module implementation is a sequence of phrases each one endig wit double semicolon (;;). A phrase is an expression, a value definition, a type or exception definition or a directive.  In run time they are evaluated in the order they appear (the expressions are also evaluated)

    Definition of types and exceptions introduce constructors of type, variants and registers.  The scope are the phrases that are after them (they have no effect in run time)

    Directives modify the behavior of compiler in the phrases after it. They have no effect at runtime nor in other modules.  The module name must begin with uppercase.

    directiveCaml  ::=
               open module_name

    A valueDefCaml is   let [rec] let-binding {andlet-binding} and allows associate names to function bodies.  Its scope begins after the definition and ends with the module.  The name of the function must begin with a lowercase letter.

    The type definitions must have the following syntax:
     
    typeDefCaml
          type typeBodyCaml {and typeDefCaml}
    Mutually recursive types can be defined with  and
    typeBodyCaml
          [typeParamsCaml] ident [typeEquationCaml] [typeRepCaml] {constraintTypeCaml
    The name of a type must begin with lowercase
    typeEquationCaml
          = typeExpr
    The defined type is equivalent to a type expression. They can be interchanged.
    The other styles of definition create incompatible types.
    type id=string;;
    typeRepCaml
        =constrDeclCaml {| constrDeclCaml
     |  ={ labelDeclCaml {; labelDeclCaml} } 
    The user can define variants (direct sum of several types) or registers with labels
    type num=Integer of int | Real of float;;
    type reg={name:String;mutable salary:int};;
    typeParamsCaml
          ' ident
       |  ( ' ident {, ' ident} ) 
    A type can be parametric:
    type 'a myList=NIL | CONS of 'a*('a myList);;
    constrDeclCaml
          ident 
       |  ident of typeExprCaml
    A constructor of a variant type can have arguments (a function).  The name of any constructor must begin with lowercase.
    labelDeclCaml
          ident : typeExprCaml 
       |  mutable ident : typeExprCaml 
    A label of a register can be modifiable (imperative style) by using the keyword mutable.
    constraintTypeCaml:
          constraint 'ident =typeExprCaml
    It allows the user to set constraints on the type of parameters.
    typeExprCaml:
         ' ident 
      |  ( typeExprCaml )
      |  typeExprCaml ->typeExprCaml
      |  typeExprCaml {*typeExprCaml}+ 
      |  ident
      |  typeExprCaml ident
      |  (typeExprCaml{,typeExprCaml})ident
      |  typeExprCaml as 'ident
    A type expression represents a type and can be: A parameter ('a), a function from one to type in other (int -> float), a cartesian product (int*string), an existent type (int) an existent parametric type (int list) or it can be an association with an identifier (the first letter of identifier must be lowercase).

    By defining an exception new constructor are added to the built-in type exn.  The syntax is:

    exceptionDefCaml:
          exceptionconstrDeclCaml
    Expressions have the following syntax:
     
    exprCaml:
          value-path An identifier associated to a value  (it can have the module path) e.g. var1
       |  constantCaml e.g. 5
       |  ( exprCaml) e.g. (3+6)
       |  begin exprCaml end Used to make blocks
    e.g. begin print_int 1;print_string "%" end
       |  ( exprCaml:typeExprCaml) e.g. 5:int
       |  exprCaml ,exprCaml {, exprCaml} Cartesian product e.g. (4,3.14,"w")
       |  ident exprCaml Constructor with arguments argumentos 
    e.g CONS(2,NIL)
       |  exprCaml ::exprCaml List constructor, the first expression is the head and the second the tail  e.g. 3::4::[]
       |  [ exprCaml {; exprCaml} ] List, the elements are separated with ; e.g [3;4]
       |  [| exprCaml {; exprCaml} |] Vector e.g. [|1;2;3|]
       |  { ident=exprCaml {;ident =exprCaml} } Register, the orden is not important
    e.g. {nombre="J";salario=2}
       |  exprCaml exprCaml Functional application, the first expression must be a function
    e.g. f  6
       |  prefix-symbol exprCaml Prefix and Infix symbols are treated as function, they can be redefined.  The predefined prefix operator are integer and float negation:: -  -.
       |  exprCaml infix-op exprCaml The infix operator are presented later.
       |  exprCaml . ident Evaluates the field label of register expr e.g. r.nombre
       |  exprCaml . ident <- exprCaml Modify a mutable field in a register e.g. r.salario<-5
       |  exprCaml .(exprCaml) Evaluates in a vector's element, the first expression is the vector and the second the index. e.g. v.(i)
       |  exprCaml .(exprCaml)<-exprCaml Modifies a vector position e.g. v.(i)<-8
       |  exprCaml .[exprCaml] Evaluates in a string position 
       |  exprCaml .[exprCaml]<-exprCaml Modifies a character in a string
       |  if exprCaml then exprCaml
          [else exprCaml]
    Conditional, the first expression must be of type bool and the remaining must have the same type.
    e.g if a=b then 0 else 1
       |  while exprCaml doexprCaml done Evaluates the second expression while the first evaluates to true.  Finally it always evaluate to (). 
    e.g. while true do print_string "etern" done
       |  for ident=exprCaml  (to | downto)
          exprCaml do exprCamldone
    Evaluates the third expression counting  ident from the first to the second one 
    e.g. for k=1 to 10 do (print_int k) done
       |  exprCaml ; exprCaml Secuence, evaluates the first expression, then the second and returns the value of the second. e.g. print_int a;a
       |  match exprcaml withpatternMatchCaml Evaluates the first expression and compare it with the patterns, the first that unify determine the returned value.
       |  function patternMatchCaml Define a one argument function
       |  fun multipleMatchCaml Define a function with one or more arguments
       |  try exprCaml withpatternMatchCaml Evaluates the first expression, if an exception ocurrs y try to match with a pattern to treat it.
       |  let [rec] letBindingCaml
         {andletBindingCaml} inexprCaml
    It associates names to values  that can be used in the las expression.

     
    constantCaml:
          integer-literal 
       |  float-literal
       |  char-literal 
       |  string-literal
       |  cconstr 
    Integer range from -2^30   to 2^30  -1. 
    Float values are implementation dependent
    8 bit characters
    Strings are character sequences, they can have at most 2^16-1 characters
    The predefined constant constructors are: true, false, [],  ()

    Some non terminal symbols used before are now presented:

    patternCaml:
          value-name
       |  _
       |  constant
       |  patternCaml asvalue-name
       |  ( patternCaml )
       |  ( patternCaml : typexpr )
       |  patternCaml | patternCaml
       |  ident patternCaml
       |  patternCaml {,patternCaml}
       |  { ident = patternCaml {;ident = patternCaml} }
       |  [ patternCaml {;patternCaml} ]
       |  patternCaml :: patternCaml

    patternMatchCaml:
        patternCaml [when exprCaml]->exprCaml
       {| patternCaml [when exprCaml] ->exprCaml}

    multipleMatchCaml:
          {patternCaml}+ [when exprCaml]->exprCaml

    letBindingCaml:
          patternCaml = exprCaml
       |  value-name {patternCaml}+ [: typeExprCaml] = exprCaml

    infix-op:
          infix-symbol
       |  * | = | or | &

    Now the operators and constructions are presented in accordance with their precedence.
     
    Constructor / operator
    Meaning
    símbolos prefijos,   ! ! is a deferrencing operator (ref objects)
    . .( .[
    Function application
    Constructor application
    - -. (prefix)
    ** Power
    *.  *  /  /.  % mod Product, Division, Residue
    +.  +  -.  - Addition, substraction
    :: List constructors
    @   ^ List and string concatenation
    = <> ==  !=  <  <=  >  >= Structural equality, structural difference, physical equality, physical difference, less than, less or equal, greather, greather or equal
    land lor lxor lsl lsr asr logical ande, logical or, xor, logicl shift to left, logical shift to right, aritmetical shift to right 
    not
    & &&
    or ||
    , Tuple constructor.  A tuple can have up to 2^14 - 1 components
    <- := Label and references assignation
    if
    ; Secuence
    let match fun function try

    An interface or module specification is a sequence of specifications, an specification is the signature of a function or constant, a type definition, an exception definition or an open directive.:

    interCaml:
       {specificationCaml [;;]}

    specificationCaml:
          val value-name:typeExprCaml
       |  typeDefCaml
       |  exceptionDefCaml
       |  directiveCaml

     6.3  Ocaml AST

    Almost every non terminal symbol of the shown grammar can be a Data Structure.

     6.4 Pretty Printer

    Every type of the AST has a function to print a string with the concrete syntax,
    The name of those functions is StringOfT where T is the ADT (e.g. StringOfImplCaml).
     
     6.5 Representations

    Some ManTa ADT have a natural representation in OCaml:
    NAT can be represented with the OCaml type int, BOOL with boolean, Char with char, String with string and List with the list type.  These representations have been written in the files NATExprCaml, BOOLExprCaml, CharExprCaml, StringExprCaml and ListExprCaml.

    In addition to simple representations for the generators of these types, the constant terms (composed only of generators) of types NAT, BOOL, Char and String are converted to constants in Caml (e.g SUC(SUC(SUC(ZERO))) is translated to 3, charOfNAT(65) is translated to 'A')

    The mentioned representations are used along with routines to identify in a ManTa AST the functions that have representation.   These routines are in module RepCaml
     
     6.6 Conversion Routines

    The generators of the ADT are converted to constructors of a variant type.  e.g. The generators of NAT are ZERO:->NATand SUC:NAT->NAT, hence the type in OCaml is:
      type nat=ZERO | SUC of nat ;;
    If an ADT is generic the translated ADT should be parametric.
    For example the following ADT represents a generic binary tree, it can be is empty or it can have a left son, a root and a right son.
      ADT BTtree[X]
        INITIALS
          empty:->BTree
        CONSTRUCTORS
          consBTree:BTree*X*Btree->BTree
    The translation to OCaml produces the following type:
      type 'x btree=Empty | ConsBTree of ('x btree)*'x*('x btree) ;;

    The selectors of an ADT are translated to functions, for example the function to add two natural numbers defined in the ADT NAT
      ADD(ZERO,y)=y
      ADD(SUC(x),y)=SUC(ADD(x,y))
    will be translated to
      let rec add=function
          (ZERO,y) -> y
        | (SUC(x),y) -> SUC(add(x,y))
      ;;

    Hence the left side of every axiom is translated to a pattern match case, and the right side is translated to a Caml Expression.

    A complete ADT must be translated to an interface and an implementation:

    An interface is composed of:
    1. Some open directives to open modules required in the signature of selectors
    2. If the types is not an extension, a type declaration (not definition)
    3. One val for each selector to declare it's type
    An implementation is composed of:
    1. Some open directives to open other modules used
    2. One type definition to define the translations of the generators
    3. One let  or let rec for each selector
    In this way it's possible to translate mutually recursive functions, and mutually recursive types.

    ManTa follows naming conventions taken from Java and Z: types begin with uppercase, and functions with lowercase, including generators for types.   However OCaml has different conventios: Types begin with lowercase, constructors of variante types must begin with uppercase, type parameters, functions and variables begin with lowercase, module's names must begin with uppercase.   The names of the ADTs, functions and generators are altered to follow those conventions.

    The file M2Caml contains the translation routines, this implementation has the following restrictions:

    The main functions of this file are:
     6.7 Translation Program

    All the shown files must be generated in a programming language and then the following routines mut be written in that language:

    Part of the code in Caml showing the main actions done by the translator is presented below:
      let symt=loadSymbols dirADT
      in let adt=loadADT adtName symt
      in let rADT=usedADT adt symt
      in let ocprg=absType_ImplCaml (adt,(lStringOfStringList rADT))
         and ocint=absType_InterCaml adt
      in
      begin
        writeProg outInt (InterCaml.stringOfInterCaml ocint);
        writeProg outImp (ImplCaml.stringOfImplCaml ocprg);
      end

    Denotational Semantics