Commit ecf8fab8 authored by Dario Pinto's avatar Dario Pinto
Browse files

add 2 articles + 4 assets + update

parent 832ac67f
title=The Generic Syntax Extension
authors=Çagdas Bozman
OCaml 4.01 with its new feature to disambiguate constructors allows to do a nice trick: a simple and generic syntax extension that allows to define your own syntax without having to write complicated parsetree transformers. We propose an implementation in the form of a ppx rewriter.
it does only a simple transformation: replace strings prefixed by an operator starting with ! by a series of constructor applications
for instance:
!! "hello 3"
is rewriten to
!! (Start (H (E (L (L (O (Space (N3 (End))))))))
How is that generic ? We will present you a few examples.
#### Base 3 Numbers
For instance, if you want to declare base 3 arbitrary big numbers, let’s define a syntax for it. We first start by declaring some types.
type start = Start of p
and p =
| N0 of stop
| N1 of q
| N2 of q
and q =
| N0 of q
| N1 of q
| N2 of q
| Underscore of q
| End
and stop = End
This type will only allow to write strings matching the regexp 0 | (1|2)(0|1|2|_)*. Notice that some constructors appear in multiple types like N0. This is not a problem since constructor desambiguation will choose for us the right one at the right place. Let's now define a few functions to use it:
open Num
let rec convert_p = function
| N0 (End) -> Int 0
| N1 t -> convert_q (Int 1) t
| N2 t -> convert_q (Int 2) t
and convert_q acc = function
| N0 t -> convert_q (acc */ Int 3) t
| N1 t -> convert_q (Int 1 +/ acc */ Int 3) t
| N2 t -> convert_q (Int 2 +/ acc */ Int 3) t
| Underscore t -> convert_q acc t
| End -> acc
let convert (Start p) = convert_p p
# val convert : start -> Num.num = <fun>
And we can now try it:
let n1 = convert (Start (N0 End))
# val n1 : Num.num = <num 0>
let n2 = convert (Start (N1 (Underscore (N0 End))))
# val n2 : Num.num = <num 3>
let n3 = convert (Start (N1 (N2 (N0 End))))
# val n3 : Num.num = <num 15>
And the generic syntax extension allows us to write:
let ( !! ) = convert
let n4 = !! "120_121_000"
val n4 : Num.num = <num 11367>
#### Specialised Format Strings
We can implement specialised format strings for a particular usage. Here, for concision we will restrict to a very small subset of the classical format: the characters %, i, c and space
Let's define the constructors.
type 'a start = Start of 'a a
and 'a a =
| Percent : 'a f -> 'a a
| I : 'a a -> 'a a
| C : 'a a -> 'a a
| Space : 'a a -> 'a a
| End : unit a
and 'a f =
| I : 'a a -> (int -> 'a) f
| C : 'a a -> (char -> 'a) f
| Percent : 'a a -> 'a f
Let's look at the inferred type for some examples:
let (!*) x = x
let v = !* "%i %c";;
# val v : (int -> char -> unit) start = Start (Percent (I (Space (Percent (C End)))))
let v = !* "ici";;
# val v : unit start = Start (I (C (I End)))
This is effectively the types we would like for a format string looking like that. To use it we can define a simple printer:
let rec print (Start cons) =
main cons
and main : type t. t a -> t = function
| I r ->
print_string "i";
main r
| C r ->
print_string "c";
main r
| Space r ->
print_string " ";
main r
| End -> ()
| Percent f ->
format f
and format : type t. t f -> t = function
| I r ->
fun i ->
print_int i;
main r
| C r ->
fun c ->
print_char c;
main r
| Percent r ->
print_string "%";
main r
let (!!) cons = print cons
And voila! `<span class="keyword2">`
let s = !! "%i %c" 1 'c';;
# 1 c
### How generic is it really ?
It may not look like it, but we can do almost any syntax we might want this way. For instance we can do any regular language. To explain how we transform a regular language to a type definition, we will use as an example the language a(a|)b
type start = Start of a
and a =
| A of a';
and a' =
| A of b
| B of stop
and b = B of stop
and stop = End
We can try a few things on it:
let v = Start (A (A (B End)))
# val v : start = Start (A (A (B End)))
let v = Start (A (B End))
# val v : start = Start (A (B End))
let v = Start (B End)
# Characters 15-16:
# let v = Start (B End);;
# ^
# Error: The variant type a has no constructor B
let v = Start (A (A (A (B End))))
# Characters 21-22:
# let v = Start (A (A (A (B End))));;
# ^
# Error: The variant type b has no constructor A
Assumes the language is given as an automaton that:
- has 4 states, a, a', b and stop
- with initial state a
- with final state stop
- with transitions: a - A -> a' a' - A -> b a' - B -> stop b - B -> stop
let's write {c} for the constructor corresponding to the character c and
for the type corresponding to a state of the automaton.
- For each state q we have a type declaration [q]
- For each letter a of the alphabet we have a constructor {a}
- For each transition p - l -> q we have a constructor {l} with parameter [q] in type [p]:
type [p] = {l} of [q]
- The End constructor without any parameter must be present in any final state
- The initial state e is declared by
type start = Start of [e]
### Yet more generic
In fact we can encode deterministic context free languages (DCFL) also. To do that we encode pushdown automatons. Here we will only give a small example: the language of well parenthesized words
type empty
type 'a r = Dummy
type _ q =
| End : empty q
| Rparen : 'a q -> 'a r q
| Lparen : 'a r q -> 'a q
type start = Start of empty q
let !! x = x
let m = ! ""
let m = ! "()"
let m = ! "((())())()"
To encode the stack, we use the type parameters: Lparen pushes an r to the stack, Rparen consumes it and End checks that the stack is effectively empty.
There are a few more tricks needed to encode tests on the top value in the stack, and a conversion of a grammar to Greibach normal form to allow this encoding.
### We can go even further
#### a^n b^n c^n
In fact we don't need to restrict to DCFL, we can for instance encode the a^n.b^n.c^n language which is not context free:
type zero
type 'a s = Succ
type (_,_) p =
| End : (zero,zero) p
| A : ('b s, 'c s) p -> ('b, 'c) p
| B : ('b, 'c s) q -> ('b s, 'c s) p
and (_,_) q =
| B : ('b, 'c) q -> ('b s, 'c) q
| C : 'c r -> (zero, 'c s) q
and _ r =
| End : zero r
| C : 'c r -> 'c s r
type start = Start of (zero,zero) p
let v = Start (A (B (C End)))
let v = Start (A (A (B (B (C (C End))))))
#### Non recursive languages
We can also encode solutions of Post Correspondance Problems (PCP), which are not recursive languages:
Suppose we have two alphabets A = { X, Y, Z } et O = { a, b } and two morphisms m1 and m2 from A to O* defined as
- m1(X) = a, m1(Y) = ab, m1(Z) = bba
- m2(X) = baa, m2(Y) = aa, m2(Z) = bb
Solutions of this instance of PCP are words such that their images by m1 and m2 are equal. for instance ZYZX is a solution: both images are bbaabbbaa. The language of solution can be represented by this type declaration:
type empty
type 'a a = Dummy
type 'a b = Dummy
type (_,_) z =
| X : ('t1, 't2) s -> ('t1 a, 't2 b a a) z
| Y : ('t1, 't2) s -> ('t1 a b, 't2 a a) z
| Z : ('t1, 't2) s -> ('t1 b b a, 't2 b b) z
and (_,_) s =
| End : (empty,empty) s
| X : ('t1, 't2) s -> ('t1 a, 't2 b a a) s
| Y : ('t1, 't2) s -> ('t1 a b, 't2 a a) s
| Z : ('t1, 't2) s -> ('t1 b b a, 't2 b b) s
type start = Start : ('a, 'a) z -> start
let v = X (Z (Y (Z End)))
let r = Start (X (Z (Y (Z End))))
### Open question
Can every context free language (not deterministic) be represented like that ? Notice that the classical example of the palindrome can be represented (proof let to the reader).
### Conclusion
So we have a nice extension available that allows you to define a new syntax by merely declaring a type. The code is available on [github]( We are waiting for the nice syntax you will invent !
PS: Their may remain a small problem... If inadvertently you mistype something you may find some quite complicated type errors attacking you like a pyranha instead of a syntax error.
title=OCamlPro Highlights: April 2014
authors=Çagdas Bozman
tags=highlights,namespaces,weather,source code
Here is a short report on some of our activities in April 2014, and a short analysis of OCaml evolution since its first release.
### OPAM Improvements
We're still working on release 1.2. It was decided to include quite a few new features in this release, which delayed it a little bit since we want to be sure to get it right. It's now getting stabilized, documented and tested. One of the biggest improvements concerns the development workflow and the use of pinned packages, which is a powerful and complex feature that could also get a bit confusing. We are grateful for the large amount of feedback from the community that helped in its design. The basic idea is to use OPAM metadata from within the source packages, because it's most useful while developping and helps get the packaging right. It was possible before, but a little bit awkward : you now only need to provide an `opam` file or directory at the root of your project, and when pinned to either a local path or a version-controlled repository, opam will pick it up and use it. It will then be synchronized on any subsequent `opam update`. You can even do this if there is no corresponding package in the repository, OPAM will create it and store it in its internal repository for you. And in case this metadata is getting in the way, or you just want a quick local fix, you can always do `opam pin edit <package>` to locally change the metadata used by opam.
During this month, we've also been improving performance by a large amount in several areas, because delays could become noticeable for people using it on eg. raspberry pis. There is an important clarification on the [handling of optional dependencies](; and we worked hard on making the build of OPAM as painless as possible on every possible setting.
### OPAM Weather Service
Last month, we presented an [online service]( for OPAM, to provide advanced CUDF solvers to every OPAM user. The service is provided by [IRILL](, and based on the tools they implemented to manipulate CUDF files (some of them are also used directly in OPAM).
This month, we are happy to introduce a new service, that we helped them put online: the [OPAM Weather Service](, an instantiation for OPAM of a [service]( they also provide for Debian. It shows the evolution of the installability of all packages in the official OPAM repository, for [three stable versions]( of OCaml (3.12.1, 4.00.1 and 4.01.0). It should help maintainers track dependency problems with their packages, when old packages are removed or new conflicting dependencies are introduced.
### An Internship on OCaml Namespaces
This month, we welcomed Pierrick Couderc for an internship in our lab. He is going to work on adding namespaces to OCaml. His goal is to design a kind of namespaces that extend the current module mechanism in a consistent but powerful way. One challenge of his job will be to make these namespaces also extend our [big functors]( to provide functors at the namespace level.
Pierrick is not a complete newcomer in our team: last year, he already worked for us with David Maison (now working at TrustInSoft) on an online service to [edit and compile]( OCaml code for students.
### The Evolution of OCaml Sources
This month, there was also a lot of activity for the Core team, as we are closing to the feature freeze for OCaml 4.02. We took this opportunity to have a look at the evolution of OCaml sources since the first release of OCaml 1.00, in 1996.
Our first graph plots the size of uncompressed OCaml sources in bytes, from the first release to the current trunk:
The graph shows four interesting events:
- in 2002-2003, between 3.02 and 3.06, an increase of 4 MB
- in 2007, between 3.09.3 and 3.10.0, an increase of again 4 MB
- in 2013, between 4.00.1 and 4.01.0, an increase of 2 MB
- in 2014, between 4.01.0 and 4.02.0, a decrease of 6 MB
Our second graph plots the number of files per kind (OCaml sources, OCaml interfaces, C sources and C headers):
We can now check the files that were added and removed at the four events that we noticed on the first graph:
- the first event corresponds to the addition of 174 files for `camlp4` in 3.04, and then 70 files for `ocamldoc` in 3.06. Also, `labltk` increased a lot, with many new examples;
- the second event corresponds to the addition of 225 files for `ocamlbuild` in 3.10.0, and the replacement of `camlp4` (renamed into `camlp5`) by a new implementation;
- the third event corresponds to ... a change in the size of `boot/myocamlbuild.boot`, the bytecode file used by `ocamlbuild` to bootstrap itself !
- finally, the incoming new event corresponds to the removal of `camlp4` and `labltk` from 4.02, i.e. about 300 files for each of them.
Our third graph shows the number of lines per kind of file, again:
This graph does not show us much more than what we have seen by number of files, but what might be interesting is to compute the ratio, i.e. the number of lines per file, for each kind of file:
There is a general trend to increase the number of lines per file, from about 200 lines in an OCaml source file in 1996 to about 330 lines in 2014. This ratio increased considerably for release 3.04, because `camlp4` used to generate a huge bootstrap file of its own pre-preprocessed OCaml sources. More interestingly, the ratio didn't decrease in 2014, when `camlp4` was removed from the distribution ! Interface files also grew bigger, but most of the increase was in 3.06, when `ocamldoc` was added to the distribution, and an effort was done to document `mli` files.
......@@ -8,8 +8,6 @@ let old_to_new =
; ("/fr/recrutement-ocamlpro/", "/jobs")
; ( ""
, "/" )
; ("", "/")
; ("", "/")
; ("", "/")
; ( ""
, "/" )
......@@ -162,10 +160,14 @@ let old_to_new =
, "/blog/2014_02_05_ocamlpro_highlights_dec_2013_jan_2014" )
; ( "/2014/03/05/ocamlpro-highlights-feb-2014/"
, "/blog/2014_03_05_ocamlpro_highlights_feb_2014" )
; ( "/2014/04/01/the-generic-syntax-extension/"
, "/blog/2014_04_01_the_generic_syntax_extension" )
; ( "/2015/04/13/yes-ocp-memprof-scanf/"
, "/blog/2015_04_13_yes_ocp_memprof_scanf" )
; ( "/2015/05/07/opam-1-2-2-released/"
, "/blog/2015_05_07_opam_1.2.2_released" )
; ( "/2014/05/20/ocamlpro-highlights-april-2014/"
, "/blog/2014_05_20_ocamlpro_highlights_april_2014" )
; ("/2016/04/01/asm-ocaml/", "/blog/2016_04_01_asm_ocaml")
; ( "/2017/02/09/opam-2-0-beta-is-out/"
, "/blog/2017_02_09_opam_2.0_beta_is_out" )
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment