Commit a56a3472 authored by Dario Pinto's avatar Dario Pinto
Browse files

Merge branch 'master' into 'master'

rm dupe article, add one + 5 assets + update

See merge request OCamlPro/www!56
parents a914b2e7 3dcc0138
title=Better Inlining: Progress Report
As announced [some time ago](optimisations-you-shouldnt-do), I am working on a new intermediate language within the OCaml compiler to improve its inlining strategy. After some time of bug squashing, I prepared a testable version of the patchset, available either on [Github]( (branch `flambda_experiments`), or through OPAM, in the following repository:
opam repo add inlining
opam switch flambda
opam install inlining-benchs
The series of patches is not ready for benchmarking against real applications, as no cross module information is propagated yet (this is more practical for development because it simplifies debugging a lot), but it already works quite well on single-file code. Some very simple benchmark examples are available in the `inlining-benchs` package.
The series of patches implements a set of 'reasonable' compilation passes, that do not try anything too complicated, but combined, generates quite efficient code.
## Current Status
As said in the previous post, I decided to design a new intermediate language to implement better inlining heuristics in the compiler. This intermediate language, called `flambda`, lies between the `lambda` code and the `Clambda` code. It has an explicit representation of closures, making them easier to manipulate, and modules do not appear in it anymore (they have already been compiled to static structures).
I then started to implement new inlining heuristics as functions from the `lambda` code to the `flambda` code. The following features are already present:
* intra function value analysis
* variable rebinding
* dead code elimination (which needs purity analysis)
* known match / if branch elimination
In more detail, the chosen strategy is divided into two passes, which can be described by the following pseudo-code:
if function is at toplevel
then if applied to at least one constant OR small enough
then inline
else if applied to at least one constant AND small enough
then inline
if function is small enough
AND does not contain local function declarations
then inline
The first pass eliminates most functor applications and functions of the kind:
let iter f x =
let rec aux x = ... f ... in
aux x
The second pass eliminates the same kind of functions as Ocaml 4.01, but being after the first pass, it can also inline functions revealed by inlining functors.
## Benchmarks
I ran a few benchmarks to ensure that there were no obvious miscompilations (and there were, but they are now fixed). On benchmarks that were too carefully written there was not much gain, but I got interesting results on some examples: those illustrate quite well the improvements, and can be seen at `$(opam config var lib)/inlining-benchs` (binaries at `$(opam congfig var bin)/bench-*`).
### The Knuth-Bendix Benchmark (single-file)
Performance gains against OCaml 4.01 are around 20%. The main difference is that exceptions are compiled to constants, hence not allocated when raised. In that particular example, this halves the allocations.
In general, constant exceptions can be compiled to constants when predefined (`Not_found`, `Failure`, ...). They cannot yet when user defined: to improve this a few things need to be changed in `` to annotate values created by exceptions.
### The Noiz Benchmark:
Performance gains are around 30% against OCaml 4.01. This code uses a lot of higher order functions of the kind:
let map_triple f (a,b,c) = (f a, f b, f c)
OCaml 4.01 can inline `map_triple` itself but then cannot inline the parameter `f`. Moreover, when writing:
let (x,y,z) = map_triple f (1,2,3)
the tuples are not really used, and after inlining their allocations can be eliminated (thanks to rebinding and dead code elimination)
### The Set Example
Performance gains are around 20% compared to OCaml 4.01. This example shows how inlining can help defunctorization: when inlining the `Set` functor, the provided comparison function can be inlined in `Set.add`, allowing direct calls everywhere.
## Known Bugs
### Recursive Values
A problem may arise in a rare case of recursive values where a field access can be considered to be a constant. Something that would look like (if it were allowed):
type 'a v = { v : 'a }
let rec a = { v = b }
and b = (a.v, a.v)
I have a few solutions, but not sure yet which one is best. This probably won't appear in any normal test. This bug manifests through a segmentation fault (`cmmgen` fails to compile that recursive value reasonably).
### Pattern-Matching
The new passes assume that every identifier is declared only once in a given module, but this assumption can be broken on some rare pattern matching cases. I will have to dig through `` to add a substitution in these cases. (the only non hand-built occurence that I found is in `ocamlnet`)
## Known Mis-compilations
* since there is no cross-module information at the moment, calls to functions from other modules are always slow.
* In some rare cases, there could be functions with more values in their closure, thus resulting in more allocations.
## What's next ?
I would now like to add back cross-module information, and after a bit of cleanup the first series of patches should be ready to propose upstream.
title=Better Inlining: Progress Report
title=Cumulus and ocp-memprof, a love story
authors=Çagdas Bozman
In this blog post, we went on the hunt of memory leaks in Cumulus by using [our memory profiler: ocp-memprof]( Cumulus is a feed aggregator based on [Eliom](, a framework for programming web sites and client/server web applications, part of the [Ocsigen Project](
### First, run and get the memory snapshots
To test and run the server, we use `ocp-memprof` to start the process:
$ ocp-memprof -exec ocsigenserver.opt -c ocsigenserver.opt.conf -v
There are several ways to obtain snapshots:
- automatically after each GC: there is nothing to do, this is the default behavior
- manually:
- by sending a SIGUSR1 signal (the default signal can be changed by using `--signal SIG` option);
- by editing the source code and using the dump function in the `Headump` module:
(* the string argument stands for the name of the dump *)
val dump : string -> unit
Here, we use the default behavior and get a snapshot after every GC.
### The Memory Evolution Graph
After running the server for a long time, the server process shows an unusually high consumption of memory. `ocp-memprof` automatically generates some statistics on the application memory usage. Below, we show the graph of memory consumption. On the x-axis, you can see the number of GCs, and on the y-axis, the memory size in bytes used by the most popular types in memory.
![cumulus evolution with leak](/blog/assets/img/graph_cumulus_evolution_with_leak.png)
Eliom expert users would quickly identify that most of the memory is used by XML nodes and attributes, together with strings and closures.
Unfortunately, it is not that easy to know which parts of Cumulus source code are the cause for the allocations of these XML trees. These trees are indeed abstract types allocated using functions exported by the Eliom modules. The main part of the allocations are then located in the Eliom source code.
Generally, we will have a problem to locate abstract type values just using allocation points. It may be useful to browse the memory graph which can be completely reconstructed from the snapshot to identify all paths between the globals and the blocks representing XML nodes.
### From roots to leaking nodes
The approach that we chose to identify the leak is to take a look at the pointer graph of our application in order to identify the roots retaining a significant portion of the memory. Above, we can observe the table of the retained size, for all roots of the application. What we can tell quickly is that **92.2%** of our memory is retained by values with finalizers.
Below, looking at them more closely, we can state that there is a significant amount of values of type:
[code language="fsharp" gutter="false"]
'a Eliom_comet_base.channel_data Lwt_stream.t -> unit
Probably, these finalizers are never called in order to free their associated values. The leak is not trivial to track down and fix. However, a quick fix is possible in the case of Cumulus.
### Identifying the source code and patching it
After further investigation into the source code of Cumulus, we found the only location where such values are allocated:
(* $ROOT/cumulus/src/base/ *)
let (event , call_event ) =
let ( private_event , call_event ) = React.E. create () in
let event = Eliom_react .Down. of_react private_event in
(event , call_event )
The function `of_react` takes an optional argument `~scope` to specify the way that `Eliom_comet.Channel.create` has to use the communication channel.
Changing the default value of the scope by another given in Eliom module, we have now only one channel and every client use this channel to communicate with the server (the default method created one channel by client).
(* $ROOT/cumulus/src/base/ *)
let (event , call_event ) =
let ( private_event , call_event ) = React.E. create () in
let event = Eliom_react .Down. of_react
~scope : Eliom_common . site_scope private_event in
(event , call_event )let (event , call_event ) =
### Checking the fix
After patching the source code, we recompile our application and re-execute the process as before. Below, we can observe the new pointer graph. By changing the default value of `scope`, the size retained by finalizers drops from **92.2% to 0%** !
The new evolution graph below shows that the memory usage drops from **45Mb (still growing quickly) for a few hundreds connections to 5.2Mb** for thousands connections.
### Conclusion
As a reminder, a finalisation function is a function that will be called with the (heap-allocated) value to which it is associated when that value becomes unreachable.
The GC calls finalisation functions in order to deallocate their associated values. You need to pay special attention when writing such finalisation functions, since anything reachable from the closure of a finalisation function is considered reachable. You also need to be careful not to make the value, that you want to free, become reachable again.
This example is online in our gallery of examples if you want to see and explore the graphs ([with the leak]( and [without the leak](
Do not hesitate to use `ocp-memprof` on your applications. Of course, all feedback and suggestions on using `ocp-memprof` are welcome, just send us a mail !
More information:
- Homepage: [](
- Usage: [](
- Support: [](
- Gallery of examples: [](
- Commercial: [](
......@@ -8,8 +8,6 @@ let old_to_new =
; ("/fr/recrutement-ocamlpro/", "/jobs")
; ( ""
, "/" )
; ( ""
, "/" )
; ( ""
, "/" )
; ( ""
......@@ -159,6 +157,8 @@ let old_to_new =
, "/blog/2014_04_01_the_generic_syntax_extension" )
; ( "/2015/01/29/private-release-of-alt-ergo-1-00/"
, "/blog/2015_01_29_private_release_of_alt_ergo_1_00" )
; ( "/2015/03/04/cumulus-and-ocp-memprof-a-love-story/"
, "/blog/2015_03_04_cumulus_and_ocp_memprof_a_love_story" )
; ( "/2015/04/13/yes-ocp-memprof-scanf/"
, "/blog/2015_04_13_yes_ocp_memprof_scanf" )
; ( "/2015/05/07/opam-1-2-2-released/"
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment