Commit 9c911c12 authored by Thomas Blanc's avatar Thomas Blanc
Browse files

Thomas Blanc's articles (internal links broken)

parent fe5a72a2
title=An in-depth Look at OCaml’s new “Best-fit” Garbage Collector Strategy
authors=Thomas Blanc
tags=ocaml, highlights, GC
[![An in-depth Look at OCaml’s new "Best-fit" Garbage Collector Strategy](/assets/img/Ocaml-search-300x300-1.png)](
The Garbage Collector probably is OCaml’s greatest unsung hero. Its pragmatic approach allows us to allocate without much fear of efficiency loss. In a way, the fact that most OCaml hackers know little about it is a good sign: you want a runtime to gracefully do its job without having to mind it all the time.
But as OCaml 4.10.0 has now hit the shelves, a very exciting feature is [in the changelog](
- #8809, #9292: Add a best-fit allocator for the major heap; still
experimental, it should be much better than current allocation
policies (first-fit and next-fit) for programs with large heaps,
reducing both GC cost and memory usage.
This new best-fit is not (yet) the default; set it explicitly with
OCAMLRUNPARAM="a=2" (or Gc.set from the program). You may also want
to increase the `space_overhead` parameter of the GC (a percentage,
80 by default), for example OCAMLRUNPARAM="o=85", for optimal
(Damien Doligez, review by Stephen Dolan, Jacques-Henri Jourdan,
Xavier Leroy, Leo White)
At OCamlPro, some of the tools that we develop, such as the package manager [opam](, the [Alt-Ergo]( SMT solver or the Flambda optimizer, can be quite demanding in memory usage, so we were curious to better understand the properties of this new allocator.
## Minor heap and Major heap: the GC in a nutshell
Not all values are allocated equal. Some will only be useful for the span of local calculations, some will last as long as the program lives. To handle those two kinds of values, the runtime uses a *Generational Garbage Collector* with two spaces:
* The minor heap uses the [Stop-and-copy]( principle. It is fast but has to stop the computation to perform a full iteration.
* The major heap uses the [Mark-and-sweep]( principle. It has the perk of being incremental and behaves better for long-lived data.
Allocation in the minor heap is straightforward and efficient: values are stored sequentially, and when there is no space anymore, space is emptied, surviving values get allocated in the major heap while dead values are just forgotten for free. However, the major heap is a bit more tricky, since we will have random allocations and deallocations that will eventually produce a scattered memory. This is called [fragmentation](, and this means that you’re using more memory than necessary. Thankfully, the GC has two strategies to counter that problem:
* Compaction: a heavyweight reallocation of everything that will remove those holes in our heap. OCaml’s compactor is cleverly written to work in constant space, and would be worth its own specific article!\
* Free-list Allocation: allocating the newly coming data in the holes (the free-list) in memory, de-scattering it in the process.
Of course, asking the GC to be smarter about how it allocates data makes the GC slower. Coding a good GC is a subtle art: you need to have something smart enough to avoid fragmentation but simple enough to run as fast as possible.
## Where and how to allocate: the 3 strategies
OCaml used to propose 2 free-list allocation strategies: *next-fit*, the default, and *first-fit*. Version 4.10 of OCaml introduces the new *best-fit* strategy. Let’s compare them:
### Next-fit, the original and remaining champion
OCaml’s original (and default) “next-fit” allocating strategy is pretty simple:
1. Keep a (circular) list of every hole in memory ordered by increasing addresses;
1. Have a pointer on an element of that list;
1. When an allocation is needed, if the currently pointed-at hole is big enough, allocate in it;
1. Otherwise, try the next hole and so-on.
This strategy is extremely efficient, but a big hole might be fragmented with very small data while small holes stay unused. In some cases, the GC would trigger costly compactions that would have been avoidable.
### First-fit, the unsuccessful contender
To counteract that problem, the “first-fit” strategy was implemented in 2008 (OCaml 3.11.0):
* Same idea as next-fit, but with an extra allocation table.
* Put the pointer back at the beginning of the list for each allocation.
* Use the allocation table to skip some parts of the list.
Unfortunately, that strategy is slower than the previous one. This is an example of making the GC smarter ends up making it slower. It does, however, reduce fragmentation. It was still useful to have this strategy at hand for the case where compaction would be too costly (on a 100Gb heap, for instance). An application that requires low latency might want to disable compaction and use that strategy.
### Best-fit: a new challenger enters!
This leads us to the brand new “best-fit” strategy. This strategy is actually composite and will have different behaviors depending on the size of the data you’re trying to allocate.
* On small data (up to 32 words), [segregated free lists]( will allow allocation in (mostly) constant time.
* On big data, a general best-fit allocator based on [splay trees](
This allows for the best of the two worlds, as you can easily allocate your numerous small blocks in the small holes in your memory while you take a bit more time to select a good place for your big arrays.
How will best-fit fare? Let’s find out!
## Try it!
First, let us remind you that this is still an experimental feature, which from the OCaml development team means “We’ve tested it thoroughly on different systems, but only for months and not on a scale as large as the whole OCaml ecosystem”.
That being said, we’d advise you don’t use it in production code yet.
### Why you should try it
Making benchmarks of this new strategy could be beneficial for you and the language at large: the dev team is hoping for feedback, the more quality feedback **you** give means the more the future GC will be tuned for your needs.
In 2008, the first-fit strategy was released with the hope of improving memory usage by reducing fragmentation. However, the lack of feedback meant that the developers were not aware that it didn’t meet the users’ needs. If more feedback had been given, it’s possible that work on improving the strategy or on better strategies would have happened sooner.
### Choosing the allocator strategy
Now, there are two ways to control the GC behavior: through the code or through environment variables.
#### First method: Adding instructions in your code
This method should be used by those of us who have code that already does some GC fine-tuning. As early as possible in your program, you want to execute the following lines:
let () =
{ (get()) with
allocation_policy = 2; (* Use the best-fit strategy *)
space_overhead = 100; (* Let the major GC work a bit less since it's more efficient *)
You might also want to add `verbose = 0x400;` or `verbose = 0x404;` in order to get some GC debug information. See [here]( for more details on how to use the `GC` module.
Of course, you’ll need to recompile your code, and this will apply only after the runtime has initialized itself, triggering a compaction in the process. Also, since you might want to easily switch between different allocation policies and overhead specifications, we suggest you use the second method.
#### Second method: setting `$OCAMLRUNPARAM`
At OCamlPro, we develop and maintain a program that any OCaml developer should want to run smoothly. It’s called [Opam](, maybe you’ve heard of it? Though most commands take a few seconds, some [administrative-heavy]( commands can be a strain on our computer. In other words: those are perfect for a benchmark.
Here’s what we did to benchmark Opam:
$ opam update
$ opam switch create 4.10.0
$ opam install opam-devel # or build your own code
$ export OCAMLRUNPARAM='b=1,a=2,o=100,v=0x404'
$ cd my/local/opam-repository
$ perf stat ~/.opam/4.10.0/lib/opam-devel/opam admin check --installability # requires right to execute perf, time can do the trick
If you want to compile and run your own benchmarks, here are a few details on `OCAMLRUNPARAM`:
* `b=1` means “print the backtrace in case of uncaught exception”
* `a=2` means “use best-fit” (default is `0` , first-fit is `1`)
* `o=100` means “do less work” (default is `80`, lower means more work)
* `v=0x404` means “have the gc be verbose” (`0x400` is “print statistics at exit”, 0x4 is “print when changing heap size”)
See the [manual]( for more details on `OCAMLRUNPARAM`
You might want to compare how your code fares on all three different GC strategies (and fiddle a bit with the overhead to find your best configuration).
## Our results on opam
Our contribution in this article is to benchmark `opam` with the different allocation strategies:
<figure><table><thead><tr><td>Strategy:</td><td>Next-fit</td><td>First-fit</td><td colspan="3" scope="colgroup">Best-fit</td></tr><tr><td>Overhead:</td><td>80</td><td>80</td><td>80</td><td>100</td><td>120</td></tr><tr><td>Cycles used (Gcycle)</td><td>2,040</td><td>3,808</td><td>3,372</td><td>2,851</td><td>2,428</td></tr><tr><td>Maximum heap size (kb)</td><td>793,148</td><td>793,148</td><td>689,692</td><td>689,692</td><td>793,148</td></tr><tr><td>User time (s)</td><td>674</td><td>1,350</td><td>1,217</td><td>1,016</td><td>791</td></tr></thead></table></figure>
A quick word on these results. Most of `opam`‘s calculations are done by [dose]( and rely heavily on small interconnected blocks. We don’t really have big chunks of data we want to allocate, so the strategy won’t give us the bonus you might have as it perfectly falls into the best-case scenario of the next-fit strategy. As a matter of fact, for every strategy, we didn’t have a single GC compaction happen. However, Best-fit still allows for a lower memory footprint!
## Conclusions
If your software is highly reliant on memory usage, you should definitely try the new Best-fit strategy and stay tuned on its future development. If your software requires good performance, knowing if your performances are better with Best-fit (and giving feedback on those) might help you in the long run.
The different strategies are:
* Next-fit: generally good and fast, but has very bad worst cases with big heaps.
* First fit: mainly useful for very big heaps that must avoid compaction as much as possible.
* Best-fit: almost the best of both worlds, with a small performance hit for programs that fit well with next-fit.
Remember that whatever works best for you, it’s still better than having to `malloc` and `free` by hand. Happy allocating!
> About OCamlPro
> OCamlPro is a R&D lab founded in 2011, with the mission to help industrial users harness the OCaml state-of-the art programming language.
> We design, create and implement custom ad-hoc software for our clients in state-of-the-art languages (OCaml, Rust…). We also have a long experience in developing and maintaining open-source tooling for OCaml, such as Opam and ocp-indent, and we contribute to the core-development of OCaml, notably with our work on the Flambda optimizer branch. Another area of expertise is that of Formal Methods, with tools such as our SMT Solver Alt-Ergo (check our [Alt-Ergo Users’ Club]( We also provide vocational trainings in OCaml and Rust, and we can build courses on formal methods on-demand. Do not hesitate reach out by email: [](
title=A look back on OCaml since 2011
authors=Thomas Blanc
tags=ocaml, highlights, cheat-sheets
[![A look back on OCaml since 2011](assets/img/ocaml-2011-e1600870731841.jpeg)](
As you already know if you’ve read [our last blogpost](, we have updated our OCaml cheat sheets starting with the language and stdlib ones. We know some of you have students to initiate in September and we wanted these sheets to be ready for the start of the school year! We’re working on more sheets for OCaml tools like opam or Dune and important libraries such as ~~Obj~~ Lwt or Core. Keep an eye on our blog or the [repo on GitHub]( to follow all the updates.
Going through the documentation was a journey to the past: we have looked back on 8 years of evolution of the OCaml language and library. New feature after new feature, OCaml has seen many changes. Needless to say, upgrading our cheat sheets to OCaml 4.08.1 was a trip down memory lane. We wanted to share our throwback experience with you!
## 2011
Fabrice Le Fessant first published our cheat sheets in 2011, the year OCamlPro was created! At the time, OCaml was in its 3.12 version and just [got its current name]( agreed upon. [First-class modules]( were the new big thing, Camlp4 and Camlp5 were battling for the control of the syntax extension world and Godi and Oasis were the packaging rage.
## 2012
Right after 3.12 came the switch to OCaml 4.00 which brought a major change: [GADTs]( (generalized algebraic data types). Most of OCaml’s developers don’t use their almighty typing power, but the possibilities they provide are really helpful in some cases, most notably the format overhaul. They’re also a fun way to troll a beginner asking how to circumvent the typing system on Stack Overflow. Since most of us might lose track of their exact syntax, GADTs deserve their place in the updated sheet (if you happen to be OCamlPro’s CTO, *of course* the writer of this blogpost remembers how to use GADTs at all times).
On the standard library side, the big change was the switch of `Hashtbl` to Murmur 3 and the support for seeded randomization[.](
## 2013
With OCaml 4.01 came [constructor disambiguation](, but there isn’t really a way to add this to the sheet. This feature allows you to avoid misguided usage of polymorphic variants, but that’s a matter of personal taste (there’s a well-known rule that if you refresh the comments section enough times, someone —usually called Daniel— will appear to explain polymorphic variants’ superiority to you). `-ppx` rewriters were introduced in this version as well.
The standard library got a few new functions. Notably, `Printexc.get_callstack` for stack inspection, the optimized application operators `|>` and `@@` and `Format.asprintf`.
## 2014
*Gabriel Scherer, on the Caml-list, end of January:*
> TL;DR: During the six next months, we will follow pull requests (PR) posted on the github mirror of the OCaml distribution, as an alternative to the mantis bugtracker. This experiment hopes to attract more people to participate in the extremely helpful and surprisingly rewarding activity of patch reviews.
Can you guess which change to the cheat-sheets came with 4.02? It’s a universally-loved language feature added in 2014. Still don’t know? It is *exceptional*! Got it?
Drum roll… it is the `match with exception` [construction](! It made our codes simpler, clearer and in some cases more efficient. A message to people who want to improve the language: please aim for that.
This version also added the `{quoted|foo|quoted}` [syntax]( (which broke comments), generative functors, attributes and [extension nodes](, extensible data types, module aliases and, of course, immutable strings (which was optional at the time). Immutable strings is the one feature that prompted us to *remove* a line from the cheat sheets. More space is good. Camlp4 and Labltk moved out of the distribution.
In consequence of immutable strings, `Bytes` and `BytesLabel` were added to the library. For the great pleasure of optimization addicts, `raise_notrace` popped up. Under the hood, the `format` type was re-implemented using GADTs.
## 2015
This release was so big that 4.02.2 feels like a release in itself, with the adding of `nonrec` and `#...` operators.
The standard library was spared by this bug-fix themed release. Note that this is the last comparatively slow year of OCaml as the transition to GitHub would soon make features multiply, as hindsight teaches us.
## 2016
Speaking of a major release, we’re up to OCaml 4.03! It introduced [inline records](, a GADT exhaustiveness check on steroids (with `-> .` to denote unreachability) and standard attributes like `warning`, `inlined`, `unboxed` or `immediate`. Colors appeared in the compiler and last but not least, it was the dawn of a new option called [Flambda](
The library saw a lot of useful new functions coming in: lots of new iterators for `Array`, an `equal` function in most basic type modules, `Uchar`, the `*_ascii` alternatives and, of course, `Ephemeron`.
4.04 was much more restrained, but it was the second release in a single year. Local opening of module with the `M.{}` syntax was added along with the `let exception ...` in construct. `String.split_on_char` was notably added to the stdlib which means we don’t have to rewrite it anymore.
## 2017
We now get to 4.05… which did not change the language. Not that the development team wasn’t busy, OCaml just got better without any change to the syntax.
On the library side however, much happened, with the adding of `*_opt` functions pretty much everywhere. If you’re using the OCaml compiler from [Debian](, this is where you might think the story ends. You’d be wrong…
…because 4.06 added a lot! My own favorite feature from this release has to be user-defined [indexing operators]( This is also when `safe-string` became the default, giving worthwhile work to every late maintainer in the community. This release also added one awesome function in the standard library: `Map.update`.
## 2018
4.07 was aimed towards solidifying the language. It added empty variants and type-based selection of GADT constructors to the mix.
On the library side, one old and two new modules were added, with the integration of `Bigarray`, `Seq` and `Float`.
## 2019
And here we are with 4.08, in the present day! We can now put exceptions under or-patterns, which is the only language change from this release we propagated to the sheet. Time will tell if we need to add custom [binding operators]( or `[@@alert]`. `Pervasives` is now deprecated in profit of `Stdlib` and new modules are popping up (`Int`, `Bool`, `Fun`, `Result`… did we miss one?) while `Sort` made its final deprecation warning.
We did not add 4.09 to this journey to the past, as this release is still solidly in the *now* at the time of this blogpost. Rest assured, we will see much more awesome features in OCaml in the future! In the meantime, we are working on updating more cheat sheets: keep posted!
title=The Opam 2.0 cheatsheet, with a new theme!
authors=Thomas Blanc
tags=opam, documentation, cheat-sheets
[![The Opam 2.0 cheatsheet, with a new theme!](opam-banniere-e1600868011587.png)](
[Earlier](, we dusted-off our Language and Stdlib cheatsheets, for teachers and students. With more time, we managed to design an Opam 2.0 cheat-sheet we are proud of. It is organized into two pages:
* The everyday average Opam use:
* Installation, Configuration, Switches, Allowed URL formats, Packages, Exploring, Package pinning, Working with local pins, Sharing a dev setup, Configuring remotes.
* Peculiar advanced use cases (opam-managed project, publishing, repository maintenance, etc.):
* Package definition files, Some optional fields, Expressions, External dependencies, Publishing, Repository administration.
Moreover, with the help of listings, we tried the use of colors for better readability. And we left some blank space for your own peculiar commands. Two versions are available (PDF):
* The Opam cheatsheet in [black & white](
* The Opam cheatsheet in [colour](
In any case do not hesitate to send us your suggestions on [github](
* Louis and Raja, the lead Opam developers, designed this cheatsheet so as to shed light on some important features (some I even discovered even though I speak daily with them!). If a command *you* find useful is not mentioned, let us know and we’ll add it. Feel free to ask for clarification and/or expansion of the manual!
Happy hacking!
> Note: If you come to one of our [training sessions](, you’ll get a free cheatsheet! Isn’t that a bargain?
title=Updated Cheat Sheets: OCaml Language and OCaml Standard Library
authors=Thomas Blanc
tags=ocaml, documentation, cheat-sheets
In 2011, we shared several cheat sheets for OCaml. Cheat sheets are helpful to refer to, as an overview of the documentation when you are programming, especially when you’re starting in a new language. They are meant to be printed and pinned on your wall, or to be kept in handy on a spare screen. We hope they will help you out when your rubber duck is rubbish at debugging your code!
Since we first shared them, OCaml and its related tools have evolved. We decided to refresh them and started with the two most-used cheat sheets—our own contribution to the start of the school year!
Download the revised version:
- [OCaml Language (lang)]( (PDF)
- [OCaml Standard Library (stdlib)]( (PDF)
You can also find [the sources on GitHub]( We welcome contributions, feel free to send patches if you see room for improvement! We’re working on other cheat sheets: keep an eye on our blog to see updates and brand new cheat sheets.
While we were updating them, we realized how much OCaml had evolved in the last eight years. We’ll tell you everything about our trip down memory lane very soon in another blogpost!
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment