Commit 12ed34e5 authored by Dario Pinto's avatar Dario Pinto
Browse files

add new article + 4 assets + update urls.ml

parent a56a3472
title=Reduced Memory Allocations with ocp-memprof
authors=Çagdas Bozman
date=2015-05-18
category=Tooling
tags=memory,memprof,optimization,profiler
In this blog post, we explain how `ocp-memprof` helped us identify a piece of code in Alt-Ergo that needed to be improved. Simply put, a function that merges two maps was performing a lot of unnecessary allocations, negatively impacting the garbage collector's activity. A simple patch allowed us to prevent these allocations, and thus speed up Alt-Ergo's execution.
### The Story
Il all started with a challenging example coming from an industrial user of [Alt-Ergo](https://alt-ergo.ocamlpro.com/), our SMT solver. It was proven by Alt-Ergo in approximately 70 seconds. This seemed abnormnally long and needed to be investigated. Unfortunately, all our tests with different options (number of triggers, case-split analysis, …) and different plugins (satML plugin, profiling plugin, fm-simplex plugin) of Alt-Ergo failed to improve the resolution time. We then profiled an execution using `ocp-memprof` to understand the memory behavior of this example.
### Profiling an Execution with `ocp-memprof`
As usual, profiling an OCaml application with `ocp-memprof` is very simple (see the [user manual](https://memprof.typerex.org/free-version.php?action=documentation) for more details). We just compiled Alt-Ergo in the OPAM switch for `ocp-memprof` (version `4.01.0+ocp1`) and executed the following command:
```shell-session
$ ocp-memprof -exec ./ae-4.01.0+ocp1-public-without-patch pb-many-GCs.why
```
The execution above triggers 612 garbage collections in about 114 seconds. The analysis of the generated dumps produces the evolution graph below. We notice on the graph that:
- we have approximately 10 MB of hash-tables allocated since the beginning of the execution, which is expected;
- the second most allocated data in the major heap are maps, and they keep growing during the execution of Alt-Ergo.
We are not able to precisely identify the allocation origins of the maps in this graph (maps are generic structures that are intensively used in Alt-Ergo). To investigate further, we wanted to know if some global value was abnormally retaining a lot of memory, or if some (non recursive-terminal) iterator was causing some trouble when applied on huge data structures. For that, we extended the analysis with the `--per-root` option to focus on the memory graph of the last dump. This is done by executing the following command, where 4242 is the PID of the process launched by `ocp-memprof --exec` in the previous command:
```shell-session
$ ocp-memprof -load 4242 -per-root 611
```
![](/blog/assets/img/graph_before_mini.png)
![](/blog/assets/img/screenshot_per_root_before_mini.png)
The per-root graph (above, on the right) gives more interesting information. When expanding the `stack` node and sorting the sixth column in decreasing order, we notice that:
- a bunch of these maps are still in the stack: the item `Map_at_192_offset_1` in the first column means that most of the memory is retained by the `fold` function, at line 192 of the `Map` module (resolution of stack frames is only available in the commercial version of `ocp-memprof`);
- the "kind" column corresponding to `Map_at_192_offset_1` gives better information. It provides the signature of the function passed to fold. This information is already provided by [the online version](https://memprof.typerex.org/).
```cpp
Uf.Make(Uf.??Make.X).LX.t ->;
Explanation.t ->;
Explanation.t Map.Make(Uf.Make(Uf.??Make.X).LX).t ->;
Explanation.t Map.Make(Uf.Make(Uf.??Make.X).LX).t
```
This information allows us to see the precise origin of the allocation: the map of module `LX` used in [uf.ml](https://github.com/OCamlPro/alt-ergo/blob/master/src/theories/uf.ml). Lucky us, there is only one `fold` function of `LX`'s maps in the `Uf` module with the same type.
### Optimizing the Code
Thanks to the information provided by the `--per-root` option, we identified the code responsible for this behavior:
```ocaml
(*** function extracted from module uf.ml ***)
module MapL = Map.Make(LX)
let update_neqs r1 r2 dep env =
let merge_disjoint_maps l1 ex1 mapl =
try
let ex2 = MapL.find l1 mapl in
let ex = Ex.union (Ex.union ex1 ex2) dep in
raise (Inconsistent (ex, cl_extract env))
with Not_found ->;
MapL.add l1 (Ex.union ex1 dep) mapl
in
let nq_r1 = lookup_for_neqs env r1 in
let nq_r2 = lookup_for_neqs env r2 in
let mapl = MapL.fold merge_disjoint_maps nq_r1 nq_r2 in
MapX.add r2 mapl (MapX.add r1 mapl env.neqs)
```
Roughly speaking, the function above retrieves two maps `nq_r1` and `nq_r2` from `env`, and folds on the first one while providing the second map as an accumulator. The local function `merge_disjoint_maps` (passed to fold) raises `Exception.Inconsistent` if the original maps were not disjoint. Otherwise, it adds the current binding (after updating the corresponding value) to the accumulator. Finally, the result `mapl` of the fold is used to update the values of `r1` and `r2` in `env.neqs`.
After further debugging, we observed that one of the maps (`nq_r1` and `nq_r2`) is always empty in our situation. A straightforward fix consists in testing whether one of these two maps is empty. If it is the case, we simply return the other map. Here is the corresponding code:
```ocaml
(*** first patch: testing if one of the maps is empty ***)
let mapl =
if MapL.is_empty nq_r1 then nq_r2
else
if MapL.is_empty nq_r2 then nq_r1
else MapL.fold_merge merge_disjoint_maps nq_r1 nq_r2
```
Of course, a more generic solution should not just test for emptiness, but should fold on the smallest map. In the second patch below, we used a slightly modified version of OCaml's maps that exports the `height` function (implemented in constant time). This way, we always fold on the smallest map while providing the biggest one as an accumulator.
```ocaml
(*** second (better) patch : folding on the smallest map ***)
let small, big =
if MapL.height nq_r1 > MapL.height nq_r2 then nq_r1, nq_r2
else nq_r2, nq_r1
in
let mapl = MapL.fold merge_disjoint_maps small big in
```
### Checking the Efficiency of our Patch
Curious to see the result of the patch ? We regenerate the evolution and memory graphs of the patched code (fix 1), and we noticed:
- a better resolution time: from 69 seconds to 16 seconds;
- less garbage collection : from 53,000 minor collections to 19,000;
- a smaller memory footprint : from 26 MB to 24 MB;
![](/blog/assets/img/graph_after_mini.png)
![](/blog/assets/img/screenshot_per_root_after_mini.png)
### Conclusion
We show in this post that `ocp-memprof` can also be used to optimize your code, not only by decreasing memory usage, but by improving the speed of your application. The interactive graphs are online in our gallery of examples if you want to see and explore them ([without the patch](https://memprof.typerex.org/users/5a198a7f26b9b9d6f402276e16818a66/2015-05-15_15-32-21_48c9e783500e896444f998eb001fff4c_4242/) and [with the patch](https://memprof.typerex.org/users/5a198a7f26b9b9d6f402276e16818a66/2015-05-15_16-13-22_4174baa4b9b5d8845653e04307b010a9_4530/)).
<table class="tableau2">
<tbody>
<tr>
<td></td>
<th>AE</th>
<th>AE + patch</th>
<th>Remarks</th>
</tr>
</tbody>
<tbody>
<tr>
<th>4.01.0</th>
<td>69.1 secs</td>
<td>16.4 secs</td>
<td>substantial improvement on the example</td>
</tr>
<tr>
<th>4.01.0+ocp1</th>
<td>76.3 secs</td>
<td>17.1 secs</td>
<td>when using the patched version of Alt-Ergo</td>
</tr>
<tr>
<th>dumps generation</th>
<td>114.3 secs (+49%)</td>
<td>17.6 secs (+2.8%)</td>
<td>(important) overhead when dumping<br />
memory snapshots</td>
</tr>
<tr>
<th># dumps (major collections)</th>
<td>612 GCs</td>
<td>31 GCs</td>
<td>impressive GC activity without the patch</td>
</tr>
<tr>
<th>dumps analysis<br />
(online ocp-memprof)</th>
<td>759 secs</td>
<td>24.3 secs</td>
<td></td>
</tr>
<tr>
<th>dumps analysis<br />
(commercial ocp-memprof)</th>
<td>153 secs</td>
<td>3.7 secs</td>
<td>analysis with commercial ocp-memprof is<br />
**~ x5 faster** than public version (above)</td>
</tr>
<tr>
<th>AE memory footprint</th>
<td>26 MB</td>
<td>24 MB</td>
<td>memory consumption is comparable</td>
</tr>
<tr>
<th>minor collections</th>
<td>53K</td>
<td>19K</td>
<td>fewer minor GCs thanks to the patch</td>
</tr>
</tbody>
</table>
Do not hesitate to use `ocp-memprof` on your applications. Of course, all feedback and suggestions are welcome, just [email](mailto:contact@ocamlpro.com) us !
More information:
- Homepage: [https://memprof.typerex.org/](https://memprof.typerex.org/)
- Gallery of examples: [https://memprof.typerex.org/gallery.php](https://memprof.typerex.org/gallery.php)
- Free Version: [https://memprof.typerex.org/free-version.php](https://memprof.typerex.org/free-version.php)
- Commercial Version: [https://memprof.typerex.org/commercial-version.php](https://memprof.typerex.org/commercial-version.php)
- Report a Bug: [https://memprof.typerex.org/report-a-bug.php](https://memprof.typerex.org/report-a-bug.php)
......@@ -8,8 +8,6 @@ let old_to_new =
; ("/fr/recrutement-ocamlpro/", "/jobs")
; ( "https://www.ocamlpro.com/pre-inscription-a-une-session-de-formation-inter-entreprises/"
, "/" )
; ( "https://www.ocamlpro.com/2015/05/18/reduced-memory-allocations-with-ocp-memprof/"
, "/" )
; ( "https://www.ocamlpro.com/2016/11/21/release-of-alt-ergo-1-30-with-experimental-support-for-models-generation/"
, "/" )
; ("https://www.ocamlpro.com/2018/01/15/2017-at-ocamlpro-2/", "/")
......@@ -169,6 +167,8 @@ let old_to_new =
, "/blog/2014_07_15_try_alt_ergo_in_your_browser" )
; ( "/2014/07/16/ocamlpro-highlights-may-june-2014/"
, "/blog/2014_07_16_ocamlpro_highlights_may_june_2014" )
; ( "/2015/05/18/reduced-memory-allocations-with-ocp-memprof/"
, "/blog/2015_05_18_reduced_memory_allocations_with_ocp_memprof." )
; ("/2016/04/01/asm-ocaml/", "/blog/2016_04_01_asm_ocaml")
; ( "/2017/02/09/opam-2-0-beta-is-out/"
, "/blog/2017_02_09_opam_2.0_beta_is_out" )
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment