# Sundials/ML 2.7.0p0

Sundials is a collection of six numeric solvers: CVODE, CVODES, IDA, IDAS, ARKODE, and KINSOL. It is written by Carol S. Woodward, Daniel R. Reynolds, Alan C. Hindmarsh, and Lawrence E. Banks at the Center for Applied Scientific Computing, Lawrence Livermore National Laboratory with significant contributions from Radu Serban, and contributions from Peter N. Brown, Scott Cohen, Aaron Collier, Keith E. Grant, Steven L. Lee, Cosmin Petra, Dan Shumaker, and Allan G. Taylor.

This OCaml interface was written by Timothy Bourke (Inria/ENS), Jun Inoue (AIST), and Marc Pouzet (UPMC/ENS/Inria). It provides an OCaml interface to Sundials version 2.7.0. The only features missing are the Hypre ParVector and PETSC nvector modules (contact us if you need these).

The source code is available under a New BSD license at git@github.com:inria-parkas/sundialsml.git. Feedback, bug reports, and pull requests are welcome. Support requests should be made to the (public) OCaml mailing list.

We presented Sundials/ML at the 2016 The OCaml Users and Developers Workshop. A brief technical summary is included in the proceedings of the 2016 ACM Workshop on ML.

## Overview

The structure of this interface mostly follows that of the original library, both for ease of reading the official documentation and for converting existing source code, but several changes have been made for programming convenience and to increase safety, namely:

• solver sessions are mostly configured via algebraic data types rather than multiple function calls;
• errors are signalled by exceptions, not return codes (also from user-supplied callback routines);
• user data is shared between callback routines via closures (partial applications of functions);
• vectors are checked for compatibility (using a combination of static and dynamic checks); and
• explicit free commands are not necessary since OCaml is a garbage-collected language.
Functions have been renamed according to a regular scheme. Leading module identifiers are replaced by module paths, words beginning with an uppercase letter are separated by underscores and put into lowercase. For instance, IdaSetErrHandlerFn, becomes Ida.set_err_handler_fn, and CVSpilsSetJacTimesVecFn becomes Cvode.Spils.set_jac_times_vec_fn.

Constants are replaced by variant types in most cases. They are renamed by conversion to CamlCase and the removal of underscores. For instance, PREC_NONE becomes Spils.PrecNone. Exception names are sometimes renamed for consistency and to make them more self explanatory. For instance, the return codes CV_FIRST_RHSFUNC_ERR and IDA_FIRST_RES_FAIL become, respectively, the exceptions Cvode.FirstRhsFuncFailure and Ida.FirstResFuncFailure, and CV_BAD_IS becomes Cvodes.Sensitivity.BadSensIdentifier.

Rather than try to duplicate or replace the comprehensive Sundials user manuals, this documentation provides brief summaries, adapted from the manual, with hyperlinks back to the original text whenever possible (the original HTML documentation is not currently being updated).

### Nvectors

Sundials defines an abstract interface for vectors and provides default serial, parallel, Pthreads, and OpenMP instantiations. The OCaml interface defines likewise a generic Nvector.t type whose type arguments indicate the underlying data and kind—the latter may be Nvector_serial.kind, Nvector_parallel.kind, Nvector_pthreads.kind, Nvector_openmp.kind, or Nvector_custom.kind. The Nvector.unwrap function gives direct access to the underlying data.

The interface to serial nvectors, Nvector_serial, is based on Bigarrays. These arrays are manipulated directly, i.e., with no additional overhead, within the solver by the original low-level serial nvector operations (see Nvector.NVECTOR_OPS). The same low-level operations can be called from OCaml (Nvector_serial.Ops), as can equivalent reimplementations in OCaml on the underlying data (Nvector_serial.DataOps).

The interface to parallel nvectors, Nvector_parallel, is based on Bigarrays and the OCamlMPI library. Parallel nvectors are only available when Sundials/ML is configured to use MPI, as described below.

The underlying operations of Pthreads nvectors, Nvector_pthreads, and OpenMP nvectors, Nvector_openmp, are implemented using multiple threads. These nvectors can be used anywhere that Serial nvectors can (except for the operations in Nvector_serial.Ops).

Besides these four standard implementations, it is also possible to define new nvector implementations through Nvector_custom by providing low-level operations on an underlying datatype. A demonstration of this feature on float arrays is provided in Nvector_array. Custom nvectors suffer two disadvantages compared to the standard nvector implementations. First, each low-level operation incurs the cost of a callback into OCaml. Second, of all the provided linear solvers, only Cvode.Diag can be used; although custom solvers can be implemented via Cvode.Alternate, Ida.Alternate, and Kinsol.Alternate.

### Linear Solvers

Nonlinear algebraic systems occur optionally in the solution of ODE initial value problems with Cvode, invariably when solving DAE initial value problems with Ida, and when solving implicit problems or problems involving a mass matrix with Arkode, and directly in the problems treated by Kinsol. Such systems are solved using some form of Newton iteration which in turn requires the solution of linear equations.

Sundials provides six options for the solution of linear equations:

• The diagonal approximation of Jacobians by difference equations (only for Cvode);
• Direct Linear Solvers (DLS) requiring user-supplied callback functions that explicitly compute a Jacobian;
• Scaled Preconditioned Iterative Linear Solvers (SPILS) requiring user-supplied callback functions to setup and solve linear preconditioning systems;
• The SuperLU_MT sparse-direct linear solver requiring a user-supplied callback function that computes a sparse Jacobian;
• The KLU sparse-direct linear solver also requiring a user-supplied callback function that computes a sparse Jacobian;
• Alternate linear solvers providing hooks for implementing new linear solver modules (in OCaml).

The DLS routines are only available to sessions that use serial, Pthreads, or OpenMP nvectors. Callback functions can either update the Jacobian matrix as a Dls.DenseMatrix.t or a Dls.BandMatrix.t. Access to the underlying solver routines on bigarrays is provided via Dls.ArrayDenseMatrix and Dls.ArrayBandMatrix.

The SPILS routines include the Scaled Preconditioned GMRES (SPGMR), Scaled Preconditioned Bi-CGStab (SPBCG), Scaled Preconditioned TFQMR (SPTFQMR), Scaled Preconditioned Flexible GMRES (SPFMGR), and Preconditioned Conjugate Gradient (PCG) methods. Additionally, Cvode provides banded preconditioners for sessions that use serial, Pthreads, or OpenMP nvectors. Access to the underlying solver routines on bigarrays is provided via the submodules of Spils. Parallel Band-Block-Diagonal (BBD) preconditioners are available to sessions that use parallel nvectors—see Cvode_bbd, Cvodes_bbd, Ida_bbd, Idas_bbd, and Kinsol_bbd.

The SuperLU_MT and KLU routines are only available to sessions that use serial, Pthreads, or OpenMP nvectors. Callback functions update the Jacobian matrix as an Sls.SparseMatrix.t.

Each solver module has a distinct linear solver type, e.g., Cvode.linear_solver or Cvodes.Adjoint.linear_solver. As for nvectors, these types are parameterised by data and kind arguments. Values in these types are constructed by passing parameters to functions like Cvode.Dls.band or Ida.Spils.spgmr. They are then passed to the appropriate init or reinit functions to configure sessions.

## API Reference

 Sundials Generic definitions, arrays, and utility functions.

 Dls Direct Linear Solvers. Spils Scaled Preconditioned Iterative Linear Solvers routines. Sls Sparse Linear Solvers.

 Nvector Generic nvector types and operations. Nvector_serial The standard serial nvectors of Sundials. Nvector_parallel The standard parallel nvectors of Sundials (requires MPI). Nvector_pthreads The Pthreads nvectors of Sundials (requires pthreads). Nvector_openmp The OpenMP nvectors of Sundials (requires OpenMP). Nvector_custom An interface for creating custom nvectors in OCaml. Nvector_array A custom nvector based on float arrays.

 Cvode Variable-step solution of ODE initial value problems with zero-crossing detection. Cvode_bbd Parallel band-block-diagonal preconditioners for CVODE (requires MPI). Cvodes Sensitivity analysis (forward and adjoint) and quadrature equations for CVODE. Cvodes_bbd Parallel band-block-diagonal preconditioners for CVODES (requires MPI).

 Ida Variable-step solution of DAE initial value problems with zero-crossing detection. Ida_bbd Parallel band-block-diagonal preconditioners for IDA (requires MPI). Idas Sensitivity analysis (forward and adjoint) and quadrature equations for IDA. Idas_bbd Parallel band-block-diagonal preconditioners for IDAS (requires MPI).

 Arkode Adaptive-step time integration for stiff, nonstiff, and multi-rate systems of ODE initial value problems with zero-crossing detection. Arkode_bbd Parallel band-block-diagonal preconditioners for ARKODE (requires MPI).

 Kinsol Solves nonlinear systems using Newton-Krylov techniques. Kinsol_bbd Parallel band-block-diagonal preconditioners for KINSOL (requires MPI).

## Installation

The dependencies of Sundials/ML are

• OCaml 3.12.1 or greater,
• Sundials 2.7.0 (compiles with 2.5.0 onwards with less features),
• Optionally: OCamlMPI 1.01.
Sundials must be compiled with 64-bit floats (the default: --with-precision=double) and the C compiler must provide 32-bit ints.

Building the extra features of Sundials requires the installation of dependencies and the right cmake incantation. In particular, it can be tricky to get the optional SuperLU/MT library to work. If Sundials cmake claims that SuperLU_MT is not functional, this patch may help.

For Debian-based systems, we found the following worked:

1. apt-get install cmake liblapack-dev libopenmpi-dev openmpi-bin libsuitesparse-dev
2. Optionally download and build SuperLU/MT as described above, since the libsuperlu-dev package does not provide the shared library file.
3. mkdir build; cd build
4. cmake -Wno-dev ../sundials-2.7.0 \
-DCMAKE_BUILD_TYPE=Release \
-DOPENMP_ENABLE=1 \
-DMPI_ENABLE=1 \
-DLAPACK_ENABLE=1 -DLAPACK_LIBRARIES='-llapack -lblas' \
-DKLU_ENABLE=1 -DKLU_LIBRARY_DIR=/usr/lib/x86_64-linux-gnu \
-DKLU_INCLUDE_DIR=/usr/include/suitesparse
-DSUPERLUMT_ENABLE=1 \
-DSUPERLUMT_LIBRARY_DIR=<full-path-to>/SuperLU_MT_3.0/lib \
-DSUPERLUMT_INCLUDE_DIR=<full-path-to>/SuperLU_MT_3.0/SRC \
-DSUPERLUMT_LIBRARIES=-lblas
5. Configure Sundials/ML with
./configure SUPERLUMT_LIBRARY_DIR=<full-path-to>/SuperLU_MT_3.0/lib
Note that SUPERLUMT_LIBRARY_DIR must be registered with ld:
export LD_LIBRARY_PATH=<full-path-to>/SuperLU_MT_3.0/lib:$LD_LIBRARY_PATH For OS X, we found the following worked: 1. Optionally build and install suite-space from the sources, since the brew package does not include dynamic libraries. 2. Optionally download and build SuperLU/MT as described above. 3. mkdir build; cd build 4. For OpenMP, the gcc compiler is required: cmake -Wno-dev ../sundials-2.7.0 \ -DCMAKE_BUILD_TYPE=Release \ -DCMAKE_C_COMPILER=/usr/local/bin/gcc-5 \ -DOPENMP_ENABLE=1 \ -DPTHREAD_ENABLE=1 \ -DMPI_ENABLE=1 \ -DLAPACK_ENABLE=1 \ -DKLU_ENABLE=1 -DKLU_LIBRARY_DIR=/usr/local/lib -DKLU_INCLUDE_DIR=/usr/local/include adding, if necessary, -DSUPERLUMT_ENABLE=1 \ -DSUPERLUMT_LIBRARY_DIR=<full-path-to>/SuperLU_MT_3.0/lib \ -DSUPERLUMT_INCLUDE_DIR=<full-path-to>/SuperLU_MT_3.0/SRC 5. Configure Sundials/ML with ./configure SUPERLUMT_LIBRARY_DIR=<full-path-to>/SuperLU_MT_3.0/lib The configure script detects and automatically enables optional features. Lapack solvers, like Cvode.Dls.lapack_dense, are enabled if Sundials was built with lapack support (see also Sundials.lapack_enabled). The KLU and SuperLU_MT solvers, and the Pthreads and OpenMP nvectors are only enabled if Sundials was built with them. Parallel nvectors and Band-Block-Diagonal (BBD) solvers are only enabled if Sundials was built with them and OCamlMPI is available. Under older versions of Sundials (2.5.0), the comparison of examples will only function correctly if the Sundials sources are patched and configured with --enable-examples and --enable-shared. ### Opam The easiest way to use the library is to install Sundials using a package manager and the library itself using Opam. Note, however that the Debian/Ubuntu and Brew packages are out-of-date, they install Sundials 2.5.0 (at the time of writing). Sundials/ML will function correctly but with less features. 1. Download and manually install Sundials, or use a package manager: • Debian/Ubuntu (version 2.5.0 without parallelism): apt-get install libsundials-serial-dev • Fedora: dnf install lapack-devel sundials-devel sundials-threads-devel sundials-openmpi-devel • Mac OS X (version 2.7.0 without parallelism): brew install sundials / port install sundials 2. Optionally run opam install mpi. 3. Run opam install sundialsml. ### From source Building from source is a three step process: 1. Download and manually install Sundials, or use a package manager: • Debian/Ubuntu (without parallelism): apt-get install libsundials-serial-dev • Mac OS X: brew install sundials / port install sundials 2. Run configure to find and check dependencies. 3. Run make install or make install-findlib to build and install the library. The choices made by the configure script can be influenced by arguments (like --prefix=...) and variables (like OCAMLROOT=...). Type configure --help for detailed information. OCaml reimplementations of the standard Sundials examples are provided in the examples/ subdirectory. The library's behaviour can be tested via make tests.opt.log which runs the OCaml versions and compares their outputs against those of the original C versions: they should be identical. The library's performance can be analyzed via make perf.opt.log which produces the graph explained below. ## Running programs ### Compiling and linking Programs are compiled by specifying where Sundials/ML is installed, e.g., • -I +sundialsml, • or -I opam config var lib/sundialsml, • or ocamlfind ... -package sundialsml, and including bigarray.cma and sundials.cma, for example: ocamlc -o myprog.byte -I +sundialsml bigarray.cma sundials.cma myprog.ml or the .cmxa versions: ocamlopt -o myprog.opt -I +sundialsml bigarray.cmxa sundials.cmxa myprog.ml The sundials.cma/.cmxa files link against the libraries libsundials_cvodes and libsundials_idas. The code in these libraries should give the same results as that in those without sensitivity analysis (except for the functions Cvode.get_work_space and Ida.get_work_space), even though they are compiled from distinct source files. The sundials_no_sens.cma/cmxa files, on the other hand, link against the libraries libsundials_cvode and libsundials_ida and thus do not include the functionality in Cvodes or Idas. Both sets of files link against libsundials_kinsol, libsundials_arkode, and libsundials_nvecserial. The parallel features—in the Nvector_parallel, Cvode_bbd, Cvodes_bbd, Ida_bbd, Idas_bbd, Kinsol_bbd, and Arkode_bbd modules—require the additional inclusions of mpi.cma and sundials_mpi.cma. So, for example: ocamlc -o myprog.byte -I +sundialsml bigarray.cma mpi.cma sundials.cma \ sundials_mpi.cma myprog.ml or with the .cmxa versions: ocamlopt -o myprog.opt -I +sundialsml bigarray.cmxa mpi.cmxa sundials.cmxa \ sundials_mpi.cmxa myprog.ml The sundials_mpi.cm(x)a files link against the libsundials_nvecparallel library. The Nvector_openmp and Nvector_pthreads modules require the additional inclusion, respectively, of sundials_openmp.cm(x)a and sundials_pthreads.cm(x)a. Under ocamlfind, the parallel, OpenMP, and Pthreads features are selected via subpackages, and the use of the libraries without sensitivity analysis via a predicate. For example, for everything: ocamlfind ocamlopt -package sundialsml.mpi,sundials.pthreads,sundials.openmp \ -linkpkg -o mysim.opt mysim.ml The available packages and the features they select are: • sundialsml: basic features; add -predicates no_sens to disable sensitivity, • sundialsml.mpi: additionally include MPI-based parallel nvectors, • sundialsml.openmp: additionally include OpenMP nvectors, • sundialsml.pthreads: additionally include Pthreads nvectors. ### From the toplevel Sundials/ML can also be used from the OCaml interactive loop, either by an invocation like: ocaml bigarray.cma -I +sundialsml sundials.cma sundials_openmp.cma or through ocamlfind, for example: #use "topfind";; #predicates "no_sens";; (* optional—excludes sensitivity code *) #require "sundialsml";; let f t y yd = yd.{0} <- 1.;; let g t y gout = gout.{0} <- y.{0};; let y = Sundials.RealArray.of_array [| -1.0 |];; let yvec = Nvector_serial.wrap y;; let s = Cvode.(init Adams Functional default_tolerances f ~roots:(1, g) 0. yvec);; Cvode.set_stop_time s 2.;; (* repeat the commands below to advance the simulation until t = 2.0 *) let (t', result) = Cvode.solve_normal s 2. yvec;; Format.printf "%e: %a\n" t' Sundials.RealArray.pp y;; Using MPI from a toplevel is best done with ocamlfind by first creating a custom toplevel: ocamlfind ocamlmktop -o partop -package sundialsml.mpi,findlib -linkpkg and then launching it in multiple terminals: mpirun -np 2 xterm -e ./partop Here, 2 is the number of processes, xterm is the terminal program, and -e ./partop has each xterm execute ./partop. As a simple test, paste the following into all terminals: #use "topfind";; #require "sundialsml.mpi";; let comm = Mpi.comm_world let n = Mpi.comm_size comm let my_id = Mpi.comm_rank comm let pv = Nvector_parallel.make 1 n comm (float_of_int (my_id + 1));; Printf.printf "%d: local=%f.\n" my_id (Nvector_parallel.local_array pv).{0};; Printf.printf "Sum of abs. = %f\n" (Nvector_parallel.Ops.n_vl1norm pv);;  ### Solutions to common problems 1. The message Fatal error: cannot load shared library dllmlsundials can usually be fixed by updating LD_LIBRARY_PATH, for example, export LD_LIBRARY_PATH=/usr/local/lib:${LD_LIBRARY_PATH}
Otherwise you may have compiled Sundials without --enable-shared.
2. The configuration warning
Couldn't determine C compiler flag for OpenMP.
can usually be eliminated by specifying a compiler that supports OpenMP, for example,
CC=gcc-5 ./configure

## Performance

An interface like Sundials/ML inevitably adds overhead: there is extra code to execute at each call. But, how significant is this cost? And, more broadly, how does the performance of OCaml compare to that of C for programs that use numeric solvers?

These questions are not easy to answer. As a first attempt, we took the examples in C from the Sundials distribution, reimplemented them in OCaml and compared the execution times. The bars in the graph below show the ratios of the execution times of the OCaml code to the C code, i.e., a value of 2 on the left axis means that OCaml is twice as slow. The black dots indicate, against the right axis, the execution time of the C code.

The graph shows that the OCaml examles are only sometimes more than 50% slower than the original ones. The *_custom example (light blue) uses custom nvectors with low-level operations in OCaml and the *_alt examples (darker blue) use linear solvers implemented in OCaml.

This conclusion seems reasonable as a first approximation, but several details of the analysis process and individual results show that the real situation is less clear-cut. For one, the running times of most of the examples are so short that accurate profiling is not possible, i.e., time and gprof simply show 0 seconds. The figures in the graph were obtained by modifying the examples to repeatedly execute their main functions; we trigger a Gc.compact after each execution since the C program must call malloc and free. The number of repetitions varies per example since otherwise the slower examples take too long. The timings indicated by the dots and the axis at right are calculated by dividing the wall-clock time of each C version by the number of repetitions. All but six of the serial examples (red) run so fast that comparisons are made based on tens, or usually hundreds of repetitions and in some cases this amplifies factors other than the interface overhead. The slowest example, for instance, kin--ser--kinRoberts_fp is iterated more than 185 000 times to achieve a significant wall clock time. This means creating and destroying many more data structures than usual for such code. Running the example without explicit compacts and with a larger minor heap (OCAMLRUNPARAM=s=128M) decreases the OCaml/C ratio to 2.6.

The parallel examples (lighter red) all have relatively long run times and results are obtained without iterating. Their OCaml/C ratios are almost all close to 1—the interface and other overheads are small compared to the dominating costs of parallelization and communication.

We were able to make our OCaml versions much faster (up to 4 times) by:

• Adding explicit type annotations to all vector arguments. For instance, rather than declare a callback with
 let f t y yd = ... ,
it is better to use
 let f t (y : Sundials.RealArray.t) (yd : Sundials.RealArray.t) = ... ,
or more concisely
 let f : Sundials.RealArray.t Cvode.rhsfn = fun t y yd -> ...
since then the compiler need not generate polymorphic code and can optimize for the bigarray layout.
• Avoid functions like Bigarray.Array1.sub and Bigarray.Array2.slice_left. These functions allocate new bigarrays on the major heap, which increases the frequency of major GCs. They can often be avoided by calculating and passing indices more explicitly. When part of an array must be passed to another function, it can be faster, depending on the size, to copy into and out of a statically-allocated temporary array.
• Sequences of Sundials.RealArray2.get and Sundials.RealArray2.set operations are usually better replaced by Sundials.RealArray2.unwrap (projection from a tuple) and direct accesses to the underlying array.
• Write numeric expressions and loops according to the advice in [2] to avoid float ‘boxing’.

In summary, OCaml code using the Sundials solvers should almost never be more than 50% slower than the equivalent code written in C, provided the guidelines above are followed, and it should usually not be more than 30% slower. It is usually, however, faster to write and debug OCaml code thanks to automatic memory management, bounds checking on arrays, strong static type checking, higher-order functions, etcetera. Moreover, the Sundials/ML library offers a good comprise for programs combining symbolic manipulation and numeric calculation.

The graph above can be generated from the Sundials source by running

cd examples; make perf.opt.log GC_AT_END=1 PERF_DATA_POINTS=10

Be sure to build Sundials with -DCMAKE_BUILD_TYPE=Release otherwise the underlying library is unoptimized.

## References

1. A. C. Hindmarsh, P. N. Brown, K. E. Grant, S. L. Lee, R. Serban, D. E. Shumaker, and C. S. Woodward, “ SUNDIALS: Suite of Nonlinear and Differential/Algebraic Equation Solvers,” ACM Transactions on Mathematical Software, 31(3), pp. 363-396, 2005.
2. X. Leroy, “Old Objective Caml site: Writing efficient numerical code in Objective Caml,” July 2002.

## Acknowledgements

We gratefully acknowledge the support of the ITEA 3 project 11004 MODRIO (Model driven physical systems operation), Inria, and the Departement d'Informatique de l'ENS.

This library benefits greatly from the OCaml Bigarray and MPI libraries, and from OCaml's optimized floating-point representations and compilation.

This documentation uses J. Protzenko's CSS stylesheet, and MathJax for rendering mathematics.

We are grateful for direct contributions to this library from