Sundials/ML 2.7.0p0

Sundials is a collection of six numeric solvers: CVODE, CVODES, IDA, IDAS, ARKODE, and KINSOL. It is written by Carol S. Woodward, Daniel R. Reynolds, Alan C. Hindmarsh, and Lawrence E. Banks at the Center for Applied Scientific Computing, Lawrence Livermore National Laboratory with significant contributions from Radu Serban, and contributions from Peter N. Brown, Scott Cohen, Aaron Collier, Keith E. Grant, Steven L. Lee, Cosmin Petra, Dan Shumaker, and Allan G. Taylor.

This OCaml interface was written by Timothy Bourke (Inria/ENS), Jun Inoue (AIST), and Marc Pouzet (UPMC/ENS/Inria). It provides an OCaml interface to Sundials version 2.7.0. The only features missing are the Hypre ParVector and PETSC nvector modules (contact us if you need these).

The source code is available under a New BSD license at git@github.com:inria-parkas/sundialsml.git. Feedback, bug reports, and pull requests are welcome. Support requests should be made to the (public) OCaml mailing list.

We presented Sundials/ML at the 2016 The OCaml Users and Developers Workshop. A brief technical summary is included in the proceedings of the 2016 ACM Workshop on ML.

Contents

Overview

The structure of this interface mostly follows that of the original library, both for ease of reading the official documentation and for converting existing source code, but several changes have been made for programming convenience and to increase safety, namely:

Functions have been renamed according to a regular scheme. Leading module identifiers are replaced by module paths, words beginning with an uppercase letter are separated by underscores and put into lowercase. For instance, IdaSetErrHandlerFn, becomes Ida.set_err_handler_fn, and CVSpilsSetJacTimesVecFn becomes Cvode.Spils.set_jac_times_vec_fn.

Constants are replaced by variant types in most cases. They are renamed by conversion to CamlCase and the removal of underscores. For instance, PREC_NONE becomes Spils.PrecNone. Exception names are sometimes renamed for consistency and to make them more self explanatory. For instance, the return codes CV_FIRST_RHSFUNC_ERR and IDA_FIRST_RES_FAIL become, respectively, the exceptions Cvode.FirstRhsFuncFailure and Ida.FirstResFuncFailure, and CV_BAD_IS becomes Cvodes.Sensitivity.BadSensIdentifier.

Rather than try to duplicate or replace the comprehensive Sundials user manuals, this documentation provides brief summaries, adapted from the manual, with hyperlinks back to the original text whenever possible (the original HTML documentation is not currently being updated).

Nvectors

Sundials defines an abstract interface for vectors and provides default serial, parallel, Pthreads, and OpenMP instantiations. The OCaml interface defines likewise a generic Nvector.t type whose type arguments indicate the underlying data and kind—the latter may be Nvector_serial.kind, Nvector_parallel.kind, Nvector_pthreads.kind, Nvector_openmp.kind, or Nvector_custom.kind. The Nvector.unwrap function gives direct access to the underlying data.

The interface to serial nvectors, Nvector_serial, is based on Bigarrays. These arrays are manipulated directly, i.e., with no additional overhead, within the solver by the original low-level serial nvector operations (see Nvector.NVECTOR_OPS). The same low-level operations can be called from OCaml (Nvector_serial.Ops), as can equivalent reimplementations in OCaml on the underlying data (Nvector_serial.DataOps).

The interface to parallel nvectors, Nvector_parallel, is based on Bigarrays and the OCamlMPI library. Parallel nvectors are only available when Sundials/ML is configured to use MPI, as described below.

The underlying operations of Pthreads nvectors, Nvector_pthreads, and OpenMP nvectors, Nvector_openmp, are implemented using multiple threads. These nvectors can be used anywhere that Serial nvectors can (except for the operations in Nvector_serial.Ops).

Besides these four standard implementations, it is also possible to define new nvector implementations through Nvector_custom by providing low-level operations on an underlying datatype. A demonstration of this feature on float arrays is provided in Nvector_array. Custom nvectors suffer two disadvantages compared to the standard nvector implementations. First, each low-level operation incurs the cost of a callback into OCaml. Second, of all the provided linear solvers, only Cvode.Diag can be used; although custom solvers can be implemented via Cvode.Alternate, Ida.Alternate, and Kinsol.Alternate.

Linear Solvers

Nonlinear algebraic systems occur optionally in the solution of ODE initial value problems with Cvode, invariably when solving DAE initial value problems with Ida, and when solving implicit problems or problems involving a mass matrix with Arkode, and directly in the problems treated by Kinsol. Such systems are solved using some form of Newton iteration which in turn requires the solution of linear equations.

Sundials provides six options for the solution of linear equations:

The DLS routines are only available to sessions that use serial, Pthreads, or OpenMP nvectors. Callback functions can either update the Jacobian matrix as a Dls.DenseMatrix.t or a Dls.BandMatrix.t. Access to the underlying solver routines on bigarrays is provided via Dls.ArrayDenseMatrix and Dls.ArrayBandMatrix.

The SPILS routines include the Scaled Preconditioned GMRES (SPGMR), Scaled Preconditioned Bi-CGStab (SPBCG), Scaled Preconditioned TFQMR (SPTFQMR), Scaled Preconditioned Flexible GMRES (SPFMGR), and Preconditioned Conjugate Gradient (PCG) methods. Additionally, Cvode provides banded preconditioners for sessions that use serial, Pthreads, or OpenMP nvectors. Access to the underlying solver routines on bigarrays is provided via the submodules of Spils. Parallel Band-Block-Diagonal (BBD) preconditioners are available to sessions that use parallel nvectors—see Cvode_bbd, Cvodes_bbd, Ida_bbd, Idas_bbd, and Kinsol_bbd.

The SuperLU_MT and KLU routines are only available to sessions that use serial, Pthreads, or OpenMP nvectors. Callback functions update the Jacobian matrix as an Sls.SparseMatrix.t.

Each solver module has a distinct linear solver type, e.g., Cvode.linear_solver or Cvodes.Adjoint.linear_solver. As for nvectors, these types are parameterised by data and kind arguments. Values in these types are constructed by passing parameters to functions like Cvode.Dls.band or Ida.Spils.spgmr. They are then passed to the appropriate init or reinit functions to configure sessions.

API Reference


Sundials
Generic definitions, arrays, and utility functions.

Dls
Direct Linear Solvers.
Spils
Scaled Preconditioned Iterative Linear Solvers routines.
Sls
Sparse Linear Solvers.

Nvector
Generic nvector types and operations.
Nvector_serial
The standard serial nvectors of Sundials.
Nvector_parallel
The standard parallel nvectors of Sundials (requires MPI).
Nvector_pthreads
The Pthreads nvectors of Sundials (requires pthreads).
Nvector_openmp
The OpenMP nvectors of Sundials (requires OpenMP).
Nvector_custom
An interface for creating custom nvectors in OCaml.
Nvector_array
A custom nvector based on float arrays.

Cvode
Variable-step solution of ODE initial value problems with zero-crossing detection.
Cvode_bbd
Parallel band-block-diagonal preconditioners for CVODE (requires MPI).
Cvodes
Sensitivity analysis (forward and adjoint) and quadrature equations for CVODE.
Cvodes_bbd
Parallel band-block-diagonal preconditioners for CVODES (requires MPI).

Ida
Variable-step solution of DAE initial value problems with zero-crossing detection.
Ida_bbd
Parallel band-block-diagonal preconditioners for IDA (requires MPI).
Idas
Sensitivity analysis (forward and adjoint) and quadrature equations for IDA.
Idas_bbd
Parallel band-block-diagonal preconditioners for IDAS (requires MPI).

Arkode
Adaptive-step time integration for stiff, nonstiff, and multi-rate systems of ODE initial value problems with zero-crossing detection.
Arkode_bbd
Parallel band-block-diagonal preconditioners for ARKODE (requires MPI).

Kinsol
Solves nonlinear systems using Newton-Krylov techniques.
Kinsol_bbd
Parallel band-block-diagonal preconditioners for KINSOL (requires MPI).

Installation

The dependencies of Sundials/ML are

Sundials must be compiled with 64-bit floats (the default: --with-precision=double) and the C compiler must provide 32-bit ints.

Building the extra features of Sundials requires the installation of dependencies and the right cmake incantation. In particular, it can be tricky to get the optional SuperLU/MT library to work. If Sundials cmake claims that SuperLU_MT is not functional, this patch may help.

For Debian-based systems, we found the following worked:

  1. apt-get install cmake liblapack-dev libopenmpi-dev openmpi-bin libsuitesparse-dev
  2. Optionally download and build SuperLU/MT as described above, since the libsuperlu-dev package does not provide the shared library file.
  3. mkdir build; cd build
  4. cmake -Wno-dev ../sundials-2.7.0 \
    -DCMAKE_BUILD_TYPE=Release \
    -DOPENMP_ENABLE=1 \
    -DPTHREAD_ENABLE=1 \
    -DMPI_ENABLE=1 \
    -DLAPACK_ENABLE=1 -DLAPACK_LIBRARIES='-llapack -lblas' \
    -DKLU_ENABLE=1 -DKLU_LIBRARY_DIR=/usr/lib/x86_64-linux-gnu \
    -DKLU_INCLUDE_DIR=/usr/include/suitesparse
    adding, if necessary,
    -DSUPERLUMT_ENABLE=1 \
    -DSUPERLUMT_LIBRARY_DIR=<full-path-to>/SuperLU_MT_3.0/lib \
    -DSUPERLUMT_INCLUDE_DIR=<full-path-to>/SuperLU_MT_3.0/SRC \
    -DSUPERLUMT_LIBRARIES=-lblas
  5. Configure Sundials/ML with
    ./configure SUPERLUMT_LIBRARY_DIR=<full-path-to>/SuperLU_MT_3.0/lib
    Note that SUPERLUMT_LIBRARY_DIR must be registered with ld:
    export LD_LIBRARY_PATH=<full-path-to>/SuperLU_MT_3.0/lib:$LD_LIBRARY_PATH

For OS X, we found the following worked:

  1. Optionally build and install suite-space from the sources, since the brew package does not include dynamic libraries.
  2. Optionally download and build SuperLU/MT as described above.
  3. mkdir build; cd build
  4. For OpenMP, the gcc compiler is required:
    cmake -Wno-dev ../sundials-2.7.0 \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_C_COMPILER=/usr/local/bin/gcc-5 \
    -DOPENMP_ENABLE=1 \
    -DPTHREAD_ENABLE=1 \
    -DMPI_ENABLE=1 \
    -DLAPACK_ENABLE=1 \
    -DKLU_ENABLE=1 -DKLU_LIBRARY_DIR=/usr/local/lib -DKLU_INCLUDE_DIR=/usr/local/include
    adding, if necessary,
    -DSUPERLUMT_ENABLE=1 \
    -DSUPERLUMT_LIBRARY_DIR=<full-path-to>/SuperLU_MT_3.0/lib \
    -DSUPERLUMT_INCLUDE_DIR=<full-path-to>/SuperLU_MT_3.0/SRC
  5. Configure Sundials/ML with
    ./configure SUPERLUMT_LIBRARY_DIR=<full-path-to>/SuperLU_MT_3.0/lib

The configure script detects and automatically enables optional features. Lapack solvers, like Cvode.Dls.lapack_dense, are enabled if Sundials was built with lapack support (see also Sundials.lapack_enabled). The KLU and SuperLU_MT solvers, and the Pthreads and OpenMP nvectors are only enabled if Sundials was built with them. Parallel nvectors and Band-Block-Diagonal (BBD) solvers are only enabled if Sundials was built with them and OCamlMPI is available.

Under older versions of Sundials (2.5.0), the comparison of examples will only function correctly if the Sundials sources are patched and configured with --enable-examples and --enable-shared.

Opam

The easiest way to use the library is to install Sundials using a package manager and the library itself using Opam. Note, however that the Debian/Ubuntu and Brew packages are out-of-date, they install Sundials 2.5.0 (at the time of writing). Sundials/ML will function correctly but with less features.

  1. Download and manually install Sundials, or use a package manager:
    • Debian/Ubuntu (version 2.5.0 without parallelism): apt-get install libsundials-serial-dev
    • Fedora: dnf install lapack-devel sundials-devel sundials-threads-devel sundials-openmpi-devel
    • Mac OS X (version 2.7.0 without parallelism): brew install sundials / port install sundials
  2. Optionally run opam install mpi.
  3. Run opam install sundialsml.

From source

Building from source is a three step process:

  1. Download and manually install Sundials, or use a package manager:
    • Debian/Ubuntu (without parallelism): apt-get install libsundials-serial-dev
    • Mac OS X: brew install sundials / port install sundials
  2. Run configure to find and check dependencies.
  3. Run make install or make install-findlib to build and install the library.

The choices made by the configure script can be influenced by arguments (like --prefix=...) and variables (like OCAMLROOT=...). Type configure --help for detailed information.

OCaml reimplementations of the standard Sundials examples are provided in the examples/ subdirectory. The library's behaviour can be tested via make tests.opt.log which runs the OCaml versions and compares their outputs against those of the original C versions: they should be identical. The library's performance can be analyzed via make perf.opt.log which produces the graph explained below.

Running programs

Compiling and linking

Programs are compiled by specifying where Sundials/ML is installed, e.g.,

and including bigarray.cma and sundials.cma, for example:
ocamlc -o myprog.byte -I +sundialsml bigarray.cma sundials.cma myprog.ml
or the .cmxa versions:
ocamlopt -o myprog.opt -I +sundialsml bigarray.cmxa sundials.cmxa myprog.ml

The sundials.cma/.cmxa files link against the libraries libsundials_cvodes and libsundials_idas. The code in these libraries should give the same results as that in those without sensitivity analysis (except for the functions Cvode.get_work_space and Ida.get_work_space), even though they are compiled from distinct source files. The sundials_no_sens.cma/cmxa files, on the other hand, link against the libraries libsundials_cvode and libsundials_ida and thus do not include the functionality in Cvodes or Idas. Both sets of files link against libsundials_kinsol, libsundials_arkode, and libsundials_nvecserial.

The parallel features—in the Nvector_parallel, Cvode_bbd, Cvodes_bbd, Ida_bbd, Idas_bbd, Kinsol_bbd, and Arkode_bbd modules—require the additional inclusions of mpi.cma and sundials_mpi.cma. So, for example:

ocamlc -o myprog.byte -I +sundialsml bigarray.cma mpi.cma sundials.cma \
					sundials_mpi.cma myprog.ml
or with the .cmxa versions:
ocamlopt -o myprog.opt -I +sundialsml bigarray.cmxa mpi.cmxa sundials.cmxa \
					 sundials_mpi.cmxa myprog.ml
The sundials_mpi.cm(x)a files link against the libsundials_nvecparallel library.

The Nvector_openmp and Nvector_pthreads modules require the additional inclusion, respectively, of sundials_openmp.cm(x)a and sundials_pthreads.cm(x)a.

Under ocamlfind, the parallel, OpenMP, and Pthreads features are selected via subpackages, and the use of the libraries without sensitivity analysis via a predicate. For example, for everything:

ocamlfind ocamlopt -package sundialsml.mpi,sundials.pthreads,sundials.openmp \
				-linkpkg -o mysim.opt mysim.ml
The available packages and the features they select are:

From the toplevel

Sundials/ML can also be used from the OCaml interactive loop, either by an invocation like:

ocaml bigarray.cma -I +sundialsml sundials.cma sundials_openmp.cma
or through ocamlfind, for example:
#use "topfind";;
#predicates "no_sens";;          (* optional—excludes sensitivity code *)
#require "sundialsml";;

let f t y yd = yd.{0} <- 1.;;
let g t y gout = gout.{0} <- y.{0};;
let y = Sundials.RealArray.of_array [| -1.0 |];;
let yvec = Nvector_serial.wrap y;;
let s =
  Cvode.(init Adams Functional default_tolerances f ~roots:(1, g) 0. yvec);;
Cvode.set_stop_time s 2.;;

(* repeat the commands below to advance the simulation until t = 2.0 *)
let (t', result) = Cvode.solve_normal s 2. yvec;;
Format.printf "%e: %a\n" t' Sundials.RealArray.pp y;;

Using MPI from a toplevel is best done with ocamlfind by first creating a custom toplevel:

ocamlfind ocamlmktop -o partop -package sundialsml.mpi,findlib -linkpkg

and then launching it in multiple terminals:

mpirun -np 2 xterm -e ./partop
Here, 2 is the number of processes, xterm is the terminal program, and -e ./partop has each xterm execute ./partop. As a simple test, paste the following into all terminals:
#use "topfind";;
#require "sundialsml.mpi";;

let comm = Mpi.comm_world
let n = Mpi.comm_size comm
let my_id = Mpi.comm_rank comm
let pv = Nvector_parallel.make 1 n comm (float_of_int (my_id + 1));;

Printf.printf "%d: local=%f.\n" my_id (Nvector_parallel.local_array pv).{0};;
Printf.printf "Sum of abs. = %f\n" (Nvector_parallel.Ops.n_vl1norm pv);;

Solutions to common problems

  1. The message
    Fatal error: cannot load shared library dllmlsundials
    can usually be fixed by updating LD_LIBRARY_PATH, for example,
    export LD_LIBRARY_PATH=/usr/local/lib:${LD_LIBRARY_PATH}
    Otherwise you may have compiled Sundials without --enable-shared.
  2. The configuration warning
    Couldn't determine C compiler flag for OpenMP.
    can usually be eliminated by specifying a compiler that supports OpenMP, for example,
    CC=gcc-5 ./configure

Performance

An interface like Sundials/ML inevitably adds overhead: there is extra code to execute at each call. But, how significant is this cost? And, more broadly, how does the performance of OCaml compare to that of C for programs that use numeric solvers?

These questions are not easy to answer. As a first attempt, we took the examples in C from the Sundials distribution, reimplemented them in OCaml and compared the execution times. The bars in the graph below show the ratios of the execution times of the OCaml code to the C code, i.e., a value of 2 on the left axis means that OCaml is twice as slow. The black dots indicate, against the right axis, the execution time of the C code.

The graph shows that the OCaml examles are only sometimes more than 50% slower than the original ones. The *_custom example (light blue) uses custom nvectors with low-level operations in OCaml and the *_alt examples (darker blue) use linear solvers implemented in OCaml.

This conclusion seems reasonable as a first approximation, but several details of the analysis process and individual results show that the real situation is less clear-cut. For one, the running times of most of the examples are so short that accurate profiling is not possible, i.e., time and gprof simply show 0 seconds. The figures in the graph were obtained by modifying the examples to repeatedly execute their main functions; we trigger a Gc.compact after each execution since the C program must call malloc and free. The number of repetitions varies per example since otherwise the slower examples take too long. The timings indicated by the dots and the axis at right are calculated by dividing the wall-clock time of each C version by the number of repetitions. All but six of the serial examples (red) run so fast that comparisons are made based on tens, or usually hundreds of repetitions and in some cases this amplifies factors other than the interface overhead. The slowest example, for instance, kin--ser--kinRoberts_fp is iterated more than 185 000 times to achieve a significant wall clock time. This means creating and destroying many more data structures than usual for such code. Running the example without explicit compacts and with a larger minor heap (OCAMLRUNPARAM=s=128M) decreases the OCaml/C ratio to 2.6.

The parallel examples (lighter red) all have relatively long run times and results are obtained without iterating. Their OCaml/C ratios are almost all close to 1—the interface and other overheads are small compared to the dominating costs of parallelization and communication.

We were able to make our OCaml versions much faster (up to 4 times) by:

In summary, OCaml code using the Sundials solvers should almost never be more than 50% slower than the equivalent code written in C, provided the guidelines above are followed, and it should usually not be more than 30% slower. It is usually, however, faster to write and debug OCaml code thanks to automatic memory management, bounds checking on arrays, strong static type checking, higher-order functions, etcetera. Moreover, the Sundials/ML library offers a good comprise for programs combining symbolic manipulation and numeric calculation.

The graph above can be generated from the Sundials source by running

cd examples; make perf.opt.log GC_AT_END=1 PERF_DATA_POINTS=10
Be sure to build Sundials with -DCMAKE_BUILD_TYPE=Release otherwise the underlying library is unoptimized.

References

  1. A. C. Hindmarsh, P. N. Brown, K. E. Grant, S. L. Lee, R. Serban, D. E. Shumaker, and C. S. Woodward, “ SUNDIALS: Suite of Nonlinear and Differential/Algebraic Equation Solvers,” ACM Transactions on Mathematical Software, 31(3), pp. 363-396, 2005.
  2. X. Leroy, “Old Objective Caml site: Writing efficient numerical code in Objective Caml,” July 2002.

Acknowledgements

We gratefully acknowledge the support of the ITEA 3 project 11004 MODRIO (Model driven physical systems operation), Inria, and the Departement d'Informatique de l'ENS.

This library benefits greatly from the OCaml Bigarray and MPI libraries, and from OCaml's optimized floating-point representations and compilation.

This documentation uses J. Protzenko's CSS stylesheet, and MathJax for rendering mathematics.

We are grateful for direct contributions to this library from

Indexes