I was looking for a code that can handle an optimization problem with a million dimension. What I know is that my function has a single peak. I can also get a fairly good starting point. There are correlations between parameters. Given these conditions, my question was what software is publicly available for solving the equations numerically. A huge list of open source minimization software can be found in Wikipedia. Eigen library provides a set of non-linear optimization methods. It is ported from MINPACK. Its starting point was the c version of MINPACK cminpack. Then there is lfbgs++ and HLFBGS. There are routines in Python and R to do this but my code was in C++. I came across a M1QN3 algorithm and was very impressed by its performance.
MINPACK
MINPACK is a library of FORTRAN subroutines for the solving of systems of nonlinear equations, or the least squares minimization of the residual of a set of linear or nonlinear equations. One of the famous subroutines in this code is lmder
.
subroutine lmder(fcn,m,n,x,fvec,fjac,ldfjac,ftol,xtol,gtol,
* maxfev,diag,mode,factor,nprint,info,nfev,njev,
* ipvt,qtf,wa1,wa2,wa3,wa4)
A C/C++ rewrite of the MINPACK is available in GitHub. This portable and easy to use. Here is an example from cminpack. Note the definition of __minpack_real__
and the use macro __minpack_func__()
/* driver for lmder example. */
#include <stdio.h>
#include <math.h>
#include <assert.h>
#include <minpack.h>
#define real __minpack_real__
void fcn(const int *m, const int *n, const real *x, real *fvec, real *fjac,
const int *ldfjac, int *iflag);
int main()
{
int i, j, m, n, ldfjac, maxfev, mode, nprint, info, nfev, njev;
int ipvt[3];
real ftol, xtol, gtol, factor, fnorm;
real x[3], fvec[15], fjac[15*3], diag[3], qtf[3],
wa1[3], wa2[3], wa3[3], wa4[15];
int one=1;
real covfac;
m = 15;
n = 3;
/* the following starting values provide a rough fit. */
x[1-1] = 1.;
x[2-1] = 1.;
x[3-1] = 1.;
ldfjac = 15;
/* set ftol and xtol to the square root of the machine */
/* and gtol to zero. unless high solutions are */
/* required, these are the recommended settings. */
ftol = sqrt(__minpack_func__(dpmpar)(&one));
xtol = sqrt(__minpack_func__(dpmpar)(&one));
gtol = 0.;
maxfev = 400;
mode = 1;
factor = 1.e2;
nprint = 0;
__minpack_func__(lmder)(&fcn, &m, &n, x, fvec, fjac, &ldfjac, &ftol, &xtol, >ol,
&maxfev, diag, &mode, &factor, &nprint, &info, &nfev, &njev,
ipvt, qtf, wa1, wa2, wa3, wa4);
fnorm = __minpack_func__(enorm)(&m, fvec);
printf(" final l2 norm of the residuals%15.7g\n\n", (double)fnorm);
printf(" number of function evaluations%10i\n\n", nfev);
printf(" number of Jacobian evaluations%10i\n\n", njev);
printf(" exit parameter %10i\n\n", info);
printf(" final approximate solution\n");
for (j=1; j<=n; j++) {
printf("%s%15.7g", j%3==1?"\n ":"", (double)x[j-1]);
}
printf("\n");
ftol = __minpack_func__(dpmpar)(&one);
covfac = fnorm*fnorm/(m-n);
__minpack_func__(covar)(&n, fjac, &ldfjac, ipvt, &ftol, wa1);
printf(" covariance\n");
for (i=1; i<=n; i++) {
for (j=1; j<=n; j++) {
printf("%s%15.7g", j%3==1?"\n ":"", (double)fjac[(i-1)*ldfjac+j-1]*covfac);
}
}
printf("\n");
return 0;
}
void fcn(const int *m, const int *n, const real *x, real *fvec, real *fjac,
const int *ldfjac, int *iflag)
{
/* subroutine fcn for lmder example. */
int i;
real tmp1, tmp2, tmp3, tmp4;
real y[15]={1.4e-1, 1.8e-1, 2.2e-1, 2.5e-1, 2.9e-1, 3.2e-1, 3.5e-1,
3.9e-1, 3.7e-1, 5.8e-1, 7.3e-1, 9.6e-1, 1.34, 2.1, 4.39};
assert(*m == 15 && *n == 3);
if (*iflag == 0) {
/* insert print statements here when nprint is positive. */
/* if the nprint parameter to lmder is positive, the function is
called every nprint iterations with iflag=0, so that the
function may perform special operations, such as printing
residuals. */
return;
}
if (*iflag != 2) {
/* compute residuals */
for (i=1; i <= 15; i++)
{
tmp1 = i;
tmp2 = 16 - i;
tmp3 = tmp1;
if (i > 8) {
tmp3 = tmp2;
}
fvec[i-1] = y[i-1] - (x[1-1] + tmp1/(x[2-1]*tmp2 + x[3-1]*tmp3));
}
} else {
/* compute Jacobian */
for (i=1; i<=15; i++) {
tmp1 = i;
tmp2 = 16 - i;
tmp3 = tmp1;
if (i > 8) {
tmp3 = tmp2;
}
tmp4 = (x[2-1]*tmp2 + x[3-1]*tmp3); tmp4 = tmp4*tmp4;
fjac[i-1 + *ldfjac*(1-1)] = -1.;
fjac[i-1 + *ldfjac*(2-1)] = tmp1*tmp2/tmp4;
fjac[i-1 + *ldfjac*(3-1)] = tmp1*tmp3/tmp4;
};
}
return;
}
Eigen library
Eigen has its own port of MINPACK. However, it is in the unsupported modules and the documentation can be found here. Here is a snippet from of the examples provided by Eigen.
struct lmder_functor : Functor<double>
{
lmder_functor(void): Functor<double>(3,15) {}
int operator()(const VectorXd &x, VectorXd &fvec) const
{
double tmp1, tmp2, tmp3;
static const double y[15] = {1.4e-1, 1.8e-1, 2.2e-1, 2.5e-1, 2.9e-1, 3.2e-1, 3.5e-1,
3.9e-1, 3.7e-1, 5.8e-1, 7.3e-1, 9.6e-1, 1.34, 2.1, 4.39};
for (int i = 0; i < values(); i++)
{
tmp1 = i+1;
tmp2 = 16 - i - 1;
tmp3 = (i>=8)? tmp2 : tmp1;
fvec[i] = y[i] - (x[0] + tmp1/(x[1]*tmp2 + x[2]*tmp3));
}
return 0;
}
int df(const VectorXd &x, MatrixXd &fjac) const
{
double tmp1, tmp2, tmp3, tmp4;
for (int i = 0; i < values(); i++)
{
tmp1 = i+1;
tmp2 = 16 - i - 1;
tmp3 = (i>=8)? tmp2 : tmp1;
tmp4 = (x[1]*tmp2 + x[2]*tmp3); tmp4 = tmp4*tmp4;
fjac(i,0) = -1;
fjac(i,1) = tmp1*tmp2/tmp4;
fjac(i,2) = tmp1*tmp3/tmp4;
}
return 0;
}
};
void testLmder1()
{
int n=3, info;
VectorXd x;
/* the following starting values provide a rough fit. */
x.setConstant(n, 1.);
// do the computation
lmder_functor functor;
LevenbergMarquardt<lmder_functor> lm(functor);
info = lm.lmder1(x);
// check return value
VERIFY_IS_EQUAL(info, 1);
VERIFY_IS_EQUAL(lm.nfev, 6);
VERIFY_IS_EQUAL(lm.njev, 5);
// check norm
VERIFY_IS_APPROX(lm.fvec.blueNorm(), 0.09063596);
// check x
VectorXd x_ref(n);
x_ref << 0.08241058, 1.133037, 2.343695;
VERIFY_IS_APPROX(x, x_ref);
}
HLBFGS
HLBFGS is a hybrid L-BFGS(Limited Memory Broyden–Fletcher–Goldfarb–Shanno Method) optimization framework which unifies L-BFGS method, Preconditioned L-BFGS method and Preconditioned Conjugate Gradient method. With different options, HLBFGS also works like gradient-decent method, Newton method and Conjugate-gradient method. The original purpose I wrote this routine is for fast-computing large-scaled Centroidal Voronoi diagram. This is one of my favorite libraries and offers a big set of algorithms, for example, Gradient Decent, Conjugate Gradient, L-BFGS and M1QN3. I am making use of this library in my code and find it very stable. Here is an example that shows the easiness of the API
#ifdef _DEBUG
#define _CRTDBG_MAP_ALLOC
#include <stdlib.h>
#include <crtdbg.h>
#endif
#include "HLBFGS.h"
#include "Lite_Sparse_Matrix.h"
#include <iostream>
Lite_Sparse_Matrix* m_sparse_matrix = 0;
//////////////////////////////////////////////////////////////////////////
void evalfunc(int N, double* x, double *prev_x, double* f, double* g)
{
*f = 0;
for (int i = 0; i < N; i+=2)
{
double T1 = 1 - x[i];
double T2 = 10*(x[i+1]-x[i]*x[i]);
*f += T1*T1+T2*T2;
g[i+1] = 20*T2;
g[i] = -2*(x[i]*g[i+1]+T1);
}
}
//////////////////////////////////////////////////////////////////////////
void newiteration(int iter, int call_iter, double *x, double* f, double *g, double* gnorm)
{
std::cout << iter <<": " << call_iter <<" " << *f <<" " << *gnorm << std::endl;
}
//////////////////////////////////////////////////////////////////////////
void evalfunc_h(int N, double *x, double *prev_x, double *f, double *g, HESSIAN_MATRIX& hessian)
{
//the following code is not optimal if the pattern of hessian matrix is fixed.
if (m_sparse_matrix)
{
delete m_sparse_matrix;
}
m_sparse_matrix = new Lite_Sparse_Matrix(N, N, SYM_LOWER, CCS, FORTRAN_TYPE, true);
m_sparse_matrix->begin_fill_entry();
static bool first = true;
double *diag = m_sparse_matrix->get_diag();
if (first)
{
// you need to update f and g
*f = 0;
double tmp;
for (int i = 0; i < N; i+=2)
{
tmp = x[i]*x[i];
double T1 = 1 - x[i];
double T2 = 10*(x[i+1]-tmp);
*f += T1*T1+T2*T2;
g[i+1] = 20*T2;
g[i] = -2*(x[i]*g[i+1]+T1);
diag[i] = 2+1200*tmp-400*x[i+1];
diag[i+1] = 200;
m_sparse_matrix->fill_entry(i, i+1, -400*x[i]);
}
}
else
{
for (int i = 0; i < N; i+=2)
{
diag[i] = 2+1200*x[i]*x[i]-400*x[i+1];
diag[i+1] = 200;
m_sparse_matrix->fill_entry(i, i+1, -400*x[i]);
}
}
m_sparse_matrix->end_fill_entry();
hessian.set_diag(m_sparse_matrix->get_diag());
hessian.set_values(m_sparse_matrix->get_values());
hessian.set_rowind(m_sparse_matrix->get_rowind());
hessian.set_colptr(m_sparse_matrix->get_colptr());
hessian.set_nonzeros(m_sparse_matrix->get_nonzero());
first = false;
}
//////////////////////////////////////////////////////////////////////////
void Optimize_by_HLBFGS(int N, double *init_x, int num_iter, int M, int T, bool with_hessian)
{
double parameter[20];
int info[20];
//initialize
INIT_HLBFGS(parameter, info);
info[4] = num_iter;
info[6] = T;
info[7] = with_hessian?1:0;
info[10] = 0;
info[11] = 1;
if (with_hessian)
{
HLBFGS(N, M, init_x, evalfunc, evalfunc_h, HLBFGS_UPDATE_Hessian, newiteration, parameter, info);
}
else
{
HLBFGS(N, M, init_x, evalfunc, 0, HLBFGS_UPDATE_Hessian, newiteration, parameter, info);
}
}
//////////////////////////////////////////////////////////////////////////
void main()
{
#ifdef _DEBUG
_CrtSetDbgFlag(_CRTDBG_ALLOC_MEM_DF | _CRTDBG_LEAK_CHECK_DF);
#endif
std::cout.precision(16);
std::cout << std::scientific;
int N = 1000;
std::vector<double> x(N);
for (int i = 0; i < N/2; i++)
{
x[2*i] = -1.2;
x[2*i+1] = 1.0;
}
int M = 7;
int T = 0;
//use Hessian
// if M = 0, T = 0, it is Newton
//Optimize_by_HLBFGS(N, &x[0], 1000, M, T, true);
//without Hessian
Optimize_by_HLBFGS(N, &x[0], 1000, M, T, false); // it is LBFGS(M) actually, T is not used
if (m_sparse_matrix)
delete m_sparse_matrix;
}
M1QN3
While using HLFBGS I noticed the excellent performance of MIQN3 algorithm. So started looking further and came across the original fortran version. It is a part of MODULOPT, a library for solving optimization problems and testing optimization software. The API in fortran for M1QN3 is shown below
subroutine m1qn3 (simul, prosca, ctonb, ctcab, n, x, f, g, &
dxmin, df1, epsg, normtype, impres, io, &
imode, omode, niter, nsim, iz, dz, ndz, &
reverse, indic, izs, rzs, dzs)
Here is what they say about this software in the website
The routine M1QN3 has been designed to minimize functions depending on a very large number of variables (several hundred million is sometimes possible) no subject to constraints. It implements a limited memory quasi-Newton technique (the L-BFGS method of J. Nocedal) with a dynamically updated scalar or diagonal preconditioner. It uses line-search to enforce global convergence; more precisely, the step-size is determined by the Fletcher-Lemaréchal algorithm and realizes the Wolfe conditions.
I find it true. I have tried it with a problem of 1e6 dimensions and it worked. My problem was unimodal and with minor correlations between parameters. Analytical gradients were available as well.