Getting started with Non-Linear Least-Squares Fitting — Non-Linear Least-Squares Minimization and Curve-Fitting for Python (2024)

The lmfit package provides simple tools to help you build complex fittingmodels for non-linear least-squares problems and apply these models to realdata. This section gives an overview of the concepts and describes how toset up and perform simple fits. Some basic knowledge of Python, NumPy, andmodeling data are assumed – this is not a tutorial on why or how toperform a minimization or fit data, but is rather aimed at explaining howto use lmfit to do these things.

In order to do a non-linear least-squares fit of a model to data or for anyother optimization problem, the main task is to write an objectivefunction that takes the values of the fitting variables and calculateseither a scalar value to be minimized or an array of values that are to beminimized, typically in the least-squares sense. For many data fittingprocesses, the latter approach is used, and the objective function shouldreturn an array of (data-model), perhaps scaled by some weighting factorsuch as the inverse of the uncertainty in the data. For such a problem,the chi-square (\(\chi^2\)) statistic is often defined as:

\[\chi^2 = \sum_i^{N} \frac{[y^{\rm meas}_i - y_i^{\rm model}({\bf{v}})]^2}{\epsilon_i^2}\]

where \(y_i^{\rm meas}\) is the set of measured data, \(y_i^{\rmmodel}({\bf{v}})\) is the model calculation, \({\bf{v}}\) is the set ofvariables in the model to be optimized in the fit, and \(\epsilon_i\)is the estimated uncertainty in the data, respectively.

In a traditional non-linear fit, one writes an objective function thattakes the variable values and calculates the residual array \(y^{\rmmeas}_i - y_i^{\rm model}({\bf{v}})\), or the residual array scaled by thedata uncertainties, \([y^{\rm meas}_i - y_i^{\rmmodel}({\bf{v}})]/{\epsilon_i}\), or some other weighting factor.

As a simple concrete example, one might want to model data with a decayingsine wave, and so write an objective function like this:

from numpy import exp, sindef residual(variables, x, data, uncertainty): """Model a decaying sine wave and subtract data.""" amp = variables[0] phaseshift = variables[1] freq = variables[2] decay = variables[3] model = amp * sin(x*freq + phaseshift) * exp(-x*x*decay) return (data-model) / uncertainty

To perform the minimization with scipy.optimize, one would do this:

from numpy import linspace, randomfrom scipy.optimize import leastsq# generate synthetic data with noisex = linspace(0, 100)noise = random.normal(size=x.size, scale=0.2)data = 7.5 * sin(x*0.22 + 2.5) * exp(-x*x*0.01) + noise# generate experimental uncertaintiesuncertainty = abs(0.16 + random.normal(size=x.size, scale=0.05))variables = [10.0, 0.2, 3.0, 0.007]out = leastsq(residual, variables, args=(x, data, uncertainty))

Though it is wonderful to be able to use Python for such optimizationproblems, and the SciPy library is robust and easy to use, the approachhere is not terribly different from how one would do the same fit in C orFortran. There are several practical challenges to using this approach,including:

The user has to keep track of the order of the variables, and theirmeaning – variables[0] is the amplitude, variables[2] is thefrequency, and so on, although there is no intrinsic meaning to thisorder.
If the user wants to fix a particular variable (not vary it in the fit),the residual function has to be altered to have fewer variables, and havethe corresponding constant value passed in some other way. Whilereasonable for simple cases, this quickly becomes a significant work formore complex models, and greatly complicates modeling for people notintimately familiar with the details of the fitting code.
There is no simple, robust way to put bounds on values for the variables,or enforce mathematical relationships between the variables. While someoptimization methods in SciPy do provide bounds, they require bounds tobe set for all variables with separate arrays that are in the samearbitrary order as variable values. Again, this is acceptable for smallor one-off cases, but becomes painful if the fitting model needs tochange.
In some cases, constraints can be placed on Parameter values, but this isa pretty opaque and complex process.

While these shortcomings can be worked around with some work, they are allessentially due to the use of arrays or lists to hold the variables.This closely matches the implementation of the underlying Fortran code, butdoes not fit very well with Python’s rich selection of objects and datastructures. The key concept in lmfit is to define and use Parameterobjects instead of plain floating point numbers as the variables for thefit. Using Parameter objects (or the closely relatedParameters – a dictionary of Parameter objects), allows oneto do the following:

forget about the order of variables and refer to Parametersby meaningful names.
place bounds on Parameters as attributes, without worrying aboutpreserving the order of arrays for variables and boundaries, and withoutrelying on the solver to support bounds itself.
fix Parameters, without having to rewrite the objective function.
place algebraic constraints on Parameters.

To illustrate the value of this approach, we can rewrite the above examplefor the decaying sine wave as:

from numpy import exp, sinfrom lmfit import minimize, Parametersdef residual(params, x, data, uncertainty): amp = params['amp'] phaseshift = params['phase'] freq = params['frequency'] decay = params['decay'] model = amp * sin(x*freq + phaseshift) * exp(-x*x*decay) return (data-model) / uncertaintyparams = Parameters()params.add('amp', value=10)params.add('decay', value=0.007)params.add('phase', value=0.2)params.add('frequency', value=3.0)out = minimize(residual, params, args=(x, data, uncertainty))

At first look, we simply replaced a list of values with a dictionary, so thatwe can access Parameters by name. Just by itself, this is better as it allowsseparation of the objective function from the code using it.

Note that creation of Parameters here could also be done as:

New in version 1.2.0.

from lmfit import create_paramsparams = create_params(amp=10, decay=0.007, phase=0.2, frequency=3.0)

where keyword/value pairs set Parameter names and their initial values.

Either when using create_param() or Parameters, the resultingparams object is an instance of Parameters, which acts like adictionary, with keys being the Parameter name and values being individualParameter objects. These Parameter objects hold the valueand several other attributes that control how a Parameter acts. For example,Parameters can be fixed or bounded; setting attributes to control thisbehavior can be done during definition, as with:

params = Parameters()params.add('amp', value=10, vary=False)params.add('decay', value=0.007, min=0.0)params.add('phase', value=0.2)params.add('frequency', value=3.0, max=10)

Here vary=False will prevent the value from changing in the fit, andmin=0.0 will set a lower bound on that parameter’s value. The same thingcan be accomplished by providing a dictionary of attribute values tocreate_params():

New in version 1.2.0.

params = create_params(amp={'value': 10, 'vary': False}, decay={'value': 0.007, 'min': 0}, phase=0.2, frequency={'value': 3.0, 'max':10})

Parameter attributes can also be modified after they have been created:

params['amp'].vary = Falseparams['decay'].min = 0.10

Importantly, our objective function remains unchanged. This means the objectivefunction can simply express the parametrized phenomenon to be calculated,accessing Parameter values by name and separating the choice of parameters tobe varied in the fit.

The params object can be copied and modified to make many user-levelchanges to the model and fitting process. Of course, most of the informationabout how your data is modeled goes into the objective function, but theapproach here allows some external control; that is, control by the userperforming the fit, instead of by the author of the objective function.

Finally, in addition to the Parameters approach to fitting data, lmfitallows switching optimization methods without changing the objective function,provides tools for generating fitting reports, and provides a betterdetermination of Parameters confidence levels.

Getting started with Non-Linear Least-Squares Fitting — Non-Linear Least-Squares Minimization and Curve-Fitting for Python (2024)

References