summaryrefslogtreecommitdiff
path: root/doc/source/Tutorials/fit.rst
diff options
context:
space:
mode:
authorPicca Frédéric-Emmanuel <picca@synchrotron-soleil.fr>2017-08-18 14:48:52 +0200
committerPicca Frédéric-Emmanuel <picca@synchrotron-soleil.fr>2017-08-18 14:48:52 +0200
commitf7bdc2acff3c13a6d632c28c4569690ab106eed7 (patch)
tree9d67cdb7152ee4e711379e03fe0546c7c3b97303 /doc/source/Tutorials/fit.rst
Import Upstream version 0.5.0+dfsg
Diffstat (limited to 'doc/source/Tutorials/fit.rst')
-rw-r--r--doc/source/Tutorials/fit.rst635
1 files changed, 635 insertions, 0 deletions
diff --git a/doc/source/Tutorials/fit.rst b/doc/source/Tutorials/fit.rst
new file mode 100644
index 0000000..1305299
--- /dev/null
+++ b/doc/source/Tutorials/fit.rst
@@ -0,0 +1,635 @@
+
+.. _fit-tutorial:
+
+Fit tools
+---------
+
+.. contents:: :local:
+
+.. _leastsq-tutorial:
+
+Using :func:`leastsq`
++++++++++++++++++++++
+
+.. currentmodule:: silx.math.fit
+
+Running an iterative fit with :func:`leastsq` involves the following steps:
+
+ - designing a fit model function that has the signature ``f(x, ...)``,
+ where ``x`` is an array of values of the independant variable and all
+ remaining parameters are the parameters to be fitted
+ - defining the sequence of initial values for all parameters to be fitted.
+ You can usually start with ``[1., 1., ...]`` if you don't know a better
+ estimate. The algorithm is robust enough to converge to a solution most
+ of the time.
+ - setting constraints (optional)
+
+Data required to perform a fit is:
+
+ - an array of ``x`` values (abscissa, independant variable)
+ - an array of ``y`` data points
+ - the ``sigma`` array of uncertainties associated to each data point.
+ This is optional, by default each data point gets assigned a weight of 1.
+
+Standard fit
+************
+
+Let's demonstrate this process in a short example, using synthetic data.
+We generate an array of synthetic data using a polynomial function of degree 4,
+and try to use :func:`leastsq` to find back the functions parameters.
+
+.. code-block:: python
+
+ import numpy
+ from silx.math.fit import leastsq
+
+ # create some synthetic polynomial data
+ x = numpy.arange(1000)
+ y = 2.4 * x**4 - 10. * x**3 + 15.2 * x**2 - 24.6 * x + 150.
+
+ # define our fit function: a generic polynomial of degree 4
+ def poly4(x, a, b, c, d, e):
+ return a * x**4 + b * x**3 + c * x**2 + d * x + e
+
+ # The fit is an iterative process that requires an initial
+ # estimation of the parameters. Let's just use 1s.
+ initial_parameters = numpy.array([1., 1., 1., 1., 1.])
+
+ # Run fit
+ fitresult = leastsq(model=poly4,
+ xdata=x,
+ ydata=y,
+ p0=initial_parameters,
+ full_output=True)
+
+ # leastsq with full_output=True returns 3 objets
+ optimal_parameters, covariance, infodict = fitresult
+ # the first object is an array with the fitted parameters
+ a, b, c, d, e = optimal_parameters
+
+ print("Fit took %d iterations" % infodict["niter"])
+ print("Reduced chi-square: %f" % infodict["reduced_chisq"])
+ print("Theoretical parameters:\n\t" +
+ "a=2.4, b=-10, c=15.2, d=-24.6, e=150")
+ print("Optimal parameters for y2 fitting:\n\t" +
+ "a=%f, b=%f, c=%f, d=%f, e=%f" % (a, b, c, d, e))
+
+The output of this program is::
+
+ Fit took 35 iterations
+ Reduced chi-square: 682592.670690
+ Theoretical parameters:
+ a=2.4, b=-10, c=15.2, d=-24.6, e=150
+ Optimal parameters for y fitting:
+ a=2.400000, b=-9.999665, c=14.970422, d=31.683448, e=-3216.131136
+
+We can see that this fit result is poor. In particular, parameters ``d`` and ``e``
+are very poorly fitted.
+This is most likely due to numerical rounding errors. As we are dealing with
+very large values in our ``y`` array, we are affected by the limits of how
+floating point numbers are represented by computers. The larger a value, the
+larger its rounding error.
+
+If you limit the ``x`` range to deal with
+smaller ``y`` values, the fit result becomes perfect. In our example, replacing ``x``
+with::
+
+ x = numpy.arange(100)
+
+produces the following result::
+
+ Fit took 9 iterations
+ Reduced chi-square: 0.000000
+ Theoretical parameters:
+ a=2.4, b=-10, c=15.2, d=-24.6, e=150
+ Optimal parameters for y fitting:
+ a=2.400000, b=-10.000000, c=15.200000, d=-24.600000, e=150.000000
+
+
+
+Constrained fit
+***************
+
+But let's revert back to our initial ``x = numpy.arange(1000)``, to experiment
+with different approaches to improving the fit.
+
+The :func:`leastsq` functions provides
+a way to set constraints on parameters. You can for instance assert that a given
+parameter must remain equal to it's initial value, or define an acceptable range
+for it to vary, or decide that a parameter must be equal to another parameter
+multiplied by a certain factor. This is very useful in cases in which you have
+enough knowledge to make reasonable assumptions on some parameters.
+
+In our case, we will set constraints on ``d`` and ``e``. We will quote ``d`` to
+stay in the range between -25 and -24, and fix ``e`` to 150.
+
+Replace the call to :func:`leastsq` by following lines:
+
+.. code-block:: python
+
+ # Define constraints
+ cons = [[0, 0, 0], # a: no constraint
+ [0, 0, 0], # b: no constraint
+ [0, 0, 0], # c: no constraint
+ [2, -25., -23.], # -25 < d < -24
+ [3, 0, 0]] # e is fixed to initial value
+ fitresult = leastsq(poly4, x, y,
+ # initial values must be consistent with constraints
+ p0=[1., 1., 1., -24., 150.],
+ constraints=cons,
+ full_output=True)
+
+The output of this program is::
+
+ Constrained fit took 100 iterations
+ Reduced chi-square: 3.749280
+ Theoretical parameters:
+ a=2.4, b=-10, c=15.2, d=-24.6, e=150
+ Optimal parameters for y fitting:
+ a=2.400000, b=-9.999999, c=15.199648, d=-24.533014, e=150.000000
+
+The chi-square value is much improved and the results are much better, at the
+cost of more iterations.
+
+Weighted fit
+************
+A third approach to improve our fit is to define uncertainties for the data.
+The larger the uncertainty on a data sample, the smaller its weight will be
+in the least-square problem.
+
+In our case, we do not know the uncertainties associated to our data. We could
+determine the uncertainties due to numerical rounding errors, but let's just use
+a common approach that requires less work: use the square-root of the data values
+as their uncertainty value:
+
+.. code-block:: python
+
+ sigma = numpy.sqrt(y)
+
+ # Fit y
+ fitresult = leastsq(model=poly4,
+ xdata=x,
+ ydata=y,
+ sigma=sigma,
+ p0=initial_parameters,
+ full_output=True)
+
+This results in a great improvement::
+
+ Weighted fit took 6 iterations
+ Reduced chi-square: 0.000000
+ Theoretical parameters:
+ a=2.4, b=-10, c=15.2, d=-24.6, e=150
+ Optimal parameters for y fitting:
+ a=2.400000, b=-10.000000, c=15.200000, d=-24.600000, e=150.000000
+
+The resulting fit is perfect. The very large ``y`` values with their very large
+associated uncertainties have been practicaly rejected from the fit process. The fit
+converged even faster than with the solution of limiting the ``x`` range to
+0 -- 100.
+
+.. _fitmanager-tutorial:
+
+Using :class:`FitManager`
++++++++++++++++++++++++++
+
+.. currentmodule:: silx.math.fit.fitmanager
+
+A :class:`FitManager` is a tool that provides a way of handling fit functions,
+associating estimation functions to estimate the initial parameters, modify
+the configuration parameters for the fit (enabling or disabling weights...) or
+for the estimation function, and choosing a background model.
+
+It provides an abstraction layer on top of :func:`leastsq`.
+
+Weighted polynomial fit
+***********************
+
+The following program accomplishes the same weighted fit of a polynomial as in
+the previous tutorial (`Weighted fit`_)
+
+.. code-block:: python
+
+ import numpy
+ from silx.math.fit.fitmanager import FitManager
+
+ # Create synthetic data with a sum of gaussian functions
+ x = numpy.arange(1000).astype(numpy.float)
+ y = 2.4 * x**4 - 10. * x**3 + 15.2 * x**2 - 24.6 * x + 150.
+
+ # define our fit function: a generic polynomial of degree 4
+ def poly4(x, a, b, c, d, e):
+ return a * x**4 + b * x**3 + c * x**2 + d * x + e
+
+ # define an estimation function to that returns initial parameters
+ # and constraints
+ def esti(x, y):
+ p0 = numpy.array([1., 1., 1., 1., 1.])
+ cons = numpy.zeros(shape=(5, 3))
+ return p0, cons
+
+ # Fitting
+ fit = FitManager()
+ fit.setdata(x=x, y=y)
+
+ fit.addtheory("polynomial",
+ function=poly4,
+ # any list of 5 parameter names would be OK
+ parameters=["A", "B", "C", "D", "E"],
+ estimate=esti)
+ fit.settheory('polynomial')
+ fit.configure(WeightFlag=True)
+ fit.estimate()
+ fit.runfit()
+
+ print("\n\nFit took %d iterations" % fit.niter)
+ print("Reduced chi-square: %f" % fit.chisq)
+ print("Theoretical parameters:\n\t" +
+ "a=2.4, b=-10, c=15.2, d=-24.6, e=150")
+ a, b, c, d, e = (param['fitresult'] for param in fit.fit_results)
+ print("Optimal parameters for y2 fitting:\n\t" +
+ "a=%f, b=%f, c=%f, d=%f, e=%f" % (a, b, c, d, e))
+
+
+The result is the same as in our weighted :func:`leastsq` example,
+as expected::
+
+ Fit took 6 iterations
+ Reduced chi-square: 0.000000
+ Theoretical parameters:
+ a=2.4, b=-10, c=15.2, d=-24.6, e=150
+ Optimal parameters for y2 fitting:
+ a=2.400000, b=-10.000000, c=15.200000, d=-24.600000, e=150.000000
+
+Fitting gaussians
+*****************
+
+The :class:`FitManager` object is especially useful for fitting multi-peak
+gaussian-shaped spectra. The *silx* module :mod:`silx.math.fit.fittheories`
+provides fit functions and their associated estimation functions that are
+specifically designed for this purpose.
+
+These fit functions can handle variable number of parameters defining a
+variable number of peaks, and the estimation functions use a peak detection
+algorithm to determine how many initial parameters must be returned.
+
+For the sake of the example, let's test the multi-peak fitting on synthetic
+data, generated using another *silx* module: :mod:`silx.math.fit.functions`.
+
+.. code-block:: python
+
+ import numpy
+ from silx.math.fit.functions import sum_gauss
+ from silx.math.fit import fittheories
+ from silx.math.fit.fitmanager import FitManager
+
+ # Create synthetic data with a sum of gaussian functions
+ x = numpy.arange(1000).astype(numpy.float)
+
+ # height, center x, fwhm
+ p = [1000, 100., 250, # 1st peak
+ 255, 690., 45, # 2nd peak
+ 1500, 800.5, 95] # 3rd peak
+
+ y = sum_gauss(x, *p)
+
+ # Fitting
+ fit = FitManager()
+ fit.setdata(x=x, y=y)
+ fit.loadtheories(fittheories)
+ fit.settheory('Gaussians')
+ fit.estimate()
+ fit.runfit()
+
+ print("Searched parameters = %s" % p)
+ print("Obtained parameters : ")
+ dummy_list = []
+ for param in fit.fit_results:
+ print(param['name'], ' = ', param['fitresult'])
+ dummy_list.append(param['fitresult'])
+ print("chisq = ", fit.chisq)
+
+And the result of this program is::
+
+ Searched parameters = [1000, 100.0, 250, 255, 690.0, 45, 1500, 800.5, 95]
+ Obtained parameters :
+ ('Height1', ' = ', 1000.0)
+ ('Position1', ' = ', 100.0)
+ ('FWHM1', ' = ', 250.0)
+ ('Height2', ' = ', 255.0)
+ ('Position2', ' = ', 690.0)
+ ('FWHM2', ' = ', 44.999999999999993)
+ ('Height3', ' = ', 1500.0)
+ ('Position3', ' = ', 800.5)
+ ('FWHM3', ' = ', 95.000000000000014)
+ ('chisq = ', 0.0)
+
+In addition to gaussians, we could have fitted several other similar type of
+functions: asymetric gaussian functions, lorentzian functions,
+Pseudo-Voigt functions or hypermet tailing functions.
+
+The :meth:`loadtheories` method can also be used to load user defined
+functions. Instead of a module, a path to a Python source file can be given
+as a parameter. This source file must adhere to certain conventions, as explained
+in the documentation of :mod:`silx.math.fit.fittheories` and
+:mod:`silx.math.fit.fittheory.FitTheory`.
+
+Subtracting a background
+************************
+
+:class:`FitManager` provides a few standard background theories, for cases when
+a background signal is superimposed on the multi-peak spectrum.
+
+For example, let's add a linear background to our synthetic data, and see how
+:class:`FitManager` handles the fitting.
+
+In our previous example, redefine ``y`` as follows:
+
+.. code-block:: python
+
+ p = [1000, 100., 250,
+ 255, 690., 45,
+ 1500, 800.5, 95]
+ y = sum_gauss(x, *p)
+ # add a synthetic linear background
+ y += 0.13 * x + 100.
+
+Before the line ``fit.estimate()``, add the following line:
+
+.. code-block:: python
+
+ fit.setbackground('Linear')
+
+The result becomes::
+
+ Searched parameters = [1000, 100.0, 250, 255, 690.0, 45, 1500, 800.5, 95]
+ Obtained parameters :
+ ('Constant', ' = ', 100.00000000000001)
+ ('Slope', ' = ', 0.12999999999999998)
+ ('Height1', ' = ', 1000.0)
+ ('Position1', ' = ', 100.0)
+ ('FWHM1', ' = ', 249.99999999999997)
+ ('Height2', ' = ', 255.00000000000003)
+ ('Position2', ' = ', 690.0)
+ ('FWHM2', ' = ', 44.999999999999993)
+ ('Height3', ' = ', 1500.0)
+ ('Position3', ' = ', 800.5)
+ ('FWHM3', ' = ', 95.0)
+ ('chisq = ', 3.1789004676997597e-27)
+
+The available background theories are: *Linear*, *Constant* and *Strip*.
+
+The strip background is a popular background model that can compute and
+subtract any background shape as long as its curvature is significantly
+lower than the peaks' curvature. In other words, as long as the background
+signal is significantly smoother than the actual signal, it can be easily
+computed.
+
+The main parameters required by the strip function are the strip width *w*
+and the number of iterations. At each iteration, if the contents of channel *i*,
+``y(i)``, is above the average of the contents of the channels at *w* channels of
+distance, ``y(i-w)`` and ``y(i+w)``, ``y(i)`` is replaced by the average.
+At the end of the process we are left with something that resembles a spectrum
+in which the peaks have been "stripped".
+
+The following example illustrates the strip background removal process:
+
+.. code-block:: python
+
+ from silx.sx import plot
+ from silx.gui import qt
+ import numpy
+ from silx.math.fit.filters import strip
+ from silx.math.fit.functions import sum_gauss
+
+ x = numpy.arange(5000)
+ # (height1, center1, fwhm1, ...) 5 peaks
+ params1 = (50, 500, 100,
+ 20, 2000, 200,
+ 50, 2250, 100,
+ 40, 3000, 75,
+ 23, 4000, 150)
+ y0 = sum_gauss(x, *params1)
+
+ # random values between [-1;1]
+ noise = 2 * numpy.random.random(5000) - 1
+ # make it +- 5%
+ noise *= 0.05
+
+ # 2 gaussians with very large fwhm, as background signal
+ actual_bg = sum_gauss(x, 15, 3500, 3000, 5, 1000, 1500)
+
+ # Add 5% random noise to gaussians and add background
+ y = y0 * (1 + noise) + actual_bg
+
+ # compute strip background model
+ strip_bg = strip(y, w=5, niterations=5000)
+
+ # plot results
+ app = qt.QApplication([])
+ plot(x, y, x, actual_bg, x, strip_bg)
+ plot(x, y, x, (y - strip_bg))
+ app.exec_()
+
+.. |imgStrip1| image:: img/stripbg_plot1.png
+ :height: 300px
+ :align: middle
+
+.. |imgStrip2| image:: img/stripbg_plot2.png
+ :height: 300px
+ :align: middle
+
+.. list-table::
+ :widths: 1 2
+
+ * - |imgStrip1|
+ - Data with background in black (``y``), actual background in red, computed strip
+ background in green
+ * - |imgStrip2|
+ - Data with background in blue, data after subtracting strip background in black
+
+The strip also removes the statistical noise, so the computed strip background
+will be slightly lower than the actual background. This can be solved by
+performing a smoothing prior to the strip computation.
+
+See the `PyMca documentation <http://pymca.sourceforge.net/stripbackground.html>`_
+for more information on the strip background.
+
+To configure the strip background model of :class:`FitManager`, use its :meth:`configure`
+method to modify the following parameters:
+
+ - *StripWidth*: strip width parameter *w*, mentionned earlier
+ - *StripNIterations*: number of iterations
+ - *StripThresholdFactor*: if this parameter is left to its default value 1,
+ the algorithm behaves as explained earlier: ``y(i)`` is compared to the average of
+ ``y(i-1)`` and ``y(i+1)``.
+ If this factor is set to another value *f*, ``y(i)`` is compared to the
+ average multiplied by ``f``.
+ - *SmoothStrip*: if this parameter is set to ``True``, a smoothing is applied
+ prior to the strip.
+
+
+These parameters can be modified like this:
+
+.. code-block:: python
+
+ # ...
+ fit.settheory('Strip')
+ fit.configure(StripWidth=5,
+ StripNIterations=5000,
+ StripThresholdFactor=1.1,
+ SmoothStrip=True)
+ # ...
+
+Using a strip background has performance implications. You should try to keep
+the number of iterations as low as possible if you need to run batch fitting
+using this model. Increasing the strip width can help reducing the number of
+iterations, with the risk of underestimating the background signal.
+
+.. _fitwidget-tutorial:
+
+Using :class:`FitWidget`
+++++++++++++++++++++++++
+
+.. currentmodule:: silx.gui.fit.FitWidget
+
+Simple usage
+************
+
+The :class:`FitWidget` is a graphical interface for :class:`FitManager`.
+
+.. code-block:: python
+
+ import numpy
+ from silx.gui import qt
+ from silx.gui.fit import FitWidget
+ from silx.math.fit.functions import sum_gauss
+
+ x = numpy.arange(2000).astype(numpy.float)
+ constant_bg = 3.14
+
+ # gaussian parameters: height, position, fwhm
+ p = numpy.array([1000, 100., 30.0,
+ 500, 300., 25.,
+ 1700, 500., 35.,
+ 750, 700., 30.0,
+ 1234, 900., 29.5,
+ 302, 1100., 30.5,
+ 75, 1300., 75.])
+ y = sum_gauss(x, *p) + constant_bg
+
+ a = qt.QApplication([])
+
+ w = FitWidget()
+ w.setData(x=x, y=y)
+ w.show()
+
+ a.exec_()
+
+.. |imgFitWidget1| image:: img/fitwidget1.png
+ :width: 300px
+ :align: middle
+
+.. |imgFitWidget2| image:: img/fitwidget2.png
+ :width: 300px
+ :align: middle
+
+.. |imgFitWidget3| image:: img/fitwidget3.png
+ :width: 300px
+ :align: middle
+
+.. |imgFitWidget4| image:: img/fitwidget4.png
+ :width: 300px
+ :align: middle
+
+Executing this code opens the following widget.
+
+ |imgFitWidget1|
+
+The functions you can choose from are the standard gaussian-shaped functions
+from :mod:`silx.math.fit.fittheories`. At the top of the list, you will find
+the *Add Function(s)* option, that allows you to load your user defined fit
+theories from a *.py* source file.
+
+After selecting the *Constant* background model and clicking the *Estimate*
+button, the widget displays this:
+
+ |imgFitWidget2|
+
+The 7 peaks have been detected, and their parameters estimated.
+Also, the estimation function defined some constraints (positive height and
+positive full-width at half-maximum).
+
+You can modify the values in the estimation column of the table, to use different
+initial parameters for the fit.
+
+The individual constraints can be modified prior to fitting. It is also possible to
+modify the constraints globally by clicking the *Configure* button' to open a
+configuration dialog. To get help on the meaning of the various parameters,
+hover the mouse on the corresponding check box or entry widget, to display a
+tooltip help message.
+
+ |imgFitWidget3|
+
+The other configuration tabs can be modified to change the peak search parameters
+and the strip background parameters prior to the estimation.
+After closing the configuration dialog, you must re-run the estimation
+by clicking the *Estimate* button.
+
+After all configuration parameters and all constrants are set according to your
+preferences, you can click the *Start Fit* button. This runs the fit and displays
+the results in the *Fit Value* column of the table.
+
+ |imgFitWidget4|
+
+Customizing the functions
+*************************
+
+.. |imgFitWidget5| image:: img/fitwidget5.png
+ :width: 300px
+ :align: middle
+
+The :class:`FitWidget` can be initialized with a non-standard
+:class:`FitManager`, to customize the available functions.
+
+.. code-block:: python
+
+ from silx.gui import qt
+ from silx.math.fit import FitManager
+ from silx.gui.fit import FitWidget
+
+ def linearfun(x, a, b):
+ return a * x + b
+
+ # create synthetic data for the example
+ x = list(range(0, 100))
+ y = [linearfun(x_, 2.0, 3.0) for x_ in x]
+
+ # we need to create a custom fit manager and add our theory
+ myfitmngr = FitManager()
+ myfitmngr.setdata(x, y)
+ myfitmngr.addtheory("my linear function",
+ function=linearfun,
+ parameters=["a", "b"])
+
+ a = qt.QApplication([])
+
+ # our fit widget can now use our custom fit manager
+ fw = FitWidget(fitmngr=myfitmngr)
+ fw.show()
+
+ a.exec_()
+
+In our previous example, we didn't load a custom :class:`FitManager`.
+Therefore, the fit widget automatically initialized a fit manager and
+loaded the custom gaussian functions.
+
+This time, we initialized our own :class:`FitManager` and loaded our
+own function, so only this function is presented as an option in the GUI.
+
+Our custom function does not provide an associated estimation function, so
+the default estimation function of :class:`FitManager` was used. This
+default estimation function returns an array of ones the same length as the
+list of *parameter* names, and set all constraints to *FREE*.
+
+ |imgFitWidget5|