Import Upstream version 0.5.0+dfsg

author: Picca Frédéric-Emmanuel <picca@synchrotron-soleil.fr> 2017-08-18 14:48:52 +0200
committer: Picca Frédéric-Emmanuel <picca@synchrotron-soleil.fr> 2017-08-18 14:48:52 +0200
commit: f7bdc2acff3c13a6d632c28c4569690ab106eed7 (patch)
tree: 9d67cdb7152ee4e711379e03fe0546c7c3b97303 /doc/source/Tutorials/fit.rst
1 files changed, 635 insertions, 0 deletions
diff --git a/doc/source/Tutorials/fit.rst b/doc/source/Tutorials/fit.rst
new file mode 100644
index 0000000..1305299
--- /dev/null
+++ b/doc/source/Tutorials/fit.rst
@@ -0,0 +1,635 @@
+
+.. _fit-tutorial:
+
+Fit tools
+---------
+
+.. contents:: :local:
+
+.. _leastsq-tutorial:
+
+Using :func:`leastsq`
++++++++++++++++++++++
+
+.. currentmodule:: silx.math.fit
+
+Running an iterative fit with :func:`leastsq` involves the following steps:
+
+    - designing a fit model function that has the signature ``f(x, ...)``,
+      where ``x`` is an array of values of the independant variable and all
+      remaining parameters are the parameters to be fitted
+    - defining the sequence of initial values for all parameters to be fitted.
+      You can usually start with ``[1., 1., ...]`` if you don't know a better
+      estimate. The algorithm is robust enough to converge to a solution most
+      of the time.
+    - setting constraints (optional)
+
+Data required to perform a fit is:
+
+    - an array of ``x`` values (abscissa, independant variable)
+    - an array of ``y`` data points
+    - the ``sigma`` array of uncertainties associated to each data point.
+      This is optional, by default each data point gets assigned a weight of 1.
+
+Standard fit
+************
+
+Let's demonstrate this process in a short example, using synthetic data.
+We generate an array of synthetic data using a polynomial function of degree 4,
+and try to use :func:`leastsq` to find back the functions parameters.
+
+.. code-block:: python
+
+   import numpy
+   from silx.math.fit import leastsq
+
+   # create some synthetic polynomial data
+   x = numpy.arange(1000)
+   y = 2.4 * x**4 - 10. * x**3 + 15.2 * x**2 - 24.6 * x + 150.
+
+   # define our fit function: a generic polynomial of degree 4
+   def poly4(x, a, b, c, d, e):
+       return a * x**4 + b * x**3 + c * x**2 + d * x + e
+
+   # The fit is an iterative process that requires an initial
+   # estimation of the parameters. Let's just use 1s.
+   initial_parameters = numpy.array([1., 1., 1., 1., 1.])
+
+   # Run fit
+   fitresult = leastsq(model=poly4,
+                       xdata=x,
+                       ydata=y,
+                       p0=initial_parameters,
+                       full_output=True)
+
+   # leastsq with full_output=True returns 3 objets
+   optimal_parameters, covariance, infodict = fitresult
+   # the first object is an array with the fitted parameters
+   a, b, c, d, e = optimal_parameters
+
+   print("Fit took %d iterations" % infodict["niter"])
+   print("Reduced chi-square: %f" % infodict["reduced_chisq"])
+   print("Theoretical parameters:\n\t" +
+         "a=2.4, b=-10, c=15.2, d=-24.6, e=150")
+   print("Optimal parameters for y2 fitting:\n\t" +
+         "a=%f, b=%f, c=%f, d=%f, e=%f" % (a, b, c, d, e))
+
+The output of this program is::
+
+   Fit took 35 iterations
+   Reduced chi-square: 682592.670690
+   Theoretical parameters:
+       a=2.4, b=-10, c=15.2, d=-24.6, e=150
+   Optimal parameters for y fitting:
+       a=2.400000, b=-9.999665, c=14.970422, d=31.683448, e=-3216.131136
+
+We can see that this fit result is poor. In particular, parameters ``d`` and ``e``
+are very poorly fitted.
+This is most likely due to numerical rounding errors. As we are dealing with
+very large values in our ``y`` array, we are affected by the limits of how
+floating point numbers are represented by computers. The larger a value, the
+larger its rounding error.
+
+If you limit the ``x`` range to deal with
+smaller ``y`` values, the fit result becomes perfect. In our example, replacing ``x``
+with::
+
+    x = numpy.arange(100)
+
+produces the following result::
+
+   Fit took 9 iterations
+   Reduced chi-square: 0.000000
+   Theoretical parameters:
+       a=2.4, b=-10, c=15.2, d=-24.6, e=150
+   Optimal parameters for y fitting:
+       a=2.400000, b=-10.000000, c=15.200000, d=-24.600000, e=150.000000
+
+
+
+Constrained fit
+***************
+
+But let's revert back to our initial ``x = numpy.arange(1000)``, to experiment
+with different approaches to improving the fit.
+
+The :func:`leastsq` functions provides
+a way to set constraints on parameters. You can for instance assert that a given
+parameter must remain equal to it's initial value, or define an acceptable range
+for it to vary, or decide that a parameter must be equal to another parameter
+multiplied by a certain factor. This is very useful in cases in which you have
+enough knowledge to make reasonable assumptions on some parameters.
+
+In our case, we will set constraints on ``d`` and ``e``. We will quote ``d`` to
+stay in the range between -25 and -24, and fix ``e`` to 150.
+
+Replace the call to :func:`leastsq` by following lines:
+
+.. code-block:: python
+
+   # Define constraints
+   cons = [[0, 0, 0],          # a: no constraint
+           [0, 0, 0],          # b: no constraint
+           [0, 0, 0],          # c: no constraint
+           [2, -25., -23.],    # -25 < d < -24
+           [3, 0, 0]]          # e is fixed to initial value
+   fitresult = leastsq(poly4, x, y,
+                       # initial values must be consistent with constraints
+                       p0=[1., 1., 1., -24., 150.],
+                       constraints=cons,
+                       full_output=True)
+
+The output of this program is::
+
+   Constrained fit took 100 iterations
+   Reduced chi-square: 3.749280
+   Theoretical parameters:
+       a=2.4, b=-10, c=15.2, d=-24.6, e=150
+   Optimal parameters for y fitting:
+       a=2.400000, b=-9.999999, c=15.199648, d=-24.533014, e=150.000000
+
+The chi-square value is much improved and the results are much better, at the
+cost of more iterations.
+
+Weighted fit
+************
+A third approach to improve our fit is to define uncertainties for the data.
+The larger the uncertainty on a data sample, the smaller its weight will be
+in the least-square problem.
+
+In our case, we do not know the uncertainties associated to our data. We could
+determine the uncertainties due to numerical rounding errors, but let's just use
+a common approach that requires less work: use the square-root of the data values
+as their uncertainty value:
+
+.. code-block:: python
+
+   sigma = numpy.sqrt(y)
+
+   # Fit y
+   fitresult = leastsq(model=poly4,
+                       xdata=x,
+                       ydata=y,
+                       sigma=sigma,
+                       p0=initial_parameters,
+                       full_output=True)
+
+This results in a great improvement::
+
+   Weighted fit took 6 iterations
+   Reduced chi-square: 0.000000
+   Theoretical parameters:
+       a=2.4, b=-10, c=15.2, d=-24.6, e=150
+   Optimal parameters for y fitting:
+       a=2.400000, b=-10.000000, c=15.200000, d=-24.600000, e=150.000000
+
+The resulting fit is perfect. The very large ``y`` values with their very large
+associated uncertainties have been practicaly rejected from the fit process. The fit
+converged even faster than with the solution of limiting the ``x`` range to
+0 -- 100.
+
+.. _fitmanager-tutorial:
+
+Using :class:`FitManager`
++++++++++++++++++++++++++
+
+.. currentmodule:: silx.math.fit.fitmanager
+
+A :class:`FitManager` is a tool that provides a way of handling fit functions,
+associating estimation functions to estimate the initial parameters, modify
+the configuration parameters for the fit (enabling or disabling weights...) or
+for the estimation function, and choosing a background model.
+
+It provides an abstraction layer on top of :func:`leastsq`.
+
+Weighted polynomial fit
+***********************
+
+The following program accomplishes the same weighted fit of a polynomial as in
+the previous tutorial (`Weighted fit`_)
+
+.. code-block:: python
+
+    import numpy
+    from silx.math.fit.fitmanager import FitManager
+
+    # Create synthetic data with a sum of gaussian functions
+    x = numpy.arange(1000).astype(numpy.float)
+    y = 2.4 * x**4 - 10. * x**3 + 15.2 * x**2 - 24.6 * x + 150.
+
+    # define our fit function: a generic polynomial of degree 4
+    def poly4(x, a, b, c, d, e):
+        return a * x**4 + b * x**3 + c * x**2 + d * x + e
+
+    # define an estimation function to that returns initial parameters
+    # and constraints
+    def esti(x, y):
+        p0 = numpy.array([1., 1., 1., 1., 1.])
+        cons = numpy.zeros(shape=(5, 3))
+        return p0, cons
+
+    # Fitting
+    fit = FitManager()
+    fit.setdata(x=x, y=y)
+
+    fit.addtheory("polynomial",
+                  function=poly4,
+                  # any list of 5 parameter names would be OK
+                  parameters=["A", "B", "C", "D", "E"],
+                  estimate=esti)
+    fit.settheory('polynomial')
+    fit.configure(WeightFlag=True)
+    fit.estimate()
+    fit.runfit()
+
+    print("\n\nFit took %d iterations" % fit.niter)
+    print("Reduced chi-square: %f" % fit.chisq)
+    print("Theoretical parameters:\n\t" +
+          "a=2.4, b=-10, c=15.2, d=-24.6, e=150")
+    a, b, c, d, e = (param['fitresult'] for param in fit.fit_results)
+    print("Optimal parameters for y2 fitting:\n\t" +
+          "a=%f, b=%f, c=%f, d=%f, e=%f" % (a, b, c, d, e))
+
+
+The result is the same as in our weighted :func:`leastsq` example,
+as expected::
+
+    Fit took 6 iterations
+    Reduced chi-square: 0.000000
+    Theoretical parameters:
+        a=2.4, b=-10, c=15.2, d=-24.6, e=150
+    Optimal parameters for y2 fitting:
+        a=2.400000, b=-10.000000, c=15.200000, d=-24.600000, e=150.000000
+
+Fitting gaussians
+*****************
+
+The :class:`FitManager` object is especially useful for fitting multi-peak
+gaussian-shaped spectra. The *silx* module :mod:`silx.math.fit.fittheories`
+provides fit functions and their associated estimation functions that are
+specifically designed for this purpose.
+
+These fit functions can handle variable number of parameters defining a
+variable number of peaks, and the estimation functions use a peak detection
+algorithm to determine how many initial parameters must be returned.
+
+For the sake of the example, let's test the multi-peak fitting on synthetic
+data, generated using another *silx* module: :mod:`silx.math.fit.functions`.
+
+.. code-block:: python
+
+    import numpy
+    from silx.math.fit.functions import sum_gauss
+    from silx.math.fit import fittheories
+    from silx.math.fit.fitmanager import FitManager
+
+    # Create synthetic data with a sum of gaussian functions
+    x = numpy.arange(1000).astype(numpy.float)
+
+    # height, center x, fwhm
+    p = [1000, 100., 250,     # 1st peak
+         255, 690., 45,       # 2nd peak
+         1500, 800.5, 95]     # 3rd peak
+
+    y = sum_gauss(x, *p)
+
+    # Fitting
+    fit = FitManager()
+    fit.setdata(x=x, y=y)
+    fit.loadtheories(fittheories)
+    fit.settheory('Gaussians')
+    fit.estimate()
+    fit.runfit()
+
+    print("Searched parameters = %s" % p)
+    print("Obtained parameters : ")
+    dummy_list = []
+    for param in fit.fit_results:
+        print(param['name'], ' = ', param['fitresult'])
+        dummy_list.append(param['fitresult'])
+    print("chisq = ", fit.chisq)
+
+And the result of this program is::
+
+    Searched parameters = [1000, 100.0, 250, 255, 690.0, 45, 1500, 800.5, 95]
+    Obtained parameters :
+    ('Height1', ' = ', 1000.0)
+    ('Position1', ' = ', 100.0)
+    ('FWHM1', ' = ', 250.0)
+    ('Height2', ' = ', 255.0)
+    ('Position2', ' = ', 690.0)
+    ('FWHM2', ' = ', 44.999999999999993)
+    ('Height3', ' = ', 1500.0)
+    ('Position3', ' = ', 800.5)
+    ('FWHM3', ' = ', 95.000000000000014)
+    ('chisq = ', 0.0)
+
+In addition to gaussians, we could have fitted several other similar type of
+functions: asymetric gaussian functions, lorentzian functions,
+Pseudo-Voigt functions or hypermet tailing functions.
+
+The :meth:`loadtheories` method can also be used to load user defined
+functions. Instead of a module, a path to a Python source file can be given
+as a parameter. This source file must adhere to certain conventions, as explained
+in the documentation of :mod:`silx.math.fit.fittheories` and
+:mod:`silx.math.fit.fittheory.FitTheory`.
+
+Subtracting a background
+************************
+
+:class:`FitManager` provides a few standard background theories, for cases when
+a background signal is superimposed on the multi-peak spectrum.
+
+For example, let's add a linear background to our synthetic data, and see how
+:class:`FitManager` handles the fitting.
+
+In our previous example, redefine ``y`` as follows:
+
+.. code-block:: python
+
+    p = [1000, 100., 250,
+         255, 690., 45,
+         1500, 800.5, 95]
+    y = sum_gauss(x, *p)
+    # add a synthetic linear background
+    y += 0.13 * x + 100.
+
+Before the line ``fit.estimate()``, add the following line:
+
+.. code-block:: python
+
+    fit.setbackground('Linear')
+
+The result becomes::
+
+    Searched parameters = [1000, 100.0, 250, 255, 690.0, 45, 1500, 800.5, 95]
+    Obtained parameters :
+    ('Constant', ' = ', 100.00000000000001)
+    ('Slope', ' = ', 0.12999999999999998)
+    ('Height1', ' = ', 1000.0)
+    ('Position1', ' = ', 100.0)
+    ('FWHM1', ' = ', 249.99999999999997)
+    ('Height2', ' = ', 255.00000000000003)
+    ('Position2', ' = ', 690.0)
+    ('FWHM2', ' = ', 44.999999999999993)
+    ('Height3', ' = ', 1500.0)
+    ('Position3', ' = ', 800.5)
+    ('FWHM3', ' = ', 95.0)
+    ('chisq = ', 3.1789004676997597e-27)
+
+The available background theories are: *Linear*, *Constant* and *Strip*.
+
+The strip background is a popular background model that can compute and
+subtract any background shape as long as its curvature is significantly
+lower than the peaks' curvature. In other words, as long as the background
+signal is significantly smoother than the actual signal, it can be easily
+computed.
+
+The main parameters required by the strip function are the strip width *w*
+and the number of iterations. At each iteration, if the contents of channel *i*,
+``y(i)``, is above the average of the contents of the channels at *w* channels of
+distance, ``y(i-w)`` and ``y(i+w)``,  ``y(i)`` is replaced by the average.
+At the end of the process we are left with something that resembles a spectrum
+in which the peaks have been "stripped".
+
+The following example illustrates the strip background removal process:
+
+.. code-block:: python
+
+    from silx.sx import plot
+    from silx.gui import qt
+    import numpy
+    from silx.math.fit.filters import strip
+    from silx.math.fit.functions import sum_gauss
+
+    x = numpy.arange(5000)
+    # (height1, center1, fwhm1, ...) 5 peaks
+    params1 = (50, 500, 100,
+               20, 2000, 200,
+               50, 2250, 100,
+               40, 3000, 75,
+               23, 4000, 150)
+    y0 = sum_gauss(x, *params1)
+
+    # random values between [-1;1]
+    noise = 2 * numpy.random.random(5000) - 1
+    # make it +- 5%
+    noise *= 0.05
+
+    # 2 gaussians with very large fwhm, as background signal
+    actual_bg = sum_gauss(x, 15, 3500, 3000, 5, 1000, 1500)
+
+    # Add 5% random noise to gaussians and add background
+    y = y0 * (1 + noise) + actual_bg
+
+    # compute strip background model
+    strip_bg = strip(y, w=5, niterations=5000)
+
+    # plot results
+    app = qt.QApplication([])
+    plot(x, y, x, actual_bg, x, strip_bg)
+    plot(x, y, x, (y - strip_bg))
+    app.exec_()
+
+.. |imgStrip1| image:: img/stripbg_plot1.png
+   :height: 300px
+   :align: middle
+
+.. |imgStrip2| image:: img/stripbg_plot2.png
+   :height: 300px
+   :align: middle
+
+.. list-table::
+   :widths: 1 2
+
+   * - |imgStrip1|
+     - Data with background in black (``y``), actual background in red, computed strip
+       background in green
+   * - |imgStrip2|
+     - Data with background in blue, data after subtracting strip background in black
+
+The strip also removes the statistical noise, so the computed strip background
+will be slightly lower than the actual background. This can be solved by
+performing a smoothing prior to the strip computation.
+
+See the `PyMca documentation <http://pymca.sourceforge.net/stripbackground.html>`_
+for more information on the strip background.
+
+To configure the strip background model of :class:`FitManager`, use its :meth:`configure`
+method to modify the following parameters:
+
+ - *StripWidth*: strip width parameter *w*, mentionned earlier
+ - *StripNIterations*: number of iterations
+ - *StripThresholdFactor*: if this parameter is left to its default value 1,
+   the algorithm behaves as explained earlier: ``y(i)`` is compared to the average of
+   ``y(i-1)`` and ``y(i+1)``.
+   If this factor is set to another value *f*, ``y(i)`` is compared to the
+   average multiplied by ``f``.
+ - *SmoothStrip*: if this parameter is set to ``True``, a smoothing is applied
+   prior to the strip.
+
+
+These parameters can be modified like this:
+
+.. code-block:: python
+
+    # ...
+    fit.settheory('Strip')
+    fit.configure(StripWidth=5,
+                  StripNIterations=5000,
+                  StripThresholdFactor=1.1,
+                  SmoothStrip=True)
+    # ...
+
+Using a strip background has performance implications. You should try to keep
+the number of iterations as low as possible if you need to run batch fitting
+using this model. Increasing the strip width can help reducing the number of
+iterations, with the risk of underestimating the background signal.
+
+.. _fitwidget-tutorial:
+
+Using :class:`FitWidget`
+++++++++++++++++++++++++
+
+.. currentmodule:: silx.gui.fit.FitWidget
+
+Simple usage
+************
+
+The :class:`FitWidget` is a graphical interface for :class:`FitManager`.
+
+.. code-block:: python
+
+    import numpy
+    from silx.gui import qt
+    from silx.gui.fit import FitWidget
+    from silx.math.fit.functions import sum_gauss
+
+    x = numpy.arange(2000).astype(numpy.float)
+    constant_bg = 3.14
+
+    # gaussian parameters: height, position, fwhm
+    p = numpy.array([1000, 100., 30.0,
+                     500, 300., 25.,
+                     1700, 500., 35.,
+                     750, 700., 30.0,
+                     1234, 900., 29.5,
+                     302, 1100., 30.5,
+                     75, 1300., 75.])
+    y = sum_gauss(x, *p) + constant_bg
+
+    a = qt.QApplication([])
+
+    w = FitWidget()
+    w.setData(x=x, y=y)
+    w.show()
+
+    a.exec_()
+
+.. |imgFitWidget1| image:: img/fitwidget1.png
+   :width: 300px
+   :align: middle
+
+.. |imgFitWidget2| image:: img/fitwidget2.png
+   :width: 300px
+   :align: middle
+
+.. |imgFitWidget3| image:: img/fitwidget3.png
+   :width: 300px
+   :align: middle
+
+.. |imgFitWidget4| image:: img/fitwidget4.png
+   :width: 300px
+   :align: middle
+
+Executing this code opens the following widget.
+
+    |imgFitWidget1|
+
+The functions you can choose from are the standard gaussian-shaped functions
+from :mod:`silx.math.fit.fittheories`. At the top of the list, you will find
+the *Add Function(s)* option, that allows you to load your user defined fit
+theories from a *.py* source file.
+
+After selecting the *Constant* background model and clicking the *Estimate*
+button, the widget displays this:
+
+    |imgFitWidget2|
+
+The 7 peaks have been detected, and their parameters estimated.
+Also, the estimation function defined some constraints (positive height and
+positive full-width at half-maximum).
+
+You can modify the values in the estimation column of the table, to use different
+initial parameters for the fit.
+
+The individual constraints can be modified prior to fitting. It is also possible to
+modify the constraints globally by clicking the *Configure* button' to open a
+configuration dialog. To get help on the meaning of the various parameters,
+hover the mouse on the corresponding check box or entry widget, to display a
+tooltip help message.
+
+    |imgFitWidget3|
+
+The other configuration tabs can be modified to change the peak search parameters
+and the strip background parameters prior to the estimation.
+After closing the configuration dialog, you must re-run the estimation
+by clicking the *Estimate* button.
+
+After all configuration parameters and all constrants are set according to your
+preferences, you can click the *Start Fit* button. This runs the fit and displays
+the results in the *Fit Value* column of the table.
+
+    |imgFitWidget4|
+
+Customizing the functions
+*************************
+
+.. |imgFitWidget5| image:: img/fitwidget5.png
+   :width: 300px
+   :align: middle
+
+The :class:`FitWidget` can be initialized with a non-standard
+:class:`FitManager`, to customize the available functions.
+
+.. code-block:: python
+
+    from silx.gui import qt
+    from silx.math.fit import FitManager
+    from silx.gui.fit import FitWidget
+
+    def linearfun(x, a, b):
+        return a * x + b
+
+    # create synthetic data for the example
+    x = list(range(0, 100))
+    y = [linearfun(x_, 2.0, 3.0) for x_ in x]
+
+    # we need to create a custom fit manager and add our theory
+    myfitmngr = FitManager()
+    myfitmngr.setdata(x, y)
+    myfitmngr.addtheory("my linear function",
+                        function=linearfun,
+                        parameters=["a", "b"])
+
+    a = qt.QApplication([])
+
+    # our fit widget can now use our custom fit manager
+    fw = FitWidget(fitmngr=myfitmngr)
+    fw.show()
+
+    a.exec_()
+
+In our previous example, we didn't load a custom :class:`FitManager`.
+Therefore, the fit widget automatically initialized a fit manager and
+loaded the custom gaussian functions.
+
+This time, we initialized our own :class:`FitManager` and loaded our
+own function, so only this function is presented as an option in the GUI.
+
+Our custom function does not provide an associated estimation function, so
+the default estimation function of :class:`FitManager` was used. This
+default estimation function returns an array of ones the same length as the
+list of *parameter* names, and set all constraints to *FREE*.
+
+    |imgFitWidget5|
author	Picca Frédéric-Emmanuel <picca@synchrotron-soleil.fr>	2017-08-18 14:48:52 +0200
committer	Picca Frédéric-Emmanuel <picca@synchrotron-soleil.fr>	2017-08-18 14:48:52 +0200
commit	f7bdc2acff3c13a6d632c28c4569690ab106eed7 (patch)
tree	9d67cdb7152ee4e711379e03fe0546c7c3b97303 /doc/source/Tutorials/fit.rst