diff options
Diffstat (limited to 'doc/source/Tutorials/fit.rst')
-rw-r--r-- | doc/source/Tutorials/fit.rst | 124 |
1 files changed, 74 insertions, 50 deletions
diff --git a/doc/source/Tutorials/fit.rst b/doc/source/Tutorials/fit.rst index 1305299..9889274 100644 --- a/doc/source/Tutorials/fit.rst +++ b/doc/source/Tutorials/fit.rst @@ -16,7 +16,7 @@ Using :func:`leastsq` Running an iterative fit with :func:`leastsq` involves the following steps: - designing a fit model function that has the signature ``f(x, ...)``, - where ``x`` is an array of values of the independant variable and all + where ``x`` is an array of values of the independent variable and all remaining parameters are the parameters to be fitted - defining the sequence of initial values for all parameters to be fitted. You can usually start with ``[1., 1., ...]`` if you don't know a better @@ -26,13 +26,13 @@ Running an iterative fit with :func:`leastsq` involves the following steps: Data required to perform a fit is: - - an array of ``x`` values (abscissa, independant variable) + - an array of ``x`` values (abscissa, independent variable) - an array of ``y`` data points - the ``sigma`` array of uncertainties associated to each data point. This is optional, by default each data point gets assigned a weight of 1. -Standard fit -************ +Default (unweighted) fit +************************ Let's demonstrate this process in a short example, using synthetic data. We generate an array of synthetic data using a polynomial function of degree 4, @@ -83,16 +83,22 @@ The output of this program is:: Optimal parameters for y fitting: a=2.400000, b=-9.999665, c=14.970422, d=31.683448, e=-3216.131136 +.. note:: + + The exact results may vary depending on your Python version. + We can see that this fit result is poor. In particular, parameters ``d`` and ``e`` are very poorly fitted. -This is most likely due to numerical rounding errors. As we are dealing with -very large values in our ``y`` array, we are affected by the limits of how -floating point numbers are represented by computers. The larger a value, the -larger its rounding error. -If you limit the ``x`` range to deal with -smaller ``y`` values, the fit result becomes perfect. In our example, replacing ``x`` -with:: +This is due to the fact that data points with large values have a stronger influence +in the fit process. In our examples, as ``x`` increases, ``y`` increases fast. +The influence of the weighting, and how to solve this issue is explained in more details +in the next section. + +In the meantime, if you simply limit the ``x`` range, to deal with +smaller ``y`` values, you can notice that the fit result becomes perfect. + +In our example, replacing ``x`` with:: x = numpy.arange(100) @@ -106,11 +112,66 @@ produces the following result:: a=2.400000, b=-10.000000, c=15.200000, d=-24.600000, e=150.000000 +Weighted fit +************ + +Since the fitting algorithm minimizes the sum of squared differences between input +and evaluated data, points with higher y value had a greater weight in the fitting process. +A solution to this problem, if we want to improve our fit, is to define uncertainties +for the data. +The larger the uncertainty on a data sample, the smaller its weight will be +in the least-square problem. + +It is important to set the uncertainties correctly, or you risk favoring either +the lower values or the higher values in your data. + +The common approach in counting experiments is to use the square-root of the data +values as the uncertainty value (assuming a Poissonian law). +Let's apply it to our previous example: + +.. code-block:: python + + sigma = numpy.sqrt(y) + + # Fit y + fitresult = leastsq(model=poly4, + xdata=x, + ydata=y, + sigma=sigma, + p0=initial_parameters, + full_output=True) + +This results in a great improvement:: + + Weighted fit took 6 iterations + Reduced chi-square: 0.000000 + Theoretical parameters: + a=2.4, b=-10, c=15.2, d=-24.6, e=150 + Optimal parameters for y fitting: + a=2.400000, b=-10.000000, c=15.200000, d=-24.600000, e=150.000000 + +The resulting fit is perfect. The fit converged even faster than when +we limited ``x`` range to 0 -- 100. + +To use a real world example, when fitting x-ray fluorescence spectroscopy data, +this common approach means that we consider the variance of each channel to be +the number of counts in that channel. +That corresponds to assuming a normal distribution. +The true distribution being a Poisson distribution, the Gaussian distribution +is a good approximation for channels with high number of counts, +but the approximation is not valid when the number of counts in a channel is small. + +Therefore, in spectra where the overall statistics is very low, a +weighted fit can lead the fitting process to fit the background +considering the peaks as outliers, because the fit will consider a +channel with 1 count 100 times more relevant than a channel with 100 +counts. + Constrained fit *************** -But let's revert back to our initial ``x = numpy.arange(1000)``, to experiment +But let's revert back to our unweighted fit, to experiment with different approaches to improving the fit. The :func:`leastsq` functions provides @@ -151,43 +212,6 @@ The output of this program is:: The chi-square value is much improved and the results are much better, at the cost of more iterations. -Weighted fit -************ -A third approach to improve our fit is to define uncertainties for the data. -The larger the uncertainty on a data sample, the smaller its weight will be -in the least-square problem. - -In our case, we do not know the uncertainties associated to our data. We could -determine the uncertainties due to numerical rounding errors, but let's just use -a common approach that requires less work: use the square-root of the data values -as their uncertainty value: - -.. code-block:: python - - sigma = numpy.sqrt(y) - - # Fit y - fitresult = leastsq(model=poly4, - xdata=x, - ydata=y, - sigma=sigma, - p0=initial_parameters, - full_output=True) - -This results in a great improvement:: - - Weighted fit took 6 iterations - Reduced chi-square: 0.000000 - Theoretical parameters: - a=2.4, b=-10, c=15.2, d=-24.6, e=150 - Optimal parameters for y fitting: - a=2.400000, b=-10.000000, c=15.200000, d=-24.600000, e=150.000000 - -The resulting fit is perfect. The very large ``y`` values with their very large -associated uncertainties have been practicaly rejected from the fit process. The fit -converged even faster than with the solution of limiting the ``x`` range to -0 -- 100. - .. _fitmanager-tutorial: Using :class:`FitManager` @@ -326,7 +350,7 @@ And the result of this program is:: In addition to gaussians, we could have fitted several other similar type of functions: asymetric gaussian functions, lorentzian functions, -Pseudo-Voigt functions or hypermet tailing functions. +pseudo-voigt functions or hypermet tailing functions. The :meth:`loadtheories` method can also be used to load user defined functions. Instead of a module, a path to a Python source file can be given |