summaryrefslogtreecommitdiff
path: root/doc/source/Tutorials/io.rst
diff options
context:
space:
mode:
authorPicca Frédéric-Emmanuel <picca@debian.org>2017-10-07 07:59:01 +0200
committerPicca Frédéric-Emmanuel <picca@debian.org>2017-10-07 07:59:01 +0200
commitbfa4dba15485b4192f8bbe13345e9658c97ecf76 (patch)
treefb9c6e5860881fbde902f7cbdbd41dc4a3a9fb5d /doc/source/Tutorials/io.rst
parentf7bdc2acff3c13a6d632c28c4569690ab106eed7 (diff)
New upstream version 0.6.0+dfsg
Diffstat (limited to 'doc/source/Tutorials/io.rst')
-rw-r--r--doc/source/Tutorials/io.rst289
1 files changed, 289 insertions, 0 deletions
diff --git a/doc/source/Tutorials/io.rst b/doc/source/Tutorials/io.rst
new file mode 100644
index 0000000..139ad2d
--- /dev/null
+++ b/doc/source/Tutorials/io.rst
@@ -0,0 +1,289 @@
+
+Getting started with silx.io
+============================
+
+This tutorial explains how to read data files using the :meth:`silx.io.open` function.
+
+The target audience are developers without knowledge of the *h5py* library.
+
+If you are already familiar with *h5py*, you just need to know that
+the :meth:`silx.io.open` function returns objects that mimic *h5py* file objects,
+and that the main supported file formats are:
+
+ - HDF5
+ - all formats supported by the *FabIO* library
+ - SPEC data files
+
+Knowledge about the python *dictionary* type and the numpy *ndarray* type
+are prerequisites for this tutorial.
+
+
+Background
+----------
+
+In the past, it was necessary to learn how to use multiple libraries to read multiple
+data formats. The library *FabIO* was designed to read images in many formats, but not to read
+more heterogeneous formats, such as *HDF5* or *SPEC*.
+
+To read *SPEC* data files in Python, a common solution was to use the *PyMca* module
+:mod:`PyMca5.PyMcaIO.specfilewrapper`.
+Regarding HDF5 files, the de-facto standard for reading them in Python is to
+use the *h5py* library.
+
+*silx* tries to address this situation by providing a unified way to read all
+data formats supported at the ESRF. Today, HDF5 is the preffered format to store
+data for many scientific institutions, including most synchrotrons.
+So it was decided to provide tools for reading data that mimic the *h5py* library's API.
+
+
+Definitions
+-----------
+
+HDF5
+++++
+
+The *HDF5* format is a *hierarchical data format*, designed to store and
+organize large amounts of data.
+
+A HDF5 file contains a number of *datasets*, which are multidimensional arrays
+of a homogeneous type.
+
+These datasets are stored in container structures
+called *groups*. Groups can also be stored in other groups, allowing to
+define a hierarchical tree structure.
+
+Both datasets and groups may have *attributes* attached to them. Attributes are
+used to document the object. They are similar to datasets in several ways
+(data container of homogeneous type), but they are typically much smaller.
+
+It is a common analogy to compare a HDF5 file to a filesystem.
+Groups are analogous to directories, while datasets are analogous to files,
+and attributes are analogous to file metadata (creation date, last modification...).
+
+.. image:: img/silx_view_edf.png
+ :width: 400px
+
+
+h5py
+++++
+
+The *h5py* library is a Pythonic interface to the `HDF5`_ binary data format.
+
+It exposes an HDF5 group as a python object that resembles a python
+dictionary, and an HDF5 dataset or attribute as an object that resembles a
+numpy array.
+
+API description
+---------------
+
+All main objects, File, Group and Dataset, share the following attributes:
+
+ - :attr:`attrs`: Attributes, as a dictionary of metadata for the group or dataset.
+ - :attr:`basename`: String giving the basename of this group or dataset.
+ - :attr:`name`: String giving the full path to this group or dataset, relative
+ to the root group (file).
+ - :attr:`file`: File object at the root of the tree structure containing this
+ group or dataset.
+ - :attr:`parent`: Group object containing this group or dataset.
+
+File object
++++++++++++
+
+The API of the file objects returned by the :meth:`silx.io.open`
+function tries to be as close as possible to the API of the :class:`h5py.File`
+objects used to read HDF5 data.
+
+A h5py file is a group with just a few extra attributes and methods.
+
+The objects defined in `silx.io` implement a subset of these attributes and methods:
+
+ - :attr:`filename`: Name of the file on disk.
+ - :attr:`mode`: String indicating if the file is open in read mode ("r")
+ or write mode ("w"). :meth:`silx.io.open` always returns objects in read mode.
+ - :meth:`close`: Close this file. All open objects will become invalid.
+
+The :attr:`parent` of a file is `None`, and its :attr:`name` is an empty string.
+
+Group object
+++++++++++++
+
+Group objects behave like python dictionaries.
+
+You can iterate over a group's :meth:`keys`, which are the names of the objects
+encapsulated by the group (datasets and sub-groups). The :meth:`values` method
+returns an iterator over the encapsulated objects. The :meth:`items` method returns
+an iterator over `(name, value)` pairs.
+
+Groups provide a :meth:`get` method that retrieves an item, or information about an item.
+Like standard python dictionaries, a `default` parameter can be used to specify
+a value to be returned if the given name is not a member of the group.
+
+Two methods are provided to visit recursively all members of a group, :meth:`visit`
+and :meth:`visititems`. The former takes as argument a *callable* with the signature
+``callable(name) -> None or return value``. The latter takes as argument a *callable*
+with the signature ``callable(name, object) -> None or return value`` (``object`` being a
+a group or dataset instance.)
+
+Example
+-------
+
+Accessing data
+++++++++++++++
+
+In this first example, we open a Spec data file and we print some of its information.
+
+.. code-block:: python
+
+ >>> import silx.io
+ >>> sf = silx.io.open("data/CuZnO_2.spec")
+ <silx.io.spech5.SpecH5 at 0x7f00d0760f90>
+ >>> print(sf.keys())
+ ['1.1', '2.1', '3.1', '4.1', '5.1', '6.1', '7.1', ...]
+ >>> print(sf["1.1"])
+ <silx.io.spech5.ScanGroup object at 0x7f00d0715b90>
+
+
+We just opened a file, keeping a reference to the file object as ``sf``.
+We then printed all items contained in this root group. We can see that all
+these items are groups. Lets looks at what is inside these groups, and find
+datasets:
+
+
+.. code-block:: python
+
+ >>> grp = sf["2.1"]
+ ... for name in grp:
+ ... item = grp[name]
+ ... print("Found item " + name)
+ ... if silx.io.is_dataset(item):
+ ... print("'%s' is a dataset.\n" % name)
+ ... elif silx.io.is_group(item):
+ ... print("'%s' is a group.\n" % name)
+ ...
+ Found item title
+ title is a dataset.
+
+ Found item start_time
+ start_time is a dataset.
+
+ Found item instrument
+ instrument is a group.
+
+ Found item measurement
+ measurement is a group.
+
+ Found item sample
+ sample is a group.
+
+We could have replaced the first three lines with this single line,
+by iterating over the iterator returned by the group method :meth:`items`:
+
+.. code-block:: python
+
+ >>> for name, item in sf["2.1"].items():
+ ...
+
+In addition to :meth:`silx.io.is_group` and :meth:`silx.io.is_dataset`,
+you can also use :meth:`silx.io.is_file` and :meth:`silx.io.is_softlink`.
+
+
+Let's look at a dataset:
+
+.. code-block:: python
+
+ >>> print(sf["2.1/title"])
+ <HDF5-like dataset "title": shape (), type "|S29">
+
+As you can see, printing a dataset does not print the data itself, it only print a
+representation of the dataset object. The information printed tells us that the
+object is similar to a numpy array, with a *shape* and a *type*.
+
+In this case, we are dealing with a scalar dataset, so we can use the same syntax as
+in numpy to access the scalar value, ``result = dset[()]``:
+
+.. code-block:: python
+
+ >>> print(sf["2.1/title"][()])
+ 2 ascan phi 0.61 1.61 20 1
+
+Similarly, you need to use numpy slicing to access values in numeric array:
+
+.. code-block:: python
+
+ >>> print (sf["2.1/measurement/Phi"])
+ <HDF5-like dataset "Phi": shape (21,), type "<f4">
+ >>> print (sf["2.1/measurement/Phi"][0:10])
+ [ 0.61000001 0.66000003 0.70999998 0.75999999 0.81 0.86000001
+ 0.91000003 0.95999998 1.00999999 1.05999994]
+ >>> entire_phi_array = sf["2.1/measurement/Phi"][:]
+
+Here we could read the entire array by slicing it with ``[:]``, because we know
+it is a 1D array. For a 2D array, the slicing argument would have been ``[:, :]``.
+
+For a dataset of unknown dimensionality (including scalar datasets), the
+``Ellipsis`` object (represented by ``...``) can be used to slice the object.
+
+.. code-block:: python
+
+ >>> print(sf["2.1/title"][...])
+ 2 ascan phi 0.61 1.61 20 1
+ >>> print (sf["2.1/measurement/Phi"][...])
+ [ 0.61000001 0.66000003 0.70999998 0.75999999 0.81 0.86000001
+ 0.91000003 0.95999998 1.00999999 1.05999994 1.11000001 1.15999997
+ 1.21000004 1.25999999 1.30999994 1.36000001 1.40999997 1.46000004
+ 1.50999999 1.55999994 1.61000001]
+
+To read more about the usage of ``Ellipsis`` to slice arrays, see
+`Indexing numpy arrays <http://scipy-cookbook.readthedocs.io/items/Indexing.html?highlight=indexing#Multidimensional-slices>`_
+in the scipy documentation.
+
+Note that slicing a scalar dataset with ``[()]`` is not strictly equivalent to
+slicing with ``[...]``. The former gives you the actual scalar value in
+the dataset, while the latter always gives you an array object, which happens to
+be 0D in the case of a scalar.
+
+ >>> sf["2.1/instrument/positioners/Delta"][()]
+ 0.0
+ >>> sf["2.1/instrument/positioners/Delta"][...]
+ array(0.0, dtype=float32)
+
+Closing the file
+++++++++++++++++
+
+You should always make sure to close the files that you opened. The simple way of
+closing a file is to call its :meth:`close` method.
+
+.. code-block:: python
+
+ import silx.io
+ sf = silx.io.open("data/CuZnO_2.spec")
+
+ # read the information you need...
+ maxPhi = sf["2.1/measurement/Phi"][...].max()
+
+ sf.close()
+
+The drawback of this method is that, if an error is raised while processing
+the file, the program might never reach the ``sf.close()`` line.
+Leaving files open can cause various issues for the rest of your program,
+such as consuming memory, not being able to reopen the file when you need it...
+
+The best way to ensure the file is always properly closed is to use the file
+inside its context manager:
+
+.. code-block:: python
+
+ import silx.io
+
+ with silx.io.open("data/CuZnO_2.spec") as sf:
+ # read the information you need...
+ maxPhi = sf["2.1/measurement/Phi"][...].max()
+
+
+Additional resources
+--------------------
+
+- `h5py documentation <http://docs.h5py.org/en/latest/>`_
+- `Formats supported by FabIO <http://www.silx.org/doc/fabio/dev/getting_started.html#list-of-file-formats-that-fabio-can-read-and-write>`_
+- `Spec file h5py-like structure <http://www.silx.org/doc/silx/dev/modules/io/spech5.html#api-description>`_
+- `HDF5 format documentation <https://support.hdfgroup.org/HDF5/>`_