summaryrefslogtreecommitdiff
path: root/doc/source/Tutorials/specfile_to_hdf5.rst
blob: 32d942a5cf911d0fe2e5564b8ec12c1483c0158a (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
SpecFile as HDF5
================

Introduction to SPEC data files
-------------------------------

SPEC data files are ASCII files.
They contain two general types of block of lines:

 - header lines starting with a ``#`` immediately followed by one or more characters
   identifying the information that follows
 - data lines

Header lines
++++++++++++

There are two types of headers. The first type is the *file header*. File headers always start
with a ``#F`` line.
The metadata stored in a file header applies to all the content of the data file, until a
new file header is encountered. There can be more than one file header, but a file with
multiple headers can be treated as multiple SPEC files concatenated into a single one.
File headers are sometimes missing.

A file header contains general information:

 - ``#F`` - file name
 - ``#E`` - epoch
 - ``#D`` - file time and date
 - ``#C`` - First comment (SPEC title, SPEC user)
 - ``#O`` - Motor names (separated by at least two blank spaces)

The second type of header is the *scan header*. A scan header must start with a ``#S`` line
and must be preceded by an empty line. This also applies to files without file headers: in
such a case, the file must start with an empty line.
The metadata stored in scan headers applies to a single block of data lines.

A scan header contains following information:

 - ``#S`` - Mandatory first line showing the scan number and the
   command that was used to record the scan
 - ``#D`` - scan time and date
 - ``#Q`` - *H, K, L* values
 - ``#P`` - Motor positions (corresponding motor names are in file header ``#O``)
 - ``#N`` - Number of data columns in the following data block
 - ``#L`` - Column labels (``#N`` labels separated by two blank spaces)

Users can also define their own type of header lines in their macros.

There can sometimes be a block of scan header lines after a data block, but before the ``#S`` of the next
scan.

Data lines
++++++++++

Data blocks are structured as 2D arrays. Each line contains ``#N`` values, each value
corresponding to the label with the same position in the ``#L`` scan header line.
This implies that each column corresponds to one series of measurements.

A column typically contains motor positions for a given positioner, a timestamp or the measurement
of a sensor.

MCA data
++++++++

Newer SPEC files can also contain multi-channel analyser data, in between each *normal* data line.
A multichannel analyser records multiple values per single measurement.
This is typically a histogram of number of counts against channels (*MCA spectrum*), to analyze energy distribution
of a process.

SPEC data files containing MCA data have additional scan header lines:

 - ``#@MCA %16C`` - a spectrum will usually extend for more than one line.
   This indicates a number of 16 values per line.
 - ``#@CHANN`` - contains 4 values:

   - the number of channels per spectrum
   - the first channel number
   - the last channel number
   - the increment between two channel numbers (usually 1)
 - ``#@CALIB`` - 3 polynomial calibration values a, b, c. ( i.e. energy = a + b * channel + c * channel ^ 2)
 - ``#@CTIME`` - 3 values: preset time, live time, elapsed time

The actual MCA data for a single spectrum usually spans over multiple lines.
A spectrum starts on a new line with a ``@A``, and when it span over multiple lines, all
lines except the last one end with a continuation character ``\``.

Example of SPEC file
++++++++++++++++++++

Example of file header::

    #F ./data/binary_mixtures_mca1.100211
    #E 1295362398
    #D Thu Feb 10 22:43:43 2011
    #C id10b  User = opid10
    #O0    delta     gamma     omega     theta        mu     sigma    sigmat        xt
    #O1       zt       zt1       thd      chid      rhod        xd        yd        zd
    #O2     att0      arcf        zf      PhiD     phigH     chigH       ygH
    #O3      zgH     phigV     chigV       xgV       ygV       zgV   gslithg   gslitho
    #O4  gslitvo   gslitvg    slit1T    slit1B    slit1F    slit1R   slit1hg   slit1ho
    #O5  slit1vg   slit1vo       s0T       s0B       s0R       s0F
    #O6     s0hg      s0ho      s0vg      s0vo       TRT
    #O7       pi    trough       hv1    mpxthl    apdwin    apdthl     apdhv     xcrl2
    #O8   thcrl2     zcrl2     picou     picod    vdrift    vmulti      vglo      vghi
    #O9     rien

Example of scan and data block, without MCA::

    #S 30  ascan  tz3 29.35 29.75  100 0.5
    #D Sat Oct 31 15:43:21 1998
    #T 0.5  (Seconds)
    #G0 0
    #G1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    #G2 0
    #Q
    #P0 40.135381 40.262001 65.6 70 35 -1.83 0 -36.1
    #P1 0 0 -1.98 0 0 35.6 86.2 -29.5
    #P2 3.0688882 24.893749 295.98749 28 -27.249938
    #N 22
    #L TZ3    Epoch  Seconds  If2  If3  If5  If6  If7  If8  I0  It  ItdI0  If1dI0  If2dI0  If3dI0  If4dI0  If5dI0  If6dI0  If7dI0  If8dI0  If1  If4
    29.35  45246 0.000264 478 302 206 201 209 264 177860 646 0.00363207 0.00468346 0.00268751 0.00169796 0.00146745 0.00115821 0.0011301 0.00117508 0.00148431 833 261
    29.353976  45249 0.000295 549 330 219 208 227 295 178021 684 0.00384224 0.00537577 0.00308391 0.00185371 0.00158408 0.00123019 0.0011684 0.00127513 0.00165711 957 282
    29.357952  45251 0.000313 604 368 231 215 229 313 178166 686 0.00385034 0.00603931 0.0033901 0.00206549 0.00166698 0.00129654 0.00120674 0.00128532 0.00175679 1076 297
    29.362028  45253 0.000333 671 390 237 226 236 333 178387 672 0.00376709 0.00683346 0.00376148 0.00218626 0.00176582 0.00132857 0.00126691 0.00132297 0.00186673 1219 315
    29.366004  45256 0.000343 734 419 248 229 236 343 178082 664 0.00372862 0.00765939 0.0041217 0.00235285 0.00185308 0.00139262 0.00128592 0.00132523 0.00192608 1364 330
    29.36998  45258 0.00036 847 448 254 229 248 360 178342 668 0.00374561 0.00857342 0.0047493 0.00251203 0.00194009 0.00142423 0.00128405 0.00139059 0.00201859 1529 346

Synthetic example of file with 3 scans. The last scan includes MCA data.

::

    #F /tmp/sf.dat
    #E 1455180875
    #D Thu Feb 11 09:54:35 2016
    #C imaging  User = opid17
    #O0 Pslit HGap  MRTSlit UP  MRTSlit DOWN
    #O1 Sslit1 VOff  Sslit1 HOff  Sslit1 VGap
    #o0 pshg mrtu mrtd
    #o2 ss1vo ss1ho ss1vg

    #S 1  ascan  ss1vo -4.55687 -0.556875  40 0.2
    #D Thu Feb 11 09:55:20 2016
    #T 0.2  (Seconds)
    #P0 180.005 -0.66875 0.87125
    #P1 14.74255 16.197579 12.238283
    #N 3
    #L MRTSlit UP  second column  3rd_col
    -1.23 5.89  8
    8.478100E+01  5 1.56
    3.14 2.73 -3.14
    1.2 2.3 3.4

    #S 25  ascan  c3th 1.33245 1.52245  40 0.15
    #D Sat 2015/03/14 03:53:50
    #P0 80.005 -1.66875 1.87125
    #P1 4.74255 6.197579 2.238283
    #N 4
    #L column0  column1  col2  col3
    0.0 0.1 0.2 0.3
    1.0 1.1 1.2 1.3
    2.0 2.1 2.2 2.3
    3.0 3.1 3.2 3.3

    #S 1 aaaaaa
    #D Thu Feb 11 10:00:32 2016
    #@MCA %16C
    #@CHANN 20 0 19 1
    #@CALIB 1.2 2.3 3.4
    #@CTIME 123.4 234.5 345.6
    #N 2
    #L uno  duo
    1 2
    @A 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15\
    16 17 18 19
    3 4
    @A 0 0 2 4 15 10 5 1 0 0 0 0 1 0 0 0\
    0 0 0 0
    5 6
    @A 0 0 0 0 5 7 2 0 0 0 0 0 1 0 0 0\
    0 0 0 1

Reading a SpecFile as an HDF5 file
----------------------------------

Introduction to the spech5 module
+++++++++++++++++++++++++++++++++

The *silx* module :mod:`silx.io.spech5` can be used to expose SPEC files in a hierarchical tree structure
and access them through an API that mimics the *h5py* Python library used to read HDF5 files.

The structure exposed is as follows::

  /
      1.1/
          title = "…"
          start_time = "…"
          instrument/
              specfile/
                  file_header = "…"
                  scan_header = "…"
              positioners/
                  motor_name = value

              mca_0/
                  data = …
                  calibration = …
                  channels = …
                  preset_time = …
                  elapsed_time = …
                  live_time = …

              mca_1/


          measurement/
              colname0 = …
              colname1 = …

              mca_0/
                   data -> /1.1/instrument/mca_0/data
                   info -> /1.1/instrument/mca_0/

          sample/
              ub_matrix = …
              unit_cell = …
              unit_cell_abc = …
              unit_cell_alphabetagamma = …
      2.1/


Scans appear as *Groups* at the root level. The name of a scan group is
made of two numbers, the first one being the *scan number* from the ``#S``
header line, and the second one being the *scan order*.
If a scan number appears multiple times in a SPEC file, the scan order is incremented.
For examples, the scan *3.2* designates the second occurence of scan number 3 in a given file.

Data is stored in the ``measurement`` subgroup, one dataset per column. The dataset name
is the column label as it appears on the ``#L`` header line.

The ``instrument`` subgroup contains following subgroups:

    - ``specfile`` - contains two datasets, ``file_header`` and ``scan_header``,
      containing all header lines as a long string. Lines are delimited by the ``\n`` character.
    - ``positioners`` - contains one dataset per motor (positioner), containing
      either the single motor position from the ``#P`` header line, or a complete 1D array
      of positions if the motor names corresponds to a data column (i.e. if the motor name
      from the ``#O`` header line is identical to a label on the ``#L`` header line)
    - one subgroup per MCA analyser/device containing a 2D ``data`` array with all spectra
      recorded by this analyser, as well as datasets for the various MCA metadata
      (``#@`` header lines). The first dimension of the ``data`` array corresponds to the number
      of points and the second one to the spectrum length.


In addition the the data columns, this group contains one subgroup per MCA analyser/device
with links to the data already contained in  ``instrument/mca_...``

spech5 examples
+++++++++++++++

Accessing groups and datasets:

.. code-block:: python

    from silx.io.spech5 import SpecH5

    # Open a SpecFile
    sfh5 = SpecH5("test.dat")

    # using SpecH5 as a regular group to access scans
    scan1group = sfh5["1.1"]   # This retrieves scan 1.1
    scan1group = sfh5[0]       # This retrieves the first scan irrespectively of its number.
    instrument_group = scan1group["instrument"]

    # alternative: full path access
    measurement_group = sfh5["/1.1/measurement"]

    # accessing a scan data column by name as a 1D numpy array
    data_array = measurement_group["Pslit HGap"]

    # accessing all mca-spectra for one MCA device as a 2D array
    mca_0_spectra = measurement_group["mca_0/data"]


Files and groups can be treated as iterators, which allows looping through them.

.. code-block:: python

    # get all column names (labels) in all scans in a file
    for scan_group in SpecH5("test.dat"):
        dataset_names = [item.name in scan_group["measurement"] if not
                         item.name.startswith("mca")]
        print("Found labels in scan " + scan_group.name + " :")
        print(", ".join(dataset_names))

Converting SPEC data to HDF5
++++++++++++++++++++++++++++

Using the convert module
************************

The *silx* module :mod:`silx.io.convert` can be used to convert a SPEC file into a
HDF5 file with the same structure as the one exposed by the :mod:`spech5` module.

.. code-block:: python

    from silx.io.convert import convert

    convert("/home/pierre/myspecfile.dat", "myfile.h5")


You can then read the file with any HDF5 reader.


The function :func:`silx.io.convert.convert` is a simplified version of a
more flexible function :func:`silx.io.convert.write_to_h5`.

The latter allows you to write scans into a specific HDF5 group in the output directory.
You can also decide whether you want to overwrite an existing file, or append data to it.
You can specify whether existing data with the same name as input data should be overwritten
or ignored.

This allows you to repeatedly transfer new content of a SPEC file to an existing HDF5 file, in between
two scans.

The following script is an example of a command line interface to :func:`write_to_h5`.

.. literalinclude:: ../../../examples/writetoh5.py
   :lines: 44-

Using the convert application
*****************************

.. versionadded:: 0.6

*silx* also provides a ``silx convert`` command line application, which allows you to
perform standard conversions without having to write your own program.

Type ``silx convert --help`` in a terminal to see all available options.

The simplest command to convert a single SPEC file to an HDF5 file would be::

    silx convert myspecfile.dat

As no output name is supplied, the input file name is reused but the extension is
modified from *.dat* to *.h5*.

The following example allows you to append the content of a SPEC file to an
existing HDF5 file::

    silx convert myspecfile.dat -m a -o myhdf5file.h5

You could write the file into a specific group of the HDF5 file by providing
the complete URI in the format ``file_path::group_path``. For instance::

    silx convert myspecfile.dat -m a -o archive.h5::/2017-09-20/SPEC