Restructure docs and add section about broadcasts

author: Matthias Vogelgesang <matthias.vogelgesang@kit.edu> 2017-01-13 15:43:08 +0100
committer: Matthias Vogelgesang <matthias.vogelgesang@kit.edu> 2017-01-13 15:43:08 +0100
commit: ab6b0e404dbe78483acea19e3c5226ef934da434 (patch)
tree: 22ac17b839eaf22f3c1f3e11531d0a667d4af96c /docs/manual
parent: bc5d8d0f2b762cf56e34fefba2d530c1c546ee83 (diff)
4 files changed, 128 insertions, 94 deletions
diff --git a/docs/manual/using/background.rst b/docs/manual/using/background.rst
deleted file mode 100644
index 027244d..0000000
--- a/docs/manual/using/background.rst
+++ /dev/null
@@ -1,40 +0,0 @@
-.. _using-objects:
-
-====================
-Technical Background
-====================
-
-Relationship between graph and scheduler
-========================================
-
-A ``Ufo.Graph`` represents a network of interconnected filter nodes. New nodes
-can be added and existing node relationships be queried. Also, the graph can be
-serialized as a JSON structure with ``ufo_graph_save_to_json`` and read back
-again with ``ufo_graph_read_from_json``.
-
-The ``Ufo.Scheduler`` on the other hand is an implementation of a strategy *how*
-to execute the filters contained in a graph. Therefore, the scheduler is passed
-a graph object on execution.
-
-
-Profiling
-=========
-
-By default, the scheduler measures the run-time from initial setup until
-processing of the last data item finished. You can get the time in seconds via the
-``time`` property ::
-
-    g = Ufo.TaskGraph()
-    scheduler = Ufo.Scheduler()
-    scheduler.run(g)
-    print("Time spent: {}s".format(scheduler.time))
-
-To get more fine-grained insight into the execution, you can enable tracing ::
-
-    scheduler.props.enable_tracing = True
-    scheduler.run(g)
-
-and analyse the generated traces for OpenCL (saved in ``opencl.PID.json``) and
-general events (saved in ``trace.PID.json``). To visualize the trace events, you
-can either use the distributed ``ufo-prof`` tool or Google Chrome or Chromium by
-going to chrome://tracing and loading the JSON files.
diff --git a/docs/manual/using/cluster.rst b/docs/manual/using/cluster.rst
deleted file mode 100644
index e266f2c..0000000
--- a/docs/manual/using/cluster.rst
+++ /dev/null
@@ -1,52 +0,0 @@
-.. _using-cluster:
-
-==========================
-Running tasks in a cluster
-==========================
-
-The UFO framework comes with built-in cluster capabilities based on ZeroMQ 3.2.
-Contrary to bulk cluster approaches (e.g. solving large linear systems), UFO
-tries to distribute `streamed` data on a set of multiple machines. On each
-remote slave, ``ufod`` must be started. By default, the server binds to port
-5555 on any available network adapter. To change this, use the ``-l/--listen``
-option::
-    
-    $ ufod --listen tcp://ib0:5555
-
-will let ``ufod`` use the first Infiniband-over-IP connection.
-
-On the master host, you pass the remote slave addresses to the scheduler object.
-In Python this would look like this::
-
-    sched = Ufo.Scheduler(remotes=['tcp://foo.bar.org:5555'])
-
-Address are notated according to `ZeroMQ <http://api.zeromq.org/3-2:zmq-tcp>`_.
-
-
-Streaming vs. replication
-=========================
-
-Work can be executed in two ways: `streaming`, which means data is transferred
-from a master machine to all slaves and returned to the master after computation
-is finished and `replicated` in which each slaves works on its own subset of the
-initial input data. The former must be used if the length of the stream is
-unknown before execution, otherwise the stream could not be split up into equal
-partitions.
-
-Initially, the scheduler is set to streaming mode. To switch to replication
-mode, you have to prepare the scheduler::
-
-    sched = Ufo.Scheduler(remotes=remotes)
-    sched.set_remote_mode(Ufo.RemoteMode.REPLICATE)
-    sched.run(graph)
-
-
-Improving small kernel launches
-===============================
-
-UFO uses a single OpenCL context to manage multiple GPUs in a transparent way.
-For applications and plugins that require many small kernel launches, multi-GPU
-performance suffers on NVIDIA systems due to bad scaling of the kernel launch
-time. In order to improve performance on machines with multiple GPUs it is
-strongly advised to run multiple ``ufod`` services with differently chosen GPUs
-and ports.
diff --git a/docs/manual/using/execution.rst b/docs/manual/using/execution.rst
new file mode 100644
index 0000000..47d200f
--- /dev/null
+++ b/docs/manual/using/execution.rst
@@ -0,0 +1,127 @@
+==============
+Task execution
+==============
+
+This section provides a deeper look into the technical background concerning
+scheduling and task execution. The execution model of the UFO framework is based
+on the ``Ufo.TaskGraph`` that represents a network of interconnected task
+nodes and the ``Ufo.BaseScheduler`` that runs these tasks according to a
+pre-defined strategy. The ``Ufo.Scheduler`` is a concrete implementation and is
+the default choice because it is able to instantiate tasks in a multi-GPU
+environment. For greater flexibility, the ``Ufo.FixedScheduler`` can be used to
+define arbitrary GPU mappings.
+
+
+Profiling execution
+===================
+
+By default, the scheduler measures the run-time from initial setup until
+processing of the last data item finished. You can get the time in seconds via the
+``time`` property ::
+
+    g = Ufo.TaskGraph()
+    scheduler = Ufo.Scheduler()
+    scheduler.run(g)
+    print("Time spent: {}s".format(scheduler.time))
+
+To get more fine-grained insight into the execution, you can enable tracing ::
+
+    scheduler.props.enable_tracing = True
+    scheduler.run(g)
+
+and analyse the generated traces for OpenCL (saved in ``opencl.PID.json``) and
+general events (saved in ``trace.PID.json``). To visualize the trace events, you
+can either use the distributed ``ufo-prof`` tool or Google Chrome or Chromium by
+going to chrome://tracing and loading the JSON files.
+
+
+Broadcasting results
+====================
+
+.. highlight:: c
+
+Connecting a task output to multiple consumers will in most cases cause
+undefined results because some data is processed differently than others. A
+certain class of problems can be solved by inserting explicit ``Ufo.CopyTask``
+nodes and executing the graph with a ``Ufo.FixedScheduler``. In the following
+example, we want write the same data twice with a different prefix::
+
+    from gi.repository import Ufo
+
+    pm = Ufo.PluginManager()
+    sched = Ufo.FixedScheduler()
+    graph = Ufo.TaskGraph()
+    copy = Ufo.CopyTask()
+
+    data = pm.get_task('read')
+
+    write1 = pm.get_task('write')
+    write1.set_properties(filename='w1-%05i.tif')
+
+    write2 = pm.get_task('write')
+    write2.set_properties(filename='w2-%05i.tif')
+
+    graph.connect_nodes(data, copy)
+    graph.connect_nodes(copy, write1)
+    graph.connect_nodes(copy, write2)
+
+    sched.run(graph)
+
+.. note:: 
+
+    The copy task node is not a regular plugin but part of the core API and
+    thus cannot be used with tools like ``ufo-runjson`` or ``ufo-launch``. 
+
+
+
+Running tasks in a cluster
+==========================
+
+.. highlight:: bash
+
+The UFO framework comes with built-in cluster capabilities based on ZeroMQ 3.2.
+Contrary to bulk cluster approaches (e.g. solving large linear systems), UFO
+tries to distribute `streamed` data on a set of multiple machines. On each
+remote slave, ``ufod`` must be started. By default, the server binds to port
+5555 on any available network adapter. To change this, use the ``-l/--listen``
+option::
+    
+    $ ufod --listen tcp://ib0:5555
+
+will let ``ufod`` use the first Infiniband-over-IP connection.
+
+On the master host, you pass the remote slave addresses to the scheduler object.
+In Python this would look like this::
+
+    sched = Ufo.Scheduler(remotes=['tcp://foo.bar.org:5555'])
+
+Address are notated according to `ZeroMQ <http://api.zeromq.org/3-2:zmq-tcp>`_.
+
+
+Streaming vs. replication
+-------------------------
+
+Work can be executed in two ways: `streaming`, which means data is transferred
+from a master machine to all slaves and returned to the master after computation
+is finished and `replicated` in which each slaves works on its own subset of the
+initial input data. The former must be used if the length of the stream is
+unknown before execution, otherwise the stream could not be split up into equal
+partitions.
+
+Initially, the scheduler is set to streaming mode. To switch to replication
+mode, you have to prepare the scheduler::
+
+    sched = Ufo.Scheduler(remotes=remotes)
+    sched.set_remote_mode(Ufo.RemoteMode.REPLICATE)
+    sched.run(graph)
+
+
+Improving small kernel launches
+-------------------------------
+
+UFO uses a single OpenCL context to manage multiple GPUs in a transparent way.
+For applications and plugins that require many small kernel launches, multi-GPU
+performance suffers on NVIDIA systems due to bad scaling of the kernel launch
+time. In order to improve performance on machines with multiple GPUs it is
+strongly advised to run multiple ``ufod`` services with differently chosen GPUs
+and ports.
diff --git a/docs/manual/using/index.rst b/docs/manual/using/index.rst
index 3da6812..dd291ad 100644
--- a/docs/manual/using/index.rst
+++ b/docs/manual/using/index.rst
@@ -25,6 +25,5 @@ own image processing pipeline or implement a new filter.
 
     quickstart.rst
     env.rst
-    background.rst
-    cluster.rst
+    execution.rst
     json.rst
author	Matthias Vogelgesang <matthias.vogelgesang@kit.edu>	2017-01-13 15:43:08 +0100
committer	Matthias Vogelgesang <matthias.vogelgesang@kit.edu>	2017-01-13 15:43:08 +0100
commit	ab6b0e404dbe78483acea19e3c5226ef934da434 (patch)
tree	22ac17b839eaf22f3c1f3e11531d0a667d4af96c /docs/manual
parent	bc5d8d0f2b762cf56e34fefba2d530c1c546ee83 (diff)