Document NVIDIA multi-GPU performance issue

author: Matthias Vogelgesang <matthias.vogelgesang@kit.edu> 2015-07-07 17:01:37 +0200
committer: Matthias Vogelgesang <matthias.vogelgesang@kit.edu> 2015-07-07 17:01:37 +0200
commit: 9a99cd39053ba9646e618663bdf3cc72c7890fa7 (patch)
tree: b72df267be9562d7f309ea3c2b2309c9cd1024d5 /docs
parent: ed71be36a5694ed2583e4bdf2061aad10b648e1a (diff)
1 files changed, 11 insertions, 0 deletions
diff --git a/docs/manual/using/cluster.rst b/docs/manual/using/cluster.rst
index 7496b25..e266f2c 100644
--- a/docs/manual/using/cluster.rst
+++ b/docs/manual/using/cluster.rst
@@ -39,3 +39,14 @@ mode, you have to prepare the scheduler::
     sched = Ufo.Scheduler(remotes=remotes)
     sched.set_remote_mode(Ufo.RemoteMode.REPLICATE)
     sched.run(graph)
+
+
+Improving small kernel launches
+===============================
+
+UFO uses a single OpenCL context to manage multiple GPUs in a transparent way.
+For applications and plugins that require many small kernel launches, multi-GPU
+performance suffers on NVIDIA systems due to bad scaling of the kernel launch
+time. In order to improve performance on machines with multiple GPUs it is
+strongly advised to run multiple ``ufod`` services with differently chosen GPUs
+and ports.
author	Matthias Vogelgesang <matthias.vogelgesang@kit.edu>	2015-07-07 17:01:37 +0200
committer	Matthias Vogelgesang <matthias.vogelgesang@kit.edu>	2015-07-07 17:01:37 +0200
commit	9a99cd39053ba9646e618663bdf3cc72c7890fa7 (patch)
tree	b72df267be9562d7f309ea3c2b2309c9cd1024d5 /docs
parent	ed71be36a5694ed2583e4bdf2061aad10b648e1a (diff)