summaryrefslogtreecommitdiff
path: root/opcodes/pvsanal.xml
diff options
context:
space:
mode:
Diffstat (limited to 'opcodes/pvsanal.xml')
-rw-r--r--opcodes/pvsanal.xml28
1 files changed, 12 insertions, 16 deletions
diff --git a/opcodes/pvsanal.xml b/opcodes/pvsanal.xml
index 2d7d4dc..ea0695c 100644
--- a/opcodes/pvsanal.xml
+++ b/opcodes/pvsanal.xml
@@ -6,13 +6,11 @@
<refentrytitle>pvsanal</refentrytitle>
</refmeta>
-
-
<refnamediv>
<refname>pvsanal</refname>
<refpurpose>
Generate an fsig from a mono audio source ain, using phase vocoder overlap-add analysis.
- </refpurpose>
+ </refpurpose>
</refnamediv>
<refsect1>
@@ -30,23 +28,23 @@
<refsect1>
<title>Initialization</title>
<para>
- <emphasis>ifftsize</emphasis> -- The FFT size in samples. Need not be a power of two (though these are especially efficient), but must be even. Odd numbers are rounded up internally. ifftsize determines the number of analysis bins in fsig, as ifftsize/2 + 1. For example, where ifftsize = 1024, fsig will contain 513 analysis bins, ordered linearly from the fundamental to Nyquist. The fundamental of analysis (which in principle gives the lowest resolvable frequency) is determined as sr/ifftsize. Thus, for the example just given and assuming sr = 44100, the fundamental of analysis is 43.07Hz. In practice, due to the phase-preserving nature of the phase vocoder, the frequency of any bin can deviate bilaterally, so that DC components are recorded. Given a strongly pitched signal, frequencies in adjacent bins can bunch very closely together, around partials in the source, and the lowest bins may even have negative frequencies.
+ <emphasis>ifftsize</emphasis> -- The FFT size in samples. Need not be a power of two (though these are especially efficient), but must be even. Odd numbers are rounded up internally. <emphasis>ifftsize</emphasis> determines the number of analysis bins in <emphasis>fsig</emphasis>, as <emphasis>ifftsize/2 + 1</emphasis>. For example, where <emphasis>ifftsize</emphasis> = 1024, <emphasis>fsig</emphasis> will contain 513 analysis bins, ordered linearly from the fundamental to Nyquist. The fundamental of analysis (which in principle gives the lowest resolvable frequency) is determined as <emphasis>sr/ifftsize</emphasis>. Thus, for the example just given and assuming <emphasis>sr</emphasis> = 44100, the fundamental of analysis is 43.07Hz. In practice, due to the phase-preserving nature of the phase vocoder, the frequency of any bin can deviate bilaterally, so that DC components are recorded. Given a strongly pitched signal, frequencies in adjacent bins can bunch very closely together, around partials in the source, and the lowest bins may even have negative frequencies.
</para>
<para>
- As a rule, the only reason to use a non power-of-two value for ifftsize would be to match the known fundamental frequency of a strongly pitched source. Values with many small factors can be almost as efficient as power-of-two sizes; for example: 384, for a source pitched at around low A=110Hz.
+ As a rule, the only reason to use a non power-of-two value for <emphasis>ifftsize</emphasis> would be to match the known fundamental frequency of a strongly pitched source. Values with many small factors can be almost as efficient as power-of-two sizes; for example: 384, for a source pitched at around low A=110Hz.
</para>
<para>
- <emphasis>ioverlap</emphasis> -- The distance in samples (<quote>hop size</quote>) between overlapping analysis frames. As a rule, this needs to be at least ifftsize/4, e.g. 256 for the example above. ioverlap determines the underlying analysis rate, as sr/ioverlap. ioverlap does not require to be a simple factor of ifftsize; for example a value of 160 would be legal. The choice of ioverlap may be dictated by the degree of pitch modification applied to the fsig, if any. As a rule of thumb, the more extreme the pitch shift, the higher the analysis rate needs to be, and hence the smaller the value for ioverlap. A higher analysis rate can also be advantageous with broadband transient sounds, such as drums (where a small analysis window gives less smearing, but more frequency-related errors).
+ <emphasis>ioverlap</emphasis> -- The distance in samples (<quote>hop size</quote>) between overlapping analysis frames. As a rule, this needs to be at least <emphasis>ifftsize/4</emphasis>, e.g. 256 for the example above. <emphasis>ioverlap</emphasis> determines the underlying analysis rate, as <emphasis>sr/ioverlap</emphasis>. <emphasis>ioverlap</emphasis> does not require to be a simple factor of <emphasis>ifftsize</emphasis>; for example a value of 160 would be legal. The choice of <emphasis>ioverlap</emphasis> may be dictated by the degree of pitch modification applied to the <emphasis>fsig</emphasis>, if any. As a rule of thumb, the more extreme the pitch shift, the higher the analysis rate needs to be, and hence the smaller the value for <emphasis>ioverlap</emphasis>. A higher analysis rate can also be advantageous with broadband transient sounds, such as drums (where a small analysis window gives less smearing, but more frequency-related errors).
</para>
<para>
- Note that it is possible, and reasonable, to have distinct fsigs in an orchestra (even in the same instrument), running at different analysis rates. Interactions between such fsigs is currently unsupported, and the fsig assignment opcode does not allow copying between fsigs with different properties, even if the only difference is in ioverlap. However, this is not a closed issue, as it is possible in theory to achieve crude rate conversion (especially with regard to in-memory analysis files) in ways analogous to time-domain techniques.
+ Note that it is possible, and reasonable, to have distinct fsigs in an orchestra (even in the same instrument), running at different analysis rates. Interactions between such fsigs is currently unsupported, and the fsig assignment opcode does not allow copying between fsigs with different properties, even if the only difference is in <emphasis>ioverlap</emphasis>. However, this is not a closed issue, as it is possible in theory to achieve crude rate conversion (especially with regard to in-memory analysis files) in ways analogous to time-domain techniques.
</para>
<para>
- <emphasis>iwinsize</emphasis> -- The size in samples of the analysis window filter (as set by iwintype). This must be at least ifftsize, and can usefully be larger. Though other proportions are permitted, it is recommended that iwinsize always be an integral multiple of ifftsize, e.g. 2048 for the example above. Internally, the analysis window (Hamming, von Hann) is multiplied by a sinc function, so that amplitudes are zero at the boundaries between frames. The larger analysis window size has been found to be especially important for oscillator bank resynthesis (e.g. using pvsadsyn), as it has the effect of increasing the frequency resolution of the analysis, and hence the accuracy of the resynthesis. As noted above, iwinsize determines the overall latency of the analysis/resynthesis system. In many cases, and especially in the absence of pitch modifications, it will be found that setting iwinsize=ifftsize works very well, and offers the lowest latency.
+ <emphasis>iwinsize</emphasis> -- The size in samples of the analysis window filter (as set by <emphasis>iwintype</emphasis>). This must be at least <emphasis>ifftsize</emphasis>, and can usefully be larger. Though other proportions are permitted, it is recommended that <emphasis>iwinsize</emphasis> always be an integral multiple of <emphasis>ifftsize</emphasis>, e.g. 2048 for the example above. Internally, the analysis window (Hamming, von Hann) is multiplied by a sinc function, so that amplitudes are zero at the boundaries between frames. The larger analysis window size has been found to be especially important for oscillator bank resynthesis (e.g. using <emphasis>pvsadsyn</emphasis>), as it has the effect of increasing the frequency resolution of the analysis, and hence the accuracy of the resynthesis. As noted above, <emphasis>iwinsize</emphasis> determines the overall latency of the analysis/resynthesis system. In many cases, and especially in the absence of pitch modifications, it will be found that setting <emphasis>iwinsize=ifftsize</emphasis> works very well, and offers the lowest latency.
</para>
<para>
@@ -98,14 +96,12 @@
<refsect1>
<title>Examples</title>
<para>
- <informalexample>
- <programlisting>
-ain in ; live source
-ffin pvsanal ain,1024,256,2048,0 ; analyze, using Hamming
-ffout pvsmaska ffin,1,0.75 ; apply eq from f-table
-aout pvsynth ffout ; and resynthesize
- </programlisting>
- </informalexample>
+ Here is an example of the pvsanal opcode. It uses the file <ulink url="examples/pvsanal.csd"><citetitle>pvsanal.csd</citetitle></ulink>.
+ <example>
+ <title>Example of the pvsanal opcode.</title>
+ <para>See the sections <link linkend="UsingRealTime"><citetitle>Real-time Audio</citetitle></link> and <link linkend="CommandFlags"><citetitle>Command Line Flags</citetitle></link> for more information on using command line flags.</para>
+ <xi:include href="examples-xml/pvsanal.csd.xml" xmlns:xi="http://www.w3.org/2001/XInclude"/>
+ </example>
</para>
</refsect1>