\chapter{Optimizations} \label{chapter:opt} Yosys employs a number of optimizations to generate better and cleaner results. This chapter outlines these optimizations. \section{Simple Optimizations} The Yosys pass {\tt opt} runs a number of simple optimizations. This includes removing unused signals and cells and const folding. It is recommended to run this pass after each major step in the synthesis script. At the time of this writing the {\tt opt} pass executes the following passes that each perform a simple optimization: \begin{itemize} \item Once at the beginning of {\tt opt}: \begin{itemize} \item {\tt opt\_const} \item {\tt opt\_share -nomux} \end{itemize} \item Repeat until result is stable: \begin{itemize} \item {\tt opt\_muxtree} \item {\tt opt\_reduce} \item {\tt opt\_share} \item {\tt opt\_rmdff} \item {\tt opt\_clean} \item {\tt opt\_const} \end{itemize} \end{itemize} The following section describes each of the {\tt opt\_*} passes. \subsection{The opt\_const pass} This pass performs const folding on the internal combinational cell types described in Chap.~\ref{chapter:celllib}. This means a cell with all constant inputs is replaced with the constant value this cell drives. In some cases this pass can also optimize cells with some constant inputs. \begin{table} \hfil \begin{tabular}{cc|c} A-Input & B-Input & Replacement \\ \hline any & 0 & 0 \\ 0 & any & 0 \\ 1 & 1 & 1 \\ \hline X/Z & X/Z & X \\ 1 & X/Z & X \\ X/Z & 1 & X \\ \hline any & X/Z & 0 \\ X/Z & any & 0 \\ \hline $a$ & 1 & $a$ \\ 1 & $b$ & $b$ \\ \end{tabular} \caption{Const folding rules for {\tt\$\_AND\_} cells as used in {\tt opt\_const}.} \label{tab:opt_const_and} \end{table} Table~\ref{tab:opt_const_and} shows the replacement rules used for optimizing an {\tt\$\_AND\_} gate. The first three rules implement the obvious const folding rules. Note that `any' might include dynamic values calculated by other parts of the circuit. The following three lines propagate undef (X) states. These are the only three cases in which it is allowed to propagate an undef according to Sec.~5.1.10 of IEEE Std. 1364-2005 \cite{Verilog2005}. The next two lines assume the value 0 for undef states. These two rules are only used if no other subsitutions are possible in the current module. If other substitutions are possible they are performed first, in the hope that the `any' will change to an undef value or a 1 and therefore the output can be set to undef. The last two lines simply replace an {\tt\$\_AND\_} gate with one constant-1 input with a buffer. Besides this basic const folding the {\tt opt\_const} pass can replace 1-bit wide {\tt \$eq} and {\tt \$ne} cells with buffers or not-gates if one input is constant. The {\tt opt\_const} pass is very conservative regarding optimizing {\tt \$mux} cells, as these cells are often used to model decision-trees and breaking these trees can interfere with other optimizations. \subsection{The opt\_muxtree pass} This pass optimizes trees of multiplexer cells by analyzing the select inputs. Consider the following simple example: \begin{lstlisting}[numbers=left,frame=single,language=Verilog] module uut(a, y); input a; output [1:0] y = a ? (a ? 1 : 2) : 3; endmodule \end{lstlisting} The output can never be 2, as this would require \lstinline[language=Verilog];a; to be 1 for the outer multiplexer and 0 for the inner multiplexer. The {\tt opt\_muxtree} pass detects this contradiction and replaces the inner multiplexer with a constant 1, yielding the logic for \lstinline[language=Verilog];y = a ? 1 : 3;. \subsection{The opt\_reduce pass} \begin{sloppypar} This is a simple optimization pass that identifies and consolidates identical input bits to {\tt \$reduce\_and} and {\tt \$reduce\_or} cells. It also sorts the input bits to ease identification of shareable {\tt \$reduce\_and} and {\tt \$reduce\_or} cells in other passes. \end{sloppypar} This pass also identifies and consolidates identical inputs to multiplexer cells. In this case the new shared select bit is driven using a {\tt \$reduce\_or} cell that combines the original select bits. Lastly this pass consolidates trees of {\tt \$reduce\_and} cells and trees of {\tt \$reduce\_or} cells to single large {\tt \$reduce\_and} or {\tt \$reduce\_or} cells. These three simple optimizations are performed in a loop until a stable result is produced. \subsection{The opt\_rmdff pass} This pass identifies single-bit d-type flip-flops ({\tt \$\_DFF\_*}, {\tt \$dff}, and {\tt \$adff} cells) with a constant data input and replaces them with a constant driver. \subsection{The opt\_clean pass} This pass identifies unused signals and cells and removes them from the design. It also creates an \B{unused\_bits} attribute on wires with unused bits. This attribute can be used for debugging or by other optimization passes. \subsection{The opt\_share pass} This pass performs trivial resource sharing. This means that this pass identifies cells with identical inputs and replaces them with a single instance of the cell. The option {\tt -nomux} can be used to disable resource sharing for multiplexer cells ({\tt \$mux}, {\tt \$pmux}, and {\tt \$safe\_pmux}). This can be useful as it prevents multiplexer trees to be merged, which might prevent {\tt opt\_muxtree} to identify possible optimizations. \section{FSM Extraction and Encoding} The {\tt fsm} pass performs finite-state-machine (FSM) extraction and recoding. The {\tt fsm} pass simply executes the following other passes: \begin{itemize} \item Identify and extract FSMs: \begin{itemize} \item {\tt fsm\_detect} \item {\tt fsm\_extract} \end{itemize} \item Basic optimizations: \begin{itemize} \item {\tt fsm\_opt} \item {\tt opt\_clean} \item {\tt fsm\_opt} \end{itemize} \item Expanding to nearby gate-logic (if called with {\tt -expand}): \begin{itemize} \item {\tt fsm\_expand} \item {\tt opt\_clean} \item {\tt fsm\_opt} \end{itemize} \item Re-code FSM states (unless called with {\tt -norecode}): \begin{itemize} \item {\tt fsm\_recode} \end{itemize} \item Print information about FSMs: \begin{itemize} \item {\tt fsm\_info} \end{itemize} \item Export FSMs in KISS2 file format (if called with {\tt -export}): \begin{itemize} \item {\tt fsm\_export} \end{itemize} \item Map FSMs to RTL cells (unless called with {\tt -nomap}): \begin{itemize} \item {\tt fsm\_map} \end{itemize} \end{itemize} The {\tt fsm\_detect} pass identifies FSM state registers and marks them using the \B{fsm\_encoding}{\tt = "auto"} attribute. The {\tt fsm\_extract} extracts all FSMs marked using the \B{fsm\_encoding} attribute (unless \B{fsm\_encoding} is set to {\tt "none"}) and replaces the corresponding RTL cells with a {\tt \$fsm} cell. All other {\tt fsm\_*} passes operate on these {\tt \$fsm} cells. The {\tt fsm\_map} call finally replaces the {\tt \$fsm} cells with RTL cells. Note that these optimizations operate on an RTL netlist. I.e.~the {\tt fsm} pass should be executed after the {\tt proc} pass has transformed all {\tt RTLIL::Process} objects to RTL cells. The algorithms used for FSM detection and extraction are influenced by a more general reported technique \cite{fsmextract}. \subsection{FSM Detection} The {\tt fsm\_detect} pass identifies FSM state registers. It sets the \B{fsm\_encoding}{\tt = "auto"} attribute on any (multi-bit) wire that matches the following description: \begin{itemize} \item Does not already have the \B{fsm\_encoding} attribute. \item Is not an output of the containing module. \item Is driven by single {\tt \$dff} or {\tt \$adff} cell. \item The \B{D}-Input of this {\tt \$dff} or {\tt \$adff} cell is driven by a multiplexer tree that only has constants or the old state value on its leaves. \item The state value is only used in the said multiplexer tree or by simple relational cells that compare the state value to a constant (usually {\tt \$eq} cells). \end{itemize} This heuristic has proven to work very well. It is possible to overwrite it by setting \B{fsm\_encoding}{\tt = "auto"} on registers that should be considered FSM state registers and setting \B{fsm\_encoding}{\tt = "none"} on registers that match the above criteria but should not be considered FSM state registers. \subsection{FSM Extraction} The {\tt fsm\_extract} pass operates on all state signals marked with the \B{fsm\_encoding} ({\tt != "none"}) attribute. For each state signal the following information is determined: \begin{itemize} \item The state registers \item The asynchronous reset state if the state registers use asynchronous reset \item All states and the control input signals used in the state transition functions \item The control output signals calculated from the state signals and control inputs \item A table of all state transitions and corresponding control inputs- and outputs \end{itemize} The state registers (and asynchronous reset state, if applicable) is simply determined by identifying the driver for the state signal. From there the {\tt \$mux}-tree driving the state register inputs is recursively traversed. All select inputs are control signals and the leaves of the {\tt \$mux}-tree are the states. The algorithm fails if a non-constant leaf that is not the state signal itself is found. The list of control outputs is initialized with the bits from the state signal. It is then extended by adding all values that are calculated by cells that compare the state signal with a constant value. In most cases this will cover all uses of the state register, thus rendering the state encoding arbitrary. If however a design uses e.g.~a single bit of the state value to drive a control output directly, this bit of the state signal will be transformed to a control output of the same value. Finally, a transition table for the FSM is generated. This is done by using the {\tt ConstEval} C++ helper class (defined in {\tt kernel/consteval.h}) that can be used to evaluate parts of the design. The {\tt ConstEval} class can be asked to calculate a given set of result signals using a set of signal-value assignments. It can also be passed a list of stop-signals that abort the {\tt ConstEval} algorithm if the value of a stop-signal is needed in order to calculate the result signals. The {\tt fsm\_extract} pass uses the {\tt ConstEval} class in the following way to create a transition table. For each state: \begin{enumerate} \item Create a {\tt ConstEval} object for the module containing the FSM \item Add all control inputs to the list of stop signals \item Set the state signal to the current state \item Try to evaluate the next state and control output \label{enum:fsm_extract_cealg_try} \item If step~\ref{enum:fsm_extract_cealg_try} was not successful: \begin{itemize} \item Recursively goto step~\ref{enum:fsm_extract_cealg_try} with the offending stop-signal set to 0. \item Recursively goto step~\ref{enum:fsm_extract_cealg_try} with the offending stop-signal set to 1. \end{itemize} \item If step~\ref{enum:fsm_extract_cealg_try} was successful: Emit transition \end{enumerate} Finally a {\tt \$fsm} cell is created with the generated transition table and added to the module. This new cell is connected to the control signals and the old drivers for the control outputs are disconnected. \subsection{FSM Optimization} The {\tt fsm\_opt} pass performs basic optimizations on {\tt \$fsm} cells (not including state recoding). The following optimizations are performed (in this order): \begin{itemize} \item Unused control outputs are removed from the {\tt \$fsm} cell. The attribute \B{unused\_bits} (that is usually set by the {\tt opt\_clean} pass) is used to determine which control outputs are unused. \item Control inputs that are connected to the same driver are merged. \item When a control input is driven by a control output, the control input is removed and the transition table altered to give the same performance without the external feedback path. \item Entries in the transition table that yield the same output and only differ in the value of a single control input bit are merged and the different bit is removed from the sensitivity list (turned into a don't-care bit). \item Constant inputs are removed and the transition table is alterered to give an unchanged behaviour. \item Unused inputs are removed. \end{itemize} \subsection{FSM Recoding} The {\tt fsm\_recode} pass assigns new bit pattern to the states. Usually this also implies a change in the width of the state signal. At the moment of this writing only one-hot encoding with all-zero for the reset state is supported. The {\tt fsm\_recode} pass can also write a text file with the changes performed by it that can be used when verifying designs synthesized by Yosys using Synopsys Formality \citeweblink{Formality}. \section{Logic Optimization} Yosys can perform multi-level combinational logic optimization on gate-level netlists using the external program ABC \citeweblink{ABC}. The {\tt abc} pass extracts the combinational gate-level parts of the design, passes it through ABC, and re-integrates the results. The {\tt abc} pass can also be used to perform other operations using ABC, such as technology mapping (see Sec.~\ref{sec:techmap_extern} for details).