summaryrefslogtreecommitdiff
path: root/fpoptimizer/fpoptimizer.txt
diff options
context:
space:
mode:
Diffstat (limited to 'fpoptimizer/fpoptimizer.txt')
-rw-r--r--fpoptimizer/fpoptimizer.txt357
1 files changed, 357 insertions, 0 deletions
diff --git a/fpoptimizer/fpoptimizer.txt b/fpoptimizer/fpoptimizer.txt
new file mode 100644
index 0000000..35bb70b
--- /dev/null
+++ b/fpoptimizer/fpoptimizer.txt
@@ -0,0 +1,357 @@
+FILES:
+ fpoptimizer.hh - C++ structures
+ fpoptimizer.dat - Optimization operations
+
+PLAN:
+1. Convert fpoptimizer.dat into code that initializes Grammar
+ - either automatically or manually
+ - avoid runtime parsing of .dat file
+ - prefer static const structs over constructor code
+ *DONE: See fpoptimizer_grammar_gen.y .
+ Produces fpoptimizer_grammar.cc .
+
+2. Augment the CodeTree mechanism the following ways:
+ - Add mechanism that generates a hash value for the contents
+ of the tree, for easy comparison of identical subtrees
+ *DONE
+ - Any time a CodeTree changes in any way that may affect the
+ hash, update the hash value for this node and recursively
+ to all parents
+ (Note: Ensure that a CodeTree may only have one parent.)
+
+2. Create code that matches a given CodeTree node to Grammar,
+ and finds the rules that match (only one should match at most,
+ but it can match a number of different ways, but that does not
+ matter; only utilize the first way)
+ - If a matching rule is found, replace either the matched params
+ or the entire node as instructed in the rule. Ensure that the
+ node gets its hash updated, along with its parents recursively.
+ - The matching can be optimized by grouping the rules by the Opcode
+ Note: The matching can get really complex and/or heavy in the
+ cases where the same operand (or worse, the same set of operands)
+ is expected to be found in two different subtrees. It may require
+ some O(n!) algorithms. However, those rules can be safely skipped
+ if the runtime sees that it would be too costly to check them.
+
+3. Algorithm for matching trees:
+ First,
+ match all nodes in the entire tree. (Do this only once.)
+ Any time a hash of a node is updated,
+ match that node.
+ (When the hashes are changed recursively for a node and all
+ its parents, match all those affected nodes after the hashes
+ have been regenerated.)
+ Repeat until no substitutions are done.
+
+ Overall, first apply this algorithm using the [ENTRY] rules only.
+ Then, apply the rules with [INTERMEDIATE] rules only.
+ Finally, apply the rules with [FINAL] rules only.
+
+3.5. Algorithm for matching nodes:
+ TODO
+
+4a. Algorithm for handling bytecode:
+ First, convert the bytecode into a CodeTree.
+ Then run the matching algorithm.
+ Last, convert the produced CodeTree back into bytecode.
+
+ When generating the bytecode, additional optimizations can be performed:
+ - Converting raise-to-integer-powers into sequences of cInv/cDup/cFetch/cMul
+ using the powi algorithm already existing in fpoptimizer.cc
+ (be wary of generating too long sequences, though)
+ *DONE
+ - Converting multiply-by-integers into sequences of cNeg/cDup/cFetch/cAdd
+ using the powi algorithm already existing in fpoptimizer.cc
+ (be wary of generating too long sequences, though)
+ *DONE
+ - Reordering cMul operands such that inverted operands
+ don't come first, to avoid creating cInv unless necessary.
+ When inverted operands come in non-head positions, use cDiv instead of cMul.
+ *DONE
+ - Reordering cAdd operands such that negated operands
+ don't come first, to avoid creating cNeg unless necessary.
+ When negated operands come in non-head positions, use cSub instead of cAdd.
+ *DONE
+ - When an identical subtree is synthesized into bytecode more than
+ once in a row, use cDup for the subsequential occurrences.
+ To that point, reorder any commutative operands so as to
+ increase the chances of cDup being utilized.
+ *TODO
+
+4b. Optional algorithm if an SSA evaluator is supported
+ (possibly only if a JIT backend exists):
+ 1. First, convert the bytecode into a CodeTree.
+ 2. Then run the matching algorithm.
+ 3. Then convert the produced CodeTree into SSA format.
+ SSA is a medium-level intermediate language (chapter 4.3 in [1], [2]),
+ in which each instruction takes one of the following forms:
+ target = command source1 source2 <...>
+ - commands such as add, mul, pow, cos, max, less
+ jump <label>
+ branch <label> if source == 0
+ target = phi source1 source2
+ - phi is used after two branches merge, to select
+ either result from the branches into one variable;
+ it is not a fparser function.
+ and in which every variable is written to only once,
+ i.e. "x = x+5" never occurs, but is instead written
+ as "x2 = x1+5"
+ <label> can be a pointer to another codeblock.
+ 4. Then run generic optimizations on the SSA code, such
+ as common subexpression elimination (CSE, described
+ in chapter 13.1 in [1]).
+ Try also global value numbering (GVN, see [3]).
+ Note: After CSE, it's difficult to convert
+ the code back into bytecode.
+ Note: Due to the hashing, it is possible that some
+ of the CSE can be done automatically in the SSA
+ generation phase by peeling the CodeTree depth-first,
+ storing identical trees only once.
+ 5a. (Apply this option if we do not have JIT.)
+ Eliminate the "single assignment" property of SSA
+ by mapping out the lifetimes of the variables and
+ reusing the same variables where possible. This
+ decreases the memory usage of the evaluator and
+ improves the cache efficiency.
+ 5b. (Apply this option if we do have JIT.)
+ Perform register allocation for the SSA code,
+ as per chapter 16 in [1]. Doing this before
+ actual JIT will make the JIT more straightforward.
+ Doing code scheduling could be useful as well,
+ though it gets somewhat complex. (Chapter 17 in [1].)
+ If there's a library we can use from step 3 forward, it'd be great.
+ Note: There are many typical optimizations that we don't need to do.
+ For instance, we don't need to do "dead code elimination", because
+ dead code is never produced in the first place.
+
+[1] = Advanced Compiler Design & Implementation by Steven S. Muchnick, ISBN 1-55860-320-4
+[2] = http://en.wikipedia.org/wiki/Static_single_assignment_form
+[3] = http://en.wikipedia.org/wiki/Global_value_numbering
+
+
+
+
+----------------------------------------
+------------------------------------------------------
+CODETREE - A REPRESENTATION OF MATHEMATICAL EXPRESSION
+------------------------------------------------------
+FPoptimizer changes the bytecode expression into tree format.
+
+An FPoptimizer expression is called a CodeTree.
+
+A CodeTree node has:
+ - 0 to N children
+ - Type (Opcode)
+ - Type-dependent other fields
+ Different type of nodes:
+ cVar:
+ Has a type-dependent field "Var", which
+ identifies the var number
+ cImmed:
+ Has a type-dependent field "Value", which
+ describes the constant value of this node
+ cFCall and cPCall:
+ Has a type-dependent field "FuncNo", which
+ describes the function number called
+ Anything else:
+ Has a number of children which describe
+ the parameters to the operation.
+ For example, if the type (Opcode) is cSin,
+ it has a single child, which is a CodeTree
+ that describes the parameter of the sin() call.
+ If the type (Opcode) is cAdd, it has an arbitrary
+ number of children, which are all added together.
+ Examples of some expressions in tree format:
+ Expression: sin(x)*cos(y)
+ Tree: Opcode=cMul
+ Child[0]:
+ Opcode=cSin
+ Child[0]:
+ Opcode=cVar
+ Var=0
+ Child[1]:
+ Opcode=cCos
+ Child[0]:
+ Opcode=cVar
+ Var=1
+ Expression: 1.0 + 2.0 - 3.0 - (4.0 + -5.5)
+ Actual expression (converted by bytecode reader):
+ 1.0 + 2.0 + (-1 * 3.0) + (-1 * (4.0 + -5.5))
+ Tree: Opcode=cAdd
+ Child[0]:
+ Opcode=cImmed
+ Value=1.0
+ Child[1]:
+ Opcode=cImmed
+ Value=2.0
+ Child[2]:
+ Opcode=cMul
+ Child[0]:
+ Opcode=cImmed
+ Value=-1.0
+ Child[1]:
+ Opcode=cImmed
+ Value=3.0
+ Child[3]:
+ Opcode=cMul
+ Child[0]:
+ Opcode=cImmed
+ Value=-1.0
+ Child[1]:
+ Opcode=cAdd
+ Child[0]:
+ Opcode=cImmed
+ Value=4.0
+ Child[1]:
+ Opcode=cImmed
+ Value=-5.5
+
+------------------------------------------------------
+GRAMMAR RULE - A TREE SUBSTITUTION RULE DESCRIPTION
+------------------------------------------------------
+The Grammar describes changes (optimizations) that
+can be performed to the CodeTree.
+
+The Grammar is composed of Rules, each of which describes
+a matching-tree, and a replacement.
+
+A matching-tree node (ParamSpec) has:
+ -------
+ - Opcode
+ - Method of parameter matching
+ - 0 to N params (ParamSpec)
+ Different types of methods:
+ PositionalParams:
+ Indicates that the tree given must match
+ the params given, in exactly that order;
+ there may not be other children in the tree.
+ SelectedParams:
+ Indicates that the tree given must match
+ the params given, in any order;
+ there may not be other children in the tree.
+ AnyParams:
+ Indicates that the tree given must match
+ the params given; in any order;
+ the tree may additionally have other children,
+ too. If the parameter list contains a RestHolder,
+ those other children are captured there.
+
+List of parameters (an index to MatchedParams)
+ - Single parent
+ - Type (opcode)
+ - 0 to N children
+ - Type-dependent other fields
+ Different types of nodes:
+ NumConstant:
+ Indicates that a CodeTree
+ with type=cImmed and
+ a particular Value is matched.
+ The value is found in pack.clist[node.index].
+ ImmedHolder:
+ Indicates that a CodeTree
+ with type=cImmed and
+ some Value is matched.
+ The ImmedHolder fields are enumerated
+ such that in a single CodeTree matching
+ operation, all ImmedHolders having the
+ same index value must match CodeTrees
+ evaluating to the same numeric value.
+ For example, if an ImmedHolder with index=0
+ happens to match a CodeTree with Value=1.443,
+ all other ImmedHolders with index=0 must
+ also match a CodeTree with Value=1.443.
+ NamedHolder
+ Indicates that any CodeTree is matched regardless of type.
+ The NamedHolder fields are enumerated
+ such that in a single CodeTree matching
+ operation, all NamedHolders having the
+ same index value must match CodeTrees
+ that are identical to the first one.
+ For example, if an NamedHolder with index=0
+ happens to match a CodeTree representing
+ the expression sin(x),
+ all other NamedHolders with index=0 must
+ also match a CodeTree representing
+ the expression sin(x).
+ SubFunction
+ Indicates a subtree that must be matched.
+ RestHolder
+ Within the target CodeTree, matches anything not
+ matched by the other rules in the parent ParamSpec.
+ Anything else:
+ Indicates a mathematical expression whose
+ value is treated by the NumConstant rules above.
+
+Example rules:
+ Grammar file syntax: cMin x x
+ Internal structure: Function.opcode = cMin
+ Function.index refers to MatchedParams where
+ MatchedParams.type = AnyParams
+ MatchedParams.count = 2
+ MatchedParams.index refers to ParamSpec where
+ ParamSpec[0].opcode = NamedHolder
+ ParamSpec[0].index = 0 -- "x" is namedholder 0
+ ParamSpec[1].opcode = NamedHolder
+ ParamSpec[1].index = 0 -- another x
+ Explanation:
+ This rule matches a CodeTree
+ where the opcode = cMin and that
+ tree has two identical subtrees.
+
+ Grammar file syntax: cMul x <1> (cPow [(cMul x <2>) -1])
+ Internal structure: Function.opcode = cMul
+ Function.index refers to MatchedParams where
+ MatchedParams.type = AnyParams
+ MatchedParams.count = 3
+ MatchedParams.index refers to ParamSpec where
+ ParamSpec[0].opcode = NamedHolder
+ ParamSpec[0].index = 0 -- "x" is namedholder 0
+ ParamSpec[1].opcode = RestHolder
+ ParamSpec[1].index = 1 -- RestHolder #1 captures anything else within that cMul group
+ ParamSpec[2].opcode = SubFunction
+ ParamSpec[2].index refers to Function where
+ Function.opcode = cPow
+ Function.index refers to MatchedParams where
+ MatchedParams.type = PositionalParams
+ MatchedParams.count = 2
+ MatchedParams.index refers to ParamSpec where
+ ParamSpec[0].opcode = SubFunction
+ ParamSpec[0].index refers to Function where
+ Function.opcode = cMul
+ Function.index refers to MatchedParams where
+ MatchedParams.type = AnyParams
+ MatchedParams.count = 2
+ MatchedParams.index refers to ParamSpec where
+ ParamSpec[0].opcode = NamedHolder
+ ParamSpec[0].index = 0 -- must match a subtree identical to that of "x" earlier
+ ParamSpec[1].opcode = RestHolder
+ ParamSpec[1].index = 2 -- RestHolder #2 captures anything else within that cMul group
+ ParamSpec[1].opcode = NumConstant
+ ParamSpec[1].value = -1
+ Explanation:
+ This rule matches a CodeTree of type cMul,
+ where there exists at least two children,
+ where one of them is an arbitrary expression,
+ call that "x",
+ and one of them is a CodeTree of type cPow,
+ where there are exactly two children,
+ where the first child is a CodeTree of type cMul,
+ where there exists at least one child,
+ where that one child is an arbitrary expression,
+ which must match the "x" captured earlier,
+ and anything else from that inner cMul codetree
+ is captured into restholder <2>;
+ and the second child of the cPow codetree
+ is a CodeTree of type cImmed, where its Value equals -1;
+ and anything else from that outer cMul codetree
+ is captured into restholder <1>.
+ This rule matches expressions such as:
+ sin(z)*5*( (sin(z)*7) ^ (-1))
+ sin(z) is captured into NamedHolder x
+ 5 is captured into RestHolder <1>
+ 7 is captured into RestHolder <2>
+ Or
+ 6*((6*w)^(-1))
+ value 6 is captured into NamedHolder x
+ nothing is captured into RestHolder <1>
+ w is captured into RestHolder <2>.