summaryrefslogtreecommitdiff
path: root/libs/subcircuit/README
blob: 757a9f540b5bdd31f07b8103a2b862a5ee0462f2 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
  **************************************************************************
  *                                                                        *
  *                    The SubCircuit C++11 library                        *
  *                                                                        *
  * An implementation of a modified Ullmann Subgraph Isomorphism Algorithm *
  * for coarse grain logic networks.                      by Clifford Wolf *
  *                                                                        *
  **************************************************************************

============
Introduction
============

This is a library that implements a modified Ullmann Subgraph Isomorphism
Algorithm with additional features aimed at working with coarse grain logic
networks. It also contains a simple frequent subcircuit mining algorithm.

A simple command line tool that exposes the features of the library is also
included.


C++11 Warning
-------------

This project is written in C++11. Use appropriate compiler switches to compile
it. Tested with clang version 3.0 and option -std=c++11. Also tested with gcc
version 4.6.3 and option -std=c++0x.


========
Features
========

The input is two graphs (needle and haystack) that represent coarse grain
logic networks. The algorithm identifies all subgraphs of haystack that are
isomorphic to needle.

The following additional features over the regular Ullmann Subgraph Isomorphism
Algorithm are provided by the library.

 * The graphs are attributed hypergraphs capable of representing netlists:

	- Nodes represent the logic cells:
		- Nodes have types and only match compatible types
		- Nodes have ports with variable bit-width

	- Hyperedges represent the signals:
		- Each hyperedge connects one to many bits on ports on nodes

	- Callback functions for advanced attributes and compatibility rules:
		Any set of node-node compatibility rules and edge-edge
		compatibility rules can be implemented by providing
		the necessary callback functions.

 * The algorithm is very efficient when all or many bits of one port are
   connected to bits of the same other port. This is usually the case
   in coarse grain logic networks. But the algorithm does not add any
   restrictions in this area; it is just optimized for this scenario.

 * The algorithm can be configured to allow larger ports in needle cells to
   match smaller ports in haystack cells in certain situations. This way it
   is possible to e.g. have a 32-bit adder cell in the needle match a
   16-bit adder cell in the haystack.

 * The algorithm can be configured to perform port-swapping on certain
   ports on certain cell types to match commutative operations properly.

   This is, however, not implemented very efficiently when a larger number
   of permutations is possible on a cell type. Therefore it is recommended
   to only use swap groups with only a few members and a few such groups
   on one cell type type.

   Also note, that the algorithm can not resolve complex dependencies
   between the port swappings of different cells. Therefore it is
   recommended to only use port swapping on input pins of commutative
   operations, where such complex dependencies can not emerge.

 * The algorithm can be configured to distinguish between internal signals
   of the needle and externally visible signals. The needle will only
   match a subgraph of the haystack if that subgraph does not expose the
   internal signal to nodes in the haystack outside the matching subgraph.

 * The algorithm can recognize a subcircuit even if some or all of its
   inputs and/or outputs are shorted together.

 * Explicit fast support for constant signals without extra nodes for
   constant drivers.

 * Support for finding only non-overlapping matches.

 * A simple miner for frequent subcircuts that operates on the same circuit
   description format.

 * The public API of the library is using std::string identifiers for
   nodes, node types and ports. Internally the costly part of the
   algorithm is only using integer values, thus speeding up the
   algorithm without exposing complex internal encodings to the caller.


=================
API Documentation
=================

This section gives a brief overview of the API. For a working example, have a
look at the demo.cc example program in this directory.


Setting up graphs
-----------------

Instanciate the SubCircuit::Graph class and use the methods of this class to
set up the circuit.

	SubCircuit::Graph myGraph;

For each node in the circuit call the createNode() method. Specify the
identifier for the node and also the type of function implemented by the node.
Then call createPort() for each port of this node.

E.g. the following code adds a node "myAdder" of type "add" with three 32 bit
wide ports "A", "B" and "Y". Note that SubCircuit does not care which port is
an input and which port is an output. The last (and optional) argument to
createPort() specifies the minimum number of bits required for this port in the
haystack (this field is only used in the needle graph). So in this example the
node would e.g. also match any adder with a bit width smaller 32.

	myGraph.createNode("myAdder", "add");
	myGraph.createPort("myAdder", "A", 32, 1);
	myGraph.createPort("myAdder", "B", 32, 1);
	myGraph.createPort("myAdder", "Y", 32, 1);

The createConnection() method can be used to connect the nodes. It internally
creates a hypergraph. So the following code does not only connect cell1.Y with
cell2.A and cell3.A but also implicitly cell2.A with cell3.A.

	myGraph.createConnection("cell1", "Y", "cell2", "A");
	myGraph.createConnection("cell1", "Y", "cell3", "A");

Redundent calls to createConnection() are ignored. As long as the method is
called after the relevant nodes and ports are created, the order in which the
createConnection() calls are performed is irrelevant.

The createConnection() method can also be used to connect single bit signals.
In this case the start bit for both ports must be provided as well as an
optional width (which defaults to 1). E.g.  the following calls can be used to
connect the 32 bit port cell4.Y to the 32 bit port cell5.A with a one bit left
rotate shift,

	myGraph.createConnection("cell4", "Y", 0, "cell5", "A", 1, 31);
	myGraph.createConnection("cell4", "Y", 31, "cell5", "A", 0);

The method createConstant() can be used to add a constant driver to a signal.
The signal value is encoded as one char by bit, allowing for multi-valued
logic matching. The follwoing command sets the lowest bit of cell6.A to a
logic 1:

	myGraph.createConnection("cell6", "A", 0, '1');

It is also possible to set an entire port to a integer value, using the
encodings '0' and '1' for the binary digits:

	myGraph.createConnection("cell6", "A", 42);

The method markExtern() can be used to mark a signal as externally visible. In
a needle graph this means, this signal may match a signal in the haystack that
is used outside the matching subgraph. In a haystack graph this means, this
signal is used outside the haystack graph. I.e. an internal signal of the
needle won't match an external signal of the haystack regardless where the
signal is used in the haystack.

In some application one may disable this extern/intern checks. This can easily
be achieved by marking all signals in the needle as extern. This can be done
using the Graph::markAllExtern() method.


Setting up and running solvers
------------------------------

To actually run the subgraph isomorphism algorithm, an instance of
SubCircuit::Solver must be created.

	SubCircuit::Solver mySolver;

The addGraph() method can be used to register graphs with the solver:

	mySolver.addGraph("graph1", myGraph);
	mySolver.addGraph("graph2", myOtherGraph);

Usually nodes in the needle and the haystack must have the same type identifier
to match each other. Additionally pairs of compatible needle and haystack node
pairs can be registered using the addCompatibleTypes() method:

	mySolver.addCompatibleTypes("alu", "add");
	mySolver.addCompatibleTypes("alu", "sub");
	mySolver.addCompatibleTypes("alu", "and");
	mySolver.addCompatibleTypes("alu", "or");
	mySolver.addCompatibleTypes("alu", "xor");

Note that nodes in needle and haystack must also use the same naming convention
for their ports in order to be considered compatible by the algorithm.

Similarly the method addCompatibleConstants() can be used the specify which
constant values in the needle should match which constant value in the haystack.
Equal values always do match.

	mySolver.addCompatibleConstants('x', '0');
	mySolver.addCompatibleConstants('x', '1');

Some cells implement commutative operations that don't care if their input
operands are swapped. For this cell types it is possible to register groups
of swappable ports. Let's consider a cell "macc23" that implements the
function Y = (A * B) + (C * D * E):

	mySolver.addSwappablePorts("macc23", "A", "B");
	mySolver.addSwappablePorts("macc23", "C", "D", "E");

Sometimes the rules for port swapping are a more complicated and the swapping
of one port is related to the swapping of another port. Let's consider a cell
"macc22" that implements the function Y = (A * B) + (C * D):

	mySolver.addSwappablePorts("macc22", "A", "B");
	mySolver.addSwappablePorts("macc22", "C", "D");

	std::map<std::string, std::string> portMapping;
	portMapping["A"] = "C";
	portMapping["B"] = "D";
	portMapping["C"] = "A";
	portMapping["D"] = "B";
	mySolver.addSwappablePortsPermutation("macc22", portMapping);

I.e. the method mySolver.addSwappablePortsPermutation() can be used to register
additional permutations for a node type of which one or none is applied on top
of the permutations yielded by the permutations generated by the swap groups.

Note that two solutions that differ only in the applied port swapping are not
reported as separate solutions. Instead only one of them is selected (in most
cases the one with less port swapping as it is usually identified first).

Once everything has been set up, the solve() method can be used to actually
search for isomorphic subgraphs. The first argument to solve() is an
std::vector<SubCircuit::Solver::Result> objects to which all found solutions
are appended. The second argument is the identifier under which the needle
graph has been registered and the third argument is the identifier under which
the haystack graph has been registered:

	std::vector<SubCircuit::Solver::Result> results;
	mySolver.solve(results, "graph1", "graph2");

The SubCircuit::Solver::Result object is a simple data structure that contains
the mappings between needle and haystack nodes, port mappings after the port
swapping and some additional metadata. See "subcircuit.h" and "demo.cc" for
details.

The solve() method has a third optional boolean argument. If it is set to
false, solve will not return any solutions that contain haystack nodes that
have been part of a previously found solution. This way it is e.g. easy
to implement a greedy macro cell matching algorithm:

	std::vector<SubCircuit::Solver::Result> results;
	mySolver.solve(results, "macroCell1", "circuit", false);
	mySolver.solve(results, "macroCell2", "circuit", false);
	mySolver.solve(results, "macroCell3", "circuit", false);

After this code has been executed, the results vector contains all
non-overlapping matches of the three macrocells. The method
clearOverlapHistory() can be used to reset the internal state used
for this feature. The default value for the third argument to solve()
is true (allow overlapping). The optional boolean fourth argument to the
Graph::createNode() method can be used to mark a node as shareable even
in non-overlapping solver mode.

The solve() method also has a fourth optional integer argument. If it is set to
a positive integer, this integer specifies the maximum number of solutions to
be appended to the results vector, i.e. to terminate the algorithm early when
the set number of matches is found. When this fourth argument is negative or
omitted all matches are found and appended.

An alternative version of the solve() method supports an additional argument
after they haystack graph identifier that specifies initial mappings for
the algorithm. In the following example only the haystack nodes cell_1 and
cell_2 are considered as mappings for the needle node cell_A:

	std::map<std::string, std::set<std::string>> initialMappings;
	initialMappings["cell_A"].insert("cell_1");
	initialMappings["cell_A"].insert("cell_2");

	std::vector<SubCircuit::Solver::Result> results;
	mySolver.solve(results, "graph1", "graph2", initialMappings);

The clearConfig() method can be used to clear all data registered using
addCompatibleTypes(), addCompatibleConstants(), addSwappablePorts() and
addSwappablePortsPermutation() but retaining the graphs and the overlap state.


Using user callback function
----------------------------

For more complex tasks it is possible to derive a class from SubCircuit::Solver
that overloads one or more of the following virtual methods. The userData
arguments to the following methods are void pointers that can be passed as
third argument to Graph::createNode() and are simly passed thru to the user
callback functions together with the node id whenever a node is referenced.

bool userCompareNodes(needleGraphId, needleNodeId, needleUserData, haystackGraphId, haystackNodeId, haystackUserData):

	Perform additional checks on a pair of nodes (one from the needle, one
	from the haystack) to determine if the nodes are compatible. The default
	implementation always returns true.


bool userCompareEdge(needleGraphId, needleFromNodeId, needleFromUserData, needleToNodeId, needleToUserData,
		haystackGraphId, haystackFromNodeId, haystackFromUserData, haystackToNodeId, haystackToUserData):

	Perform additional checks on a pair of a pair of adjacent nodes (one
	adjacent pair from the needle and one adjacent pair from the haystack)
	to determine wheter this edge from the needle is compatible with
	that edge from the haystack. The default implementation always
	returns true.

bool userCheckSolution(result):

	Perform additional checks on a solution before appending it to the
	results vector. When this function returns false, the solution is
	ignored. The default implementation always returns true.


Mining for frequent SubCircuits
-------------------------------

The solver also contains a miner for frequent subcircuits. The following code
fragment will find all frequent subcircuits with at least minNodes nodes and
at most maxNodes nodes that occurs at least minMatches times: 

	std::vector<SubCircuit::Solver::MineResult> results;
	mySolver.mine(results, minNodes, maxNodes, minMatches);

The mine() method has an optional fifth parameter that limits the number of
matches counted in one graph. This can be useful when mining for circuits that
are found in at least a number of graphs. E.g. the following call would find
all subcircuits with 5 nodes that are found in at least 7 of the registered
graphs:

	mySolver.mine(results, 5, 5, 7, 1);

Note that this miner is not very efficient and therefore its use is not
recommended for large circuits. Also note that the miner is working under the
assumption that subgraph isomorphism is bidirectional. This is not the case in
circuits with gates with shorted pins. This can result in undetected frequent
subcircuits in some corner cases.


Debugging
---------

For debugging purposes the SubCircuit::Solver class implements a setVerbose()
method. When called once, all further calls to the solve() method cause the
algorithm to dump out a lot of debug information to stdout.

In conjunction with setVerbose() one can also overload the userAnnotateEdge()
method in order to add additional information about the edges to the debug
output.


===================
Shell Documentation
===================

This package also contains a small command-line tool called "scshell" that can
be used for experimentation with the algorithm. This program reads a series of
commands from stdin and reports its findings to stdout on exit.

	$ ./scshell < test_macc22.txt 

	...

	Match #3: (macc22 in macc4x2)
	  add_1 -> add_2 A:B B:A Y:Y
	  mul_1 -> mul_4 A:A B:B Y:Y
	  mul_2 -> mul_3 A:A B:B Y:Y

The following commands can be used in scshell to specify graphs:

	graph <graph_name>
	...
	endgraph

		Used to specify a graph with the given name. Only the commands
		"node", "connect" and "extern" may be used within the graph ...
		endgraph block.

	node <node_name> [<port_name> [<bits> [<min_bits>]]]+

		Used to create a node and ports. This command is a direct frontend
		to the Graph::createNode() and Graph::createPort() methods.

	connect <from_node> <from_port> <to_node> <to_port>
	connect <from_node> <from_port> <from_bit> <to_node> <to_port> <to_bit>
	connect <from_node> <from_port> <from_bit> <to_node> <to_port> <to_bit> <width>

		Used to connect the nodes in the graph via Graph::createConnection().

	constant <node> <port> [<bit>] <value>

		Call Graph::createConstant().

	extern <node> [<port> [<bit>]]+

		Mark signals as extern via Graph::markExtern().

	allextern

		Mark all signals as extern via Graph::markAllExtern().

The following commands can be used in scshell outside a graph ... endgraph block:

	compatible <needle_type> <haystack_type>

		Call Solver::addCompatibleTypes().

	constcompat <needle_value> <haystack_value>

		Call Solver::addCompatibleConstants().

	swapgroup <needle_type> <port>+

		Call Solver::addSwappablePorts().

	swapperm <needle_type> <ports>+ : <ports>+

		Call Solver::addSwappablePortsPermutation(). Both port lists must
		have the same length and the second one must be a permutation of the
		first one.

	initmap <needle_node> <haystack_node>+

		Add an entry to the initial mappings for the next solve command.
		This mappings are automatically reset after the solve command.

	solve <needle_graph> <haystack_graph> [<allow_overlap> [<max_solutions>]]

		Call Solver::solve(). The <allow_overlap> must be "1" or "true"
		for true and "0" or "false" for false.

	mine <min_nodes> <max_nodes> <min_matches> [<limit_matches_per_graph>]

		Call Solver::mine().

	expect <number>

		Print all results so far since the last call to expect. Expect
		<number> results and exit with error code 1 if a different number
		of results have been found.

	clearoverlap

		Call Solver::clearOverlapHistory().

	clearconfig

		Call Solver::clearConfig().

	verbose

		Call Solver::setVerbose().