private/ntos/dlc/sm2c/idea2.doc


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81


*******************

NEW OPTIMAL ALGORITHM FOR PASS 4
--------------------------------

Algorithm searches the most common weighted subgroup A in a union of groups
and divides the old union to two new unions u1 and u2.  All groups 
in union u1 has subgroup A as a member.  Subset A is joined to
tree of the sorted atoms and it is excluded from any further operations.
The algorithm is executed again for the new unions u1 and u1 and it 
is repeated until no common atoms can be found.  Then all groups
are joined to the current tree node.
The number of the atoms in the common subset may also be 1.

After all this the same subsets in the tree structure could be converted
to subprocedures, when the state machine is compiled (the secondary
subsets will be separated to different nodes, when the tree is split.
In the first time we could search ALL possible subsets and change all 
subsets consisting of 3 or more atoms to new procedures.
We could keep a global data base or possible subsets, and use it whenever 
a new union is analysed.  The same data base could include also counters
for the existing references in the tree.  Thus the data base could be
used directly when the biggest of most referenced subsets would be 
compiled to separate procedures.

Analysis:  
    Produces probably the optimal solution, but may take a quite 
    long time.  Still this is not a NP- complete problem,  because
    there are many heuristics, that can be used to limit computation. 

Data structures: 
    Atom: (id, count, data pointer)
    Group: link of atoms 
    Union: link list of groups
    Subset: (count, number of atoms (weigth), links to atoms)

Subprocedures:

    SearchBestSubset( u1, s )
        Returns TRUE, if the best subtree was found (saved in s).
        ((This will be the most difficult part of the procedure,
          needs probably hash tables etc.
        The short cut:  
            Stop when  min( a, b ) * (A.weigth + B.weigth) < current maximum
            

    u2 = DivideUnions( u1, s )
        Removes all groups in union u1 having subset s to union u2.

    AddNewGroupToNode( pNode, s, CreateNode )
        Add the given group to the tree and optionally create new
        tree node for it and returns the address of the node.
    
And here is the algorithm:

BuildOptimalTree( pNode, u, s )
//++
    pNode - current node of the tree
    u - current union
    s - static storage for subset and for the sorting of the atoms in unions
--//
{
    while (SearchBestSubset( u, s ))
    {
        BuildOptimalTree( 
            AddNewGroupToNode( pNode, s, TRUE ),
            DivideUnions( u, s ),
            s
            );
    }
    for (all groups left in u)
        AddNewGroupToNode( pNode, s, FALSE );
}


(This will be done as soon as I will have couple extra days,
 ie. when I will retire)
*******************************