Set Mania

We are going to explore how set's are handled in guile-log prolog. Lets start off simple.

To define a set just do:

X is {1,2,3}.

X = {1,2,3}

We have added common set operators e.g.

% union e.g. all combined elements (cup in latex).
?- X is {1,2}  {2,3}.

X = {1,2,3}.

% intersection, e.g. all elements located in both sets (cap in latex)
?- X is {1,2}  {2,3}.

X = {2}.

% set addition e.g. all elements of the sets not located in both (oplus in latex)
?- X is {1,2}  {2,3}.

X = {1,3}.

% sed difference e.g. removing all elements in the second set from the first (setminus in latex).
?- X is {1,2}  {2,3}.

X is {1}.

A new set is not nessesary constructed at each operation, usually we take one of the arguments datastructure and modify that. Ouch you may say, well it's not that terrible. We are working with functional hashmaps that mimics a cons list e.g. vhashes E.g. we are building a new datastructure with the old one semantically untouched. Removing an element a is also not that terrible, just add a map a->deleted and only when there is a lot of fluff we reconstruct the whole map. This means that we can efficiently support a functional interface in stead of the standard mutable interface you see in languages like java and C#. Later we will se that new map bindings will override already stored maps. This is simple to implement using these functional hashmaps.

An element as a datastructure will be interpretted value wise, e.g. two lists with the same content will be equal. Any found prolog varibles will be interpreted as an adress identity. This means that if we have an object X put as a set member and it is later bound to 1, then the variable edentity will still remain and the key does not change. So the set element will be flatted into a scheme datastructure. In the future we will introduce a gard symbol so that you can do e.g. X is {[1,guard(X)]}, then with this X will be copied over verbatim. In all we recomend that varibles are not used as set elemets unless you really now what you are doing.

The first special addition we will make is that a det is a union of a finite set and the complement of a finite set e.g. Aᶜ (unicode superscript c) we can then define the set operators as,

(ABᶜ) is (BA)
(ABᶜ) is (AB)

(AᶜBᶜ) is (AB)
(AᶜBᶜ) is (AB)

⊕,∖ follows from these relations. The conclusion is that any such set can indeed be represented as a generilized set where we define a generalized set as either Set = A or Set = Aᶜ. This is well defined. When one wants a a true set from this representatation one takes a world Ω and intersect it with theSet and for the complement you get (Ω∖A).

The next addition to our set datastructure is to assume that the elements in the set can be a key value pair where the key is the usual set element, e.g. to enable the set to be a map. The basic set operations defined on this will always bind the key to the value located for this in the leftmost set where it is defined. We will see how to preserve this operation even for complemented sets, but's lets see the simple things,

% notise how the keys follow the set logic with the addition that leftmost
% value take precedence

X is {1-a,2-a}{2-b,3-b}.

X = {1-a,2-a,3-b}

X is {1-a,2-a}{2-b,3-b}.

X = {2-a}

Now, there is two roads to go from here. Either we are satisfied or we can demand that the information of the complement for a set be recorded in case complements is used e.g. Basically we want this to be true,

% This shows that we want the elements in the complement to be saved later
% for use if used before the other operator.
X is {1-a}, Y is Xᶜ  {1-b}

X = {1-a}.
Y = {1-a}.

% {1-a} is in the complement of X and the system remembers that in
% the calculation of Y.
X is {1-a,2-a}{2-b}, Y is Xᶜ  {1-c}

X = {2-a}.
Y = {1-a}.

If we first forget about complement, we could just focus on set's and simply for each map A also collect the information that have been skipt in the complement you would then simply have another set B, in the complement of A that contains the extra key value information that will later be materlized in taking a complement. We simply realize that we can use the notation A = A∖B and this is our notation which is both true acording to set theory and contains all extra information that is needed for retrieving correct values associated to the keys. The union and intersection becomes:

(A1B1)(A2B2) = (A1A2)(B1(A1A2)B2).
(A1B1)(A2B2) = (A1A2)((B1A2)(B2A1)).

Note that the order is important here, this means that basically every set element that has ever been touched uppon is stored in either A or B which is kind of expensive. Therefore this is an issue that is critical to control and it is possible to inhibit this feature. Therefore it is good to separate a generlized set/map in one set part and one map part and therfore we end up with the following representation:

% A is a pure map, B is a pure map and C is a pure set,
% The set representation is then A∖B∪C, with A∩B=∅,
% the union and intersection becomes
(A1B1  C1)  (A2B2  C2) = (   ((A1A2)(A1C2)(A2C1))
                                    ((B1(A1A2)B2)(A1C2)(A2C1))
                                   )  (C1  C2).

(A1B1  C1)  (A2B2  C2) = (   (A1A2)
                                     ((B1A2)(B2A1))
                               )  (C1  C2).

When we finally add complemented sets we end up with the representation:

% A is a pure map, B is a pure map and C is a pure set, 
% For repr A∖B∪C:  A∩B=∅.
% For repr A∖B∪Cᶜ: A∩B=∅.

(AB  Cᶜ) = (BAᶜ)C = (BC)(A(BC))  (CA)
(AB  C)  = (BAᶜ)Cᶜ = (BC)(A(BC))(AC)

And we see that the operation of complement is closed, e.g. taking the complement the representation still holds.

The intersection becomes:

(A1B1  C1)  (A2B2  C2) = (   
                                   ((A1A2)(A1C2)(A2C1))
                                  ((B1(A1A2)B2)(A1C2)(A2C1)))
                               )  (C1C2)

(A1B1  C1)  (A2B2  C2ᶜ) = (   
                                   ((A1A2)(A1C2)(A2C1))
                                  ((B1(A1A2)B2)(A1C2)(A2C1)))
                               )  (C1C2)

(A1B1  C1ᶜ)  (A2B2  C2) = (   
                                   ((A1A2)(A1C2)(A2C1))
                                  ((B1(A1A2)B2)(A1C2)(A2C1)))
                               )  (C2C1)
(A1B1  C1ᶜ)  (A2B2  C2ᶜ) = (
                                   ((A1A2)(A1C2)(A2C1))
                                  ((B1(A1A2)B2)(A1C2)(A2C1)))
                                )  (C1C2)

E.g. The representation is closed under intersection.

Finally we need to check it for union as well e.g.

(A1B1  C1)  (A2B2  C2) = (
                                  (A1A2)
                                 ((B1A2)(B2A1))
                               )  (C1  C2)

(A1B1  C1)  (A2B2  C2ᶜ) = (
                                  (A1A2)
                                 ((B1A2)(B2A1))
                                )  (C2C1)

(A1B1  C1ᶜ)  (A2B2  C2) = (
                                  (A1A2)
                                 ((B1A2)(B2A1))
                                )  (C1C2)

(A1B1  C1ᶜ)  (A2B2  C2ᶜ) = (
                                   (A1A2)
                                  ((B1A2)(B2A1))
                                )  (C1C2)

And this is also closed. Because all other operations can be expressed from these operations the system is closed under all set operations.

Testing for an element etc is done through ("in" in latex) , the semantics is easiest to show with examples:

?- X is {1,2-a,3-a}, K  X.

K = 2.

K = 3.

K = 1.


?- X is {1,2-a,3-a}, 1  X.

yes

?- X is {1,2-a,3-a}, 4  X.

no

?- X is {1,2-a,3-a}, 1-V  X.

V = #f.

?- X is {1,2-a,3-a}, 2-V  X.

V = a.

?- X is {1,2-a,3-a}, K-V  X.

K = 2,
V = a.

K = 3,
V = a.

K = 1,
V = #f.

Now ordering of the entering objects is preserved for pure sets and pure map sets. And this is in the spec. Actually the underlying datastructure allows to tag a set element or a key value pair as unordered e.g. some optimisations can be done if we relax the ordering condition, taking advantage of this will come in the future. Let's in stead look at prolog code in action using the set framework. First list_to_set (defined in the module):

list_to_set(L,S) :-
  reverse(L,LL),
  list_to_set(LL,,S).

list_to_set([],S,S).

list_to_set([X|L],S,SS) :-
 (
  (\+var(X), X = K-V) ->
    S2 is {K-V}  S;
    S2 is {X}    S
 ),
 list_to_set(L,S2,SS).

?- list_to_set([1,2],S).

 S = {1,2}.

Simple as a pancake. Now here is a useful procedure, mapset:

mapset(S,Map,SS) :- 
  findall(V, (_-V  (Map  S), V \= #f),L), 
  list_to_set(L,SS).

?- X is {2,1},Map is {1-2,2-1,3-0}, mapset(X,Map,S).

S = {2,1}.

Note that we take the intersection with S last. This will mean that Map decides the order of the resulting set, but in case S containes a key value pair Map take precidence and this is more important.

We define an equal function for sets, and this means that =,== etc do works as expected up to a certain level. The value part is not seen by equal? e.g.:

?- X is {1}, Y is {1-1}, X==Y.

yes

This might be unsatisfying and it is unsure what to do, probably we will define a fluid that specify wha kind of equal we would like to have for set's (there is many thinkable variants).

Also worth noting is that testing to find that two sets are different are cheap, it is when two sets are equal, that generally we need to check n times. We do that by using a nifty hash and explore the properties of the xorfuntion. Namely:

xor(a1,a2,a3,...) = ? % does not depend of the order
xor(a1,a1)        = 0
xor(0 ,a )        = a

This means that maintaining the hash for deletion and addition is a simple one op. So a ~64bit hash is always maintaind for the set and they most often different for different sets. This means that set of sets is much more effective than what you first would think.

Finally we have the subset operators: ⊆,⊂ (subseteq, subset in latex) The second one test for subset or equal and the second one for a pure subset e.g:

?- {1,2}  {1}.

no

?- {1,2}  {1,2}.

yes

?- {1,2}  {1,2,3}.

yes

?- {1,2}  {1}.

no

?- {1,2}  {1,2}.

no

?- {1,2}  {1,2,3}.

yes

Happy Hacking.

links

social