# Hash tables

Hash tables are a real workhorse for programmers and is essentially a quite effective way to store an association between a key and a value. For small number of pairs a simple association list is quite efficient. But for larger numbers of elements, a hash table is usually best until it gets so large that the computer cannot cache the whole table and as a result lookup will become more inefficient. Usually for real large association tables some kind of custom tree structure is used in stead. Guile has vhashes which is a semi-functional hash table system and normal hash tables. Guiles hash tables are custom-able, but unfortunately the implementation means that all custom utility functions need to go via C and then back to scheme which is inefficient and inferior semantically, hence any smart custom system that you think will improve speed is much slower than the standard setup. The driver behind this investigation was that in the python-on-guile project we want to use the same hash strategy as CPython and translate immediate integers to the same integers and much more to improve and match the speed of the python dictionary implementation. So not a problem with semantics, but with efficiency.

Another observation are that a move to front algorithm e.g. assuming that one are using a fraction of the elements of the hash and that moving those quickly to the front of the list is a good strategy and can lead to dramatic speedups compared to not using this algorithm.

Finally it is observed that current CPU architectures has very fast operations on 128 bit registers for e.g. multiple byte comparisons that one can take advantage of to extremely quickly move elements to from, delete elements, find free slots and find matching slots.

In all we want to make a scheme/C implementation (which when guile is matured can be pushed to scheme only) that push more code to scheme but keep some code in C for optimal efficiency e.g. code that only handles hash values and never the keys.

So for a key value pair, we have the key, k, value, v, and the hash of the key, h. The underlying C algorithm can't handle the calculation of the hash from the key or to check for equality because we want it to be customizable and going via C is not a good option. To maintain efficiency in guile that needs to be pushed to scheme. The underlying structure of the hash table are a vector of slots,

$$HashTable = [Slot_1, Slot_2, \ldots, Slot_{2^k}]$$

Each Slot has the form,

$$Slot = [BV_{16,8},Head_{8},SCMV_{8}]$$

Where $$BV_{16,8}$$ is a vector of 16 bytes (128 bit in total) that each is either 0, which means free or a hash key attaining values $$1,\ldots,255$$. The first bv slot (0) represents the cached hadnle and The last slot (15) indicates the rest list and is always 0 so that an allocation can point adn no hash keat kan march it.

$$HEAD$$ (8 bytes on 64bit platforms) is an SCM value consisting of a key value pair which is the front in the move to front algorithm, and $$SCMV$$ is a dynamic SCM vector that can take 4,8,16 items, where the last slots in this vector always represent the rest list. The rest list is used when we have hash clashes e.g. when two 1 byte hashes are the same for different key's and also if we have reached the limit of 15 stored paired in the $$SLOT$$.

In a lookup one divide the hash h bits in $$[k_0,k_1,k_2]$$ where $$k_2$$ (the least significant part) is used to find the slot in the hash table. $$k_1$$ is then used to find the item that matched (in one MMS instruction) to all the stored 1 byte hashes in the $$BV$$ vector. If the match is in front the $$HEAD$$ is returned, this is typically the hot path. Else if there is a match the system growls up the pair from the $$SCMV$$ vector and swap it to the front (this gives super-fast context switches when one move from one working set to another). Finally if there was no match it dispatches back to the scheme code that will look in the rest assoc list if there is a match. Now the scheme code can get a pair back from the C code but it mush check that the pair is actually the correct pair to compare the keys and if correct one have found a match else the scheme code looks in the rest list if the rest list exists.

To see the C code see the link C_CODE

To see the guile scheme code, see SCM_CODE

So how does all this looks in the basic hash implementation on the scheme level?

Here is the hash-ref implementation,

 ;; v       : hash table
;; k       : key
;; h       : hash value (immediate integer)
;; eq      : the equal? used to define equality, typically equal? or eq?

(define-inlinable (stis-c-ref v k default h eq)

;; This will look into the rest list to see if there is a match.
(define (slow-ref) (stis-d-ref v k default h eq))

;; stis-b-search is the C code, returns
;; #f    : if no match
;; pair  : if a candidate is found
;; else  : we need to look in the rest list
(aif it (stis-b-search v h)
(if (pair? it)
(if (eq (car it) k)
(cdr it)
(slow-ref))
(slow-ref))
default)))


Similarly to set a key value pair we have the code,

  ;; v       : hash table
;; k       : key
;; h       : hash value (immediate integer)
;; val     : value
;; eq      : the equal? used to define equality, typically equal? or eq?

(define-inlinable (stis-c-set! v k val h eq)
;; Use this if we need to add or modify a key value to the rest assoc
(define (slow-set!) (stis-d-set! v k val h eq))

;; stis-b-search is the C lookup function
(aif it (stis-b-search v h)
(if (pair? it)
;; We found a pair
(if (eq (car it) k)
;; It is a match
(begin
(set-cdr! it val)
#f)
(slow-set!))
(slow-set!))

;; We must add a new pair, use fast C function
;; that handles the bit magic
(let ((it (cons k val)))
it)))


Now, one should know that this primitive hash table are basically a smob as we need to be able to use a custom mark procedure for the gc. So we can't use an object or struct. But in the higher interface we use a class to represent the full hash table, here is it's specification:

 (define-class <H> () kind m nm km a na ka da name)
;; kind : a possible customize type of hash
;; m    : the primitive hash table smob
;; nm   : size of hash table
;; km   : number of elements in the hash
;; a    : a dynamic list of the handles (key value pair) in insertion order
;; na   : size of this list
;; ka   : number of active elements
;; da   : number of the active elements that are deleted
;; name : get a nice printout of the class with a name for debugging


To make a stis hash table use,

 (make-stis-hash-table kind #:optional (n 32) (m 2) #:key (name #f))
;; kind : kind of hash table e.g. what eq to use and what hash function
;; n    : initial size of table
;; m    : 2^m pre allocated vectors possible values m = (2,3,4)
;; name : name of hash table for debug purposes and nice printout


With this hash table one creates a new type through,

 (make-stis-hash-type kind hash eq)


With obvious semantics as it just stores the type together with accessores (a clashing type will be overwritten).

We do have the old guile hash table interface,

 ;; ACCESSORS
(stis-hash-ref h k #:optional (d #f))
(stis-hash-set! h k v)
(stis-hash-remove! h k)

;; Functional Iterators (all much faster than guile's)
(stis-hash-for-each f h)
(stis-hash-fold f s h)
(stis-hash-map f h)


These constructs for the accessors are slow though and preferable one uses the following macro that will expand the class into variables and use local variables in the macro expanded code for full speed (the power of scheme/lisp)

 (with-stis-hash ((H1 : ref1)
(H2 : ref2 set2)
(H3 : ref3 set3 del3)
(H4 : ref4 _    del4))
...)


As an example consider a utility code to combine two hash tables,

 (define (add a <- b)
(with-stis-hash ((a : _ set))
(stis-hash-for-each set b)))


Speedwise the functional iterators increase the speed by a large factor like 7x many times. Small hashes can be slightly faster and we've seen real speedups when we can take advantage of the move to front capabilities for larger hash tables. Using special hashes also means super fast behavior in some cases.

For speed the ref C functions is accesed through VM instructions. Also when testing, the hash functions was optimized as a special VM instruction, in all we added two vm instructions that jit's correctly.