Nyquist / XLISP 2.0  -  Contents | Tutorials | Examples | Reference

Hash Tables


The internal XLISP 'hash' function from 'xlsym.c':

/* hash - hash a symbol name string */
int hash(char *str, int len)
{
    int i;
    for (i = 0; *str; )
        i = (i << 2) ^ *str++;
    i %= len;
    return (i < 0 ? -i : i);
}

In XLISP this would look like:

(defun lisp-hash (string table-size)
  (let ((i 0))
    (dotimes (index (length string))
      (setq i (logxor (bsh i 2) (char-code (char string index)))))
    (setq i (rem i table-size))
    (if (minusp i) (- i) i)))

A hash function is a kind of random number generator, where the same input always produces the same output number. The XLISP hash function computes equally distributed integer numbers in a given range from the characters of an input string.

A very simple example:

  1. We want to store 4 strings in 2 lists, stored in 2 array elements:

    > (setq my-array (make-array 2))
    #(NIL NIL)  ; NIL NIL = two empty lists
    

    If the array index is computed by the hash function, then the equally distributed numbers make sure that every list will contain approximately the same number of strings:

    > (dolist (string '("a" "b" "c" "d") my-array)
        (push string (aref my-array (hash string (length my-array)))))
    #(("d" "b") ("c" "a"))
    

    The order of the strings in the array was computed by the hash function, it is not the same order as given to dolist.

  2. If we now search for a string in the lists then the hash function will tell us the number of the array element with the list containing the string because the same input string to the hash function always produces the same output number, as long as the same 'table-size' [the same number of array elements] is used:

    > (dolist (string '("a" "b" "c" "d"))
        (format t "~s = ~s~%" string
                  (aref my-array (hash string (length my-array)))))
    "a" = ("c" "a")
    "b" = ("d" "b")
    "c" = ("c" "a")
    "d" = ("d" "b")
    NIL
    

    The hash function will always find the correct list as long as the number of array elements has not changed.

The two main tasks of the hash function are:

  1. Make sure that all lists contain approximately the same number of elements, independent from the characters in the input strings, no matter if the strings are very similar or completely different. With the hash function it will nearly never happen that one list contains all strings while all other lists are empty.

  2. With the same 'name' and 'table-size' arguments the hash function will always return exactly the same integer number, so a string can always be found no matter in what order the strings are stored in the lists of the array.

Now we can find strings stored in lists, but we want to store and find arbitrary things. Therefore we replace the ordinary lists with association lists:

> (setq my-array (make-array 2))
#(() ())

> (dolist (a-cons '(("a" . 1) ("b" . 2) ("c" . 3) ("d" . 4)) my-array)
    (push a-cons (aref my-array (hash (car a-cons) (length my-array)))))
#((("d" . 4) ("b" . 2)) (("c" . 3) ("a" . 1)))

We now have an array like this:

Array 
  0  Association List 1  →  (("d" . 4) ("b" . 2))
  1  Association List 2  →  (("c" . 3) ("a" . 1))

The association lists give the flexibility to store an arbitrary number of key/value pairs, we are not limited by the fixed number of array elements, while the array together with the hash function gives much more speed than a single association list if we want to manage a big number of key/value pairs.

With a big number of key/value pairs it is faster to keep them in many small association lists than in one single big list. Arrays provide random access, where every element can be accessed in the same time, while a list can only be searched from the beginning up to the matching element. The longer the list, the slower the search becomes.

With the hash function we find the association list containing the key:

> (dolist (key '("a" "b" "c" "d"))
    (format t "~s = ~s~%" key
              (aref my-array (hash key (length my-array)))))
"a" = (("c" . 3) ("a" . 1))
"b" = (("d" . 4) ("b" . 2))
"c" = (("c" . 3) ("a" . 1))
"d" = (("d" . 4) ("b" . 2))
NIL

With the assoc function we find the key/value pair:

> (dolist (key '("a" "b" "c" "d"))
    (format t "~s = ~s~%" key
              (assoc key (aref my-array (hash key (length my-array)))
                     :test #'equal)))
"a" = ("a" . 1)
"b" = ("b" . 2)
"c" = ("c" . 3)
"d" = ("d" . 4)
NIL

With the cdr function we get the value:

> (dolist (key '("a" "b" "c" "d"))
    (format t "~s = ~s~%" key
              (cdr (assoc key (aref my-array (hash key (length my-array)))
                          :test #'equal))))
"a" = 1
"b" = 2
"c" = 3
"d" = 4
NIL

And now we have our first working hash-table.

But we still have one problem. The hash function works only with symbols or strings, while assoc can also work with numbers, strings and even lists as 'key' argument. To make our hash-table work with all types assoc can handle, we must make the hash function happy and convert the 'key' argument with format into a string before computing the hash index:

> (setq my-array (make-array 2))
#(() ())

> (dolist (a-cons '((#\x . 1) ((y z) . 2) (12 . 3) (6.5 . 4)) my-array)
    (push a-cons (aref my-array (hash (format nil "~s" (car a-cons))
                                      (length my-array)))))
#(((12 . 3) (#\x . 1)) ((6.5 . 4) ((Y Z) . 2)))

> (dolist (key '(#\x (y z) 12 6.5))
    (format t "~s = ~s~%" key
              (cdr (assoc key (aref my-array (hash (format nil "~s" key)
                                                   (length my-array)))
                          :test #'equal))))
#\x = 1
(Y Z) = 2
12 = 3
6.5 = 4
NIL

Wonderful.

A final quirk still needs to be solved. Maybe you have noticed the :test argument to assoc. Like with all Lisp functions providing :test arguments, the assoc :test defaults to eql [because eq is unreliable with numbers, and eql is faster than equal], but eql doesn't work with floating-point numbers, strings and lists, so we had to use equal.

The typical Lisp solution is to provide a :test argument to the 'make-hash-table' function, so the programmer can choose which function to use. The :test argument to 'make-hash-table' becomes a property of the hash-table itself, so the :test only needs to be given once, at the time when the hash-table is created, and not every time the hash-table is accessed afterwards.

We have the problem that hash-tables are no built-in XLISP data type and we want use make-hash-table in the same way as make-array:

(setq my-hash-table (make-hash-table size :test #'equal))

Here the make-hash-table function has no access to the property list of the 'my-hash-table' symbol, so the only solution is to make the :test function become part of the hash-table itself:

Array 
  0  Test Function  →  the :test argument to assoc
  1  Association List 1  →  ((key1 . value1) ... (keyN . valueN))
  2  Association List 2  →  ((key1 . value1) ... (keyN . valueN))
  3  Association List 3  →  ((key1 . value1) ... (keyN . valueN))
... 
  n  Association List n  →  ((key1 . value1) ... (keyN . valueN))

This is the final layout of our hash-tables, so we can start to implement the hash-table functions.

  Back to top


make-hash-table


(defun make-hash-table (size &optional (test #'eql))
  (and (< size 1) (error "hash-table minimum size is 1" size))
  (let ((hash-table (make-array (1+ size))))
    (setf (aref hash-table 0) test)
    hash-table))

(defun gethash (key hash-table)
  (let* ((size   (1- (length hash-table)))
         (index  (1+ (hash (format nil "~s" key) size)))
         (a-list (aref hash-table index))
         (test   (aref hash-table 0)))
    (cdr (assoc key a-list :test test))))

(defun puthash (key value hash-table)
  (let* ((size   (1- (length hash-table)))
         (index  (1+ (hash (format nil "~s" key) size)))
         (a-list (aref hash-table index))
         (test   (aref hash-table 0))
         (a-cons (assoc key a-list :test test)))
    (setf (aref hash-table index)
          (cons (cons key value)
                (if a-cons
                    (remove-if #'(lambda (x)
                                   (funcall test key (car x)))
                               a-list)
                    a-list)))))

(defun remhash (key hash-table)
  (let* ((size   (1- (length hash-table)))
         (index  (1+ (hash (format nil "~s" key) size)))
         (a-list (aref hash-table index))
         (test   (aref hash-table 0))
         (a-cons (assoc key a-list :test test)))
    (and a-cons
         (setf (aref hash-table index)
               (remove-if #'(lambda (x)
                              (funcall test key (car x)))
                          a-list)))
    a-cons))

(defun clrhash (hash-table)
  (let ((size (1- (length hash-table))))
    (do ((index 1 (1+ index)))
        ((> index size))
      (setf (aref hash-table index) nil))
    hash-table))

(defun hash-table-p (expr)
  (and (arrayp expr)             ; expression is an array
       (> (length expr) 1)       ; with more than one elements
       (fboundp (aref expr 0))   ; first element is a function
       (let ((size (1- (length expr))))   ; all other
         (do ((index 1 (1+ index)))       ; elements are lists
             ((or (> index size)
                  (not (listp (aref expr index))))
              (> index size))))))

(defun hash-table-count (hash-table)
  (let ((size (1- (length hash-table)))
        (entries 0))
    (do ((index 1 (1+ index)))
        ((> index size))
      (setf entries (+ entries (length (aref hash-table index)))))
    entries))

(defun hash-table-size (hash-table)
  (1- (length hash-table)))

(defun hash-table-test (hash-table)
  (aref hash-table 0))

(defun print-hash-table (hash-table)
  (if (not (arrayp hash-table))
      (format t ";; Not an array: ~s~%" hash-table)
      (dotimes (index (length hash-table))
        (let ((element (aref hash-table index)))
          (cond ((not (listp element))
                 (format t ";; array element ~a: ~s~%" index element))
                ((null element)
                 (format t ";; bucket ~a: ()~%" index))
                (t
                 (format t ";; bucket ~a:~%" index)
                 (let ((entry-counter 1))
                   (dolist (entry element)
                     (format t ";; ~a.~a: ~s~%" index entry-counter entry)
                     (incf entry-counter)))))))))

  Back to top


Nyquist / XLISP 2.0  -  Contents | Tutorials | Examples | Reference