The internal XLISP 'hash' function from 'xlsym.c':
/* hash - hash a symbol name string */ int hash(char *str, int len) { int i; for (i = 0; *str; ) i = (i << 2) ^ *str++; i %= len; return (i < 0 ? -i : i); }
In XLISP this would look like:
(defun lisp-hash (string table-size) (let ((i 0)) (dotimes (index (length string)) (setq i (logxor (bsh i 2) (char-code (char string index))))) (setq i (rem i table-size)) (if (minusp i) (- i) i)))
A hash function is a kind of random
number generator, where the same input always produces the same output
number.
A very simple example:
We want to store
> (setq my-array (make-array 2)) #(NIL NIL) ; NIL NIL = two empty lists
If the array index is computed by the hash function, then the equally distributed numbers make sure that every list will contain approximately the same number of strings:
> (dolist (string '("a" "b" "c" "d") my-array) (push string (aref my-array (hash string (length my-array))))) #(("d" "b") ("c" "a"))
The order of the strings in the array was computed by the hash function, it is not the same order as given to dolist.
> (dolist (string '("a" "b" "c" "d")) (format t "~s = ~s~%" string (aref my-array (hash string (length my-array))))) "a" = ("c" "a") "b" = ("d" "b") "c" = ("c" "a") "d" = ("d" "b") NIL
The hash function will always find the correct list as long as the number of array elements has not changed.
The two main tasks of the hash
Make sure that all lists contain approximately the same number of elements, independent from the characters in the input strings, no matter if the strings are very similar or completely different. With the hash function it will nearly never happen that one list contains all strings while all other lists are empty.
With the same 'name' and '
Now we can find strings stored in lists, but we want to store and find arbitrary things. Therefore we replace the ordinary lists with association lists:
> (setq my-array (make-array 2)) #(() ()) > (dolist (a-cons '(("a" . 1) ("b" . 2) ("c" . 3) ("d" . 4)) my-array) (push a-cons (aref my-array (hash (car a-cons) (length my-array))))) #((("d" . 4) ("b" . 2)) (("c" . 3) ("a" . 1)))
We now have an array like this:
0 |
(("d" . 4) ("b" . 2)) |
||
1 |
(("c" . 3) ("a" . 1)) |
The association lists give the flexibility to store an arbitrary number
of
With a big number of key/value pairs it is faster to keep them in many
small association lists than in one single
With the hash function we find the
association list containing
> (dolist (key '("a" "b" "c" "d")) (format t "~s = ~s~%" key (aref my-array (hash key (length my-array))))) "a" = (("c" . 3) ("a" . 1)) "b" = (("d" . 4) ("b" . 2)) "c" = (("c" . 3) ("a" . 1)) "d" = (("d" . 4) ("b" . 2)) NIL
With the assoc function we find the
> (dolist (key '("a" "b" "c" "d")) (format t "~s = ~s~%" key (assoc key (aref my-array (hash key (length my-array))) :test #'equal))) "a" = ("a" . 1) "b" = ("b" . 2) "c" = ("c" . 3) "d" = ("d" . 4) NIL
With the cdr function we get the value:
> (dolist (key '("a" "b" "c" "d")) (format t "~s = ~s~%" key (cdr (assoc key (aref my-array (hash key (length my-array))) :test #'equal)))) "a" = 1 "b" = 2 "c" = 3 "d" = 4 NIL
And now we have our first working
But we still have one problem.
> (setq my-array (make-array 2)) #(() ()) > (dolist (a-cons '((#\x . 1) ((y z) . 2) (12 . 3) (6.5 . 4)) my-array) (push a-cons (aref my-array (hash (format nil "~s" (car a-cons)) (length my-array))))) #(((12 . 3) (#\x . 1)) ((6.5 . 4) ((Y Z) . 2))) > (dolist (key '(#\x (y z) 12 6.5)) (format t "~s = ~s~%" key (cdr (assoc key (aref my-array (hash (format nil "~s" key) (length my-array))) :test #'equal)))) #\x = 1 (Y Z) = 2 12 = 3 6.5 = 4 NIL
Wonderful.
A final quirk still needs to be solved. Maybe you have noticed the :test
argument to assoc. Like with all Lisp
functions providing :test arguments, the
assoc :test defaults to
eql [because
eq is unreliable with numbers, and
eql is faster
The typical Lisp solution is to provide a :test argument to the
'make-hash-table' function, so the programmer can choose which function to
use.
We have the problem that
(setq my-hash-table (make-hash-table size :test #'equal))
Here the make-hash-table function has no access to the property list of
the '
0 |
|||
1 |
|||
2 |
|||
3 |
|||
|
|||
n |
This is the final layout of our
(defun make-hash-table (size &optional (test #'eql)) (and (< size 1) (error "hash-table minimum size is 1" size)) (let ((hash-table (make-array (1+ size)))) (setf (aref hash-table 0) test) hash-table)) (defun gethash (key hash-table) (let* ((size (1- (length hash-table))) (index (1+ (hash (format nil "~s" key) size))) (a-list (aref hash-table index)) (test (aref hash-table 0))) (cdr (assoc key a-list :test test)))) (defun puthash (key value hash-table) (let* ((size (1- (length hash-table))) (index (1+ (hash (format nil "~s" key) size))) (a-list (aref hash-table index)) (test (aref hash-table 0)) (a-cons (assoc key a-list :test test))) (setf (aref hash-table index) (cons (cons key value) (if a-cons (remove-if #'(lambda (x) (funcall test key (car x))) a-list) a-list))))) (defun remhash (key hash-table) (let* ((size (1- (length hash-table))) (index (1+ (hash (format nil "~s" key) size))) (a-list (aref hash-table index)) (test (aref hash-table 0)) (a-cons (assoc key a-list :test test))) (and a-cons (setf (aref hash-table index) (remove-if #'(lambda (x) (funcall test key (car x))) a-list))) a-cons)) (defun clrhash (hash-table) (let ((size (1- (length hash-table)))) (do ((index 1 (1+ index))) ((> index size)) (setf (aref hash-table index) nil)) hash-table)) (defun hash-table-p (expr) (and (arrayp expr) ; expression is an array (> (length expr) 1) ; with more than one elements (fboundp (aref expr 0)) ; first element is a function (let ((size (1- (length expr)))) ; all other (do ((index 1 (1+ index))) ; elements are lists ((or (> index size) (not (listp (aref expr index)))) (> index size)))))) (defun hash-table-count (hash-table) (let ((size (1- (length hash-table))) (entries 0)) (do ((index 1 (1+ index))) ((> index size)) (setf entries (+ entries (length (aref hash-table index))))) entries)) (defun hash-table-size (hash-table) (1- (length hash-table))) (defun hash-table-test (hash-table) (aref hash-table 0)) (defun print-hash-table (hash-table) (if (not (arrayp hash-table)) (format t ";; Not an array: ~s~%" hash-table) (dotimes (index (length hash-table)) (let ((element (aref hash-table index))) (cond ((not (listp element)) (format t ";; array element ~a: ~s~%" index element)) ((null element) (format t ";; bucket ~a: ()~%" index)) (t (format t ";; bucket ~a:~%" index) (let ((entry-counter 1)) (dolist (entry element) (format t ";; ~a.~a: ~s~%" index entry-counter entry) (incf entry-counter)))))))))