Half-hearted hash table
- July 8, 2012
Hash tables have slightly changed between OCaml 3.12.1 and OCaml 4.00.0. While some care has been taken for forward compatibility, you might encounter strange behaviors if you accidentally try to backport a hash table.
Here are two snippets of code:
(* dumpml *) let _ = let h = Hashtblcreate 2 in Hashtbladd h 23l "Hello"; Hashtbladd h 42l "World"; let oc = open_out_bin "dump" in output_value oc h; close_out oc
(* readml *) let _ = let ic = open_in_bin "dump" in let h = (input_value ic: (int32, string) Hashtblt) in Printfprintf "iter\n!"; Hashtbliter (fun k v -> Printfprintf "%ld -> %s\n" k v) h; Printfprintf "find\n!"; let s1 = Hashtblfind h 23l in let s2 = Hashtblfind h 42l in Printfprintf "print\n!"; Printfprintf "%s %s\n" s1 s2; close_in ic
Now, here is the output I got from running read
:
$ ./read iter 42 -> World 23 -> Hello find Fatal error: exception Not_found
What kind of sorcery is this!?
The problem is: I work on two machines, one of which is not mine, and
quite hostile. Therefore, instead of building my whole compiling
environment on it, I just hacked my path to point to the ocaml build
directory of my boss. dump
(of course, I only presented
here a simplification of it) has to be run on this machine, because it
has a PowerPC architecture, which is useful in this project. However, I
run read
on my own machine, because it’s much simpler. Both
used to run OCaml 3.12.1, since the project can’t be built under
4.00.
However, one day, the boss updated OCaml on the PowerPC machine to
4.00. After that, I re-ran dump
, oblivious to that change,
and then was a bit puzzled by read
’s output! («It used to
work!»™)
So, why does Hashtbl.iter
behave well, while
Hashtbl.find
can’t find the keys? It’s just that
iter
browses through the buckets, ignoring the hash
function entirely, while find
hashes the key, and looks
into the bucket for that particular hash. Since the hash function
changed, but not the underlying representation of hash tables,
iter
succeeds while find
fails.
Conclusion: Beware when dealing with serialized data structures among heterogeneous environments. Well, we already knew that, didn’t we? :-)