Compression Digest
compression/_posts/2015-07-11-hash.md
Introduction to Hash Algorithms
[Literal] This document introduces hash functions, which map a set of N items into a structure that allows for O(1) expected time dictionary queries, contrasting with O(log N) time for sorted structures. [AI Synthesis] Hashing offers a trade-off between space complexity (O(N)) and query time.
Key points
- [Literal] Hashing involves two main steps: computing a hash function to transform keys into array indices and a collision-resolution process for keys that map to the same index.
- [Literal] A hash function $h$ maps a universe $U$ of keys to an array $T$ of size $m$, denoted as $h : U \rightarrow \{0, 1, ..., m-1\}$.
- [Literal] The uniform hashing assumption states that any element in $U$ is equally likely to hash into any of the $m$ slots, independently of other elements.
- [Literal] A good hash function must be deterministic, efficient to compute, and uniformly distribute keys across the possible indices to minimize collisions.
- [Literal] The division method, $h(k) = k \pmod{m}$, is a common hashing technique, particularly effective when $m$ is a prime number unrelated to key patterns.
- [Literal] Hashing by multiplication and universal hashing are other methods mentioned.
- [Literal] Collisions occur when two different keys hash to the same slot.
- [Literal] Collision resolution techniques include separate chaining (using linked lists for each slot) and open-addressing methods like linear probing.
- [AI Synthesis] Separate chaining uses extra space for pointers to manage collisions, while open-addressing utilizes empty slots within the table itself.
- [Literal] Hash tables are advantageous because keys don't need to be ordered, and they can achieve constant-time performance in practice.
- [Literal] Hash tables are not ideal for scenarios requiring sorted output, small dictionaries where computing a good hash function is difficult, or real-time systems due to expensive resizing operations.
Patterns / reminders
- [AI Synthesis] The core benefit of hashing is achieving average O(1) lookup time by trading space for speed, making it suitable for large datasets where quick access is paramount.
- [AI Synthesis] The effectiveness of a hash table heavily relies on the quality of the hash function and the chosen table size ($m$) to minimize collisions.
- [AI Synthesis] When choosing between separate chaining and open-addressing, consider memory overhead (pointers vs. empty slots) and the complexity of probing sequences.
Sources
- (Source: raw/_posts/2015-07-11-hash.md)