Theory, meet Practice

Introduction to Algorithms, Cormen et al. (first edition, 1990, page 229-230):

If a malicious adversary chooses the keys to be hashed, then he can choose n keys that all hash to the same slot, yielding and average retrieval time of Θ(n). Any fixed hash function is vulnerable to this sort of worst-case behavior; the only effective way to improve the situation is to choose the hash function randomly in a way that is independent of the keys that are actually going to be stored. This approach, called universal hashing, yields good performance on the average, no matter what keys are chosen by the adversary.

ArsTechnica: Huge portions of the Web vulnerable to hashing denial-of-service attack:

Researchers have shown how a flaw that is common to most popular Web programming languages can be used to launch denial-of-service attacks by exploiting hash tables. Announced publicly on Wednesday at the Chaos Communication Congress event in Germany, the flaw affects a long list of technologies, including PHP, ASP.NET, Java, Python, Ruby, Apache Tomcat, Apache Geronimo, Jetty, and Glassfish, as well as Google’s open source JavaScript engine V8. The vendors and developers behind these technologies are working to close the vulnerability, with Microsoft warning of “imminent public release of exploit code” for what is known as a hash collision attack.

Researchers Alexander Klink and Julian Wälde explained that the theory behind such attacks has been known since at least 2003, when it was described in a paper for the Usenix security conference, and influenced the developers of Perl and CRuby to “change their hash functions to include randomization.”

But Klink and Wälde showed that “PHP 5, Java, ASP.NET as well as V8 are fully vulnerable to this issue and PHP 4, Python and Ruby are partially vulnerable, depending on version or whether the server running the code is a 32-bit or 64-bit machine.”

“This attack is mostly independent of the underlying Web application and just relies on a common fact of how Web application servers typically work,” the team wrote, noting that such attacks would force Web application servers “to use 99% of CPU for several minutes to hours for a single HTTP request.”

“Hash tables are a commonly used data structure in most programming languages,” they explained. “Web application servers or platforms commonly parse attacker-controlled POST form data into hash tables automatically, so that they can be accessed by application developers. If the language does not provide a randomized hash function or the application server does not recognize attacks using multi-collisions, an attacker can degenerate the hash table by sending lots of colliding keys. The algorithmic complexity of inserting n elements into the table then goes to O(n**2), making it possible to exhaust hours of CPU time using a single HTTP request.”

Funny. I was just reading Introduction to Algorithms the other day and and read about this problem, then came across the second link just now on JWZ’s blog.