List, Dict, Set, and Frozen Set Performance in Python

12 Oct 2010

I was wondering today what was the fastest way to check whether a particular string is in a long list of strings. Until discovering the set type, I would sometimes create a dictionary whose keys where the values from my list and values were all ones, because dictionary lookup would clearly be faster than array walking.

I learned about set, and this seemed more elegant (and probably saves memory storage space compared to the dict strategy), so I have been using that lately.

Today, I was curious and wanted to test the performance of each to know for sure, and ending up asking @hmason if she knew a good place to get a random corpus of words. When I told here the purpose, she said frozenset would be the fastest for sure. I had never heard of frozenset. So I threw that into the test as well.

Turns out frozenset and set are the way to go (I am sticking with set because it’s a bit more flexible and pretty close in performance).

Results:

In [1]: import set_list_dict_performance
In [2]: set_list_dict_performance.test_performance()
['frozenset', 0.00027542114257812501]
['set', 0.00042564868927001952]
['dict', 0.00062072277069091797]
['list', 0.098987483978271479]

Comically, you’ll note that I am using the slowest lookup in n_random_words. Should be using set instead of list, clearly! ^_^

Sourceon github

Updates:

Great discussion on stack overflow about this stuff here and modified tests on github here