I was wondering today what was the fastest way to check whether a particular string is in a long list of strings. Until discovering the set type, I would sometimes create a dictionary whose keys where the values from my list and values were all ones, because dictionary lookup would clearly be faster than array walking.
I learned about set, and this seemed more elegant (and probably saves memory storage space compared to the dict strategy), so I have been using that lately.
Today, I was curious and wanted to test the performance of each to know for sure, and ending up asking @hmason if she knew a good place to get a random corpus of words. When I told here the purpose, she said frozenset would be the fastest for sure. I had never heard of frozenset. So I threw that into the test as well.
Turns out frozenset and set are the way to go (I am sticking with set because it’s a bit more flexible and pretty close in performance).
In : import set_list_dict_performance In : set_list_dict_performance.test_performance() ['frozenset', 0.00027542114257812501] ['set', 0.00042564868927001952] ['dict', 0.00062072277069091797] ['list', 0.098987483978271479]
Comically, you’ll note that I am using the slowest lookup in n_random_words. Should be using set instead of list, clearly! ^_^