Doug Turnbull created a new Python package called SearchArray for experimenting with search relevance tuning.
It can supercharge any dataframe into a BM25-powered term/phrase index. Under the hood it’s a Pandas extension array backed by a traditional inverted index. Its tokenizers are just python functions that turn strings into lists of tokens. Its stemmers are just… boring python packages.
Previously, to run a search relevance experiment, I’d have to standup a bunch of systems. But now, with SearchArray, everything can just run in a single colab notebook.
I like this idea a lot. A Pandas-based search backend makes a lot of sense for small scale relevance experiments. When you’ve narrowed in on something that seems promising, then you can translate it to Lucene or whatever.