The fraud was in the code

Molly White covers some of the code that was entered into evidence at the FTX trial, in the form of GitHub screenshots. The fraud was in the code was weird for me because I’ve never before seen a criminal case full of screenshots of software I work on.

Code can be hard to understand, but this code was strong evidence for the prosecution because it was more or less:

if is_almeda_research:
    do_fraud()

New life goal: Never have my code entered into evidence at a criminal trial.

— November 28, 2023

SearchArray

Doug Turnbull created a new Python package called SearchArray for experimenting with search relevance tuning.

It can supercharge any dataframe into a BM25-powered term/phrase index. Under the hood it’s a Pandas extension array backed by a traditional inverted index. Its tokenizers are just python functions that turn strings into lists of tokens. Its stemmers are just… boring python packages.

Previously, to run a search relevance experiment, I’d have to standup a bunch of systems. But now, with SearchArray, everything can just run in a single colab notebook.

I like this idea a lot. A Pandas-based search backend makes a lot of sense for small scale relevance experiments. When you’ve narrowed in on something that seems promising, then you can translate it to Lucene or whatever.

— November 28, 2023