I recently published a long tutorial about how and why to build your own query parser for your application’s search feature. I wrote it because I’ve seen the pain of not doing it.
In the first application I worked on in my career, I added full-text search using Lucene. This was before Solr or Elasticsearch, so I learned a lot building my own indexing and search directly. When we started, we used the venerable Lucene QueryParser
. However, we ran into problems – mainly around user input. I looked at how QueryParser
was implemented. It used JavaCC to parse user input and build the underlying Lucene query objects. Following the built-in parser as a model, I built a simpler one that we could safely expose to users. It was my first experience building a parser for real.
I always wanted to build a query parser at my last job, but there was never time. At my current job, we also ran into similar issues. So I decided to write a tutorial.
First, I wanted to write the code. I started reading about parser generators in Ruby (my strongest language right now) and quickly settled on Parslet because of its great documentation and easy-to-use API. Writing the code was surprisingly easy. I ran into hurdles along the way, of course, but I was able to get something working really quickly. Writing tests and building a query generation harness helped me find problems I missed earlier.
Writing the tutorial took longer. I had to do a lot of research into PEG parsers to avoid writing something incorrect. I sent drafts to former coworkers and got some great feedback that I incorporated into the tutorial.
Along with the code for the parser itself and the tutorial, I also wrote a mini publishing system. The tutorial is written in Markdown with a preprocessor to include snippets of code and SVGs (for the diagrams), then put it into an HTML layout. The Markdown is processed by CommonMarker (one of the nicest Markdown parsers) and the source code is colorized by Rouge. Writing this little publishing system reminded me how fun it can be to write software for yourself! It’s janky, but it was fun to write and it gets the job done.