Querying s-expressions in Common Lisp

Tuesday, April 1, 2008

One downside of using Common Lisp to get things done is that occasionally libraries taken for granted in other languages are missing. I was using the excellent s-xml library to convert XML to s-expressions, when I realized that Common Lisp has no library equivalent in functionality to an implementation of XQuery.

I needed to extract quite a bit of information from the s-expressions I acquired. Using car, cdr, member, and find quickly became infeasible. I had to roll my own query mechanism. It took me about twenty minutes to produce an acceptable solution. It's only around 45 lines of code.

The code is fairly specific to how I parse expressions generated by s-xml. Its symbol handling needs to be modified to fit more general needs. It's an interpreter, not a compiler, so it's not as fast as it could be (though it's fast enough for me, and I deal with quite a bit of data). However, it's a good starting point, and an excellent demonstration of Lisp's expressive power.

Consider the following s-expression:

(setf expr
      '(foo
        (bar baz 1
         (bak 1)
         (bam 2)
	 (bat 3))))

Normally, we'd have to write specialized Lisp code to obtain parts we're interested in. We can now query it directly:

(s-query expr '(bar))
=> ((bar baz 1 (bak 1) (bam 2) (bat 3)))

(s-query expr '(bar bam))
=> ((bam 2))

We can also handle querying attributes (by using keywords) and element contents (represented by last element and queried by using a tilde):

(s-query expr '(bar :baz))
=> (1)

(s-query expr '(bar bam ~))
=> (2)

What we've done so far is useful, but isn't terribly interesting. We provided the query engine with a full path and retrieved the results. In practice things often get more complicated. The query system also supports wildcards via a star selector, and a choice via a list of elements:

(s-query expr '(bar *))
=> ((bak 1) (bam 2) (bat 3))

(s-query expr '(bar (bak bat)))
=> ((bak 1) (bat 3))

We can, of course, continue querying heterogeneous results above for common elements:

(s-query expr '(bar (bak bat) ~))
=> (1 3)

Under 50 lines of code, implemented in possibly less time than it takes to install and understand a library, this is pretty impressive. If I find the time I'll make the code more extensible and robust, convert it to a compiler, and release it as an ASDF-Installable library. In the meantime, I hope people find it useful in it's current form.

Comments?

If you have any questions, comments, or suggestions, please drop a note at coffeemug@gmail.com. I'll be glad to hear your feedback.