Designing esgrep’s query language

Ahmad B. Amireh

August 22, 2023

On my third attempt at solving the problem of searching JavaScript source code for statements instead of text, I finally arrived at an interface that is both functional and elegant. It has also stirred some controversy so I’ve set out to put into words (1) why I perceive this to be the best solution in the design space visible to me, and (2) the road that led me to it.

Prelude

The scanner program, esgrep, works by accepting a query from the user that describes a target statement they’re looking for, and then scans the source’s syntax tree for matching constructs to finally print the relevant portions. Implementation aside, the user interface itself, or rather, the way the target is described has been a stubborn problem to address.

Let me explain.

If I were to envision the ideal language to host such queries, I would describe it as follows, in order of importance:

  1. Expressive: the user describes their target in clear terms and without ambiguity
  2. Transparent: its syntax does not detract the user from their goal
  3. Simple: the user learns it once and doesn’t need to re-learn it on every use
  4. Expansive: it does not grow in complexity as it grows in functionality
  5. Succinct: it can be used reasonably well on the command-line
  6. Abstract: it does not require a degree to learn yet neither does it leak

In pursuit of such a host, I’ve explored different languages and approaches, including:

SQL was ruled out early on as it can get verbose and would be impractical for use on the command line. Ideally, I want queries to fit on a single line; multiline statements in terminal emulators do not make for the greatest experience.

XPath and CSS selectors

In JavaScript’s ecosystem, CSS selectors are used to query the DOM, which has the shape of a tree. Since the scanner’s input is also a tree, my first attempt was to use CSS selector syntax and semantics to express AST selection. It was a natural first choice because it’d be familiar to my audience.

The general idea is to:

  1. map node names to elements, e.g. CallExpression to call
  2. map node properties to attributes, e.g. a FunctionDeclaration that is annotated with {async: true} would map to fun[async]
  3. map sub expressions to axes, e.g. arguments to a CallExpression would map to call > arg
  4. apply filters on elements using the : operator, e.g. a positional filter to select the first argument in a function call would map to call > arg:nth-of-type(1)

To judge how well the language can carry the semantics of the scanner, let’s map some AST selections to CSS selectors.

Test #1: a string literal of any value:

str

Probably not the most useful selector on its own, it still mapped well: clear and succinct. What if the target was a string that contains the word “hello” in it? We’ll revisit the str selector to support a value attribute that the user can fill in.

Test #2: a string literal containing the word “hello”:

str[value*="hello"]

The introduction of a named attribute (value) means that every user of this selector is now required to remember its name for them to use the selector at all. You could argue that that’s a good thing, or a leak, but let’s reserve judgment.

The scanner works by applying filters to AST nodes, which are functions that map a node to a boolean. But it also has operators that may or may not map to a value. One such operator is ref(), which dereferences a literal value to select its references. We’ll expose it as an element function.

Test #3: a reference to a string literal that contains the word “hello”:

str[value*="hello"]: ref();

I like it. Composition is one of the defining features of declarative query languages, so let’s add support for a union operator that would select x or y.

Test #4: a string literal that contains the word “hello” or a template literal that contains the word “hello” in one of its spans:

str[value*="hello"], tpl[quasis*="hello"]

Do you notice the second leak in the interface? Suddenly, the user has to know about an implementation detail of the lexical components of a TemplateLiteral and remember it in order to select one. “What’s a quasis?” would be the first thing for them to ask, just like I did in the past and don’t necessarily feel the need to share the joy.

On the positive side, CSS selectors already have union support through the , (comma) operator so the translation mapped nicely. There is also native support for complements through the :not() function, and intersection (on attributes) by conjoining their brackets a la fun[name="x"][async].

Watch, though, how it starts to fall apart when attribute values themselves become selectors. Pick the most basic and common selector, id(), which selects statements with a matching Identifier, and apply it to an attribute.

Test #5: find a call to the function x:

call[receiver=id[value="x"]]

We’re quickly exiting familiar CSS selector territory. I am yet to see such a CSS selector, and the syntax highlighter on this page that is choking on the inner expression id[value="x"] suggests that I never will — this isn’t valid CSS anymore! So, how to solve this? I could get crafty and find ways to extend the syntax in a way that would closely resemble valid CSS, but it still wouldn’t be. And I’d need to communicate that to the user, but that’s not the worst of it: for them to issue such a query, they must always be aware of this divergence and how I chose to address it.

You could say that id[value="x"] is overstated and I will agree, but keep in mind that the scanner needs to accept selectors in far more positions than just the root. Attributes should be selectable, with id() or otherwise. If we go back to Test #3 where references to a string were selected, what if we were looking for a call to the function x made with such references as its last argument?

call[receiver=id[value="x"]]
  > arg[value=str[value*="hello"]:ref()]:last()

Yes, it can be expressed. Can we still reasonably compose the str part, though? It might be a template literal, or a string, and we don’t particularly care:

call[receiver=id[value="x"]]
  > arg[
      value=str[value*="hello"]:ref(),
            tpl[quasis*="hello"]:ref()
    ]:last()

Reasonable is a subjective term, but I think it’s fair to say that the syntax has too many primitives. There’s also hardly any consistency – I’d need to constantly consult a manual to write such a query. How do you even indent it? Merely writing this example took a substantial mental effort!

Retracing my thoughts: in the process of writing this query, my mind had to remember the following:

I want to minimize this mental flow. Perhaps we can do better in converting the argument’s value expression to be a descendant of it, then we may employ the axis operator (>) to tidy it:

call[receiver=id[value="x"]]
  > arg:last()
    > str[value*="hello"]:ref(),
      tpl[quasis*="hello"]:ref()

Better! Except that this decision was completely arbitrary and even then, it can only work for nodes like call arguments that comprise exactly one expression.

All in all, this is a painful experience. So maybe hijacking languages made for querying XML nodes was not such a great idea after all. What if we introduce a DSL instead? With complete control over the syntax, surely we can do better.

An ECMAScript replica

Perhaps the biggest advantage to building a Domain Specific Language (DSL) is in the degree to which it lets you tailor an experience. And what I ended up doing counteracted that advantage entirely…

First, I knew what I did not want the language to be: a novel one. This is in no small part due to jq, one of the references I made early in the post. I’ve used it a lot over the years and it’s excellent – once I get it to work. It seemed that no matter how much I used it, I was not becoming adept at it, which prompted me to look into its design and try to learn why.

This is entirely a subjective take and not a formal critique. I find jq to suffer from an identity crisis in its interface where it looks like JavaScript, and Bash, and JSON, while being none of them, which would normally be fine if only the language was more consistent.

For example, I find myself asking these and similar questions every time I use it:

To be fair, a glance at the manual usually unblocks me. What I took away the most was that I did not appreciate the different ways jq lets me do the same thing. A choice with no impact… it only adds fatigue. And, to a lesser extent, the number of primitives that I need to keep in cognitive reach at all times is tiresome.

Okay, so what do I want the language to be, then? What I chose was: a familiar one. Naturally, what’s most familiar to a JavaScript audience is JavaScript itself.

Welcome to my first and biggest mistake…

By voluntarily setting the constraint that the language ought to look like another, I made a full circle back to my earlier experience with XPath and CSS. The design decisions were constantly made in light of: “Well, how does JavaScript do it?” and then look for an adaptation that may or may not resemble the grandfather. Whereas outside such an artificial constraint, the driving factor would’ve been: “What is the simplest way to do this?”

This is what it looked like:

// Any string
:string

// Any object with 0 more properties
:object

// A string or an object (composition)
(:string | :object)

// The "foo" string
"foo"

// Object with 0 properties (i.e. an empty object)
{}

// Object with the "a" property
{ a }

// Object with properties that are neither "a" nor "b"
{ ^a, ^b }

// Call to function "f"
f()

// Call to function "f" on "this"
this.f()

// Call to function "f" on either "this" or "store"
(this | store).f()

// Call to function "then" (e.g. Promise) with the second
// argument being a function that returns another Promise
// using the "new" keyword
**.then(*, :func(*, :of(Promise)))

// an instance of the default export of the module "class.js"
:of(:exportOf(class.js))

// a RegExp literal with "foo" for a pattern
/foo/

// A Link JSX element without an href attribute
<Link ^href />

And this is how it would (somewhat) compare to the previous language:

// a string of any value
:string

// a string with the word "hello":
"hello"

// a string with the word "hello", or any reference to it:
:ref("hello")

// a string or template literal with *exactly* the word "hello":
"hello" | `hello`

// a call to the function x
x()

// a call to the function x with the last argument being a string
// with exactly the word "hello", or any reference to it:
x(**, :ref("hello"))

// the above, but with either a string or a template literal:
x(**, :ref("hello") | :ref(`hello`))

You can see the rest of it in its manual page, which unironically has a preamble on how to read the manual itself. You know what, it might make for a good demo, it looks quirky and curious and inviting, but go ahead and use it and I guarantee you won’t go back to it. Not willingly, at least.

Again, same or similar symptoms to previous approaches:

  1. mental fatigue during use, and
  2. inadvertent unlearning after use, and
  3. constant uncertainty of whether there were no matches to be found, or that your query was simply not formed correctly, whatever that may look like.

This pattern would repeat itself, where an existing, familiar language translates mostly but not quite entirely well into what the scanner needs to receive in order to operate. And I would resist the urge to design a new one because of the learning overhead that would incur on users.

That is, until I tried to apply symbolic expressions to the problem. Suddenly, it was no longer possible to resist.

Symbolic expressions (SEXP)

“Elegance is beauty that shows unusual effectiveness and simplicity.” -Wikipedia

I can hardly find a better testament to this statement (emphasis mine) than in what SEXP did to esgrep’s query language problem. I’m under no illusion that the solution is free of shortcomings, but from where I stand, SEXP simply smashes the competition.

Without further ado, behold my newfound love:

; a string of any value
(str)

; a string with the word "hello":
(str /hello/)

; a string with the word "hello", or any reference to it:
(ref (str /hello/))

; a string or a template literal with the word "hello":
(or (str /hello/) (tpl /hello/))

; a call to the function x
(call x)

; a call to the function x with the last argument being a string
; with the word "hello", or any reference to it:
(call x (arg -1 (ref (str /hello/))))

; the above, but with either a string or a template literal:
(call x (arg -1 (ref (or (str /hello/)
                         (tpl /hello/)))))

; the above again, but this time with a superselector to save the day:
(call x (arg -1 (ref (str+ /hello/))))

LOVE, LOVE, LOVE. Listen. The parenthesis suck, yes. Look past that, right to where the elegance is bound to strike you: just about everything the scanner performs can be expressed in precisely two syntax primitives: an atom, and a list. An atom is just an atom, while a list may contain atoms or other lists in turn, and onwards to your heart’s desire.

Language construct selectors? A list. Composition? A list of lists. Dereferencing? A list and an operand that selects the referent, which is also a list!

That there are only two primitives has forced me to treat them as the precious commodity that they are and pay great attention to what and where things are and aren’t allowed. I can only regard that as beneficial both to myself as a maintainer and to users, as the coming sections aim to substantiate.

Atoms first.

Atoms as parameters

Atoms - wherever they appear - represent parameters supplied by the user. This includes identifiers, literal values, locations, and qualifiers, like x, "hello", 1, and async.

The syntax being tight leaves little room for ambiguity, which in contrast gives me room to optimize for the 80% use case. A prime example of this is how atoms can stand in1 for identifier selectors: an atom like fetch, that is not surrounded by any symbol, expands into (id fetch):

(call fetch) ; is equivalent to
(call (id fetch))

This is tremendously useful as you have to deal with identifiers very often. It saves you keystrokes and reads better.

When surrounded by "" (double quotes), atoms expand into a string literal selector:

(call fetch (arg 1 "/users")) ; is equivalent to
(call fetch (arg 1 (str /users)))

And when surrounded by // (forward slashes), atoms expand into a regex pattern selector:

(str /foo/i) ; is equivalent to
(str (pat foo i))

And finally, the small but potent placeholder atom _ (underscore) which stands for “nothing or anything”:

(call fetch (arg _ "/users")) ; a call to fetch with "/users"
(call _ (arg _ "/users")) ; a call with "/users"

This is actually a keyword that has no corresponding selector, so there’s no expansion at play.

Lists as functions

The second primitive of the language is the list, which is denoted by a pair of parenthesis (()). Lists are used to represent every operator and filter provided by the scanner, with their names listed as the first element in the list.

Filters pertain to language features and select the target, while operators pertain to scanner features and mutate that selection.

I’m still debating whether operators should be marked with a reserved symbol, akin to Clojure with its leading : (colon) symbol:

(:ref (str))

That would give me greater room to expand the set of operators without risk of conflict with language-specific filters. At the same time, it adds complexity on the user’s end.

The toll of positional parameters

Contrast to previous iterations, the SEXP variant does not accept keywords for arguments. This is both good and bad. It’s not as leaky, which is good: the user doesn’t need to remember a word like quasis or such. But they still need to know where a parameter is supplied, which I still can’t find a way to avoid.

I did consider a system of inference, where a supplied value aligns to its intended parameter automatically based on its type. For example, consider the (fun) selector signature:

(fun [name] [args] [qualifiers])

Although the first parameter is legal only as a name selector, in the following query it can be predictably marked blank:

; a function with at least one argument
(fun (arg 1))

By looking at (arg) we can infer that it is only legal in the second parameter position ([args]), so it aligns as the value to that parameter. There is no ambiguity in this case, but not so in others. The inconsistency that would be caused by only situationally being able to infer, even if it is properly signalled, is not a good price to pay to me.

SEXP at scale

One of the qualities I desired in the language was its expansiveness; or that it can accommodate new functionality without adding to the usage complexity.

Watch how 8 different selectors can scream together into a wail of beauty:

(call (mem t
           (ref
              (call
                (import @canvas/i18n useScope))))
      (arg _ (obj (prop count))))

This query is actually from a real-life case at my previous job, saving me from likely a day’s worth of writing yet another AST scanning program to isolate the statements that were causing a bug. Calls to the function I18n.t() — that you get by instantiating an object from the useScope method exported by the @canvas/i18n package — were being made with an object that contained a count property of a non-numeric value. The value type was obviously not available to a static analyzer as they were not literals, but what the analyzer allowed let me to do is cut down from 9000 calls to go through by hand down to a few hundred.

  1. I couldn’t trust my text search to actually find those count properties as they could be anywhere from 0 to 5 lines away, and it would grab all kind of garbage in the way anyway.
  2. The alternative was to write an ad-hoc esprima or similar program to find those statements for me, and endure going through the ECMAScript spec and the relevant tooling handbooks yet another time.

So, I decided to (re)build esgrep once and for all.

Polishing a gem

Paranthesis… or, the elephant in the room. Formatting the SEXP query is somewhat of an awkward dance. A list expression is defined by a balanced pair of parenthesis, and once you’re 5+ levels deep, it starts to get unwieldy.

There are ways to mitigate the formatting problem; it won’t be eliminated altogether but perhaps turn into a minor to non-problem. Parinfer and aggressive-indent-mode algorithms are such possible ways. The idea is to employ an intelligence layer to aid in (a) indenting, and (b) balancing parens.

Function signatures are also another potential usage problem, since they expect parameters at specific positions. I can’t expect users to remember the signatures or positions, but consistency can help a great deal here.

Consistency

I tried my best to establish a few signature rules towards a consistent interface, which can be visualized as such:

[id XOR value] > [properties] > [descendants] > [qualifiers]

The rules might sound arbitrary when listed like this, and they might be. I arrived at them through intuition. For example, an array literal has no name, nor properties; it has nothing actually outside of its value, which are its elements, or descendants, so when I found myself using the (arr) selector, my mind was expecting nothing but the elements of the array, so I declared them first:

(arr [els])

In the case of a CallExpression, the first thing that comes to mind is the name of the function being called, or the callee in formal terms. And then the arguments the call is made with. So the signature follows to become:

(call [callee] [args])

Ultimately, my guiding principle and goal is “less interface”. The transparency of the language is key to the usability of the scanner. If one way is decidedly more intuitive, it becomes the way. The less syntax and fewer rules there are in order to use it, the better.

Environment

Rules aside, editors and IDEs can also help improve usability. Something similar to VSCode’s inlay hints that would show the signature of the selector the user is currently building could go a long way.

Macros

Last but not least, on the verbosity front, and where a query tends to stretch or repeats itself: macros. User-defined macros can reduce the cognitive overhead, allow for domain-specific selectors, save keystrokes, and just make for better readability overall.

Take the query shown previously in the scalability section:

(call (mem t
           (ref
              (call
                (import @canvas/i18n useScope))))
      (arg _ (obj (prop count))))

We could redefine the callee in terms of an i18n-t! macro and store it somewhere:

(def i18n-t
     (mem t
          (ref
            (call
             (import @canvas/i18n useScope)))))

And use it later:

(call i18n-t! (arg _ (obj (prop count))))

Scoring SEXP

In the beginning I described what the ideal language host for esgrep’s queries would look like, this is how I would grade the SEXP variant on that scale:

  Expressive = 1
 Transparent = 0 (converging to 1, depending on how accustomed you’re to it)
  Simple = 1
  Expansive = 1
  Succinct = 0 (parens can trip you up, and you may need multilines)
  Abstract = 1

Overall, I’m pleased with it. It’s a solid foundation with room to grow. Thanks for reading!


  1. There still is a reason to use the explicit forms of the selectors an atom can expand into. One such reason is to specify qualifiers, which you seldom need to do. As such, there is no hypocrisy at play when jq gets criticized earlier for overloads as these are not.↩︎