SQL in io-ts, Part One: Mutual Recursion & Sub-Select

August 10, 2022

Recently, my team and I successfully completed a daunting task. We were tasked with solving a problem that involved enhancing our application’s security. At the time the front-end was sending raw SQL statements to a third-party API for data retrieval. We needed to create a new flow where the backend would inspect the query before proxying the request to the existing API.

By interjecting the backend into the flow, we were able to accomplish two key objectives:

Restrict the input with additional business rules;
Restrict access to data by applying object-level permissions.

For our efforts to be successful, it was paramount that the front-end would be open to adopting this new endpoint. We knew that the front-end could already send any arbitrary query to the existing API. Therefore, we concluded the only way we could get adoption is to ensure the interface could continue to support any arbitrary query.

To adopt this endpoint, the front-end was asked to change a raw SQL string into a structured JSON, such as:

SELECT * FROM my_table

{
  "select": [{
    "column": "*"
  }],
  "from": [{
    "operator": "FROM",
    "tableName": "my_table"
  }]
}

In this series of articles, you’ll learn about how we updated our Nestjs backend by adding a new POST endpoint having a fully documented, validated, and parsed input JSON body with complete SQL support.

In this first article, you’ll get a glimpse of the io-ts schema we created. In later articles, we’ll explore more details of the schema.

Along the way you’ll read about how we overcame a repeating challenge in this endeavor: mutual recursion.

What is Mutual Recursion?

If you’re unfamiliar with the term mutual recursion, here is a definition from Wikipedia:

In mathematics and computer science, mutual recursion is a form of recursion where two mathematical or computational objects, such as functions or datatypes, are defined in terms of each other.

You might have encountered recursion before and probably have written recursive algorithms. For example, you can implement the math operator modulo with a recursive function. Another common implementation of a function utilizing recursion is tree traversal.

A straightforward example of recursion is:

// hello callstack limit
const foo = () => {
  foo();
}

The kind of recursion shown above is not too difficult to achieve: the function foo is allowed to call itself because foo is in scope.

Mutual recursion adds another wrinkle. With mutual recursion, two or more functions or data types not only refer to themselves but also to each other.

Now, consider the goal at hand and the complexity of the problem becomes apparent: define a JSON structure that supports complete SQL syntax.

Consider some of these possibilities when writing a SQL SELECT statement. Such a statement could have a great deal of mutual recursion such as:

😁 SELECT some columns FROM a table (maybe);

🤔 Some of those columns could be nested SELECT statements themselves;

😖 Some of those tables might also be nested SELECT statements;

😵 The statement might begin with a WITH clause, which is a collection of all the above, accessible anywhere in the SELECT statement;

☠ By the way WITH also supports the RECURSIVE keyword, allowing for internal recursion.

This is just the beginning of describing a SQL SELECT statement.

While trying to implement this complex input, we ran into a limitation with the Nestjs framework.

Decorators Do Not Support Mutual Recursion

When we tried to define the structure of a SQL statement in our Nestjs data transfer objects (DTOs), we immediately encountered a limitation.

Nestjs controllers rely heavily on decorators for both API documentation and validation but unfortunately, decorators do not support mutual recursion.

The problem lies with how mutual recursion is solved: returning the reference to the mutually recursive class lazily requires a deferred function call. If you don’t do this, you inevitably encounter a runtime error trying to access a value in the temporal dead zone.

This limitation stems from the lack of support of decorators in class expressions.

Because of this limitation, we could not accurately express the entire SQL structure with using the traditional Nestjs decorator-based approach.

Validation Solution

It is unfortunate that the SQL endpoint could not use the existing and amazing class validators we had used elsewhere in the Nestjs application. We needed another validator that could handle mutual recursion.

Arriving at the final solution took a couple of attempts to get right.

Attempt One: Joi

Inspired by the Nestjs documentation, we initially tried using joi to validate the incoming SQL request. We followed the existing examples by creating joi schemas and validation pipes.

At first blush, the outcome was actually quite pleasant.

Since joi has support for mutual recursion, we were able to express complex, recursive SQL structures in our validation schemas. Additionally, the validation errors were very succinct and helpful.

Unfortunately, this approach created a new problem.

Since we wanted to use the OpenAPI decorators to generate Swagger documentation, we still needed to create DTOs for our application.

The problem with the joi approach is that if either the DTOs or the joi validation schema needed to change, both needed to be kept in sync. To make matters worse, there was no compiler support to prevent drift between the two structures.

Attempt Two: io-ts

To ensure a more stable solution, we wanted to rely on the TypeScript compiler to flag the code in places where the two structures were out of sync.

To achieve this, we scrapped the joi schema approach and implemented the validations with another npm package: io-ts. The motivation for choosing io-ts is the package’s clever use of TypeScript’s type assertions to provide both static type checking and runtime type checking.

In making this move, we defined the specifications of the SQL structure using the available io-ts types and combinators. From this, we were then able to infer the TypeScript definition of those types. Finally, to achieve the goal of tying the DTOs to the validations, we applied the implements keyword.

Although we still had to keep both the io-ts schema and the DTOs in sync, the new workflow was more palatable. Any change introduced in the schema would cause the compiler to complain until the implementation was updated.

The Sub-Select Schema

When defining the interface for our SQL endpoint, we will start by introducing the structure of the sub-select.

🍿 In later articles, we’ll explore in greater detail the more interesting shapes such as ExpressionItem and FromItems.

This structure describes the top-level properties of the standard SQL SELECT clauses you’d expect to see: WITH, SELECT (optionally DISTINCT ), FROM, WHERE, GROUP BY, HAVING, ORDER BY, LIMIT and OFFSET.

There are two parts to this snippet: the SubSelect interface and the io-ts runtime shape TSubSelect, as shown below:

import * as t from 'io-ts';

export interface SubSelect {
  with?: WithItem[];
  select: SelectItem[];
  distinct?: boolean;
  from?: FromItems;
  where?: ExpressionItem[];
  groupBy?: ExpressionItem[];
  having?: ExpressionItem[];
  orderBy?: OrderByItem[];
  limit?: number | null;
  offset?: number | null;
}

export const TSubSelect: t.Type<SubSelect> = t.recursion('SubSelect', () =>
  t.intersection([
    t.type({
      select: t.array(TSelectItem),
    }),
    t.partial({
      with: t.array(TWithItem),
      distinct: t.boolean,
      from: TFromItems,
      where: t.array(TExpressionItem),
      groupBy: t.array(TExpressionItem),
      having: t.array(TExpressionItem),
      orderBy: t.array(TOrderByItem),
      limit: t.union([t.number, t.null]),
      offset: t.union([t.number, t.null]),
    }),
  ]),
);

😵 There is a lot to unpack here! We’ll go through various examples of each of these structures to further explain what’s going on. To further understand this structure, begin reading the innermost properties and then expand outward.

If this is your first time seeing an io-ts example, then the above is a great introduction for you. Basically, every foundational element you would need is present.

This table compares each property’s TypeScript type against its corresponding io-ts type:

Property	Description	Typescript	io-ts
with	Array of CTEs	WithItem[]	t.array(TWithItem)
select	Array of columns	SelectItem[]	t.array(TSelectItem)
distinct	Adds DISTINCT	boolean	t.boolean
from	Array of tables	FromItems	TFromItems
where	Array of expressions	ExpressionItem[]	t.array(TExpressionItem)
groupBy	Array of expressions	ExpressionItem[]	t.array(TExpressionItem)
having	Array of expressions	ExpressionItem[]	t.array(TExpressionItem)
orderBy	Array of sorts	OrderByItem[]	t.array(TOrderByItem)
limit	Adds LIMIT	limit \| null	t.union([t.number, t.null])
offset	Adds OFFSET	limit \| null	t.union([t.number, t.null])

These define the properties of an object, and objects can be made with two io-ts types:

t.type Defines an object where each property is required;
t.partial Defines an object where each property is optional.

As you can see by our TypeScript interface, we want the select property to be required but the remaining properties are optional. This is done in TypeScript by specifying ? after a property.

To make a composite object that has a mixture of required and optional properties we create an intersection of the t.type and t.partial interfaces using t.intersection .

Lastly, when creating a recursive structure, we must first define the interface and then we re-implement that in io-ts by creating a function that can lazily return the shape of the SubSelect. This is accomplished using the t.recursion type.

😬 Unfortunately, both parts are necessary because io-ts cannot implement recursion without a hint.

While it is unfortunate that we cannot avoid some repetition, we can ensure that the two structures are equivalent. Specifying the t.Type<> type annotation on the left-hand side of the assignment operation causes the TypeScript compiler to verify the value being assigned is compatible with that type. In this case, the type of the return value is inferred and then checked against the static template.

We’ve covered the top-level properties of the SubSelect at a high level. We’ve also introduced the basics of io-ts shapes. In later articles, we’ll explore more details about the SubSelect. Finally, we’ll build on this foundation to create even more complex structures and achieve the FullSelect.

Example

Here is an example of the types of queries the schema supports along with the generated SQL the parser produces.

This SQL SELECT statement will return the estimated number of rows in a table. The query relies on physical disk usage attributes to compute an estimate.

SELECT
  CAST(
    CASE
      WHEN reltuples < 0
        THEN NULL
      WHEN relpages = 0
        THEN CAST(0 AS REAL)
      ELSE
        reltuples / relpages
    END * (
      PG_RELATION_SIZE(oid)
      / CAST(CURRENT_SETTING('block_size') AS INTEGER)
    ) AS BIGINT
  )
FROM pg_catalog.pg_class
WHERE oid = CAST('my_table.my_schema' AS REGCLASS)

To implement this query, the schema looks like this:

{
  "select": [
    {
      "operator": "CAST",
      "expression": {
        "operator": "*",
        "source": {
          "operator": "CASE",
          "when": [
            {
              "where": {
                "operator": "LT",
                "source": { "column": "reltuples" },
                "target": { "value": 0 }
              },
              "then": { "value": null }
            },
            {
              "where": {
                "operator": "EQ",
                "source": { "column": "relpages" },
                "target": { "value": 0 }
              },
              "then": {
                "operator": "CAST",
                "expression": { "value": 0 },
                "dataType": "REAL"
              }
            }
          ],
          "else": {
            "operator": "/",
            "source": { "column": "reltuples" },
            "target": { "column": "relpages" }
          }
        },
        "target": {
          "operator": "()",
          "value": {
            "operator": "/",
            "source": {
              "functionName": "PG_RELATION_SIZE",
              "arguments": [{ "column": "oid" }]
            },
            "target": {
              "operator": "CAST",
              "expression": {
                "functionName": "CURRENT_SETTING",
                "arguments": [{ "value": "block_size" }]
              },
              "dataType": "INTEGER"
            }
          }
        }
      },
      "dataType": "BIGINT"
    }
  ],
  "from": [
    {
      "alias": "",
      "operator": "FROM",
      "allowedTableName": "pg_catalog.pg_class"
    }
  ],
  "where": [
    {
      "operator": "EQ",
      "source": { "column": "oid" },
      "target": {
        "operator": "CAST",
        "expression": { "value": "my_schema.my_table" },
        "dataType": "REGCLASS"
      }
    }
  ]
}

What’s Next?

Building the schema for the SQL endpoint was both challenging and rewarding. Along the way, we learned how to build immensely rich and detailed io-ts schemas. We also learned more about previously unexplored SQL syntax.

The schema we created employed a number of techniques such as enumerations and discriminating unions, intersections of unions, and mutual recursion. It is highly extensible and quite fun to maintain.

What’s more, the front-end was able to consume this new API and convert all of their existing, raw SQL strings into structured JSON requests.

In the next article, we’ll dive deeper into the components that define the schema. In later articles, we’ll explore how the structure is parsed and the SQL string is created. I can tell you that we would never have built this level of capability if the parser was difficult to develop. When you see the parser, you’ll be amazed at how very little code was required to add all these features.

Even later articles will explore the highly detailed Swagger documentation we generated through Nestjs Swagger decorators.