Crunch Data with furnace-sql

Crunch Data with furnace-sql
Photo by yasin hemmati / Unsplash

Crunching data from my local files, with SQL, is a task that I come across ever so often. I was never happy with the CLI tools out there and I’m not interested in signing up for some cloud product. So, when I got started with bun a few months ago I immediately saw a way to make use of bun:sqlite !

That’s how furnace-sql was born. It’s in an early state and, for now, maybe just useful to me. But, I’m really happy to be back into posting random stuff I found useful to Github/npm.

What Does Furnace SQL Do?

This library exposes a method that lets you load up a bunch of csv or parquet files into a local SQLite database and then query it via TypeScript. Here’s a super simple example, let’s assume we have a bunch of csv files that have these 3 columns:

id,date,amount
abc1,2024-02-01,1000
abc2,2024-02-02,4300
abc3,2024-02-02,120
...etc

And then, let’s load it and crunch it:

import { executeTransformation } from "furnace-sql/dist/src/index";

const myTransformation = await executeTransformation<MyType>(
  {
    transformationScript: "SELECT SUM(amount) FROM transactions;",
    tables: [
      {
        name: "transactions",
        schema: [
          { key: "id", type: "TEXT" }, { key: "date", type: "TEXT" }, { key: "amount", type: "INTEGER" },
        ],
        files: [
          {
            type: "csv",
            path: `data/t1.csv`,
          },
          {
            type: "csv",
            path: `data/t2.csv`,
          },
        ],
      },
    ],
  },
  { debug: true },
);

The variable myTransformation will return an object:

{
   result: MyType[],
   db: Database,
}

Where the result will be an array of items coming from the SELECT statement we used. And the db will return the actual database instance in case you want to do any extra queries.

Beyond Local Querying

A common use case for a lot of systems is to crunch data from various text/log files for analytical or reporting purposes. This kind of stuff is usually non-mission critical and you really only care about two things: implementing this quickly and not paying a lot of money for it.

I believe something like this to be a great fit for that! You can easily integrate it into an existing system that runs on bun or package it up in a simple serverless runtime like Fargate.

I plan to implement a streaming interface in the future, so integrating with something like S3 will be trivial and Lambda will become a viable option as well. This will be much easier to set up and maintain than something like AWS Glue and will come with the advantage of high portability – just move to wherever is cheapest!

Filip Danic

Filip Danic

Software Development Manager at Amazon and recovering JavaScript enthusiast. Sharing my experience and insights with the broader tech community via this blog!
The Netherlands