spql-lang.com Open in urlscan Pro
94.130.239.30  Public Scan

URL: https://spql-lang.com/
Submission: On March 25 via api from US — Scanned from US

Form analysis 1 forms found in the DOM

GET search.html

<form class="sidebar-search-container" method="get" action="search.html" role="search">
  <input class="sidebar-search" placeholder="Search" name="q" aria-label="Search">
  <input type="hidden" name="check_keywords" value="yes">
  <input type="hidden" name="area" value="default">
</form>

Text Content

Contents Menu Expand Light mode Dark mode Auto light/dark mode
Hide navigation sidebar
Hide table of contents sidebar
We're in pre-alpha mode. Got any feedback? florents@tselai.com
Toggle site navigation sidebar
spql: like sed, awk, and jq, but for LLMs
Toggle Light / Dark / Auto color theme
Toggle table of contents sidebar

Back to top
Toggle Light / Dark / Auto color theme
Toggle table of contents sidebar


SPQL

like sed, awk, and jq, but for LLMs.

Usage | Installation | Examples | Features | How | Reference






echo '"https://www.reddit.com/r/python/top.json"' |
spql '
    # Issue a get request
    http_get |
    # Process the json response with standard jq
    .data.children[] | 
    {
        "title": .data.title, 
        "author": .data.author,
        "title_tokens": .data.title | tokenize("hf-internal-testing/tiny-random-gpt2") ,
        "title_prompt": (.data.title | prompt("llama2c")),
        "title_embedding_4": (.data.title | embed("hf-internal-testing/tiny-random-gpt2") | .[0:4]),
        "title_embedding_ndims": (.data.title | embed("hf-internal-testing/tiny-random-gpt2") | ndims)
    }
'


Copy to clipboard

Here’s how spql queries can be executed in different contexts:

Shell

echo '[1,2,3]' | spql 'vector | ndims'
echo '"Hello world"' | spql 'prompt'
echo '"Hello world"' | spql 'embed("hf-internal-testing/tiny-random-gpt2")' 


Copy to clipboard
SQLite

sqlite3 <<SQL
.load spqlite

select spql('[1,2,3]', 'vector | ndims');
select spql('"Hello world"', 'prompt');
select spql('"Hello world"', 'embed("hf-internal-testing/tiny-random-gpt2")');

SQL


Copy to clipboard
Postgres

N/A yet; coming soon.

psql <<SQL

create extension spql;
select spql('[1,2,3]'::jsonb, 'vector | ndims');
select spql('"Hello world"'::jsonb, 'prompt');
select spql('"Hello world"'::jsonb, 'embed("hf-internal-testing/tiny-random-gpt2")');

SQL


Copy to clipboard
DuckDB

N/A yet; coming soon.

duckdb <<SQL
INSTALL spql;
LOAD spql;

select spql('[1,2,3]', 'vector | ndims');
select spql('"Hello world"', 'prompt');
select spql('"Hello world"', 'embed("hf-internal-testing/tiny-random-gpt2")');

SQL


Copy to clipboard







spql is a command line tool, programming language and database extension for
working with LLMs.

spql’s goal is to allow the creation of composable pipelines of vectors in a
minimal but powerful way. It is equally usable and flexible across different
execution contexts.

It borrows from Unix philosophy but with a twist: It treats vector as a
universal interface, not text. In an LLM world, however, text and vector are not
that different, are they?

The pattern is clear: sed, awk, grep, and friends worked for text. jq worked for
JSON data. spql works on vectors.

In the snippet above, notice how spql glues things together:

 * Everything is executed in the shell.

 * the http_get function is run within the program itself (powered by curl).

 * The tokenizer and the embedding model used are coded in Python and provided
   by huggingface.

 * The llama2c model is coded in C and is local.

 * Input and output are JSON.


🚀 GETTING STARTED#


INSTALLATION#

In the current alpha version, the easiest way to try out spql is to alias bash
functions to docker containers.

Running these in your shell will provide you with a spql bash function and a
sqlite3 instance pre-bundled with spql.

function spql() {
  docker run -v $HOME/.cache/huggingface:/.cache/huggingface -i florents/spql:v0.1.0a1 "$@"
}


Copy to clipboard

function sqlite3() {
  docker run -i --entrypoint /usr/bin/sqlite3 florents/spql:v0.1.0a1 "$@"
}


Copy to clipboard

This will allow you to replicate the examples below.


USAGE#

Here’s how spql queries can be executed in different contexts:

Shell

echo '[1,2,3]' | spql 'vector | ndims'
echo '"Hello world"' | spql 'prompt'
echo '"Hello world"' | spql 'embed("hf-internal-testing/tiny-random-gpt2")' 


Copy to clipboard
SQLite

sqlite3 <<SQL
.load spqlite

select spql('[1,2,3]', 'vector | ndims');
select spql('"Hello world"', 'prompt');
select spql('"Hello world"', 'embed("hf-internal-testing/tiny-random-gpt2")');

SQL


Copy to clipboard
Postgres

N/A yet; coming soon.

psql <<SQL

create extension spql;
select spql('[1,2,3]'::jsonb, 'vector | ndims');
select spql('"Hello world"'::jsonb, 'prompt');
select spql('"Hello world"'::jsonb, 'embed("hf-internal-testing/tiny-random-gpt2")');

SQL


Copy to clipboard
DuckDB

N/A yet; coming soon.

duckdb <<SQL
INSTALL spql;
LOAD spql;

select spql('[1,2,3]', 'vector | ndims');
select spql('"Hello world"', 'prompt');
select spql('"Hello world"', 'embed("hf-internal-testing/tiny-random-gpt2")');

SQL


Copy to clipboard


🌟 FEATURES#

spql is equally usable as a CLI tool, a programming language and as a database
extension.

Here are the features:

 * Standard library of vector operations: L2 distance, inner product, and cosine
   distance

 * Interact with LLMs to generate, embed, and tokenize content.

 * Support for both local and remote LLMs.

 * Support for sentence-transformers and huggingface transformers.

 * Support for llamafile models.

 * Out-of-the-box extensions for Postgres, SQLite, and DuckDB.

 * 100% jq-compliant. Your existing jq programs continue to work. Your jq-fu can
   still make you a codegolf superstar.


🧱 EXAMPLES#

spql simply extends jq by adding some custom types and functions.

This means that standard jq syntax and functionality is 100% availble. Thus,
have a look at jq’s manual.

Below are some examples that showcase some of these types and functions
available, but are mostly here to wet your appetite.

See the Reference for details.


PROMPTS#

Try multiple prompts in one go

echo '["Hello", "Hi", "Howdy", "¡Hola!", "Γειά σου μαν μου"]' | 
spql '
    .[] | 
    {
        "prompt": ., 
        "response": prompt
    }
'


Copy to clipboard


PIPE | MODELS TOGETHER#

Now this is cool: You can pipe different models together in a Unix-like fashion.

echo '"Tell me a story"' |
spql 'prompt("llama2c")' |
spql -c 'embed("hf-internal-testing/tiny-random-gpt2")'


Copy to clipboard


EMBEDDINGS#

echo '
[
  ["King", "Queen"],
  ["Table", "Tableau"],
  ["Dog", "Hot"]
]
' | 
spql 'map([ (.[0] | embed) , (.[1] | embed) ])'


Copy to clipboard

spql -n '[1,2,3,4] | l2_norm'


Copy to clipboard

spql -n '[1,2,3,4] | l2_distance([11,22,33,43])'


Copy to clipboard

spql -n '[1,2,3,4] | dot_product([1,3,5,40])'


Copy to clipboard

spql -n '[1,2,3,4] | cosine_similarity([1,3,5,40])'


Copy to clipboard

Issue a GET request, extract a piece of json with standard jq, embed using.

echo '"https://www.reddit.com/r/llm/top.json"' |
spql '
    http_get |
    .data.children[0].data.title |
    embed("hf-internal-testing/tiny-random-gpt2") |
    l2_norm
'


Copy to clipboard


COMPLEX PROGRAMS#

In the previous example we embedded only the first element in the children
array. To embed the title of each element we could do it like this:

echo '"https://www.reddit.com/r/python/top.json"' |
spql '
    # Issue a get request
    http_get |
    # Process the json response with standard jq
    .data.children[] | 
    {
        "title": .data.title, 
        "author": .data.author,
        "title_tokens": .data.title | tokenize ,
        "title_prompt": (.data.title | prompt("llama2c")),
        "title_embedding_4": (.data.title | embed | .[0:4]),
        "title_embedding_ndims": (.data.title | embed("hf-internal-testing/tiny-random-gpt2") | ndims)
    }
'


Copy to clipboard


PROGRAM FROM FILES#

For complex programs you can use the -f src.jq argument to execute programs
written in files.

cat << EOF > /tmp/myprogram.jq
. | length
EOF


Copy to clipboard

spql -f /tmp/myprogram.jq


Copy to clipboard


PASSING --ARGUMENTS#

Here’s an example shell script that iterates over the lines of a csv file and
generates embeddings for it’s columns. Notice how bash variables can be passed
to the program using --arg.

cat <<EOF > /tmp/spql.csv
id,name,title
1,John Doe,Software Engineer
2,Jane Doe,Data Scientist
3,James Doe,Product Manager
EOF


Copy to clipboard

# Skip the header line using tail -n +2
tail -n +2 "/tmp/spql.csv" | while IFS=, read -r id name title
do
  spql --arg id "$id" --arg name "$name" --arg title "$title" -n '{
    id: $id,
    name: $name,
    title: $title,
    title_embedding: ($title | embed)
  }'
done


Copy to clipboard


SQLITE#

Let’s generate some alternative openings for classic books

sqlite3 <<SQL
.load spqlite

CREATE TABLE books
(
    author  TEXT,
    year    INTEGER,
    title   TEXT,
    opening TEXT
);

INSERT INTO books (author, year, title, opening)
VALUES ('"Jane Austen"', 1813, 'Pride and Prejudice',
        '"It is a truth universally acknowledged, that a single man in possession of a good fortune, must be in want of a wife."'),
       ('"Charles Dickens"', 1859, 'A Tale of Two Cities',
        '"It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness..."'),
       ('"Herman Melville"', 1851, 'Moby-Dick', '"Call me Ishmael."'),
       ('"Leo Tolstoy"', 1869, 'War and Peace',
        '"Well, Prince, so Genoa and Lucca are now just family estates of the Buonapartes."'),
       ('"Mary Shelley"', 1818, 'Frankenstein',
        '"You will rejoice to hear that no disaster has accompanied the commencement of an enterprise which you have regarded with such evil forebodings."'),
       ('"F. Scott Fitzgerald"', 1925, 'The Great Gatsby',
        '"In my younger and more vulnerable years my father gave me some advice that I have been turning over in my mind ever since."'),
       ('"Mark Twain"', 1884, 'The Adventures of Huckleberry Finn',
        '"You don’t know about me without you have read a book by the name of The Adventures of Tom Sawyer; but that ain’t no matter."'),
       ('"George Orwell"', 1949, '1984', '"It was a bright cold day in April, and the clocks were striking thirteen."'),
       ('"J.R.R. Tolkien"', 1954, 'The Lord of the Rings',
        '"When Mr. Bilbo Baggins of Bag End announced that he would shortly be celebrating his eleventy-first birthday with a party of special magnificence, there was much talk and excitement in Hobbiton."'),
       ('"Emily Brontë"', 1847, 'Wuthering Heights',
        '"I have just returned from a visit to my landlord – the solitary neighbour that I shall be troubled with."');

SELECT title, spql(opening, 'prompt')
FROM books;
SQL


Copy to clipboard


💾 DATABASE EXTENSIONS#

spql can be easily and natively embedded in SQL queries via database extensions.
Available in SQLite, Postgres and DuckDB.

The signature is straightforward:

SELECT spql(json, prog)


Copy to clipboard


🤔 HOW#

spql is built on top of jq . It uses its compiler, syntax and standard library,
but it adds custom types and functions suitable for LLMs. It embeds the Python
natively to enable access to its vast Machine Learning ecosystem.


🤔 WHY#

The future (probably) belongs to LLMs targeting specific vertical domains, piped
together with more general purpose LLMs. We’ll need tools to glue these things
together.


📖REFERENCE#

Like standard jq, spql functions have the generic format:input | func(arg0;
arg1;...)

If a function doesn’t require any arguments rather than it’s input only, is
called like a simple filter input | func

In many cases and depending on the context, the second format simply uses sane
defaults for the arguments.

Plurals in arguments and returns indicate an array of. For example consider the
knn function with following signature:

vector | knn(vectors; number=5) → vectors

it expects a single vector as input, an array of vector objects as the first
argument, a number as the second argument (which if not provided will be set to
5) and returns an array of vectors.


VECTORS#

 * array | vector → vector

 * vector | ndims → number

 * vector | l2_norm → number

 * vector | l2_distance(vector) → number

 * vector | cosine_distance(vector) → number

 * vector | cosine_similarity(vector) → number

 * vector | dot_product(vector) → number


NEAREST NEIGHBOR SEARCH#

 * vector | knn_exact(vectors; number) → vectors

 * vectors | knn_hnsw(vector; number) → vectors


EMBED#

 * text | embed → vector

 * text | embed(model) → vector

 * text | embed(model, params) → vector


PROMPT#

 * text | prompt → prompt

 * text | prompt(model) → prompt

 * text | prompt(model, params) → prompt


TOKENIZE#

 * text | tokenize → vector

 * text | tokenize(model) → vector

 * text | tokenize(model, params) → vector


HTTP#

 * text | http_get → object

 * url | http_get(params) → object

 * body | http_post(url) → object

 * body | http_post(url;params) → object


CONFIGURATION#


ENVIRONMENT VARIABLES#

The following environment variables are taken into account during execution

 * SP_* prefixed environment variables

 * Huggingface HF_ environment variables.


🤔 FAQ#


I’VE HEARD/SEEN THAT JQ IS COMPLICATED#

It’s true that many jq programs out there can seem fancy and are showing off but
this doesn’t have to be the case. The perceived complexity stems from the fact
that jq has a rather small standard library and people hesitate to write
multi-line programs in jq.


ONLY JSON ?#

spql is based on jq, so it consums and returns json. but I’ve been thinking
about other input formats, like Arrow.


WHAT LANGUAGE IS THIS WRITTEN IN ?#

It’s written in C, just like jq.


WHERE CAN I SEE THE FORMAL SPEC OF SPQL / JQ#

jq is not formally formally specified, but here is a recent effort of a formal
spec .


ACKNOWLEDGMENTS#

Thanks to the jq community and especially to the contributors who breathed fresh
air into the project.



Copyright © 2024, Florents Tselai <florents@tselai.com>
Made with Sphinx and @pradyunsg's Furo
     
Contents
 * 🚀 Getting Started
   * Installation
   * Usage
 * 🌟 Features
 * 🧱 Examples
   * Prompts
   * Pipe | Models Together
   * Embeddings
   * Complex Programs
   * Program from files
   * Passing --arguments
   * SQLite
 * 💾 Database Extensions
 * 🤔 How
 * 🤔 Why
 * 📖Reference
   * Vectors
     * Nearest Neighbor Search
   * Embed
   * Prompt
   * Tokenize
   * HTTP
   * Configuration
     * Environment Variables
 * 🤔 FAQ
   * I’ve heard/seen that jq is complicated
   * Only json ?
   * What language is this written in ?
   * Where can I see the formal spec of spql / jq
 * Acknowledgments