spql-lang.com
Open in
urlscan Pro
94.130.239.30
Public Scan
URL:
https://spql-lang.com/
Submission: On March 25 via api from US — Scanned from US
Submission: On March 25 via api from US — Scanned from US
Form analysis
1 forms found in the DOMGET search.html
<form class="sidebar-search-container" method="get" action="search.html" role="search">
<input class="sidebar-search" placeholder="Search" name="q" aria-label="Search">
<input type="hidden" name="check_keywords" value="yes">
<input type="hidden" name="area" value="default">
</form>
Text Content
Contents Menu Expand Light mode Dark mode Auto light/dark mode Hide navigation sidebar Hide table of contents sidebar We're in pre-alpha mode. Got any feedback? florents@tselai.com Toggle site navigation sidebar spql: like sed, awk, and jq, but for LLMs Toggle Light / Dark / Auto color theme Toggle table of contents sidebar Back to top Toggle Light / Dark / Auto color theme Toggle table of contents sidebar SPQL like sed, awk, and jq, but for LLMs. Usage | Installation | Examples | Features | How | Reference echo '"https://www.reddit.com/r/python/top.json"' | spql ' # Issue a get request http_get | # Process the json response with standard jq .data.children[] | { "title": .data.title, "author": .data.author, "title_tokens": .data.title | tokenize("hf-internal-testing/tiny-random-gpt2") , "title_prompt": (.data.title | prompt("llama2c")), "title_embedding_4": (.data.title | embed("hf-internal-testing/tiny-random-gpt2") | .[0:4]), "title_embedding_ndims": (.data.title | embed("hf-internal-testing/tiny-random-gpt2") | ndims) } ' Copy to clipboard Here’s how spql queries can be executed in different contexts: Shell echo '[1,2,3]' | spql 'vector | ndims' echo '"Hello world"' | spql 'prompt' echo '"Hello world"' | spql 'embed("hf-internal-testing/tiny-random-gpt2")' Copy to clipboard SQLite sqlite3 <<SQL .load spqlite select spql('[1,2,3]', 'vector | ndims'); select spql('"Hello world"', 'prompt'); select spql('"Hello world"', 'embed("hf-internal-testing/tiny-random-gpt2")'); SQL Copy to clipboard Postgres N/A yet; coming soon. psql <<SQL create extension spql; select spql('[1,2,3]'::jsonb, 'vector | ndims'); select spql('"Hello world"'::jsonb, 'prompt'); select spql('"Hello world"'::jsonb, 'embed("hf-internal-testing/tiny-random-gpt2")'); SQL Copy to clipboard DuckDB N/A yet; coming soon. duckdb <<SQL INSTALL spql; LOAD spql; select spql('[1,2,3]', 'vector | ndims'); select spql('"Hello world"', 'prompt'); select spql('"Hello world"', 'embed("hf-internal-testing/tiny-random-gpt2")'); SQL Copy to clipboard spql is a command line tool, programming language and database extension for working with LLMs. spql’s goal is to allow the creation of composable pipelines of vectors in a minimal but powerful way. It is equally usable and flexible across different execution contexts. It borrows from Unix philosophy but with a twist: It treats vector as a universal interface, not text. In an LLM world, however, text and vector are not that different, are they? The pattern is clear: sed, awk, grep, and friends worked for text. jq worked for JSON data. spql works on vectors. In the snippet above, notice how spql glues things together: * Everything is executed in the shell. * the http_get function is run within the program itself (powered by curl). * The tokenizer and the embedding model used are coded in Python and provided by huggingface. * The llama2c model is coded in C and is local. * Input and output are JSON. 🚀 GETTING STARTED# INSTALLATION# In the current alpha version, the easiest way to try out spql is to alias bash functions to docker containers. Running these in your shell will provide you with a spql bash function and a sqlite3 instance pre-bundled with spql. function spql() { docker run -v $HOME/.cache/huggingface:/.cache/huggingface -i florents/spql:v0.1.0a1 "$@" } Copy to clipboard function sqlite3() { docker run -i --entrypoint /usr/bin/sqlite3 florents/spql:v0.1.0a1 "$@" } Copy to clipboard This will allow you to replicate the examples below. USAGE# Here’s how spql queries can be executed in different contexts: Shell echo '[1,2,3]' | spql 'vector | ndims' echo '"Hello world"' | spql 'prompt' echo '"Hello world"' | spql 'embed("hf-internal-testing/tiny-random-gpt2")' Copy to clipboard SQLite sqlite3 <<SQL .load spqlite select spql('[1,2,3]', 'vector | ndims'); select spql('"Hello world"', 'prompt'); select spql('"Hello world"', 'embed("hf-internal-testing/tiny-random-gpt2")'); SQL Copy to clipboard Postgres N/A yet; coming soon. psql <<SQL create extension spql; select spql('[1,2,3]'::jsonb, 'vector | ndims'); select spql('"Hello world"'::jsonb, 'prompt'); select spql('"Hello world"'::jsonb, 'embed("hf-internal-testing/tiny-random-gpt2")'); SQL Copy to clipboard DuckDB N/A yet; coming soon. duckdb <<SQL INSTALL spql; LOAD spql; select spql('[1,2,3]', 'vector | ndims'); select spql('"Hello world"', 'prompt'); select spql('"Hello world"', 'embed("hf-internal-testing/tiny-random-gpt2")'); SQL Copy to clipboard 🌟 FEATURES# spql is equally usable as a CLI tool, a programming language and as a database extension. Here are the features: * Standard library of vector operations: L2 distance, inner product, and cosine distance * Interact with LLMs to generate, embed, and tokenize content. * Support for both local and remote LLMs. * Support for sentence-transformers and huggingface transformers. * Support for llamafile models. * Out-of-the-box extensions for Postgres, SQLite, and DuckDB. * 100% jq-compliant. Your existing jq programs continue to work. Your jq-fu can still make you a codegolf superstar. 🧱 EXAMPLES# spql simply extends jq by adding some custom types and functions. This means that standard jq syntax and functionality is 100% availble. Thus, have a look at jq’s manual. Below are some examples that showcase some of these types and functions available, but are mostly here to wet your appetite. See the Reference for details. PROMPTS# Try multiple prompts in one go echo '["Hello", "Hi", "Howdy", "¡Hola!", "Γειά σου μαν μου"]' | spql ' .[] | { "prompt": ., "response": prompt } ' Copy to clipboard PIPE | MODELS TOGETHER# Now this is cool: You can pipe different models together in a Unix-like fashion. echo '"Tell me a story"' | spql 'prompt("llama2c")' | spql -c 'embed("hf-internal-testing/tiny-random-gpt2")' Copy to clipboard EMBEDDINGS# echo ' [ ["King", "Queen"], ["Table", "Tableau"], ["Dog", "Hot"] ] ' | spql 'map([ (.[0] | embed) , (.[1] | embed) ])' Copy to clipboard spql -n '[1,2,3,4] | l2_norm' Copy to clipboard spql -n '[1,2,3,4] | l2_distance([11,22,33,43])' Copy to clipboard spql -n '[1,2,3,4] | dot_product([1,3,5,40])' Copy to clipboard spql -n '[1,2,3,4] | cosine_similarity([1,3,5,40])' Copy to clipboard Issue a GET request, extract a piece of json with standard jq, embed using. echo '"https://www.reddit.com/r/llm/top.json"' | spql ' http_get | .data.children[0].data.title | embed("hf-internal-testing/tiny-random-gpt2") | l2_norm ' Copy to clipboard COMPLEX PROGRAMS# In the previous example we embedded only the first element in the children array. To embed the title of each element we could do it like this: echo '"https://www.reddit.com/r/python/top.json"' | spql ' # Issue a get request http_get | # Process the json response with standard jq .data.children[] | { "title": .data.title, "author": .data.author, "title_tokens": .data.title | tokenize , "title_prompt": (.data.title | prompt("llama2c")), "title_embedding_4": (.data.title | embed | .[0:4]), "title_embedding_ndims": (.data.title | embed("hf-internal-testing/tiny-random-gpt2") | ndims) } ' Copy to clipboard PROGRAM FROM FILES# For complex programs you can use the -f src.jq argument to execute programs written in files. cat << EOF > /tmp/myprogram.jq . | length EOF Copy to clipboard spql -f /tmp/myprogram.jq Copy to clipboard PASSING --ARGUMENTS# Here’s an example shell script that iterates over the lines of a csv file and generates embeddings for it’s columns. Notice how bash variables can be passed to the program using --arg. cat <<EOF > /tmp/spql.csv id,name,title 1,John Doe,Software Engineer 2,Jane Doe,Data Scientist 3,James Doe,Product Manager EOF Copy to clipboard # Skip the header line using tail -n +2 tail -n +2 "/tmp/spql.csv" | while IFS=, read -r id name title do spql --arg id "$id" --arg name "$name" --arg title "$title" -n '{ id: $id, name: $name, title: $title, title_embedding: ($title | embed) }' done Copy to clipboard SQLITE# Let’s generate some alternative openings for classic books sqlite3 <<SQL .load spqlite CREATE TABLE books ( author TEXT, year INTEGER, title TEXT, opening TEXT ); INSERT INTO books (author, year, title, opening) VALUES ('"Jane Austen"', 1813, 'Pride and Prejudice', '"It is a truth universally acknowledged, that a single man in possession of a good fortune, must be in want of a wife."'), ('"Charles Dickens"', 1859, 'A Tale of Two Cities', '"It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness..."'), ('"Herman Melville"', 1851, 'Moby-Dick', '"Call me Ishmael."'), ('"Leo Tolstoy"', 1869, 'War and Peace', '"Well, Prince, so Genoa and Lucca are now just family estates of the Buonapartes."'), ('"Mary Shelley"', 1818, 'Frankenstein', '"You will rejoice to hear that no disaster has accompanied the commencement of an enterprise which you have regarded with such evil forebodings."'), ('"F. Scott Fitzgerald"', 1925, 'The Great Gatsby', '"In my younger and more vulnerable years my father gave me some advice that I have been turning over in my mind ever since."'), ('"Mark Twain"', 1884, 'The Adventures of Huckleberry Finn', '"You don’t know about me without you have read a book by the name of The Adventures of Tom Sawyer; but that ain’t no matter."'), ('"George Orwell"', 1949, '1984', '"It was a bright cold day in April, and the clocks were striking thirteen."'), ('"J.R.R. Tolkien"', 1954, 'The Lord of the Rings', '"When Mr. Bilbo Baggins of Bag End announced that he would shortly be celebrating his eleventy-first birthday with a party of special magnificence, there was much talk and excitement in Hobbiton."'), ('"Emily Brontë"', 1847, 'Wuthering Heights', '"I have just returned from a visit to my landlord – the solitary neighbour that I shall be troubled with."'); SELECT title, spql(opening, 'prompt') FROM books; SQL Copy to clipboard 💾 DATABASE EXTENSIONS# spql can be easily and natively embedded in SQL queries via database extensions. Available in SQLite, Postgres and DuckDB. The signature is straightforward: SELECT spql(json, prog) Copy to clipboard 🤔 HOW# spql is built on top of jq . It uses its compiler, syntax and standard library, but it adds custom types and functions suitable for LLMs. It embeds the Python natively to enable access to its vast Machine Learning ecosystem. 🤔 WHY# The future (probably) belongs to LLMs targeting specific vertical domains, piped together with more general purpose LLMs. We’ll need tools to glue these things together. 📖REFERENCE# Like standard jq, spql functions have the generic format:input | func(arg0; arg1;...) If a function doesn’t require any arguments rather than it’s input only, is called like a simple filter input | func In many cases and depending on the context, the second format simply uses sane defaults for the arguments. Plurals in arguments and returns indicate an array of. For example consider the knn function with following signature: vector | knn(vectors; number=5) → vectors it expects a single vector as input, an array of vector objects as the first argument, a number as the second argument (which if not provided will be set to 5) and returns an array of vectors. VECTORS# * array | vector → vector * vector | ndims → number * vector | l2_norm → number * vector | l2_distance(vector) → number * vector | cosine_distance(vector) → number * vector | cosine_similarity(vector) → number * vector | dot_product(vector) → number NEAREST NEIGHBOR SEARCH# * vector | knn_exact(vectors; number) → vectors * vectors | knn_hnsw(vector; number) → vectors EMBED# * text | embed → vector * text | embed(model) → vector * text | embed(model, params) → vector PROMPT# * text | prompt → prompt * text | prompt(model) → prompt * text | prompt(model, params) → prompt TOKENIZE# * text | tokenize → vector * text | tokenize(model) → vector * text | tokenize(model, params) → vector HTTP# * text | http_get → object * url | http_get(params) → object * body | http_post(url) → object * body | http_post(url;params) → object CONFIGURATION# ENVIRONMENT VARIABLES# The following environment variables are taken into account during execution * SP_* prefixed environment variables * Huggingface HF_ environment variables. 🤔 FAQ# I’VE HEARD/SEEN THAT JQ IS COMPLICATED# It’s true that many jq programs out there can seem fancy and are showing off but this doesn’t have to be the case. The perceived complexity stems from the fact that jq has a rather small standard library and people hesitate to write multi-line programs in jq. ONLY JSON ?# spql is based on jq, so it consums and returns json. but I’ve been thinking about other input formats, like Arrow. WHAT LANGUAGE IS THIS WRITTEN IN ?# It’s written in C, just like jq. WHERE CAN I SEE THE FORMAL SPEC OF SPQL / JQ# jq is not formally formally specified, but here is a recent effort of a formal spec . ACKNOWLEDGMENTS# Thanks to the jq community and especially to the contributors who breathed fresh air into the project. Copyright © 2024, Florents Tselai <florents@tselai.com> Made with Sphinx and @pradyunsg's Furo Contents * 🚀 Getting Started * Installation * Usage * 🌟 Features * 🧱 Examples * Prompts * Pipe | Models Together * Embeddings * Complex Programs * Program from files * Passing --arguments * SQLite * 💾 Database Extensions * 🤔 How * 🤔 Why * 📖Reference * Vectors * Nearest Neighbor Search * Embed * Prompt * Tokenize * HTTP * Configuration * Environment Variables * 🤔 FAQ * I’ve heard/seen that jq is complicated * Only json ? * What language is this written in ? * Where can I see the formal spec of spql / jq * Acknowledgments