Semantic query processing on multimodal data

Semantic Queries

ThalamusDB processes queries with semantic operators using LLMs.

SELECT H.pic
FROM HolidayPictures H, ProfilePictures P
WHERE P.name in ('Alice', 'Bob')
AND NLFILTER(H.pic, 
  'this is a picture of the beach')
AND NLJOIN(H.pic, P.pic, 
  'the same person appears in both pictures');

This query retrieves beach pictures showing Alice or Bob. It uses NLFILTER to filter out beach pictures. NLJOIN matches pictures showing the same person.

Multimodal Data

ThalamusDB processes tables and many unstructured data types.

ThalamusDB processes queries on tables, supporting all SQL types.

ThalamusDB analyzes text according to natural language instructions.

ThalamusDB analyzes images in PNG, JPG, and JPEG format via LLMs.

ThalamusDB processes audio data in WAV and MP3 format via LLMs for audio.

Simply store paths to pictures and audio files in text columns. ThalamusDB automatically detects the data type of referenced files and selects a suitable LLM for processing.

Reducing Costs

ThalamusDB reduces costs by approximate processing.

Users can set bounds on per-query processing costs. ThalamusDB generates the best possible result with bounded overheads.

Users can set constraints on result error. ThalamusDB tries to minimize overheads while satisfying those constraints.

During processing, ThalamusDB regularly displays partial results, based on processing a part of the entire database.

Comparison to Other Engines

ThalamusDB differs by its focus on approximate query processing. It processes semantic operators on data subsets to derive deterministic bounds on query aggregates and partial query results. A comparison by data types and interface (as of 8/4/25):

Criterion	ThalamusDB	LOTUS	Palimpzest	FlockMTL	CAESURA
Data Types	Text, Table, Image, Sound	Text, Table, Image	Text, Table, Image	Text, Table	Text, Table, Image
Interface	Semantic SQL	Python	Python	Semantic SQL	Natural Language

Resources

Learn about ThalamusDB in the documentation and papers.

Documentation

Learn how to use the latest ThalamusDB version by reading the manual here.

Research Paper

Dive deep into the technical ideas behind ThalamusDB by reading the latest paper here.

Get ThalamusDB

You can obtain ThalamusDB in multiple ways.

Installation via PIP

Run the following commands in the terminal:

pip install thalamusdb
thalamusdb [PathToDuckDBDatabase]

These commands install ThalamusDB and start the ThalamusDB console.

Download the Code

Run the following commands in the terminal:

git clone https://github.com/itrummer/thalamusdb
cd thalamusdb
pip install -r requirements.txt

These commands download the ThalamusDB code and install its requirements.