Model Configuration

Note: to access the model configuration files, users need to clone the code repository (installation via pip is insufficient).

During query processing, ThalamusDB dynamically selects the most suitable language model for each semantic operator. Currently, only models by OpenAI are supported. Users can configure the way in which ThalamusDB selects models. The associated configuration file is located at config/models.json. This is the default content of the file:

{
	"models":[
		{"id": "gpt-4o", "modalities":["text", "image"], "priority": 10},
		{"id": "gpt-4o-audio-preview", "modalities":["text", "audio"], "priority": 10}
	]
}

Each entry in the models list is a dictionary with the following properties:

Property Semantics
id The model ID used by OpenAI
modalities A list of supported data modalities
priority ThalamusDB prefers models with higher priority

The following data modalities are recognized:

  • text
  • image
  • audio

When selecting models, ThalamusDB first filters to the models that support all required data types. Note that ThalamusDB supports joins across different data modalities (e.g., matching images with associated text descriptions). In that case, ThalamusDB requires models that support all relevant data modalities.

After narrowing down the choice to the models that support all required data modalities, ThalamusDB considers the priority. Among all eligible models, ThalamusDB selects a model with the highest priority. Ties are broken arbitrarily.