Use a Custom LLM or Embedder
Configure NQRust Analytics to run on the LLM and embedding provider of your choice via config.yaml.
NQRust Analytics is provider-agnostic: you choose which LLM generates SQL and
which embedding model indexes your schema. Model access is routed through a
LiteLLM-style abstraction, so most providers can be enabled by editing
config.yaml alone.
For the most predictable results, well-tested chat models such as OpenAI
GPT-4o or GPT-4o-mini tend to perform best. Smaller or less common models work,
but may require more tuning. The installer's default generation model is
gpt-4o-mini.
Steps
Check whether your model is already supported
Before configuring anything, confirm that your provider and model are known to
the underlying LiteLLM layer. If they are, you can usually enable them by
editing config.yaml without writing any code. The installer also ships ready
-made templates for 15 providers — OpenAI, Anthropic, Azure OpenAI, AWS Bedrock,
Google AI Studio, Google Vertex AI, DeepSeek, xAI Grok, Groq, Qwen3, Zhipu,
OpenRouter, Ollama, LM Studio, and a generic local-LLM template — which is the
fastest path for common cases.
Start from a template
Rather than authoring config.yaml by hand, copy the template that matches your
provider and use it as a base. Provide your credentials, your API
endpoint, and the model name(s) you intend to use.
Configure your LLM provider
In config.yaml, define the LLM block. Set the model name, the API endpoint,
and any provider-specific options. A typical LLM block looks like this:
type: llm
provider: litellm_llm
models:
- model: gpt-4o-mini
api_base: https://api.openai.com/v1
timeout: 120
kwargs:
temperature: 0
n: 1
max_tokens: 4096
response_format:
type: json_objectModels are referenced elsewhere using a <provider>.<model_name> notation — for
example litellm_llm.gpt-4o-mini.
Configure your embedder
If you are also customizing the embedding model, define an embedder block and
make sure the embedding dimension matches what your vector store expects. Set
embedding_model_dim in the document_store block accordingly:
type: embedder
provider: litellm_embedder
models:
- model: text-embedding-3-large
api_base: https://api.openai.com/v1
timeout: 120
---
type: document_store
provider: qdrant
embedding_model_dim: 3072The embedding dimension must agree across both blocks, or indexing will fail.
Restart the service
Apply the changes by restarting NQRust Analytics. The new model configuration takes effect on startup.
Running a local model with Ollama
Ollama lets you run open models locally. You can run it either as
a desktop application or inside a Docker container; the only difference is the
endpoint that config.yaml targets.
When the service and Ollama run on the same host, use the host's local address.
When NQRust Analytics runs in Docker and needs to reach an Ollama instance on
the host machine, use the host-gateway address instead of localhost, because
localhost inside a container refers to the container itself.
type: llm
provider: litellm_llm
models:
- model: ollama_chat/llama3.1:8b
api_base: http://host.docker.internal:11434
timeout: 120
kwargs:
temperature: 0Confirm that the container can reach the Ollama endpoint. A misconfigured
api_base is the most common cause of connection errors with local models.
