Introduction
The Dria Compute Node (dria-node) is a single Rust binary that lets you serve AI models on the Dria network and earn rewards. It runs models locally using llama.cpp — no Ollama or external dependencies required.
Single binary — no Docker, no Ollama, no complex setup.
Built-in model management — downloads, caches, and benchmarks GGUF models automatically.
GPU acceleration — supports Apple Metal, NVIDIA CUDA, and AMD ROCm out of the box.
Auto-updates — checks for new versions on startup and updates itself.
Reconnection — automatically reconnects with exponential backoff if the connection drops.
Installation
- macOS / Linux (Homebrew)
- macOS / Linux (Shell Script)
- Windows (PowerShell)
- AMD ROCm (Linux)
- Build from Source
Setup
Run the interactive setup wizard to select and download a model:- Detect your available RAM and filter models that fit your system.
- Let you pick a model from the supported list.
- Download the GGUF model file from HuggingFace.
- Run a test inference and print your benchmark TPS.
Starting Your Node
Once setup is complete, start your node:Configuration
All flags can also be set via environment variables:| Flag | Env Var | Default | Description |
|---|---|---|---|
--wallet | DRIA_WALLET | (required) | Ethereum private key (hex, 32 bytes) |
--model | DRIA_MODELS | (required) | Model(s) to serve, comma-separated |
--gpu-layers | DRIA_GPU_LAYERS | 0 (CPU only) | GPU layers to offload (-1 = all) |
--max-concurrent | DRIA_MAX_CONCURRENT | 1 | Max parallel inference tasks |
--data-dir | DRIA_DATA_DIR | ~/.dria | Directory for cached models |
--quant | DRIA_QUANT | Per-model default | Override GGUF quantization (e.g. Q8_0) |
--context-size | DRIA_CONTEXT_SIZE | Model’s native | Max context window (tokens) |
--kv-quant | DRIA_KV_QUANT | q8_0 | KV cache quantization (f16, f32, q8_0, q4_0, q4_1, q5_0, q5_1) |
--router-url | DRIA_ROUTER_URL | quic.dria.co:4001 | Router URL |
--skip-update | DRIA_SKIP_UPDATE | false | Skip auto-update check on startup |
Set
RUST_LOG=debug for verbose logging during troubleshooting.GPU Acceleration
dria-node supports GPU acceleration for faster inference:
- Apple Metal — Enabled automatically on macOS with Apple Silicon.
- NVIDIA CUDA — Use the CUDA build or
--features cudawhen building from source. - AMD ROCm — Use the ROCm install script or
--features rocmwhen building from source.
System Requirements
- OS: macOS (Intel + Apple Silicon), Linux (x86_64, arm64), Windows (x86_64)
- RAM: Minimum ~1 GB (for smallest model) to ~27 GB (for largest model) — see Selecting Models for details
- Disk: Space for GGUF model files (0.5 GB to 24.5 GB depending on model)
- Network: Outbound UDP port 4001 (QUIC connection to router)
- GPU (optional): Apple Metal, NVIDIA CUDA, or AMD ROCm 6.x
How It Works
Your node connects to the Dria router network via QUIC (a fast, encrypted UDP-based protocol). Here’s what happens:- Authentication — The router sends a random challenge. Your node signs it with your private key to prove identity.
- Registration — Your node announces which model(s) it can serve.
- Task assignment — The router forwards inference requests from users to your node based on model availability and capacity.
- Inference — Your node runs the model locally and streams results back.
- Backpressure — If your node is at capacity, it rejects new tasks and the router re-routes to another node.
The node supports text, vision (image), and audio inference depending on the model. See Selecting Models for model capabilities.
