___ ___ .____ .__ __ / | \ ___.__.______ ___________| | |__|/ |_ ____ / ~ < | |\____ \_/ __ \_ __ \ | | \ __\/ __ \ \ Y /\___ || |_> > ___/| | \/ |___| || | \ ___/ \___|_ / / ____|| __/ \___ >__| |_______ \__||__| \___ > \/ \/ |__| \/ \/ \/
No API keys. No usage tracking. No network after the initial model download. Everything runs on your hardware.
TUI visible in milliseconds. Sessions load instantly. Streamed tokens hit the screen with no runtime overhead.
26 tools — files, shell, web, git, RAG, memory. The model reads, writes, searches, and chains multi-step tasks inside your terminal.
Index an entire repo with local embeddings. Relevant chunks are retrieved and injected before each message — semantic search over your whole project, fully offline.
Save facts the model carries across every session — preferences, project details, recurring patterns. Stored in SQLite, retrieved by semantic similarity.
Opt-in git integration. Injects current branch, staged changes, diff stats, and unpushed commits into every prompt. Plus git_status, git_log, git_diff, and git_blame tools.
llamafile (auto-managed), llama.cpp, Ollama, LM Studio, KoboldCpp, LocalAI, vLLM, Jan, text-generation-webui, GPT4All.
Full markdown, syntax highlighting, collapsible reasoning blocks, fuzzy session search, and a reorganized command palette with Sessions / Agent / Display / Options tabs.
Every conversation in SQLite with WAL mode and 256 MB mmap. Instant load. Fuzzy search across all sessions.
Tokens stream directly from the inference server. Hardware — CPU, RAM, GPU — visible in the sidebar. Model and provider shown at a glance. Agent tab in the command palette controls everything AI-related.
Switching models is Ctrl+M. Switching agents is Ctrl+A. On an RTX 4090 with a 14B model, expect 50+ tok/s.
Built-in model picker lists recommended GGUF models filtered to your hardware. Select and press Enter to download directly from HuggingFace CDN with a live progress bar.
Models saved to ~/.hyperlite/models/ and available immediately. SmolLM2 1.7B to Llama 3.3 70B.
Press Ctrl+K for the command palette. Four tabs — Sessions, Agent, Display, Options. Agent tab is home to model switching, RAG indexing, memory, and git context controls.
Tab between panels. Arrow keys navigate. Enter runs. Esc closes.
A purpose-built variant for the Raspberry Pi 5 and ARM64 single-board computers. Stripped down and optimised — no RAG embedding overhead, no memory embedding model, no ONNX runtime. Just a native ARM64 binary and the fastest possible inference for the hardware.
| Model | Params | Tokens/sec |
|---|---|---|
| SmolLM2 | 1.7B | 35–50 |
| Qwen2.5 | 3B | 22–32 |
| Phi-4 Mini | 3.8B | 18–28 |
| Llama 3.2 | 3B | 20–30 |
| Mistral | 7B | 10–14 |
| Llama 3.1 | 8B | 9–13 |
The standard llamafile is an x86_64 binary. Running it on a Pi triggers QEMU emulation — 5–10× slower. HyperLite-PI compiles llama-server natively from source on first launch, targeting Cortex-A76 directly.
Compiler auto-detects the CPU and enables every available instruction set — NEON SIMD, int8 dot product, hardware AES. All the gains from the silicon already in the Pi.
KV cache stored at Q8 instead of F16 — halves memory bandwidth per token. Model weights locked in RAM with --mlock — no page faults during inference.
No ONNX runtime, no fastembed, no embedding model download. RAG, persistent memory, and the git agent are intentionally excluded — a Pi needs every bit of RAM for the LLM, not infrastructure overhead.
All backends probed concurrently at startup. Only reachable servers appear in the model picker.
| backend | port | formats | |
|---|---|---|---|
| Direct GGUF | 18080 | GGUF · GGML | auto-managed |
| Ollama new | 11434 | GGUF · GGML · SafeTensors | external |
| llama.cpp | 8080 | GGUF · GGML | external |
| LM Studio | 1234 | GGUF · EXL2 | external |
| KoboldCpp | 5001 | GGUF · GGML | external |
| text-generation-webui | 5000 | GGUF · GPTQ · AWQ · EXL2 · SafeTensors | external |
| LocalAI | 8080 | GGUF · GPTQ · SafeTensors · ONNX | external |
| vLLM | 8000 | SafeTensors · GPTQ · AWQ · EXL2 | external |
| Jan.ai | 1337 | GGUF | external |
| GPT4All | 4891 | GGUF | external |