Skip to main content

How Teale Works

Teale is a decentralized AI inference network that turns Apple Silicon Macs into compute nodes and connects them peer-to-peer with zero central servers.

The big picture

Every Teale node can run inference locally, share it with nearby devices over LAN, or serve it across the internet to anyone in the network. There is no cloud backend. The relay server exists only for discovery and signaling --- it never sees your prompts or completions.

  • Device = supply. Any supported device (Mac, Windows, iOS, Android) runs inference and earns Teale Credits for availability and for serving requests.
  • iPhone = demand. iPhones consume inference from local models or remote Macs.
  • No central storage. Conversations, wallets, and keys live on-device. Nothing is stored on a server.

The InferenceProvider chain

When you send a prompt, Teale tries to handle it as close to you as possible. The request flows through a chain of providers, each implementing the same InferenceProvider protocol, until one handles it:

Request
|
v
[MLXProvider] ---- On-device Apple MLX inference
| (model not loaded or throttled)
v
[LlamaCppProvider] ---- On-device llama.cpp subprocess
| (no local capacity)
v
[ClusterProvider] ---- LAN peer with model loaded and lower load
| (no LAN peers available)
v
[WANProvider] ---- WAN peer via relay or direct QUIC
| (complex multi-model request)
v
[Compiler] ---- Mixture of Models: decompose across multiple models

Each provider either handles the request and returns a streaming response, or passes it down the chain. The caller sees the same interface regardless of where inference actually runs.

Module architecture

Teale is built as 24 Swift modules in a single Swift Package Manager workspace, plus a Rust cross-platform binary (teale-node) for Linux, Windows, and Android.

Core modules

ModulePurpose
SharedTypesProtocols, API types, hardware types. Zero dependencies.
HardwareProfileChip/RAM/GPU detection, thermal and power monitors
InferenceEngineProvider-agnostic engine manager and adaptive throttler
MLXInferenceApple MLX wrapper, HuggingFace downloader, tokenizer adapter
LlamaCppKitllama.cpp subprocess management via HTTP, GGUF support
ModelManagerModel catalog, cache, and download service

Networking modules

ModulePurpose
ClusterKitLAN discovery (Bonjour), NWConnection transport, routing
WANKitWAN P2P via QUIC, STUN/NAT traversal, relay signaling
TealeNetKitPrivate TealeNet (PTN) certificate authority and membership

Economy and identity modules

ModulePurpose
CreditKitCredit economy, pricing, local ledger, wallet, analytics
WalletKitSolana/USDC settlement, BIP39 key generation, deposit/withdrawal
AuthKitSupabase auth, device management, Sign in with Apple + Phone OTP

Intelligence modules

ModulePurpose
CompilerKitMixture of Models request compilation and fan-out execution
AgentKitAgent-to-agent protocol, negotiation, directory
ChatKitEncrypted group chat, message sync, tool connections

Interface modules

ModulePurpose
LocalAPIHummingbird HTTP server, OpenAI-compatible endpoints at localhost:11435
AppCoreShared app logic between macOS and iOS targets
TealeSDKEmbeddable SDK for third-party apps
TealeSDKUIPre-built SwiftUI components for TealeSDK
InferencePoolAppmacOS MenuBarExtra app (executable target)
TealeCompanioniOS companion app (executable target)
TealeCLICommand-line interface

Network tiers

Teale organizes connections into four tiers, keeping traffic as close to the user as possible:

  1. Local --- On-device inference via MLX or llama.cpp. Zero latency, zero cost.
  2. LAN --- Peers on the same local network, discovered via Bonjour/mDNS. Sub-millisecond latency, no internet required.
  3. PTN (Private TealeNet) --- A private subnet of trusted nodes with CA-signed certificates. Gets 70% of scheduling priority.
  4. WWTN (Wider World Teale Network) --- The public network of all Teale nodes. Market-priced via reverse auction.

Key design decisions

Protocol-first architecture. The InferenceProvider protocol is the single abstraction that every backend implements. Adding a new inference source (CoreML, remote API, hardware accelerator) means implementing one protocol.

Length-prefixed JSON over TCP. All inter-node messages use a custom NWProtocolFramer that length-prefixes JSON payloads over persistent TCP connections. Simple, debuggable, and efficient.

No tokens, no speculation. The economy runs on USDC stablecoins. Providers earn 95% of the inference cost. There is no native token and no speculative element.