Skip to main content

Supported Models

Curated model catalog with sizes, RAM requirements, and HuggingFace repositories.

Model Catalog

All models are 4-bit quantized (q4) for the MLX framework. Rankings reflect network demand -- lower rank means more frequently requested.

RankModelParametersQuantizationSizeRAM RequiredHuggingFace Repo
1Llama 3.1 8B Instruct8Bq44.5 GB10 GBmlx-community/Meta-Llama-3.1-8B-Instruct-4bit
2Qwen 3 8B8Bq44.5 GB10 GBmlx-community/Qwen3-8B-4bit
3Gemma 3 4B Instruct4Bq42.5 GB6 GBmlx-community/gemma-3-4b-it-qat-4bit
4Llama 3.2 3B Instruct3Bq41.8 GB6 GBmlx-community/Llama-3.2-3B-Instruct-4bit
5Llama 3.2 1B Instruct1Bq40.7 GB4 GBmlx-community/Llama-3.2-1B-Instruct-4bit
6Phi 414Bq48.0 GB14 GBmlx-community/phi-4-4bit
7Mistral Small 24B24Bq413.0 GB20 GBmlx-community/Mistral-Small-24B-Instruct-2501-4bit
8Gemma 3 27B Instruct27Bq415.0 GB24 GBmlx-community/gemma-3-27b-it-qat-4bit
9Qwen 3 32B32Bq418.0 GB28 GBmlx-community/Qwen3-32B-4bit
10Llama 4 Scout 109B (MoE)109Bq456.0 GB72 GBmlx-community/Llama-4-Scout-17Bx16E-Instruct-4bit

Model Families

FamilyProviderModels
LlamaMeta1B, 3B, 8B, 109B MoE
QwenAlibaba8B, 32B
GemmaGoogle4B, 27B
PhiMicrosoft14B
MistralMistral AI24B

RAM Recommendations

Available RAMRecommended Models
4-6 GBLlama 3.2 1B
6-10 GBLlama 3.2 3B, Gemma 3 4B
10-14 GBLlama 3.1 8B, Qwen 3 8B
14-20 GBPhi 4 14B
20-28 GBMistral Small 24B, Gemma 3 27B
28-64 GBQwen 3 32B
64+ GBLlama 4 Scout 109B MoE

Model Selection

Teale automatically selects models based on available hardware. The ModelCatalog provides:

  • availableModels(for: hardware) -- all models that fit in available RAM
  • topModels(for: hardware, limit: 3) -- the most popular models that fit, sorted by demand ranking

GGUF Models (Cross-Platform)

On non-Apple platforms, Teale uses llama.cpp with GGUF-format models instead of MLX. GGUF models are available from the same HuggingFace repositories in GGUF format (e.g., Qwen/Qwen3-8B-GGUF).

The GGUF backend supports:

  • NVIDIA GPUs via CUDA
  • AMD GPUs via ROCm
  • CPU-only inference on any platform
  • Vulkan for cross-platform GPU acceleration

Model sizes and RAM requirements are similar between MLX and GGUF at the same quantization level.

Quantization Levels

LevelDescriptionSize MultiplierQuality
q44-bit quantization1.0x (baseline)Good for most tasks
q88-bit quantization~2xBetter quality, more RAM
fp16Half-precision float~4xFull quality, maximum RAM

The catalog uses q4 by default for the best balance of quality and memory efficiency. Higher quantizations are available by specifying alternate HuggingFace repos.