Awesome AI Models Matrix π§
Research-based list of AI models, development tools, and automation resources. Use it to compare releases, pricing, benchmarks, and deployment options from official sources.
Document Version: 1.7
Last Updated: 2026-04-01 20:24 UTC
Verification Scope: README sections reviewed against docs/UPDATE_RULES.md and expanded with official model, software, and browser automation sources
Repository: https://github.com/ReadyPixels/AI_Models_Matrix
Comprehensive documentation of Large Language Models (LLMs), Small Language Models (SLMs), and specialized AI models available today.
State-of-the-art proprietary AI models with cutting-edge capabilities from leading AI labs.
Model
Company
Context
Key Features
Pricing
Claude Opus 4.6
Anthropic
1M
Agent teams, enhanced coding/reasoning
$5 / $25
Claude Sonnet 4.6
Anthropic
1M
Near-Opus performance, Sonnet price
$3 / $15
GPT-5.3-Codex
OpenAI
400K
Agentic coding, 128K output
TBD
Gemini 3.1 Pro
Google
1M
77.1% ARC-AGI-2, 2x reasoning boost
$2 / $12
Gemini 3 Deep Think
Google
1M+
84.6% ARC-AGI-2, science/research
Ultra subscription
GLM-5
Zhipu AI
200K
Agentic engineering, long-horizon tasks
$1.00 / $3.20
MiniMax-M2.5
MiniMax
200K
Coding/refactoring, tool calling, long context
$0.30 / $1.20
Kimi K2.5
Moonshot AI
256K
Native multimodal, thinking & agent tasks
$0.60 / $3.00
DeepSeek-V4
DeepSeek
1M+
Engram memory, coding focus
Pay-per-token
Qwen3.5-Max
Alibaba
128K
Hybrid attention, native VLM
Pay-per-token
Gemini 3 Pro
Google
1M+
PhD-level reasoning, agentic tool-use
Tiered pricing
Gemini 3 Flash
Google
10M
Pro-grade reasoning, Flash speed
$0.30 / $2.50
Gemini 3.1 Flash-Lite
Google
1M
Fast, cost-efficient multimodal model for high-volume tasks
$0.25 / $1.50
GPT-5.4
OpenAI
1M
State-of-the-art coding, computer use, tool search
$2.50 / $15.00
GPT-5.4 mini
OpenAI
400K
Fast GPT-5.4 variant for coding and multimodal tasks
$0.75 / $4.50
Mistral Large 3
Mistral AI
128K
675B params, MoE, Open-weight
Varies
Claude Sonnet 4.5
Anthropic
200K
SWE-bench leader, best coding
$3 / $15
Llama 4 Scout
Meta
10M
Open-weight context king
Free (self-host)
Llama 4 Maverick
Meta
128K
400B params, multimodal
Free (self-host)
Grok 4
xAI
128K
First-principles reasoning
$3 / $15
Grok 4 Fast
xAI
128K
Cost-efficient variant
$0.20 / $1.50
Category
#1
#2
#3
Coding
Claude Opus 4.6
GPT-5.3-Codex
Claude Sonnet 4.5
Reasoning
Gemini 3 Deep Think
Qwen3-Max-Thinking
o3
Open Source
DeepSeek-V4
Qwen3.5-Max
Llama 4
Cost Efficiency
DeepSeek-V3.1
Grok 4 Fast
GLM-4.7-FlashX
Context Window
Gemini 3 Flash (10M)
Llama 4 Scout (10M)
Claude Opus 4.6 (1M)
Self-hostable models with permissive licenses or open weights for privacy, cost control, and customization.
Model
Company
Params
Context
License
DeepSeek-V4
DeepSeek
671B
1M+
MIT
Qwen3.5-Max
Alibaba
1T+
128K
Apache 2.0
Qwen3-Max-Thinking
Alibaba
1T+
128K
Apache 2.0
Mistral Large 3
Mistral AI
675B (MoE)
128K
Apache 2.0
Llama 4 Scout
Meta
109B
10M
Community
Llama 4 Maverick
Meta
400B
128K
Community
GPT-OSS-120B
OpenAI
117B
128K
Apache 2.0
GPT-OSS-20B
OpenAI
21B
128K
Apache 2.0
Qwen3-Coder
Alibaba
480B
128K
Apache 2.0
GLM-4.7
Zhipu AI
400B+ MoE
128K
Open Weight
Phi-4
Microsoft
14B
128K
MIT
Granite 4.0
IBM
8B-3B
128K
Apache 2.0
DeepSeek-Coder-V2
DeepSeek
236B
128K
MIT
Yi-Coder
01.AI
9B/1.5B
128K
Apache 2.0
Local Inference Tools:
Ollama - Easy local deployment
LM Studio - User-friendly GUI
llama.cpp - Efficient CPU inference
vLLM - High-throughput serving
SGLang - Structured generation
Cloud Deployment:
Hugging Face Inference - Managed deployment
AWS SageMaker - Full control
Google Cloud Vertex - Integrated
RunPod - GPU rental
Specialized AI models optimized for software development tasks.
SWE-bench Verified Leaderboard
Rank
Model
Company
Score
π₯ #1
Claude Opus 4.6
Anthropic
SOTA
π₯ #2
GPT-5.3-Codex
OpenAI
Agentic leader
π₯ #3
Claude Sonnet 4.5
Anthropic
~92%
#4
GPT-OSS-120B
OpenAI
91.4% AIME
#5
Kimi K2.5
Moonshot AI
Excellent
Model
Developer
Pricing
Best For
Claude Opus 4.6
Anthropic
$5 / $25 per 1M
Agentic coding, complex tasks
GPT-5.3-Codex
OpenAI
TBD
Agentic coding, 7+ hour autonomy
Claude Haiku 4.5
Anthropic
$1 / $5 per 1M
Low-latency coding, sub-agents, computer use
GLM-5-Code
Zhipu AI
$1.20 / $5.00 per 1M
Code generation, refactoring
MiniMax-M2.5
MiniMax
$0.30 / $1.20 per 1M
Code generation, refactoring
Claude Sonnet 4.5
Anthropic
$3 / $15 per 1M
Code review, refactoring
Codestral
Mistral AI
$0.30 / $0.90
Real-time completion
Grok Code Fast
xAI
$0.20 / $1.50
Most used (50% share)
Open-Source Coding Models
Model
Developer
License
Hardware
GPT-OSS-120B
OpenAI
Apache 2.0
80-160 GB VRAM
Qwen3-Coder
Alibaba
Apache 2.0
160-320 GB VRAM
DeepSeek-Coder-V2
DeepSeek
MIT
48-80 GB VRAM
GLM-4.6
Zhipu AI
Open Weight
80-160 GB VRAM
Phi-4
Microsoft
MIT
24-48 GB VRAM
Models optimized for step-by-step reasoning, mathematical problem-solving, and complex logical inference.
Rank
Model
Score
Notes
π₯ #1
Gemini 3 Deep Think
84.6% ARC-AGI-2
Science/research focus
π₯ #2
Qwen3-Max-Thinking
100%
Perfect AIME score
π₯ #3
GPT-5 Pro (with tools)
100%
With Python tools
#4
GPT-OSS-120B
91.4%
Open-source leader
#5
o3
~96.5%
OpenAI reasoning
#6
DeepSeek-R1
81%
Pure RL-based
Model
Type
Context
Pricing
Gemini 3 Deep Think
Reasoning
1M+
Ultra subscription
Qwen3-Max-Thinking
Reasoning/Coding
128K
$1.20 / $6.00
o3 / o1-Pro
Reasoning
128K
$2-150 / $8-600
Gemini 3 Pro
General/Multimodal
1M+
$2 / $12
DeepSeek-R1
Reasoning
128K
$0.50 / $2.15
Claude Sonnet 4.5
Hybrid
200K
$3 / $15
Mathematical Problem Solving : Qwen3-Max-Thinking, GPT-5 Pro, Gemini 3 Pro
Scientific Analysis : Claude Opus 4.6, GPT-5, Gemini 3 Pro
Strategic Planning : o3/o1-Pro, Claude Sonnet 4.5, DeepSeek-R1
Code Debugging : Claude Sonnet 4.5, GPT-5.3-Codex, DeepSeek-V3.1
Models capable of processing and generating multiple types of content: text, images, audio, and video.
Leading Multimodal Models
Model
Developer
Context
Key Features
GPT-5
OpenAI
400K
Unified multimodal, audio
Gemini 3 Pro
Google
1M+
Native multimodal, video
Claude Sonnet 4.5
Anthropic
200K
Document understanding
Llama 4 Maverick
Meta
128K
Open multimodal
Model
MMMU
MathVista
DocVQA
Gemini 3 Pro
SOTA
SOTA
SOTA
GPT-5
Excellent
Excellent
Excellent
Claude Sonnet 4.5
Strong
Strong
Excellent
Llama 4 Maverick
Good
Good
Good
Model
Speech-to-Text
Text-to-Speech
Video Input
Gemini 3 Pro
β
β
β
GPT-5
β
β
β οΈ
Whisper v3
β
β
β
Model
Developer
License
Best For
Flux.1
Black Forest Labs
Apache 2.0
High-fidelity art
Stable Diffusion 3.5
Stability AI
Community License
Fine-tuning
GLM-Image
Zhipu AI (Z.ai)
API
Fast image generation
CogView-4
Zhipu AI (Z.ai)
API
Creative image generation
Hardware Requirements π₯οΈ
Comprehensive hardware specifications for self-hosting AI models.
Quick Reference by Model Size
Model
Params
Q4 Size
Min VRAM
Rec VRAM
Min RAM
Phi-4
14B
8 GB
24 GB
48 GB
32 GB
GPT-OSS-20B
21B
12 GB
24 GB
48 GB
32 GB
Llama 4 Scout
109B
66 GB
48 GB
80 GB
96 GB
GPT-OSS-120B
117B
70 GB
80 GB
160 GB
128 GB
DeepSeek-Coder-V2
236B
143 GB
48 GB
80 GB
192 GB
Llama 4 Maverick
400B
242 GB
160 GB
320 GB
320 GB
DeepSeek-V4
671B
404 GB
80 GB
320 GB
512 GB
Qwen3-Max-Thinking
1T+
600+ GB
160 GB
640 GB
768 GB
Consumer/Entry Level (24-48 GB VRAM):
Phi-4, GPT-OSS-20B, Yi-Coder, Qwen2.5-Coder
Recommended GPUs : RTX 3090 (24GB), RTX 4090 (24GB)
Professional (80-160 GB VRAM):
Llama 4 Scout, GPT-OSS-120B, DeepSeek-Coder-V2
Recommended GPUs : A100 80GB, 2x A100 40GB
Enterprise (320+ GB VRAM):
Llama 4 Maverick, GLM-4.7, DeepSeek-V4, Qwen3-Max-Thinking
Recommended GPUs : 4x A100 80GB, 8x A100 80GB
Level
Bits
Size vs FP16
Quality
Use Case
FP16/BF16
16
100%
Best
Training
Q8_0
8
~50%
Excellent
High-quality inference
Q4_K_M
4
~25%
Good
Recommended for deployment
Q3_K_M
3
~19%
Fair
Limited resources
Development Tools π οΈ
AI-powered tools for software development, from IDEs and CLI tools to API providers and IDE extensions.
Integrated Development Environments with built-in AI capabilities.
IDE
Platform
Version
Release Date
Pricing
Key Features
GitHub
Firebase Studio
Web
-
-
Free (3 workspaces, up to 30 with Google Developer Program)
Cloud-based, Gemini, MCP
β
Lingma IDE (ιδΉη΅η )
Windows, macOS
-
-
Free (download)
Built-in agent, MCP tool use, terminal command execution
β
Tonkotsu
Windows, macOS
-
-
Free (during early access)
Team of agents, workflow
β
OpenCode
Windows, macOS, Linux
-
-
Free (OSS)
Terminal, desktop, IDE extension, multi-provider
π
Codex app
Windows
-
2026-03-04 00:00 UTC
Included with Codex plans
Multiple agents, isolated worktrees, reviewable diffs, CLI and IDE interop
β
Visual Studio
Windows, macOS
17.14.12+, 18.1.0+
2026-01-06 00:00 UTC
Free / $250/yr
Gemini 3 Flash integration, faster performance, zero-migration upgrades, real-time profiler agent
β
IntelliJ IDEA
Windows, macOS, Linux
2025.3.2
2026-01
Free / $149/yr
Java 24 support, Kotlin K2 mode, performance and memory improvements
β
Editor
Platform
Version
Release Date
Pricing
Key Features
GitHub
Zed
macOS, Windows, Linux
0.226.3
2026-03-03 00:00 UTC
Free (OSS) + Copilot $10/mo
Fast, collaboration, Gemini and Claude, Zeta AI, agent thread history, edit prediction providers, self-hosted OpenAI-compatible servers
π
Dyad
Windows, macOS, Linux
-
-
Free (OSS)
Local generation, BYO keys
π
Memex
macOS, Windows
-
-
Freemium (Free + $10/mo)
Agentic, browserβdesktop
β
IDE
Platform
Version
Release Date
Pricing
Autonomous
MCP
GitHub
Cursor
Windows, macOS, Linux
0.46+
2026-02-12 00:00 UTC
Freemium (Free + Pro $19/mo or $39/mo)
β
β
β
Windsurf
Windows, macOS, Linux
1.9552+
2026-02-12 00:00 UTC
Freemium (Free + Pro)
β
β
β
Trae
macOS, Windows
-
-
Free
β
β
β
PearAI
Windows, macOS, Linux
-
-
Free (OSS)
β
β
β
Void
Windows, macOS, Linux
-
-
Free (OSS)
β
β
β
Kiro
Windows, macOS, Linux
-
-
Free (Preview)
β
β
β
IDE
Platform
Version
Release Date
Pricing
Self-Hostable
Best For
GitHub
Replit 3
Web
-
-
Free Starter, Core $20/mo , Pro $100/mo
β
Learning/Prototyping
β
Bolt.new
Web
-
-
Free, Pro $20-25/mo, Teams $30/user/mo
β
Quick apps
β
Bolt.diy
Self-hosted
-
-
Free (MIT), bring your own API
β
Self-hosted
π
Lovable
Web
-
-
Free (5 credits/day), Pro $25/mo, Business $50/mo
β
UI/Full-stack
β
v0
Web
-
-
Free ($5 credits/mo), Premium $20/mo, Teams $30/user
β
React components
β
Gitpod
Web
-
-
Free + Paid
β
Cloud dev environments
β
Rork
Web
-
-
Free & Paid (credits)
β
Mobile apps (iOS/Android)
β
Google Antigravity
Web
-
-
Google AI Pro / Ultra
Agent-first development with Gemini-powered coding
β
Jules
Web
-
2025-05-20 00:00 UTC
Free beta, higher limits on Google AI Pro / Ultra
Async repo agent, reviewable diffs, GitHub integration
β
Command-line AI tools for autonomous coding and terminal enhancement.
Tool
Platform
Pricing
Key Features
GitHub
Aider
Windows, macOS, Linux
Free
Gold standard, Architect mode, thinking tokens
π
Claude Code 2.1+
macOS, Linux, Windows
Free + API
Fast mode for Opus 4.6, simple mode file editing, Unicode fix
π
Codex CLI
Windows, macOS, Linux
Included
Sandbox, approval modes
π
Goose
Windows, macOS, Linux
Free (Apache-2.0)
MCP, extensible, desktop app, 25+ providers
π
GPT-Pilot
Windows, macOS, Linux
Free
Full dev team simulation
π
OpenHands
Windows, macOS, Linux
Free
Cloud agents, MCP
π
Mentat
Windows, macOS, Linux
Free
Multi-file coordination
π
Tool
Developer
Pricing
Best For
Gemini CLI
Google
Free
Google ecosystem
Cursor CLI
Cursor
Free tier
Terminal + IDE bridge
Qwen Code
Alibaba
Free
Qwen optimization
Qodo CLI
Qodo
Free tier
Testing and review
Tool
Platform
Pricing
Key Features
Warp Terminal
macOS, Linux, Windows
Free
AI Agents, workflow sharing
Fig
macOS, Linux
Free
Autocomplete, AI suggestions
Extensions and plugins that add AI capabilities to existing IDEs.
Universal (Cross-Platform)
Add-on
Platform
Pricing
Context
Best For
GitHub
GitHub Copilot
VS Code, JetBrains, Vim
Free / $10/mo / $39/mo
Large
General coding
β
Supermaven
VS Code, JetBrains, Neovim
Free / $10/mo
1M
Large codebases
β
Codeium
VS Code, JetBrains, Vim
Free / $15/mo / $60/mo
Medium
Free alternative
β
Continue
VS Code, JetBrains
Free (OSS)
Custom
Self-hosted
π
Cody
VS Code, JetBrains, Web
Free (discontinued) / Enterprise Starter $19/mo / Enterprise $59/mo
Enterprise
Code search
π
Tabnine
VS Code, JetBrains, VS, Eclipse
Free / $39/mo
Local
Privacy
β
Add-on
Pricing
Autonomous
MCP
Best For
GitHub
Codex
Free (with ChatGPT Plus $20/mo or Pro $200/mo)
β
β
OpenAI's official coding agent
π
Cline
Free
β
β
Full agent
π
GitHub Copilot (Agent Mode)
$0 / $10 / $39/mo
β οΈ
β
Guided agent workflows
β
RooCode
Free/Pro
β οΈ
β
Complex tasks
β
Keploy
OSS/Enterprise
β
β
Testing
β
Add-on
Pricing
Claude Agent
Best For
JetBrains AI Assistant
$10/mo (Pro), $249/yr (Ultimate)
β
Deep IDE integration
JetBrains Claude Agent
Included in subscription
β
Native agent
Services for accessing AI models via API.
Provider
Models
Pricing
OpenAI
GPT-5.4, GPT-5.4 mini, GPT-5.4 nano, o3, Codex
Pay-per-token
Anthropic
Claude 4.6, Claude Haiku 4.5
Pay-per-token
Google AI Studio
Gemini 3.1 Pro, Gemini 3.1 Flash-Lite, Gemini 3 Flash
Free / Pay
Z.ai (Zhipu AI)
GLM-5, GLM-5-Code, GLM-4.7
Pay-per-token
MiniMax
MiniMax-M2.5/M2.1/M2
Pay-per-token
Cohere
Command, Embed, Rerank
Pay-per-token
AI21 Labs
Jamba
Pay-per-token
Perplexity
Sonar / Sonar Pro / Sonar Reasoning Pro
Pay-per-token + request fees
Moonshot AI
Kimi (kimi-k2.5, kimi-k2-thinking)
Pay-per-token
ByteDance (Volcengine)
Doubao
Pay-per-token
Tencent (Hunyuan)
Hunyuan
Pay-per-token
Baidu (ERNIE)
ERNIE
Pay-per-token
DeepSeek
DeepSeek-V4/R1
Pay-per-token
Mistral AI
Mistral Large 3
Pay-per-token
xAI
Grok-4
Pay-per-token
Unified APIs & Aggregators
Provider
Models
Key Features
OpenRouter
200+
Crypto/fiat, rankings
Hugging Face
Thousands
Serverless inference
Provider
Specialization
Speed
Together AI
Llama/Qwen/Mistral
Fast
Fireworks AI
FireAttention
Low-latency
Groq
LPU
>500 T/s
Cerebras
Wafer-Scale
>2000 T/s
Provider
Type
Best For
RunPod
GPU Rental
Flexibility
Replicate
Model-as-a-Service
Quick deployment
Vultr
Global Cloud
Hourly
Hyperbolic
Decentralized
Crypto/Fiat
AI-powered tools for automating browser and desktop tasks.
Tools and frameworks for AI-powered browser automation.
Browser
Pricing
Open Source
Local AI
Best For
GitHub
Sigma AI Browser
Freemium (Free + Pro $9.99/mo)
β
β
Offline AI, agentic browsing
β
ChatGPT Atlas
Free (with ChatGPT subscription)
β
β
OpenAI integration, macOS
β
Genspark
Freemium (Free + Plus $19.99/mo + Pro $249/mo)
β
β
AI Super Agent, research
β
BrowserOS
Free
β
β
Privacy-focused
β
Brave Leo
Freemium (Free + Premium)
β
β οΈ (Experimental)
Privacy-focused AI
β
Fellou
Freemium (Free for 4 tasks, $19/mo Pro)
β
β
True agentic browser
β
Perplexity Comet
Free (with Pro $20/mo) or $5/mo Standalone
β
β
Research
β
Dia
Freemium (Free limited, $20/mo Pro, Dia+ $50/mo)
β
β
Arc replacement
β
Opera Neon
$19.90/mo
β
β
Agentic browsing
β
Opera One (Aria)
Free
β
β
Built-in AI assistant
β
Edge Copilot
Free (Copilot Pro $20/mo)
β
β
Enterprise AI browser
β
BrowserGPT
Freemium (Free + Premium)
β
β
Mobile-first AI browser (iOS/Android)
β
Arc Max
Free
β
β
AI-enhanced browsing, macOS
β
Extension
Pricing
Free
Multi-Agent
Best For
GitHub
Harpa AI
Free
β
β
Automation recipes
β
MultiOn
Free/Paid
β οΈ
β
Complex tasks
β
NanoBrowser
Free
β
β
Local control
β
Library
Language
Best For
GitHub
Browser-use
Python
Agentic automation
π
Stagehand
TypeScript
Web apps
π
LaVague
Python
NL to code
π
Skyvern
Python
CV-based automation
π
Service
Platform
Pricing
Best For
GitHub
ChatGPT agent
ChatGPT
Plus / Pro / Team
Guided browser tasks, research, forms, and spreadsheets
β
Project Mariner
Google AI Ultra
Included with Google AI Ultra
Multi-step browser tasks, shopping, and reservations
β
Skyvern Cloud
Cloud API
Paid
Resilient automation
π
Browserbase
Cloud API
Paid
Stealth mode, session recording
β
Platforms and runtimes for running or connecting AI agents.
Project
Type
Pricing
Self-Hostable
Best For
Official
OpenClaw
Personal AI assistant
Free (OSS)
β
Always-on assistant across chat channels
π
NanoClaw
Lightweight agent framework
Free (OSS)
β
Containerized agents for WhatsApp, Telegram, Slack, Discord
π
CrewAI
Multi-agent orchestration
Free (OSS) / Enterprise
β
Team-based AI agent workflows
π
AutoGen
Multi-agent framework
Free (OSS)
β
Conversational agent collaboration
π
LangGraph
Agent framework
Free (OSS) / LangSmith paid
β
Stateful, cyclic agent workflows
π
Dify
LLM app platform
Free (OSS) / Cloud plans
β
Visual workflow builder, RAG, agents
π
n8n
Workflow automation
Free (OSS) / Cloud from $20/mo
β
No-code automation with AI agent nodes
π
Flowise
LLM orchestration
Free (OSS)
β
Drag-and-drop LLM flow builder
π
Lindy
AI agent builder
Freemium (Free + Pro $49/mo)
β
No-code AI agents for business tasks
π
Relevance AI
Agent platform
Freemium (Free + Paid plans)
β
Build and deploy AI agents, no-code
π
Moltbook
Agent social network
Free
β
Discovering and pairing with AI agents
π
Managed cloud services for building and deploying AI agents at scale.
Service
Provider
Pricing
Best For
Google Vertex AI Agent Builder
Google Cloud
Pay-per-use
Enterprise agents grounded in Google Search and data stores
Amazon Bedrock Agents
AWS
Pay-per-use
Serverless agents with knowledge bases and guardrails
Azure AI Agent Service
Microsoft Azure
Pay-per-use
Enterprise agents with Azure AI Search and OpenAI integration
Desktop Automation π₯οΈ
AI agents and tools for automating desktop tasks and OS-level interactions.
Agent
Platform
Vision-Based
Cross-Platform
Best For
GitHub
Agent S
Cross-platform
β
β
Research/SOTA
π
Simular Agent S2
Cross-platform
β
β
Latest SOTA, improved grounding
π
Open Interpreter
Cross-platform
β οΈ
β
Natural language computer control, 63K+ stars
π
Bytebot
Linux (Docker)
β
β
Self-hosted
β
UFO
Windows
β
β
Windows automation
π
Open-Interface
Cross-platform
β
β
General use
π
Anthropic Computer Use
API
β
β
Beta capability
β
OpenAI Operator
API
β
β
Guided browser computer use
β
Microsoft Fara-7B
Cross-platform
β
β
Open-weight vision grounding model
π
Amazon Nova Act
API
β
β
AWS browser automation SDK
β
Manus AI
Cloud
β
β
General-purpose cloud agent
β
Tool
Platform
Best For
Ui.Vision RPA
Windows, macOS, Linux
Visual automation
OmniParser V2
Cross-platform
Screen parsing
Tool
Platform
Key Features
GitHub
PyAutoGUI
Cross-platform
Simple API, fail-safe
β
Nut.js
Cross-platform
Visual search, image matching
β
OpenAdapt
Windows, macOS
Learning from demonstration
π
Tutorials, how-tos, and in-depth guides for getting the most out of AI models and tools.
A beginner-friendly introduction to AI models and how to start using them effectively.
Concept
Description
Parameters
Size of model (B = billions). More = more capable
Context Window
How much text model can process (128K standard)
Tokens
Basic units of text (~0.75 words per token)
Method
Best For
Setup Difficulty
Web Interfaces
Quick experiments
Easiest
API Access
Building applications
Easy
Self-Hosting
Privacy, no API costs
Medium-Hard
IDE Integration
Daily coding
Easy
Model Recommendations by Task
Task
Free Option
Premium Option
Chat
Llama 4 (self-hosted)
GPT-5, Claude
Coding
DeepSeek-Coder-V2
Claude Opus 4.6
Reasoning
DeepSeek-R1
Gemini 3 Deep Think, o3
Long docs
Llama 4 Scout
Gemini 3 Flash
Vision
Llama 4 Maverick
GPT-5, Gemini 3
Model Selection Guide π―
A comprehensive guide to choosing the right AI model for your specific needs.
Need
π Free / Self-Host
π Best Quality
β‘ Fast / Autonomous
π» Coding
DeepSeek-Coder-V2
Claude Opus 4.6
GPT-5.3-Codex
π§ Reasoning / Math
DeepSeek-R1
Gemini 3 Deep Think
o3
π¬ General Chat
Llama 4 (self-hosted)
GPT-5, Claude Opus 4.6
Gemini 3 Flash
π¨ Vision
Llama 4 Maverick
GPT-5, Gemini 3 Pro
Gemini 3 Flash
π₯οΈ Self-Hosting
Phi-4
DeepSeek-V4
vLLM / SGLang (serving)
Budget
Options
Free
Self-hosted (Llama 4, Qwen3, Mistral)
$0-10/mo
API entry tiers, Gemini Flash
$10-50/mo
Copilot, Claude API, GPT-5 API
$50+/mo
Heavy usage, multiple models
Self-Hosting Guide π₯οΈ
A comprehensive guide to running AI models on your own hardware.
Benefit
Description
Privacy
Data never leaves your infrastructure
Cost Control
No per-token API costs for unlimited usage
Customization
Fine-tune models for specific needs
No Rate Limits
Process as much as hardware allows
Offline Access
Work without internet
For installation and usage instructions, refer to the official Ollama documentation .
Local GPU Quick Guide (NVIDIA RTX 5090 / Laptop 64 GB RAM)
Recommended apps (local-first):
Ollama - Simple local runtime with a local HTTP API
LM Studio - Desktop UI for downloading and running models locally
llama.cpp - Fast local inference (CPU/GPU), great for quantized models
Open WebUI - Optional local web UI (pairs well with local runtimes)
If you want βserver-styleβ hosting (advanced):
vLLM - High-throughput serving for NVIDIA GPUs
SGLang - Structured generation and serving workflows
Practical setup (works for both desktop and laptop):
Install the latest NVIDIA drivers (enable GPU acceleration in your chosen app)
Start with smaller quantized models (Q4 is a common βbest defaultβ)
Keep context windows realistic for local hardware (lower context = faster, less memory)
Watch VRAM first, then system RAM; reduce model size or quantization if either saturates
Prefer running locally on localhost and only expose to LAN if you understand firewall rules
What fits on your hardware (quick rules):
Hardware
Good starting point
Notes
RTX 5090 desktop GPU
14Bβ70B quantized
Best experience for coding agents and longer contexts
Laptop, 64 GB RAM
7Bβ14B quantized
Great for offline chat/coding; keep context moderate
Option
Best For
Pros
Cons
Local Machine
Personal use
Simple, no latency
Limited hardware
Dedicated Server
Team use
Full control
Maintenance
Cloud GPU Rental
Experimentation
On-demand
Hourly costs
Kubernetes
Enterprise
Scalable
Complex
Comprehensive pricing comparisons and cost calculations.
Tier
Price Range
Models
π Free
$0
Self-hosted, free tiers
π΅ Budget
$0.07 - $0.50/1M
GLM-4.7-FlashX, GLM-4-32B-0414-128K, Yi-Lightning, GPT-5.4 nano, Gemini 3.1 Flash-Lite, DeepSeek-V3.1, MiniMax-M2.5
π° Mid-range
$0.60 - $15.00/1M
GPT-5.4 mini, Claude Haiku 4.5, Kimi K2.5, Sonar, GLM-5, GPT-5.4, Claude Sonnet
π Premium
$15.00 - $600.00/1M
GPT-5.4 Pro, Claude Opus, o1-Pro
Subscription Pricing (Monthly, USD)
AI chat apps
Product
Plans (USD)
Notes
Official Source
ChatGPT
Go $8 , Plus $20 , Pro $200 , Business $25/seat (annual) or $30/seat (monthly), Enterprise (contact sales)
Consumer prices are US-listed; Go is localized in some markets
π
Claude
Pro $20 , Max $100 (5Γ) or $200 (20Γ), Team/Enterprise (see pricing)
Prices shown exclude applicable taxes; availability varies by region
π
Google AI (Gemini)
Plus $7.99 , Pro $19.99 , Ultra $249.99
US pricing; some regions/local pricing differ
π
Coding assistants
Tool
Plans (USD)
Notes
Official Source
GitHub Copilot
Free $0 , Pro $10 , Pro+ $39 , Business $19/user , Enterprise $39/user
Annual options available for Pro/Pro+
π
Model
Input
Output
Best For
GLM-4.7-FlashX
$0.07
$0.40
Fast budget tasks
GLM-4-32B-0414-128K
$0.10
$0.10
Budget chat/coding
GPT-5.4 nano
$0.20
$1.25
Classification and lightweight subagents
Gemini 3.1 Flash-Lite
$0.25
$1.50
High-volume multimodal tasks
DeepSeek-V3.1
$0.27
$0.41
Everything
Gemini 3 Flash
$0.30
$2.50
Long context
MiniMax-M2.5
$0.30
$1.20
Coding, long context
GLM-4.6
$0.60
$2.20
General purpose
Kimi K2.5
$0.60
$3.00
Multimodal + agent tasks
GPT-5.4 mini
$0.75
$4.50
Fast coding and multimodal tasks
Claude Haiku 4.5
$1.00
$5.00
Low-latency coding and sub-agents
GLM-5
$1.00
$3.20
Agentic engineering
Perplexity Sonar
$1.00
$1.00
Web-grounded chat (request fees apply)
Perplexity Sonar Reasoning Pro
$2.00
$8.00
Reasoning + search (request fees apply)
GPT-5.4
$2.50
$15.00
Frontier coding and professional work
Perplexity Sonar Pro
$3.00
$15.00
Higher quality + search (request fees apply)
Claude Sonnet 4.5
$3.00
$15.00
Best coding
Claude Opus 4.6
$5.00
$25.00
Agentic coding
Note: Some search-grounded models charge both token rates and per-request search/context fees. See Perplexityβs official pricing for details: https://docs.perplexity.ai/docs/getting-started/pricing
Self-Hosting vs API (Monthly)
Usage Level
Self-Host (A100)
API (GPT-5)
Winner
Light (1M tokens)
$300 (rental)
$10
API
Medium (100M tokens)
$300
$1,000
Self-host
Heavy (1B tokens)
$300
$10,000
Self-host
Enterprise (10B+ tokens)
$2,000 (owned)
$100,000+
Self-host
Reference materials including glossary, comparison tables, and data sources.
Definitions of common terms used throughout the documentation.
Term
Definition
Agent
AI system that autonomously performs tasks and interacts with environments
API
Interface for programmatically accessing AI models
Attention Mechanism
Neural network component focusing on relevant input parts
Benchmark
Standardized test measuring model performance
Chain-of-Thought (CoT)
Prompting technique showing step-by-step reasoning
Term
Definition
Fine-Tuning
Adapting pre-trained model to specific tasks
Frontier Model
State-of-the-art proprietary model
GPU
Hardware accelerator essential for ML
LLM
Large Language Model
LoRA
Efficient fine-tuning method
Term
Definition
MCP
Model Context Protocol for tool interaction
MMLU
Massive Multitask Language Understanding benchmark
MoE
Mixture of Experts architecture
Multimodal
Processing multiple input types
RAG
Retrieval-Augmented Generation
Term
Definition
Self-Hosting
Running models on own infrastructure
SLM
Small Language Model
SWE-bench
Benchmark for real GitHub issue resolution
Token
Basic unit of text processing
VRAM
GPU memory for model storage
Side-by-side comparisons of AI models sorted by various criteria.
Sort by Latest Update (Default)
π’ Company
π€ Model
π¦ Version
π
Release Date
π Latest Updated
π» Coding
π Benchmarks
π° Price
π₯οΈ Self-Host
π Official Site
π€ OpenAI
GPT-5
5.4 mini
2026-03-17 00:00 UTC
2026-03-17 00:00 UTC β
β
N/A
$0.75 / $4.50
β
π
π€ OpenAI
GPT-5
5.4
2026-03-05 00:00 UTC
2026-03-05 00:00 UTC β
β
N/A
$2.50 / $15.00
β
π
π Google DeepMind
Gemini 3.1
Flash-Lite
2026-03-03 00:00 UTC
2026-03-03 00:00 UTC β
β
N/A
$0.25 / $1.50
β
π
π¬ DeepSeek
DeepSeek
V4
2026-02-17 00:00 UTC
2026-02-17 00:00 UTC
β
N/A
Pay-per-token
β
π
π Google DeepMind
Gemini 3
Deep Think
2026-02-12 00:00 UTC
2026-02-12 00:00 UTC β
β
84.6% ARC-AGI-2
Ultra subscription
β
π
π¨π³ Zhipu AI
GLM
5
2026-02-12 00:00 UTC
2026-02-12 00:00 UTC β
β
SWE-bench 77.8
$1.00 / $3.20
β
π
π€ Anthropic
Claude
Opus 4.6
2026-02-05 00:00 UTC
2026-02-05 00:00 UTC β
β
SWE-bench SOTA
$5 / $25
β
π
π€ OpenAI
GPT-5
5.3-Codex
2026-02-05 00:00 UTC
2026-02-05 00:00 UTC β
β
Agentic leader
TBD
β
π
π Moonshot AI
Kimi
K2.5
2026-01-29 00:00 UTC
2026-02-02 00:00 UTC β
β
N/A
$0.60 / $3.00
β
π
Release Windows (Month-level)
π’ Company
π€ Model
π
Release Window
Notes
π Official Site
π§ MiniMax
MiniMax M2.5
2026-02
$0.30 / $1.20
π
π¨π³ Alibaba/Qwen
Qwen 3.5-Max
2026-02
Open-source release window
π
π Google DeepMind
Gemini 3.1 Flash-Lite
2026-03
$0.25 / $1.50
π
π Google DeepMind
Gemini 3 Pro
2026-01
Tiered pricing
π
π€ OpenAI
GPT-5.4 family
2026-03
GPT-5.4, GPT-5.4 mini, GPT-5.4 nano
π
π» Mistral AI
Mistral Large 3
2026-01
Open-weight
π
Rank
Model
Input
Output
License
1
Self-hosted
$0
$0
Various
2
GLM-4.7-Flash
$0
$0
Free
3
GLM-4.7-FlashX
$0.07
$0.40
API
4
GLM-4-32B-0414-128K
$0.10
$0.10
API
5
Yi-Lightning
$0.14
$0.42
Apache 2.0
6
GPT-5.4 nano
$0.20
$1.25
Proprietary
7
Gemini 3.1 Flash-Lite
$0.25
$1.50
Proprietary
8
DeepSeek-V3.1
$0.27
$0.41
MIT
9
Gemini 3 Flash
$0.30
$2.50
Proprietary
10
MiniMax-M2.5
$0.30
$1.20
Proprietary
Sort by Performance (Coding)
Rank
Model
HumanEval
Self-Host
1
Claude Sonnet 4.5
~92%
β
2
GPT-OSS-120B
~89%
β
3
DeepSeek-Coder-V2
~92%
β
4
Qwen3-Coder
~92%
β
5
DeepSeek-V3.1
82%+
β
Rank
Model
Context
Best For
1
Gemini 3 Flash
10M
Entire libraries
2
Llama 4 Scout
10M
Long-document RAG
3
Gemini 3 Pro
1M+
Research papers
4
Kimi K2.5
256K
Large codebases
Attribution, verification sources, and methodology.
Benchmark
Source
Description
HumanEval
OpenAI
164 Python programming problems
SWE-bench
Princeton
Real GitHub issue resolution
MMLU
UC Berkeley
57 subjects, multi-task
AIME
MAA
American Invitational Math Exam
ARC-AGI
ARC Prize
Abstract reasoning challenge
Primary Source Review - Check official documentation
Cross-Validation - Compare multiple sources
Timestamp Verification - All data includes verification date
Update Tracking - Monitor official channels
This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) License - see the LICENSE file for details.
Last Updated: 2026-04-01 20:24 UTC
Maintained by: ReadyPixels LLC
Made with β€οΈ by ReadyPixels LLC