Llama 3.1 8B Instant on Groq
Ultra-cheap Groq text model for high-volume chat, classification, and routing.
- Input
- $0.05 / 1M tokens
- Output
- $0.08 / 1M tokens
- Cached read
- — / 1M tokens
- Cached write
- — / 1M tokens
- Batch discount
- —%
- Source
- Llama 3.1 8B Instant on Groq pricing
- Verified
- Apr 2, 2026 (High)
Capabilities
- Modalities
- text→text
- Capabilities
- batchSupportpromptCachingfunctionCallingstructuredOutputs
- Strengths
- Lowest Groq cost, Very fast
- Tradeoffs
- Smallest model in current Groq compare set
Official Links
Benchmark Coverage
| Benchmark | Version | Score | Date | Source | Notes |
|---|
Release History
| Release | Alias | Lifecycle | Release Date | Deprecation | Shutdown | Summary |
|---|---|---|---|---|---|---|
| Llama 3.1 8B Instant on Groq | groq-llama-3-1-8b-instant | Active | Sep 1, 2024 | — | — | Current published model family snapshot. |
Host Coverage
| Host | Type | Context | Pricing Note | Differences |
|---|---|---|---|---|
| Groq API | first-party | 131.1K | Reference production Groq pricing. | Production model tier |
Migration Guidance
Default Groq budget tier for simple generation, routing, and classification.
Replacement models: groq-llama-3-3-70b-versatile
Change Events
| Date | Type | Title | Description | Source |
|---|---|---|---|---|
| Sep 1, 2024 | family_added | Llama 3.1 8B Instant on Groq published | Initial public model family launch. | Llama 3.1 8B Instant on Groq release notes |
Other models from Groq
GPT-OSS 120B on Groq, GPT-OSS 20B on Groq, Groq Compound, Llama 3.3 70B Versatile on Groq, Llama 4 Scout on Groq, Qwen3 32B on Groq