Compare · AI Inference

What's on offer.

APIs for running AI and machine learning model inference.

Offerings

50 offerings on this page with service context, pricing, regions, and links.

Google GeminiService details

Offering

Gemma 4 26B A4B

Offering details

Pay-as-you-go$0.130 1M input tokens0 regions

Context Window: 262144 tokens; Input Modalities: image, text, video

Documentation Live status

Google GeminiService details

Offering

Gemma 4 31B

Offering details

Pay-as-you-go$0.140 1M input tokens0 regions

Context Window: 262144 tokens; Input Modalities: image, text, video

Documentation Live status

Google GeminiService details

Offering

Lyria 3 Clip Preview

Offering details

FreePrice pending0 regions

Context Window: 1048576 tokens; Input Modalities: text, image

Documentation Live status

Google GeminiService details

Offering

Lyria 3 Pro Preview

Offering details

FreePrice pending0 regions

Context Window: 1048576 tokens; Input Modalities: text, image

Documentation Live status

Google WorkspaceService details

Offering

Gemini for Google Workspace

Offering details

Subscription$20 per user/month (Gemini Business add-on)0 regions

Gmail AI: Draft, summarize, reply; Docs AI: Write, rewrite, proofread; +2 more

Documentation Live status

GrammarlyService details

Offering

Grammarly AI Writing Assistant

Offering details

FreemiumFree0 regions

Real-time Suggestions: true; Generative AI Writing: true; +3 more

Documentation Live status

GrammarlyService details

Offering

Grammarly Business

Offering details

Subscription$15 per member/month (billed annually, minimum 3 seats)0 regions

Style Guide: Company style guide enforcement; Brand Tone: Custom tone settings; +2 more

Documentation Live status

GrammarlyService details

Offering

Grammarly GO (AI Writing Features)

Offering details

Subscription$12 per month (Pro plan, billed annually)0 regions

Generation: Draft emails and documents; Rewriting: Full-paragraph rewrites; +2 more

Documentation Live status

GraphcoreService details

Offering

Graphcore Poplar SDK

Offering details

FreeFree0 regions

Compiler: Poplar Graph Compiler; PyTorch Integration: PopTorch; +2 more

Documentation Live status

GroqService details

Offering

Groq Compound

Offering details

CustomPrice pending0 regions

Context Window: 131072 tokens; Input Modalities: text

Documentation Live status

GroqService details

Offering

GPT-OSS 120B on Groq

Offering details

Pay-as-you-go$0.150 1M input tokens0 regions

Context Window: 131072 tokens; Input Modalities: text

Documentation Live status

GroqService details

Offering

GPT-OSS 20B on Groq

Offering details

Pay-as-you-go$0.075 1M input tokens0 regions

Context Window: 131072 tokens; Input Modalities: text

Documentation Live status

GroqService details

Offering

Groq LPU AI Inference API

Offering details

Usage-basedFree2 regions

Inference Speed: 500+ tokens/second; Latency: <1ms per token; +3 more

Documentation Live status

GroqService details

Offering

Groq LLaMA Inference

Offering details

Usage-based$0.0001 per 1K input tokens (LLaMA 3 8B)0 regions

Inference Speed: 750+ tokens/sec; Models Available: LLaMA 3, Mixtral, Gemma, Whisper; +3 more

Documentation Live status

GroqService details

Offering

Llama 3.1 8B Instant on Groq

Offering details

Pay-as-you-go$0.050 1M input tokens0 regions

Context Window: 131072 tokens; Input Modalities: text

Documentation Live status

GroqService details

Offering

Llama 3.3 70B Versatile on Groq

Offering details

Pay-as-you-go$0.590 1M input tokens0 regions

Context Window: 131072 tokens; Input Modalities: text

Documentation Live status

GroqService details

Offering

Llama 4 Scout on Groq

Offering details

Pay-as-you-go$0.110 1M input tokens0 regions

Context Window: 131072 tokens; Input Modalities: text, image

Documentation Live status

GroqService details

Offering

Groq Mixtral Inference

Offering details

Usage-based$0.0002 per 1K input tokens0 regions

Architecture: Mixture of Experts 8x7B; Speed: 500+ tokens/sec; +3 more

Documentation Live status

GroqService details

Offering

Qwen3 32B on Groq

Offering details

Pay-as-you-go$0.290 1M input tokens0 regions

Context Window: 131072 tokens; Input Modalities: text

Documentation Live status

H2O.aiService details

Offering

H2O.ai Model Deployment (Driverless AI + MLOps)

Offering details

SubscriptionFree2 regions

Scoring: REST API + batch; Champion-Challenger: A/B traffic splitting; +2 more

Documentation Live status

Hailuo AIService details

Offering

Hailuo AI MiniMax API

Offering details

Usage-based$0.0002 per 1K input tokens0 regions

MiniMax-Text-01: 456B parameter model; Multimodal: Text, image, audio, video; +3 more

Documentation Live status

HeliconeService details

Offering

Helicone - AI Gateway and Observability

Offering details

FreemiumFree0 regions

Provider Support: OpenAI, Anthropic, Azure, Gemini, 30+; Request Logging: Full request/response capture; +3 more

Documentation Live status

Hugging FaceService details

Offering

Hugging Face Inference Endpoints

Offering details

Usage-based$0.032 per hour (CPU)4 regions

One-Click Deployment: true; Auto-scaling: true; +3 more

Documentation Live status

HumanloopService details

Offering

Humanloop LLM Development Platform

Offering details

FreemiumFree0 regions

Prompt Management: true; Evaluation Framework: true; +3 more

Documentation Live status

HyperWriteService details

Offering

HyperWrite AI Models

Offering details

Subscription$19.99 per month (Premium)0 regions

Multi-Model Access: GPT-4, Claude, proprietary; Intelligent Routing: Auto model selection; +3 more

Documentation Live status

IBM CloudService details

Offering

IBM Cloud - Watson Machine Learning

Offering details

Usage-basedFree5 regions

Model Serving: Online + batch scoring; AutoAI: Automated ML pipeline; +3 more

Documentation Live status

IBM ResearchService details

Offering

Granite 4.0 Micro

Offering details

Pay-as-you-go$0.017 1M input tokens0 regions

Context Window: 131000 tokens; Input Modalities: text

Documentation Live status

IBM watsonxService details

Offering

IBM watsonx.ai Foundation Models

Offering details

Usage-based$0.0001 per 1K tokens (Granite 3B)5 regions

Granite Models: 3B–20B parameters; Prompt Engineering: true; +3 more

Documentation Live status

Inception LabsService details

Offering

Mercury

Offering details

Pay-as-you-go$0.250 1M input tokens0 regions

Context Window: 128000 tokens; Input Modalities: text

Documentation Live status

Inception LabsService details

Offering

Mercury 2

Offering details

Pay-as-you-go$0.250 1M input tokens0 regions

Context Window: 128000 tokens; Input Modalities: text

Documentation Live status

Inception LabsService details

Offering

Mercury Coder

Offering details

Pay-as-you-go$0.250 1M input tokens0 regions

Context Window: 128000 tokens; Input Modalities: text

Documentation Live status

Insilico MedicineService details

Offering

Insilico Medicine PandaOmics & Chemistry42

Offering details

EnterpriseFree0 regions

PandaOmics: AI target discovery; Chemistry42: Generative drug design; +3 more

Documentation Live status

JasperService details

Offering

Jasper AI Marketing Content Platform

Offering details

Subscription$39 per month0 regions

Brand Voice: true; Long-form Content: true; +3 more

Documentation Live status

KagiService details

Offering

Kagi - AI Search Summarization

Offering details

Subscription$5 per month0 regions

FastGPT: Instant AI answers; Universal Summarizer: Any URL summarization; +2 more

Documentation Live status

Khan AcademyService details

Offering

Khanmigo - AI Tutoring Assistant

Offering details

Subscription$4 per month0 regions

Socratic Tutoring: Question-based guidance; Subject Coverage: Math, science, humanities; +2 more

Documentation Live status

Kling AIService details

Offering

Kling AI - Video & Image Generation API

Offering details

Usage-based$0.140 per video generation0 regions

Video API: 5-10 second clips, 1080p; Async Processing: Job queue + webhooks; +2 more

Documentation Live status

LambdaService details

Offering

Lambda - GPU Cloud AI Inference

Offering details

Usage-based$0.500 per hour (A10 GPU)5 regions

GPU Options: H100, A100, A10, V100; Persistent Storage: Attached storage volumes; +3 more

Documentation Live status

LanceDBService details

Offering

LanceDB RAG & AI Application Backend

Offering details

FreemiumFree3 regions

RAG-Optimized Search: Hybrid ANN + BM25 retrieval; LangChain Integration: Native LangChain vector store; +3 more

Documentation Live status

Lepton AIService details

Offering

Lepton AI Inference & Deployment Platform

Offering details

Usage-based$0.0003 per 1K tokens (Llama 3 8B)2 regions

Python SDK: true; OpenAI-Compatible API: true; +3 more

Documentation Live status

Lightning AIService details

Offering

Lightning AI Inference & Deployment

Offering details

Usage-basedFree3 regions

PyTorch-Native Deployment: Lightning App + PyTorch Serve; Automatic Batching: Dynamic request batching; +3 more

Documentation Live status

Liquid AIService details

Offering

LFM2-24B-A2B

Offering details

Pay-as-you-go$0.030 1M input tokens0 regions

Context Window: 32768 tokens; Input Modalities: text

Documentation Live status

Liquid AIService details

Offering

LFM2.5-1.2B-Instruct

Offering details

FreePrice pending0 regions

Context Window: 32768 tokens; Input Modalities: text

Documentation Live status

Liquid AIService details

Offering

LFM2.5-1.2B-Thinking

Offering details

FreePrice pending0 regions

Context Window: 32768 tokens; Input Modalities: text

Documentation Live status

Llama APIService details

Offering

Meta Llama API

Offering details

Usage-based$0.0002 per 1K tokens (Llama 3.1 8B)2 regions

Model Access: Llama 3.1 & 3.2; Multimodal: Llama 3.2 Vision; +3 more

Documentation Live status

Luma AIService details

Offering

Luma AI API (Dream Machine)

Offering details

Usage-based$0.140 per video generation (5-second clip)0 regions

Dream Machine API: Ray 2 model; Image-to-Video: Animate any image; +3 more

Documentation Live status

Meta AIService details

Offering

Meta AI & Llama Model Family

Offering details

Open sourceFree0 regions

Llama 3.1 405B: true; Multimodal (Llama 3.2): true; +3 more

Documentation Live status

Meta AIService details

Offering

Llama 3.1 405B

Offering details

Pay-as-you-go$3.50 1M input tokens0 regions

Context Window: 128000 tokens; Input Modalities: text

Documentation Live status

Meta AIService details

Offering

Llama 3.1 70B

Offering details

Pay-as-you-go$0.590 1M input tokens0 regions

Context Window: 128000 tokens; Input Modalities: text

Documentation Live status

Meta AIService details

Offering

Llama 3.1 70B Instruct

Offering details

Pay-as-you-go$0.400 1M input tokens0 regions

Context Window: 131072 tokens; Input Modalities: text

Documentation Live status

Meta AIService details

Offering

Llama 3.1 8B

Offering details

Pay-as-you-go$0.050 1M input tokens0 regions

Context Window: 128000 tokens; Input Modalities: text

Documentation Live status

Offering rows for AI Inference
Service	Offering	Pricing model	Starting price	Regions	Features	Links
Google GeminiService details	Gemma 4 26B A4B Offering details	Pay-as-you-go	$0.130 1M input tokens	0	Context Window: 262144 tokens; Input Modalities: image, text, video	Documentation Live status
Google GeminiService details	Gemma 4 31B Offering details	Pay-as-you-go	$0.140 1M input tokens	0	Context Window: 262144 tokens; Input Modalities: image, text, video	Documentation Live status
Google GeminiService details	Lyria 3 Clip Preview Offering details	Free	—	0	Context Window: 1048576 tokens; Input Modalities: text, image	Documentation Live status
Google GeminiService details	Lyria 3 Pro Preview Offering details	Free	—	0	Context Window: 1048576 tokens; Input Modalities: text, image	Documentation Live status
Google WorkspaceService details	Gemini for Google Workspace Offering details	Subscription	$20 per user/month (Gemini Business add-on)	0	Gmail AI: Draft, summarize, reply; Docs AI: Write, rewrite, proofread; +2 more	Documentation Live status
GrammarlyService details	Grammarly AI Writing Assistant Offering details	Freemium	Free	0	Real-time Suggestions: true; Generative AI Writing: true; +3 more	Documentation Live status
GrammarlyService details	Grammarly Business Offering details	Subscription	$15 per member/month (billed annually, minimum 3 seats)	0	Style Guide: Company style guide enforcement; Brand Tone: Custom tone settings; +2 more	Documentation Live status
GrammarlyService details	Grammarly GO (AI Writing Features) Offering details	Subscription	$12 per month (Pro plan, billed annually)	0	Generation: Draft emails and documents; Rewriting: Full-paragraph rewrites; +2 more	Documentation Live status
GraphcoreService details	Graphcore Poplar SDK Offering details	Free	Free	0	Compiler: Poplar Graph Compiler; PyTorch Integration: PopTorch; +2 more	Documentation Live status
GroqService details	Groq Compound Offering details	Custom	—	0	Context Window: 131072 tokens; Input Modalities: text	Documentation Live status
GroqService details	GPT-OSS 120B on Groq Offering details	Pay-as-you-go	$0.150 1M input tokens	0	Context Window: 131072 tokens; Input Modalities: text	Documentation Live status
GroqService details	GPT-OSS 20B on Groq Offering details	Pay-as-you-go	$0.075 1M input tokens	0	Context Window: 131072 tokens; Input Modalities: text	Documentation Live status
GroqService details	Groq LPU AI Inference API Offering details	Usage-based	Free	2	Inference Speed: 500+ tokens/second; Latency: <1ms per token; +3 more	Documentation Live status
GroqService details	Groq LLaMA Inference Offering details	Usage-based	$0.0001 per 1K input tokens (LLaMA 3 8B)	0	Inference Speed: 750+ tokens/sec; Models Available: LLaMA 3, Mixtral, Gemma, Whisper; +3 more	Documentation Live status
GroqService details	Llama 3.1 8B Instant on Groq Offering details	Pay-as-you-go	$0.050 1M input tokens	0	Context Window: 131072 tokens; Input Modalities: text	Documentation Live status
GroqService details	Llama 3.3 70B Versatile on Groq Offering details	Pay-as-you-go	$0.590 1M input tokens	0	Context Window: 131072 tokens; Input Modalities: text	Documentation Live status
GroqService details	Llama 4 Scout on Groq Offering details	Pay-as-you-go	$0.110 1M input tokens	0	Context Window: 131072 tokens; Input Modalities: text, image	Documentation Live status
GroqService details	Groq Mixtral Inference Offering details	Usage-based	$0.0002 per 1K input tokens	0	Architecture: Mixture of Experts 8x7B; Speed: 500+ tokens/sec; +3 more	Documentation Live status
GroqService details	Qwen3 32B on Groq Offering details	Pay-as-you-go	$0.290 1M input tokens	0	Context Window: 131072 tokens; Input Modalities: text	Documentation Live status
H2O.aiService details	H2O.ai Model Deployment (Driverless AI + MLOps) Offering details	Subscription	Free	2	Scoring: REST API + batch; Champion-Challenger: A/B traffic splitting; +2 more	Documentation Live status
Hailuo AIService details	Hailuo AI MiniMax API Offering details	Usage-based	$0.0002 per 1K input tokens	0	MiniMax-Text-01: 456B parameter model; Multimodal: Text, image, audio, video; +3 more	Documentation Live status
HeliconeService details	Helicone - AI Gateway and Observability Offering details	Freemium	Free	0	Provider Support: OpenAI, Anthropic, Azure, Gemini, 30+; Request Logging: Full request/response capture; +3 more	Documentation Live status
Hugging FaceService details	Hugging Face Inference Endpoints Offering details	Usage-based	$0.032 per hour (CPU)	4	One-Click Deployment: true; Auto-scaling: true; +3 more	Documentation Live status
HumanloopService details	Humanloop LLM Development Platform Offering details	Freemium	Free	0	Prompt Management: true; Evaluation Framework: true; +3 more	Documentation Live status
HyperWriteService details	HyperWrite AI Models Offering details	Subscription	$19.99 per month (Premium)	0	Multi-Model Access: GPT-4, Claude, proprietary; Intelligent Routing: Auto model selection; +3 more	Documentation Live status
IBM CloudService details	IBM Cloud - Watson Machine Learning Offering details	Usage-based	Free	5	Model Serving: Online + batch scoring; AutoAI: Automated ML pipeline; +3 more	Documentation Live status
IBM ResearchService details	Granite 4.0 Micro Offering details	Pay-as-you-go	$0.017 1M input tokens	0	Context Window: 131000 tokens; Input Modalities: text	Documentation Live status
IBM watsonxService details	IBM watsonx.ai Foundation Models Offering details	Usage-based	$0.0001 per 1K tokens (Granite 3B)	5	Granite Models: 3B–20B parameters; Prompt Engineering: true; +3 more	Documentation Live status
Inception LabsService details	Mercury Offering details	Pay-as-you-go	$0.250 1M input tokens	0	Context Window: 128000 tokens; Input Modalities: text	Documentation Live status
Inception LabsService details	Mercury 2 Offering details	Pay-as-you-go	$0.250 1M input tokens	0	Context Window: 128000 tokens; Input Modalities: text	Documentation Live status
Inception LabsService details	Mercury Coder Offering details	Pay-as-you-go	$0.250 1M input tokens	0	Context Window: 128000 tokens; Input Modalities: text	Documentation Live status
Insilico MedicineService details	Insilico Medicine PandaOmics & Chemistry42 Offering details	Enterprise	Free	0	PandaOmics: AI target discovery; Chemistry42: Generative drug design; +3 more	Documentation Live status
JasperService details	Jasper AI Marketing Content Platform Offering details	Subscription	$39 per month	0	Brand Voice: true; Long-form Content: true; +3 more	Documentation Live status
KagiService details	Kagi - AI Search Summarization Offering details	Subscription	$5 per month	0	FastGPT: Instant AI answers; Universal Summarizer: Any URL summarization; +2 more	Documentation Live status
Khan AcademyService details	Khanmigo - AI Tutoring Assistant Offering details	Subscription	$4 per month	0	Socratic Tutoring: Question-based guidance; Subject Coverage: Math, science, humanities; +2 more	Documentation Live status
Kling AIService details	Kling AI - Video & Image Generation API Offering details	Usage-based	$0.140 per video generation	0	Video API: 5-10 second clips, 1080p; Async Processing: Job queue + webhooks; +2 more	Documentation Live status
LambdaService details	Lambda - GPU Cloud AI Inference Offering details	Usage-based	$0.500 per hour (A10 GPU)	5	GPU Options: H100, A100, A10, V100; Persistent Storage: Attached storage volumes; +3 more	Documentation Live status
LanceDBService details	LanceDB RAG & AI Application Backend Offering details	Freemium	Free	3	RAG-Optimized Search: Hybrid ANN + BM25 retrieval; LangChain Integration: Native LangChain vector store; +3 more	Documentation Live status
Lepton AIService details	Lepton AI Inference & Deployment Platform Offering details	Usage-based	$0.0003 per 1K tokens (Llama 3 8B)	2	Python SDK: true; OpenAI-Compatible API: true; +3 more	Documentation Live status
Lightning AIService details	Lightning AI Inference & Deployment Offering details	Usage-based	Free	3	PyTorch-Native Deployment: Lightning App + PyTorch Serve; Automatic Batching: Dynamic request batching; +3 more	Documentation Live status
Liquid AIService details	LFM2-24B-A2B Offering details	Pay-as-you-go	$0.030 1M input tokens	0	Context Window: 32768 tokens; Input Modalities: text	Documentation Live status
Liquid AIService details	LFM2.5-1.2B-Instruct Offering details	Free	—	0	Context Window: 32768 tokens; Input Modalities: text	Documentation Live status
Liquid AIService details	LFM2.5-1.2B-Thinking Offering details	Free	—	0	Context Window: 32768 tokens; Input Modalities: text	Documentation Live status
Llama APIService details	Meta Llama API Offering details	Usage-based	$0.0002 per 1K tokens (Llama 3.1 8B)	2	Model Access: Llama 3.1 & 3.2; Multimodal: Llama 3.2 Vision; +3 more	Documentation Live status
Luma AIService details	Luma AI API (Dream Machine) Offering details	Usage-based	$0.140 per video generation (5-second clip)	0	Dream Machine API: Ray 2 model; Image-to-Video: Animate any image; +3 more	Documentation Live status
Meta AIService details	Meta AI & Llama Model Family Offering details	Open source	Free	0	Llama 3.1 405B: true; Multimodal (Llama 3.2): true; +3 more	Documentation Live status
Meta AIService details	Llama 3.1 405B Offering details	Pay-as-you-go	$3.50 1M input tokens	0	Context Window: 128000 tokens; Input Modalities: text	Documentation Live status
Meta AIService details	Llama 3.1 70B Offering details	Pay-as-you-go	$0.590 1M input tokens	0	Context Window: 128000 tokens; Input Modalities: text	Documentation Live status
Meta AIService details	Llama 3.1 70B Instruct Offering details	Pay-as-you-go	$0.400 1M input tokens	0	Context Window: 131072 tokens; Input Modalities: text	Documentation Live status
Meta AIService details	Llama 3.1 8B Offering details	Pay-as-you-go	$0.050 1M input tokens	0	Context Window: 128000 tokens; Input Modalities: text	Documentation Live status

Showing 151–200 of 495 offerings