Groq API Integration

Available Models

LLaMA2-70b-chat

Experience the power of LLaMA2-70b with unprecedented speed. Groq's LPU™ technology delivers responses in milliseconds.

• Ultra-low latency (~100ms)
• High accuracy responses
• 70B parameter model
• $0.70 per million input tokens, $0.70 per million output tokens

Mixtral-8x7b-chat

Mixtral offers a perfect balance of speed and capability, with performance comparable to GPT-4 on many tasks.

• Fastest Mixtral inference available
• Mixture of Experts architecture
• Excellent coding capabilities
• $0.27 per million input tokens, $0.27 per million output tokens

Why Choose Groq?

Unmatched Speed

Groq's LPU™ technology delivers the fastest inference speeds in the industry, with response times measured in milliseconds.

• ~100ms latency
• No throttling
• Consistent performance

Cost Effective

Competitive pricing combined with superior speed makes Groq the most cost-effective solution for high-performance AI applications.

• Transparent pricing
• No minimum spend
• Pay only for what you use

Technical Specifications

• Secure API key storage with iOS Keychain
• Native iOS integration with SwiftUI
• Support for streaming responses
• Automatic token counting and cost estimation
• Real-time response monitoring