Groq API Integration

Available Models

LLaMA2-70b-chat

Experience the power of LLaMA2-70b with unprecedented speed. Groq's LPU™ technology delivers responses in milliseconds.

  • • Ultra-low latency (~100ms)
  • • High accuracy responses
  • • 70B parameter model
  • • $0.70 per million input tokens, $0.70 per million output tokens

Mixtral-8x7b-chat

Mixtral offers a perfect balance of speed and capability, with performance comparable to GPT-4 on many tasks.

  • • Fastest Mixtral inference available
  • • Mixture of Experts architecture
  • • Excellent coding capabilities
  • • $0.27 per million input tokens, $0.27 per million output tokens

Why Choose Groq?

Unmatched Speed

Groq's LPU™ technology delivers the fastest inference speeds in the industry, with response times measured in milliseconds.

  • • ~100ms latency
  • • No throttling
  • • Consistent performance

Cost Effective

Competitive pricing combined with superior speed makes Groq the most cost-effective solution for high-performance AI applications.

  • • Transparent pricing
  • • No minimum spend
  • • Pay only for what you use

Technical Specifications

  • • Secure API key storage with iOS Keychain
  • • Native iOS integration with SwiftUI
  • • Support for streaming responses
  • • Automatic token counting and cost estimation
  • • Real-time response monitoring