To help organizations scale their use of ai without overstretching their budgets, we're adding two new ways to reduce costs on consistent and asynchronous workloads:
- Use with discount on compromised performance: Customers with a sustained level of tokens per minute (TPM) usage on GPT-4 or GPT-4 Turbo can request access to provisioned throughput for discounts ranging from 10% to 50% depending on commitment size.
- Reduced costs on asynchronous workloads: Customers can use our new Batch API to run non-urgent workloads asynchronously. Batch API requests are priced at 50% off shared pricing, offer much higher rate caps, and return results within 24 hours. This is ideal for use cases such as model evaluation, offline classification, summarization, and synthetic data generation.
We plan to continue adding new features focused on enterprise-grade security, administrative controls, and cost management. To learn more about these releases, visit our API documentation or contact our team to discuss custom solutions for your business.