ai/”>OctoAI (formerly known as OctoML), today announced the launch of OctoStack, its new end-to-end solution for deploying generative ai models in an enterprise's private cloud, whether on-premises or in a virtual private cloud from a leading provider. . , including AWS, Google, Microsoft, and Azure, as well as Coreweave, Lambda Labs, Snowflake, and others.
In its early days, OctoAI focused almost exclusively on optimizing models so they ran more efficiently. Based on the Apache TVM machine learning compiler framework, the company launched its TVM-as-a-Service platform and over time expanded it into a full model service offering that combined its optimization skills with a DevOps platform. With the rise of generative ai, the team launched the fully managed OctoAI platform to help its users serve and fine-tune existing models. OctoStack, in essence, is that OctoAI platform, but for private deployments.
Today, CEO and co-founder of OctoAI Luis Cezé He told me that the company has more than 25,000 developers on the platform and hundreds of paying customers in production. Many of these companies, Ceze said, are native GenAI companies. However, the market for traditional companies wanting to adopt generative ai is significantly larger, so it is perhaps unsurprising that OctoAI is now also going after them with OctoStack.
“One thing that became clear is that as the enterprise market moves from experimentation last year to implementations, everyone is looking around because they are nervous about sending data through an API,” Ceze said. “Two: many of them have also committed their own computing, so why would I buy an API when I already have my own computing? And three, no matter what certifications you get and how big your name is, they feel that their ai is as valuable as their data and they don't want to ship it. So there is a really clear need in the enterprise to have the implementation under their control.”
Ceze noted that the team had been developing the architecture to deliver both its SaaS platform and its hosted platform for some time. And while the SaaS platform is optimized for Nvidia hardware, OctoStack can support a much wider range of hardware, including AMD GPUs and amazon.com/machine-learning/inferentia/”>AWS inference accelerator, which in turn makes the optimization challenge a bit difficult (while also leveraging OctoAI's strengths).
Deploying OctoStack should be easy for most businesses, as OctoAI provides the platform with readable containers and their associated Helm charts for deployments. For developers, the API remains the same, regardless of whether they target the SaaS product or OctoAI in their private cloud.
The canonical enterprise use case still uses text summarization and RAG to allow users to chat with their internal documents, but some companies are also wrapping these models in their internal code bases to run their own code generation models (similar to what GitHub now offers). to Copilot Enterprise users).
For many companies, being able to do so in a secure environment that is strictly under their control is what now allows them to put these technologies into production for their employees and customers.
“For our performance- and security-sensitive use case, it is imperative that models processing call data run in an environment that offers flexibility, scale, and security,” said Joshua Kennedy-White, CRO at Apate ai. “OctoStack allows us to easily and efficiently run the custom models we need, within the environments we choose, and deliver the scale our customers require.”