We’re excited to announce a new integration model that’s significantly more capable, cost-effective, and easier to use. the new model, text-embedding-ada-002
It replaces five separate models for text search, text similarity, and code search, and outperforms our most capable previous model, Davinci, in most tasks, while being 99.8% lower in price.
Embeds are numerical representations of concepts converted into numerical sequences, making it easier for computers to understand the relationships between those concepts. Since the initial release of the OpenAI/embeddings endpoint, many applications have embedded embeds to personalize, recommend, and find content.
You can query the /embeddings endpoint for the new model with two lines of code using our OpenAI Python Libraryjust as you could with previous models:
import openai
response = openai.Embedding.create(
input="porcine pals say",
model="text-embedding-ada-002"
)
Model improvements
stronger performance. text-embedding-ada-002
it outperforms all older embedding models in text search, code search, and sentence similarity tasks and achieves comparable performance in text classification. For each task category, we evaluated the models on the data sets used in old inlays.
Unification of capacities. We’ve significantly simplified the /embeddings endpoint interface by merging the five separate models shown above (text-similarity
, text-search-query
, text-search-doc
, code-search-text
Y code-search-code
) in a single new model. This unique representation performs better than our previous embedding models across a diverse set of benchmarks of text search, sentence similarity, and code search.
longer context. The context length of the new model is increased by a factor of four, from 2048 to 8192, making it more convenient to work with long documents.
Smaller embedding size. The new inlays have only 1536 dimensions, one eighth the size of davinci-001
embeddings, which makes new embeddings more cost effective when working with vector databases.
Reduced price. We have reduced the price of the new built-in models by 90% compared to the old models of the same size. The new model achieves better or similar performance than the old Davinci models at a price that is 99.8% lower.
Overall, the new embedding model is a much more powerful tool for natural language processing and coding tasks. We are excited to see how our customers will use it to build even more capable applications in their respective fields.
limitations
The new text-embedding-ada-002
the model does not have superior performance text-similarity-davinci-001
at the SentEval linear probe classification benchmark. For tasks that require training a lightweight linear shell on embedding vectors for classification prediction, we suggest comparing the new model with text-similarity-davinci-001
and choose the model that offers optimum performance.
Please see the Limitations and Risks section in the embed documentation for general limitations of our embed models.
Embedding API examples in action
calendar AI is a sales outreach product that uses embeds to match the right sales pitch to the right customers from a data set containing 340 million profiles. This automation relies on the similarity between customer profile additions and sales pitches to sort out the most suitable matches, eliminating 40-56% of unwanted targeting compared to their previous approach.
Notionthe online workspace company, will use the new additions of OpenAI to enhance Notion search beyond current keyword matching systems.