New and improved embedding model

We’re excited to announce a new integration model that’s significantly more capable, cost-effective, and easier to use. the new model, text-embedding-ada-002It replaces five separate models for text search, text similarity, and code search, and outperforms our most capable previous model, Davinci, in most tasks, while being 99.8% lower in price.

Read documentation

Embeds are numerical representations of concepts converted into numerical sequences, making it easier for computers to understand the relationships between those concepts. Since the initial release of the OpenAI/embeddings endpoint, many applications have embedded embeds to personalize, recommend, and find content.

You can query the /embeddings endpoint for the new model with two lines of code using our OpenAI Python Libraryjust as you could with previous models:

import openai
response = openai.Embedding.create(
  input="porcine pals say",
  model="text-embedding-ada-002"
)


print(response)
{
  "data": [
    {
      "embedding": [
        -0.0108,
        -0.0107,
        0.0323,
        ...
        -0.0114
      ],
      "index": 0,
      "object": "embedding"
    }
  ],
  "model": "text-embedding-ada-002",
  "object": "list"
}

Model improvements

stronger performance. text-embedding-ada-002 it outperforms all older embedding models in text search, code search, and sentence similarity tasks and achieves comparable performance in text classification. For each task category, we evaluated the models on the data sets used in old inlays.

Unification of capacities. We’ve significantly simplified the /embeddings endpoint interface by merging the five separate models shown above (text-similarity, text-search-query, text-search-doc, code-search-text Y code-search-code) in a single new model. This unique representation performs better than our previous embedding models across a diverse set of benchmarks of text search, sentence similarity, and code search.

longer context. The context length of the new model is increased by a factor of four, from 2048 to 8192, making it more convenient to work with long documents.

Smaller embedding size. The new inlays have only 1536 dimensions, one eighth the size of davinci-001 embeddings, which makes new embeddings more cost effective when working with vector databases.

Reduced price. We have reduced the price of the new built-in models by 90% compared to the old models of the same size. The new model achieves better or similar performance than the old Davinci models at a price that is 99.8% lower.

Overall, the new embedding model is a much more powerful tool for natural language processing and coding tasks. We are excited to see how our customers will use it to build even more capable applications in their respective fields.

limitations

The new text-embedding-ada-002 the model does not have superior performance text-similarity-davinci-001 at the SentEval linear probe classification benchmark. For tasks that require training a lightweight linear shell on embedding vectors for classification prediction, we suggest comparing the new model with text-similarity-davinci-001 and choose the model that offers optimum performance.

Please see the Limitations and Risks section in the embed documentation for general limitations of our embed models.

Embedding API examples in action

calendar AI is a sales outreach product that uses embeds to match the right sales pitch to the right customers from a data set containing 340 million profiles. This automation relies on the similarity between customer profile additions and sales pitches to sort out the most suitable matches, eliminating 40-56% of unwanted targeting compared to their previous approach.

Notionthe online workspace company, will use the new additions of OpenAI to enhance Notion search beyond current keyword matching systems.

Read documentation

New and improved embedding model

Technical Terrence Team

Best practices for load testing Amazon SageMaker real-time inference endpoints

Leave a Reply Cancel reply

Recommended.

Who wins the dapp race?

Tech & Learning Announces Best of Show Winners at ISTE 2023

Here's why I'd rather start investing with £500 than £5,000!

Reduced Mac Minis and a new iPad Mini could arrive in November

You may want to think twice before mining Bitcoin here

Categories

Important Links

New and improved embedding model

Model improvements

limitations

Embedding API examples in action

Related

Technical Terrence Team

Best practices for load testing Amazon SageMaker real-time inference endpoints

Leave a Reply Cancel reply

Recommended.

Who wins the dapp race?

Tech & Learning Announces Best of Show Winners at ISTE 2023

Here's why I'd rather start investing with £500 than £5,000!

Reduced Mac Minis and a new iPad Mini could arrive in November

You may want to think twice before mining Bitcoin here

Categories

Important Links

Get daily news updates to your inbox!