Vector databases are is all the rage, judging by the number of startups entering the space and investors paying for a piece of the pie. The proliferation of large language models (LLM) and the generative ai movement (GenAI) have created fertile ground for vector database technologies to flourish.
While traditional relational databases such as Postgres or MySQL are suitable for structured data (predefined data types that can be filed neatly in rows and columns), this does not work as well for unstructured data such as images, videos, emails, social networks. posts and any data that does not adhere to a predefined data model.
Vector databases, on the other hand, store and process data in the form of vector embeddings, which convert text, documents, images, and other data into numerical representations that capture the meaning and relationships between different data points. This is perfect for machine learning, as the database stores data spatially based on the relevance of each element to the other, making it easy to retrieve semantically similar data.
This is particularly useful for LLMs, such as OpenAI's GPT-4, as it allows the ai chatbot to better understand the context of a conversation by analyzing previous similar conversations. Vector search is also useful for all kinds of real-time applications, such as content recommendations on social media or e-commerce applications, as you can see what a user has searched for and retrieve similar items in an instant.
Vector search can also help reduce “hallucinations” in LLM applications by providing additional information that may not have been available in the original training data set.
“Without using vector similarity search, ai/ML applications can still be developed, but more retraining and fine-tuning would be required.” Other ZayarniCEO and co-founder of vector search startup tech/” target=”_blank” rel=”noopener”>Quadrant, he explained to TechCrunch. “Vector databases come into play when there is a large data set and a tool is needed to work with vector embeddings in an efficient and convenient way.”
In January, Qdrant raised $28 million in funding to capitalize on the growth that saw it become one of the top 10 fastest-growing commercial open source startups last year. And it's far from the only vector database startup to raise money lately: Vespa, ai-native-vector-database-technology-301803296.html” target=”_blank” rel=”noopener”>Weaviatepineapple and chroma They collectively raised $200 million last year for various vector offerings.
Since the beginning of the year, we have also seen Index Ventures lead a $9.5 million seed round in Superlinked, a platform that transforms complex data into vector embeddings. And a few weeks ago, Y Combinator (YC) unveiled its Winter '24 cohort, which included Flashlighta startup that sells a hosted vector search engine for Postgres.
Elsewhere, ai/” target=”_blank” rel=”noopener”>Broth raised a $4.4 million seed round late last year, quickly followed by a ai-powered-Vector-Search-Seamless.html” target=”_blank” rel=”noopener”>Series A round of 12.5 million dollars in February. The Marqo platform provides a full range of out-of-the-box vector tools, covering vector generation, storage and retrieval, allowing users to bypass third-party tools like OpenAI or Hugging Face, and delivers everything through one single API.
Marqo Co-Founders Tom Hamer and Jesse Clark Previously he worked in engineering positions at amazon, where they realized the “huge unmet need” for flexible, semantic search across different modalities, such as text and images. And that's when they jumped ship to form Marqo in 2021.
“Working with visual and robotic search at amazon was when I really looked at vector search; I was thinking about new ways to discover products, and that converged very quickly on vector search,” Clark told TechCrunch. “In robotics, I was using multimodal search to search through a lot of our images and identify if there were errant things like hoses and packages. Otherwise, this would be very difficult to resolve.”
Enter the company
While vector databases are having a moment amid the ChatGPT hoopla and the GenAI movement, they are not a panacea for all enterprise search scenarios.
“Dedicated databases tend to focus entirely on specific use cases and can therefore design their architecture for the performance of necessary tasks, as well as the user experience, compared to general-purpose databases. , which must be adapted to the current design”. Peter Zaitsevfounder of database services and support company Percona, explained to TechCrunch.
While specialized databases may excel at one thing to the exclusion of others, this is why we are starting to see database holders as Elastic, Redis, Open search, ai-development-apache-cassandra-introduces-vector-search” target=”_blank” rel=”noopener”>cassandra, ai-2023-09-19/” target=”_blank” rel=”noopener”>Oracleand ai” target=”_blank” rel=”noopener”>MongoDB adding vector database search intelligence to the mix, as do cloud service providers like Microsoft Azure, amazon.com/about-aws/whats-new/2023/11/vector-search-amazon-documentdb/” target=”_blank” rel=”noopener”>amazon AWSand cloud flare.
Zaitsev compares this latest trend with what happened with JSON more than a decade ago, when web applications became more prevalent and developers needed a language-independent data format that was easy for humans to read and write. In that case, a new class of database emerged in the form of document databases like MongoDB, while existing relational databases also introduced JSON support.
“I think the same thing is likely to happen with vector databases,” Zaitsev told TechCrunch. “Users who are building very complicated, large-scale ai applications will use dedicated vector search databases, while people who need to create some ai functionality for their existing application are more likely to use search functionality.” of vectors in the databases they already use. “
But Zayarni and his colleagues at Qdrant are betting that native solutions built entirely around vectors will provide the “speed, memory safety and scale” needed as vector data explodes, compared to companies that incorporate search. vectors as an afterthought.
“Their argument is, 'we can also do vector searches, if necessary,'” Zayarni said. “Our argument is: 'we do advanced vector search in the best way possible.' It's all a matter of specialization. In fact, we recommend starting with any database you already have in your technology stack. At some point, users will face limitations if vector search is a critical component of their solution.”