🧠Knowledge Series #34: Vector databases - a simple guide
Everything you need to know about this new type of database that powers AI applications - explained simply
🔒The Knowledge Series is a collection of easy to read guides designed to help you plug the gaps in your tech knowledge so that you feel more confident when chatting to colleagues. Clearly explained in plain English. One topic at a time.
If you’re a free subscriber and you’d like to upgrade to unlock them you can do so below. Or you can find out more about paid access here.
Hi product people 👋,
Thanks to their critical role in AI feature development, vector databases are snapping up major funding rounds at the moment. In January, Qdrant secured $28 million to capitalize on growth that led it to becoming one of the top 10 fastest growing commercial open source startups last year.
But what exactly are they, how are they different from other types of databases and how can product teams use them?Â
Getting your head around vector databases can be tricky. In this guide, we’ll focus only on the essential, simple things to know about vector databases so that the next time someone in your team mentions them, you’ll understand the basics.
Coming up:
What is a vector database?
Why vector databases are popular for AI feature development
Real world examples of how companies like Pinterest, Airbnb and Spotify use vector databases to power their features
An end to end example to bring the key concepts to life
The differences between a vector database and other databases
Important terminology worth knowing
Tools and types vector databases
What is a vector database?
In our Knowledge Series posts on databases so far, we’ve covered the two most common types of databases: SQL and NoSQL databases. But vector databases are an entirely different category of database.
Unlike SQL databases which store data in tables or NoSQL databases which store data in documents, vector databases are a type of database which stores - and retrieves - data based on vector representations, or embeddings, of the data.Â
There are some fairly complicated parts of vector databases to get your head around but let’s first take a look at one of the core parts of a vector database: the vector representations, or embeddings.
What’s a vector representation?
Imagine being able to take something - like your favorite movie, song or photograph and transform it into a representation which reflects all of its important characteristics. That’s what vector databases do - and the representations they use are known as embeddings.
These embeddings represent data using a series of numbers where each element in the vector corresponds to a specific feature or attribute of the data being represented.
Each type of embedding captures specific characteristics of the original data type and transforms it into a numerical format that can be efficiently processed, compared and analyzed using various machine learning algorithms.
In theory, any type of data can be transformed into a vector embedding so long as there is a meaningful way to represent the data numerically.
Examples of different types of embeddings
When machine learning engineering teams work with vector databases, what they’re doing is transforming data into numerical representations. And they do this using pre-trained models. Here’s some of the most powerful models that can be used for creating different types of embeddings: