Vector Databases

3 min readNov 21, 2023

With the rise of vector databases, it has led many people to wonder why we can’t use a SQL database.

Well, let’s find out...

VECTOR DATABASES

Vector databases store data as high-dimensional vectors, representing features or attributes mathematically. Each vector is associated with a certain number of dimensions, ranging from tens to thousands, depending on the complexity and granularity of the data. These vectors are generated by embedding raw data, such as text, images, audio, video, and others.

ADVANTAGES

Similarity Search: Used for computing the similarity between a pair of objects. It is essential for computing the similarity between vector embeddings.
Fast Retrieval of Data: The concept of distance between vectors (Euclidean, Manhattan, Cosine, and Chebyshev) is used, which helps us in classifying the data effectively, resulting in fast data retrieval.
Improved query performance.
Highly scalable and flexible.
High Dimensional Search: Gives us a wide range of data to operate upon.

For example, we can use a vector database to:

Find images that are similar to a given image based on their visual content and style
Find documents that are similar to a given document based on their topic and sentiment
In general, find products that are similar to a given product based on their features and ratings

QUERY VECTOR

We use a query vector that represents our desired information, to perform similarity search and retrieve desired information from the vector database. The query vector can be either derived from:

Same type of data as the stored vectors (using an image as a query for an image database)
From different types of data (e.g., using text as a query for an image database).

Then, we need to use a similarity measure that calculates how close or distant two vectors are in the vector space. The similarity measure can be based on various metrics, such as cosine similarity, euclidean distance, hamming distance, or jaccard index.

The result of the similarity search and retrieval is a ranked list of vectors having the highest similarity scores with the query vector. We can then access the corresponding raw data associated with each vector from the original source or index.

APPLICATIONS

Natural language processing
Computer vision
Recommendation systems
Areas requiring semantic understanding and matching of data.
Another use case for storing information in a vector database is to enable large language models (LLMs) to generate more relevant and coherent text based on an AI plugin.
Stores information about different topics, keywords, facts, opinions, and/or sources related to the desired domain or genre.

POPULAR VDBs

Pinecone, Milvus, Chroma, Weaviate, Deep Lake, Qdrant, Vespa, etc.

Let’s connect and make a project together: 🐈‍⬛

Vector Databases

VECTOR DATABASES

ADVANTAGES

QUERY VECTOR

APPLICATIONS

POPULAR VDBs

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Arion Das

No responses yet