Vector Databases

Arion Das
3 min readNov 21, 2023

VECTOR DATABASES

With the rise of vector databases, it has led many people to wonder why we can’t use a SQL database.

Well, let’s find out...

VECTOR DATABASES

Vector databases store data as high-dimensional vectors, representing features or attributes mathematically. Each vector is associated with a certain number of dimensions, ranging from tens to thousands, depending on the complexity and granularity of the data. These vectors are generated by embedding raw data, such as text, images, audio, video, and others.

SIMILARITY SEARCH

ADVANTAGES

  • Similarity Search: Used for computing the similarity between a pair of objects. It is essential for computing the similarity between vector embeddings.
  • Fast Retrieval of Data: The concept of distance between vectors (Euclidean, Manhattan, Cosine, and Chebyshev) is used, which helps us in classifying the data effectively, resulting in fast data retrieval.
  • Improved query performance.
  • Highly scalable and flexible.
  • High Dimensional Search: Gives us a wide range of data to operate upon.

For example, we can use a vector database to:

  • Find images that are similar to a given image based on their visual content and style
  • Find documents that are similar to a given document based on their topic and sentiment
  • In general, find products that are similar to a given product based on their features and ratings

QUERY VECTOR

QUERY VECTOR

We use a query vector that represents our desired information, to perform similarity search and retrieve desired information from the vector database. The query vector can be either derived from:

  • Same type of data as the stored vectors (using an image as a query for an image database)
  • From different types of data (e.g., using text as a query for an image database).

Then, we need to use a similarity measure that calculates how close or distant two vectors are in the vector space. The similarity measure can be based on various metrics, such as cosine similarity, euclidean distance, hamming distance, or jaccard index.

TYPES OF DISTANCES

The result of the similarity search and retrieval is a ranked list of vectors having the highest similarity scores with the query vector. We can then access the corresponding raw data associated with each vector from the original source or index.

APPLICATIONS

  • Natural language processing
  • Computer vision
  • Recommendation systems
  • Areas requiring semantic understanding and matching of data.
  • Another use case for storing information in a vector database is to enable large language models (LLMs) to generate more relevant and coherent text based on an AI plugin.
  • Stores information about different topics, keywords, facts, opinions, and/or sources related to the desired domain or genre.

POPULAR VDBs

Pinecone, Milvus, Chroma, Weaviate, Deep Lake, Qdrant, Vespa, etc.

Let’s connect and make a project together: 🐈‍⬛

The Rotation of the Earth really makes my day

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Arion Das
Arion Das

Written by Arion Das

AI Eng intern @CareerCafe || Ex NLP Research Intern @Oracle | Gen-AI Research | LLMs | NLP | Deep Learning | LinkedIn: https://www.linkedin.com/in/arion-das/

No responses yet

Write a response