Skip to main content

Qdrant DB

Qdrant is an open-source vector database designed to handle large volumes of high-dimensional vector data with precision and speed. Its primary function is to store, search, and manage embeddings, which are essential for various machine learning applications such as image and text retrieval, recommendation systems, and similarity searches. By indexing vector data efficiently, Qdrant optimizes both storage and retrieval operations, enabling rapid querying and robust handling of complex datasets.

Qdrant’s architecture is specifically tailored to facilitate fast and scalable vector operations. It employs advanced indexing techniques like HNSW (Hierarchical Navigable Small World graphs), which significantly reduce the search time for nearest neighbors in high-dimensional spaces. This makes Qdrant an excellent choice for applications requiring real-time vector similarity searching. Read more about Qdrant DB here.

Prerequisites for Integrating Qdrant

System Requirements

Provision 2 Virtual Machines (VMs) on Ori Global Cloud by following the steps here.

Operating System Specifications:

  • Recommended operating systems are Ubuntu 20.04 or later. These systems provide the best support for the necessary tools and libraries.

  • For server - a single CPU-only VM featuring next-generation Intel processors with a minimum of 8 cores (up to 64 cores) to ensure optimal performance.

  • The other VM could be a NVIDIA GPU with CUDA Compute Capability to leverage GPU acceleration. Ensure that NVIDIA drivers are up-to-date, and CUDA Toolkit 11.8 or newer is installed to support all necessary GPU operations.

Prior Installations:

  • On one of the VMs, Docker must be installed to facilitate the installation of Qdrant. Refer here to install Docker on your provisioned VM.

  • Python3.10 or newer should be installed, works well with the required libraries and tools.

info

Ubuntu22.04 or higher comes with Python3.10 version, you may use the following command to check the available Python version.

python3

Output

Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>

If you need to upgrade Python, following the link here.

Networking specifications

Each Qdrant instance needs the following ports to be open:

  • Port 6333: This is the default port, reserved for the HTTP API. This port is used for accessing monitoring, health, and metrics endpoints, which are essential for managing and observing the operational status of Qdrant.
  • Port 6334: Dedicated to the gRPC API, enabling robust, high-performance communication between clients and the Qdrant server. This API is crucial for handling vector database operations with efficiency.
  • Port 6335: Utilized for distributed deployments. This port is critical for inter-node communication within a Qdrant cluster, ensuring data consistency and availability across different instances.

Installing Qdrant

Pull Qdrant's open source docker image to proceed with installation

docker pull qdrant/qdrant

Start running Qdrant docker container, using it's default port 6333

docker run -p 6333:6333 \
-v $(pwd)/qdrant_storage:/qdrant/storage \
qdrant/qdrant

It should produce the following output

    qdrant/qdrant
_ _
__ _ __| |_ __ __ _ _ __ | |_
/ _` |/ _` | '__/ _` | '_ \| __|
| (_| | (_| | | | (_| | | | | |_
\__, |\__,_|_| \__,_|_| |_|\__|
|_|

Version: 1.9.4, build: 671cf97b
Access web UI at http://localhost:6333/dashboard

2024-05-30T10:44:26.814877Z INFO storage::content_manager::consensus::persistent: Initializing new raft state at ./storage/raft_state.json
2024-05-30T10:44:26.827607Z INFO qdrant: Distributed mode disabled
2024-05-30T10:44:26.827654Z INFO qdrant: Telemetry reporting enabled, id: 89959819-671f-4340-9322-563200352350
2024-05-30T10:44:26.831880Z INFO qdrant::actix: TLS disabled for REST API
2024-05-30T10:44:26.832123Z INFO qdrant::actix: Qdrant HTTP listening on 6333
2024-05-30T10:44:26.832248Z INFO actix_server::builder: Starting 7 workers
2024-05-30T10:44:26.832349Z INFO actix_server::server: Actix runtime found; starting in Actix runtime
2024-05-30T10:44:26.840279Z INFO qdrant::tonic: Qdrant gRPC listening on 6334
2024-05-30T10:44:26.840312Z INFO qdrant::tonic: TLS disabled for gRPC API
info

When setting up Qdrant as described above, a local directory named qdrant_storage was created. This directory serves as the storage location for all your collections and their associated metadata.

You may also run Qdrant using Kubernetes, Docker Compose, or build from source. Refer to Qdrant's installation guide.

Connecting with Qdrant Client

After setting up your second VM, we'll connect the Qdrant server with it's client.

1. Setup a Virtual Enviroment

Once Qdrant is up and running, let's set up a virtual environment on another VM with some packages.

# Install Python virtual environment
sudo apt install python3-venv

#give your virtual env a name
python3 -m venv <virtual-env-name>

#activate your virtual env
source <v-env-name>/bin/activate

2. Install qdrant-client

We'll now install Qdrant's client library by running the following command.

pip install qdrant-client pandas numpy faker

3. Instantiate the client

We'll now instantiate the client by using the QdrantClient module.

from qdrant_client import QdrantClient

client = QdrantClient(url="http://<your-qdrant-server-VM-IP>:6333/dashboard")

4. Start creating Collections

Let's begin by establishing a collection within the database. A collection is essentially a structured set of data, where each entry is referred to as a document. For our purposes, we will define the dimensionality of the vectors contained in this collection; here, each vector comprises 2048 dimensions.

from qdrant_client.http import models
from qdrant_client.models import CollectionStatus, Distance, VectorParams

collection_name = "new_collection"

qdrant_client.recreate_collection(
collection_name=collection_name,
vectors_config=models.VectorParams(size=2048, distance=models.Distance.COSINE)

This should return True value.

Below is the screenshot showing the new collection added to the Qdrant Dashboard:

The Distance attribute determines the method employed to calculate the distance between vectors. The COSINE distance evaluates the cosine of the angle between two vectors, which is particularly useful for identifying or sorting similar vectors within a group.

info

To establish a unique, non-replicable collection, utilize the client.create_collection() method.

We can gather information about collection by retrieving it through our client. This data can be invaluable for testing purposes, especially during the development phase, as it helps ensure the collection is functioning as expected.

collection_info = client.get_collection(collection_name=collection_name)
list(collection_info)
assert collection_info.status == CollectionStatus.GREEN
assert collection_info.vectors_count == 0

We can see the collection status on the Dashboard UI

You can now start adding some data and vectors to the collection.

To learn more and try some examples, refer to Qdrant’s Github repo.