Skip to content

🏠 Home

Supercharge your embedding pipeline with a minimalist and lightning-fast framework built in Rust 🦀
Explore the docs »

View Demo · Examples · Request Feature · Search in Audio Space

EmbedAnything is a minimalist yet highly performant, lightweight, lightning-fast, multi-source, multimodal and local embedding pipeline, built in Rust. Whether you're working with text, images, audio, PDFs, websites, or other media, EmbedAnything simplifies the process of generating embeddings from various sources and streaming them to a vector database. We support dense, sparse and late-interaction embeddings.

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Roadmap
  5. Contributing
  6. How to add custom model and chunk size

To sign up for future updates, sign up to our newsletter:

💡 What is Vector Streaming

Vector Streaming enables you to process and generate embeddings for files and stream them. If you have a 10 GB file, it can continuously generate embeddings chunk by chunk that you can segment semantically and store in the vector database of your choice. This eliminates the need for bulk embeddings storage in RAM at once.

EmbedAnythingXWeaviate

🚀 Key Features

  • Local Embedding: Works with local embedding models like BERT and JINA
  • ColPali: Support for ColPali in GPU version
  • Splade: Support for sparse embeddings for hybrid search
  • Cloud Embedding Models: Supports OpenAI and Cohere
  • Multimodality: Works with text sources like PDFs, TXT, MD, images (JPG), and audio (WAV)
  • Rust: All file processing is done in Rust for speed and efficiency
  • Candle: We have taken care of hardware acceleration with Candle
  • Python Interface: Packaged as a Python library for seamless integration into your existing projects
  • Vector Streaming: Continuously create and stream embeddings if you have low resources

🦀 Why Embed Anything

➡️Faster execution.
➡️Memory Management: Rust enforces memory management simultaneously, preventing memory leaks and crashes that can plague other languages
➡️True multithreading
➡️Running language models or embedding models locally and efficiently
➡️Candle allows inferences on CUDA-enabled GPUs right out of the box.
➡️Decrease the memory usage of EmbedAnything.

image

🧑‍🚀 Getting Started

📩 Installation

pip install embed-anything

For GPUs and using special models like ColPali

pip install embed-anything-gpu

📝 Usage

model = EmbeddingModel.from_pretrained_local(
    WhichModel.Bert, model_id="sentence-transformers/all-MiniLM-L6-v2"
)
data = embed_anything.embed_file("test_files/test.pdf", embedder=model)

Supported Models

Model HF link
Jina Jina Models
Bert All Bert based models
CLIP openai/clip-*
Whisper OpenAI Whisper models
ColPali vidore/colpali-v1.2-merged
Colbert answerdotai/answerai-colbert-small-v1, jinaai/jina-colbert-v2 and more
Splade Splade Models and other Splade like models
Reranker Jina Reranker Models, Xenova/bge-reranker

♠️ Splade Models

model = EmbeddingModel.from_pretrained_hf(
    WhichModel.SparseBert, "naver/splade-v3")

👁️ ColPali Models

model: ColpaliModel = ColpaliModel.from_pretrained("vidore/colpali-v1.2-merged", None)

📷 Image Embeddings

Requirements: Directory with pictures you want to search for example we have test_files with images of cat, dogs etc

import embed_anything
from embed_anything import EmbedData
model = embed_anything.EmbeddingModel.from_pretrained_local(
    embed_anything.WhichModel.Clip,
    model_id="openai/clip-vit-base-patch16",
    # revision="refs/pr/15",
)
data: list[EmbedData] = embed_anything.embed_directory("test_files", embedder=model)
embeddings = np.array([data.embedding for data in data])
query = ["Photo of a monkey?"]
query_embedding = np.array(
    embed_anything.embed_query(query, embedder=model)[0].embedding
)
similarities = np.dot(embeddings, query_embedding)
max_index = np.argmax(similarities)
Image.open(data[max_index].text).show()

🔊 Audio Embedding using Whisper

requirements: Audio .wav files.

import embed_anything
from embed_anything import JinaConfig, EmbedConfig, AudioDecoderConfig
import time

start_time = time.time()

# choose any whisper or distilwhisper model 
# from https://huggingface.co/distil-whisper or 
# https://huggingface.co/collections/openai/whisper-release-6501bba2cf999715fd953013
audio_decoder_config = AudioDecoderConfig(
    decoder_model_id="openai/whisper-tiny.en",
    decoder_revision="main",
    model_type="tiny-en",
    quantized=False,
)
jina_config = JinaConfig(
    model_id="jinaai/jina-embeddings-v2-small-en", revision="main", chunk_size=100
)

config = EmbedConfig(jina=jina_config, audio_decoder=audio_decoder_config)
data = embed_anything.embed_file(
    "test_files/audio/samples_hp0.wav", embedder="Audio", config=config
)
print(data[0].metadata)
end_time = time.time()
print("Time taken: ", end_time - start_time)

Colbert

Several Colbert Models are supported. The tested models are: - jinaai/jina-colbert-v2 - answerdotai/answerai-colbert-small-v1 - onnx-models/jina-colbert-v1-en-onnx

sentences = [
"The quick brown fox jumps over the lazy dog", 
"The cat is sleeping on the mat", "The dog is barking at the moon", 
"I love pizza", 
"The dog is sitting in the park"]

model = ColbertModel.from_pretrained_onnx("jinaai/jina-colbert-v2", path_in_repo="onnx/model.onnx")
embeddings = model.embed(sentences, batch_size=2)

⬆️Reranker Model

We support reranker models that are available as ONNX models. Currently the models that are tested are: 1. jinaai/jina-reranker-v2-base-multilingual 2. jinaai/jina-reranker-v1-tiny-en 3. jinaai/jina-reranker-v1-turbo-en 4. Xenova/bge-reranker-base 5. Xenova/bge-reranker-large

from embed_anything import Reranker, Dtype, RerankerResult, DocumentRank

reranker = Reranker.from_pretrained("jinaai/jina-reranker-v1-turbo-en", dtype=Dtype.FP16)

results: RerankerResult = reranker.rerank(["What is the capital of France?"], ["France is a country in Europe.", "Paris is the capital of France."], 2)

documents: list[DocumentRank] = results[0].documents

The output is a list of documents with their relevance scores and rank for each input query.

Using ONNX Models

To use ONNX models, you can either use the ONNXModel enum or the model_id from the Hugging Face model.

model = EmbeddingModel.from_pretrained_onnx(
  WhichModel.Bert, model_name = ONNXModel.AllMiniLML6V2Q
)
For some models, you can also specify the dtype to use for the model.

model = EmbeddingModel.from_pretrained_onnx(
    WhichModel.Bert, ONNXModel.ModernBERTBase, dtype = Dtype.Q4F16
)

Using the above method is best to ensure that the model works correctly as these models are tested. But if you want to use other models, like finetuned models, you can use the hf_model_id and path_in_repo to load the model like below.

model = EmbeddingModel.from_pretrained_onnx(
  WhichModel.Jina, hf_model_id = "jinaai/jina-embeddings-v2-small-en", path_in_repo="model.onnx"
)
To see all the ONNX models supported with model_name, see here