version 0.5

We are thrilled to share that EmbedAnything version 0.5 is out now and comprise of insane development like support for ModernBert and ReRanker models. Along with Ingestion pipeline support for DocX, and HTML let’s get in details.

The best of all have been support for late-interaction model, both ColPali and ColBERT on onnx.

  1. ModernBert Support: Well it made quite a splash, and we were obliged to add it, in the fastest inference engine, embedanything. In addition to being faster and more accurate, ModernBERT also increases context length to 8k tokens (compared to just 512 for most encoders), and is the first encoder-only model that includes a large amount of code in its training data.
  2. ColPali- Onnx : Β Running the ColPali model directly on a local machine might not always be feasible. To address this, we developed aΒ quantized version of ColPali. Find it on our hugging face, link here. You could also run it both on Candle and on ONNX.
  3. ColBERT: ColBERT is aΒ fastΒ andΒ accurateΒ retrieval model, enabling scalable BERT-based search over large text collections in tens of milliseconds.
  4. ReRankers: EmbedAnything recently contributed for the support of reranking models to Candle so as to add it in our own library. It can support any kind of reranking models. Precision meets performance! Use reranking models to refine your retrieval results for even greater accuracy.
  5. Jina V3: Also contributed to V3 models, for Jina can seamlessly integrate any V3 model.
  6. 𝗗𝗒𝗖𝗫 π—£π—Ώπ—Όπ—°π—²π˜€π˜€π—Άπ—»π—΄

    Effortlessly extract text from .docx files and convert it into embeddings. Simplify your document workflows like never before!

  7. π—›π—§π— π—Ÿ π—£π—Ώπ—Όπ—°π—²π˜€π˜€π—Άπ—»π—΄:

Parsing and embedding HTML documents just got easier!

βœ… Extract rich metadata with embeddings
βœ… Handle code blocks separately for better context

Supercharge your documentation retrieval with these advanced capabilities.