version 0.5
We are thrilled to share that EmbedAnything version 0.5 is out now and comprise of insane development like support for ModernBert and ReRanker models. Along with Ingestion pipeline support for DocX, and HTML letโs get in details.
The best of all have been support for late-interaction model, both ColPali and ColBERT on onnx.
- ModernBert Support: Well it made quite a splash, and we were obliged to add it, in the fastest inference engine, embedanything. In addition to being faster and more accurate, ModernBERT also increases context length to 8k tokens (compared to just 512 for most encoders), and is the first encoder-only model that includes a large amount of code in its training data.
- ColPali- Onnx : ย Running the ColPali model directly on a local machine might not always be feasible. To address this, we developed aย quantized version of ColPali. Find it on our hugging face, link here. You could also run it both on Candle and on ONNX.
- ColBERT: ColBERT is aย fastย andย accurateย retrieval model, enabling scalable BERT-based search over large text collections in tens of milliseconds.
- ReRankers: EmbedAnything recently contributed for the support of reranking models to Candle so as to add it in our own library. It can support any kind of reranking models. Precision meets performance! Use reranking models to refine your retrieval results for even greater accuracy.
- Jina V3: Also contributed to V3 models, for Jina can seamlessly integrate any V3 model.
-
๐๐ข๐๐ซ ๐ฃ๐ฟ๐ผ๐ฐ๐ฒ๐๐๐ถ๐ป๐ด
Effortlessly extract text from .docx files and convert it into embeddings. Simplify your document workflows like never before!
-
๐๐ง๐ ๐ ๐ฃ๐ฟ๐ผ๐ฐ๐ฒ๐๐๐ถ๐ป๐ด:
Parsing and embedding HTML documents just got easier!
โ
Extract rich metadata with embeddings
โ
Handle code blocks separately for better context
Supercharge your documentation retrieval with these advanced capabilities.