Vector embeddings power a lot of modern search and retrieval systems. In practice, though, choosing an embedding model is less about leaderboards and more about engineering tradeoffs:
- How many tokens per minute can I push through it
- How much GPU memory does it need
In this post I will walk through a small benchmark setup for four popular self hosted embedding models.
