CategoriesAI

Benchmarking Self Hosted Embedding Models

Vector embeddings power a lot of modern search and retrieval systems. In practice, though, choosing an embedding model is less about leaderboards and more about engineering tradeoffs:

  • How many tokens per minute can I push through it
  • How much GPU memory does it need

In this post I will walk through a small benchmark setup for four popular self hosted embedding models.