Posted by modinfo 5 days ago
About streaming ingestion support. Meilisearch support basic HTTP requests and is capable of batching task to index them faster. In v1.12 [2], we released our new indexer version that is much faster, leverages high usage of parallel processing, and reduces disk writes.
[1]: https://www.meilisearch.com/blog/hybrid-search [2]: https://github.com/meilisearch/meilisearch/releases/tag/v1.1...
Can you clarify what you mean by "fusion ranking"?
All hybrid search requires a method to blend keyword and vector search results. RRF is one approach, and cross-encoder-based rerankers is another.
Whether it uses re-ranking or not depends on how you want to stretch the definition. Meilisearch does not use the rank of the documents in each list of results to compute the final list of results.
Rather, Meilisearch attributes a relevancy score to each result and then orders the results in the final list by comparing the relevancy score of the documents in each list of results.
This is usually much better than any method that uses the rank of the documents, because the rank of a document doesn't tell you if the document is relevant, only that it is more relevant than documents that ranked after it in that list of hits. As a result, these methods tend to mix good and bad results. As semantic and full-text search are complementary, one method is best for some queries and the other for different queries, and taking results by only considering their rank in their respective list of results is really bizarre to me.
I gather other search engines might be doing it that way because they cannot produce a comparable relevancy score for both the full-text search results and the semantic search results.
I'm not sure why the website mentions Reciprocal Rank Fusion (RRF) (I'm just a dev, not in charge of this particular blog article), but it doesn't sound right to me. Maybe something got lost in translation. I'll try and have it fixed. EDIT: Reported, this is being fixed.
By the way, this way of comparing scores from multiple lists of results generalizes very well, which is how Meilisearch is able to provide its "federated search" feature, which is quite unique across search engines, I believe.
Federated search allows comparing the results of multiple queries against possibly multiple indexes or embedders.
It's a feature, not a bug :D
The cool thing about that is that it is the OS that will get to choose which process to allocate memory, and you can always run it somewhere with less memory available, and it will work the same way
But your OS will share the Meilisearch memory with other process seamlessly, you don’t have anything to do. In htop it’s the yellow bar, and it’s basically a big cache shared between all processes.
* Meili uses 50% RAM
* I use 10-20% with another script to occasionally index new data
* When indexing, Meili could jump to use 70-80% RAM
* Meili is OOM killed
While Meilisearch is capable of limiting it's resident (actual mallocs) memory. However, it requires a bare minimum (about 1GiB).
This is a trick I learned years ago with other mmap-based systems.
What was your strategy to measure that?