Lucene SIMD Codec Benchmark and Future Steps
Join the DZone community and get the full member experience.
Join For FreeTech notes
The heap problem
Edge cases
Further Plans
- provide codec and benchmark as a separate modules;
- apply SIMD codec for DocValues and Norms - it should improve generic sorting, scoring and faceting. Because ordinals in DocValues are not increasing like postings, https://github.com/lemire/FastPFor should be incorporated;
- complete codec for supporting frequencies, offsets and positions to make it fully functional;
- presumably, SIMD facet component might get some gain from vectorization, however decoding ordinals might not be the biggest problem in faceting, like it’s described here;
- execute binary operations like intersections on compressed data with SIMD instructions https://github.com/lemire/SIMDCompressionAndIntersection;
- native code might access mmapped index files without boundary checks or copying to heap arrays;
- implementing roaring bitmaps might help with dense postings;
Here are still questions to clarify:
- will critical natives work for Java 9 and further?
- couldn’t it happen that vectorization heuristic by JIT makes explicit SIMD codec redundant?
We’d like to thank all people who contributed their researches and let us to conduct ours.
Opinions expressed by DZone contributors are their own.
Comments