PR: RISC-V RVV Optimization for density-rs - Initial Results and Request for Review

Hi @g1mv and community members, thanks for the previous feedback! 🙌

I've completed the initial optimization work on density-rs, focusing on RISC-V with RVV vector extensions. Below, I'll first summarize what I've done, then explain the current issues, and invite everyone to review the results and provide feedback! 😄

What I've Done

Based on your suggestions and discussions, I prioritized optimizing the Chameleon algorithm's core loops (e.g., encode_quad and encode_batch), and extended similar improvements to Cheetah and Lion. The optimizations include:

Manual RVV Vectorization:
- In encode_batch, used RVV intrinsics (e.g., vle32_v_u32m1, vmul_vx_u32m1, vsrl_vx_u32m1, vluxei32_v_u32m1, and vmseq_vv_m_b32) for hash calculations, dictionary accesses, and conflict detection.
- Handled hash uniqueness and sequencing: Fall back to scalar paths on conflicts to ensure correct dictionary updates (referencing your "case a and b" analysis).
- Used conditional compilation #[cfg(all(target_arch = "riscv64", target_feature = "v"))] for compatibility, and handled VLEN variability with vsetvli.
Algorithm Improvements:
- Reduced branch overhead and memory accesses (e.g., optimized hash multiplication and shifts).
- Attempted dynamic mode switching (enable non-updating batches when update rate < 0.1), but currently preliminary.
- Benchmarked with dickens.txt (10.19 MB), comparing before and after performance (default vs optimized).

Performance Comparison: Using median throughput (MB/s), compression ratios unchanged:

Algorithm	Operation	Before (MB/s)	After (MB/s)	Change	Ratio
Chameleon	Compress (raw)	380.2	494.0	+30%	1.749x
	Decompress (raw)	494.4	503.1	+2%
Cheetah	Compress (raw)	220.8	264.5	+20%	1.860x
	Decompress (raw)	291.4	287.2	-1%
Lion	Compress (raw)	135.3	150.7	+11%	1.966x
	Decompress (raw)	144.9	143.5	-1%
LZ4	Compress (raw)	82.15	79.26	-3%	1.585x
	Decompress (raw)	174.2	190.5	+9%
Snappy	Compress (stream)	83.69	83.46	-0.3%	1.607x
	Decompress (stream)	141	141.7	+0.5%

Key Achievements: Chameleon compression nearing 500 MB/s goal! 🎯 Overall compression speeds improved significantly, but decompression varied slightly (minor drops in Cheetah and Lion).

Code Cleanup:
- Stuck to stable Rust, no external crates.
- Added runtime fallbacks for non-RVV hardware.
- Partially fixed warnings, but unused BYTE_SIZE_U128 and std::arch::riscv64::* remain (to be fixed in PR).

These changes build on your feedback (e.g., dynamic vectorization ideas and architectural preferences) and ensure cross-platform adaptability.

管理员消息

RISC-V RVV: Boost Chameleon to 494 MB/s, Fix Tests WIP

PR: RISC-V RVV Optimization for density-rs - Initial Results and Request for Review

What I've Done

合并请求报告