CPU Load Gen: High-frequency search burst script (10–1000 Hz)
Tooling: Custom Python benchmark with randomized query load
YaCy Peers:
agent-asx → running on RAID 5
agent-ramdrive → running on tmpfs RAM disk
Observations
agent-asx (RAID 5 backend)
CPU Usage: Peaks at 1600%
Response Times:
Min: 0.01 s
Max: 51.62 s (!)
Average: 0.82 s
Issues: Performance degraded under load, visible spikes in response time due to disk I/O bottlenecks (jbd2 activity + thread stalls)
agent-ramdrive (RAM-backed storage)
CPU Usage: Peaks at 909%
Response Times:
Min: 0.01 s
Max: 7.89 s
Average: 0.15 s
Result: Maintains low-latency searches even under extreme query pressure
Conclusion
Running YaCy on a RAM-backed temp store significantly improves query responsiveness and stability under high load. While RAID 5 can handle normal indexing workloads, it chokes under bursty traffic, introducing latency up to 50+ seconds.
RAM-backed deployments on high-memory systems like the DL360 G8 are ideal for:
I am curious why your CPU usage is so high. I have it running on an i9-10980xe box (18 core) and off of a RAID1 array hard drives, and I can completely saturate a 1Gbit/s internet link if I allow it to and yet only consume less than 10% CPU. Given that I can completely saturate a 1Gb/s link with it on a hard drive RAID array, I haven’t seen much incentive to wear out ssd’s fast.
High CPU usage is a common illusion when I/O contention is present. Most systems cannot properly distinguish between busy and wait-locked threads, especially Linux. Write-intensive workloads on traditional RAID further worsen the matter by starving the I/O read path with queued writes. Add in millions of small-file operations on a single filesystem, and it almost stops mattering how fast the individual disks are. It’s no surprise at all that RAM-backed storage for something like this would perform well.
Certain software-defined storage configurations can handle this better, as can tiering or lazily-replicating processed data to slower volumes. The only thing that higher-level applications can do to improve situations like this (assuming the Java garbage collector isn’t eating too much into this, but that’s not my suspicion) is to bundle I/O operations from many small files into fewer larger files. Inodes and files that are smaller than your chosen block size require multiple read-write-rewrite requests for what appears to be a single operation, each of which must go through RAID parity calculations, so it gets expensive quick. I don’t know that there’s all that much YaCy can do about the small-file problem (arguably the biggest offender), without a significant redesign.
I’m no YaCy authority here, but I formerly did storage benchmarking and validation as my primary career, which was followed by tuning performance for a mostly-Java distributed monitoring system whose I/O patterns would put YaCy to shame. Things get real weird when you have to start caring about NUMA domains and how much cache each individual component of the I/O path has from the individual core-down.