CPU Load Gen: High-frequency search burst script (10–1000 Hz)
Tooling: Custom Python benchmark with randomized query load
YaCy Peers:
agent-asx → running on RAID 5
agent-ramdrive → running on tmpfs RAM disk
Observations
agent-asx (RAID 5 backend)
CPU Usage: Peaks at 1600%
Response Times:
Min: 0.01 s
Max: 51.62 s (!)
Average: 0.82 s
Issues: Performance degraded under load, visible spikes in response time due to disk I/O bottlenecks (jbd2 activity + thread stalls)
agent-ramdrive (RAM-backed storage)
CPU Usage: Peaks at 909%
Response Times:
Min: 0.01 s
Max: 7.89 s
Average: 0.15 s
Result: Maintains low-latency searches even under extreme query pressure
Conclusion
Running YaCy on a RAM-backed temp store significantly improves query responsiveness and stability under high load. While RAID 5 can handle normal indexing workloads, it chokes under bursty traffic, introducing latency up to 50+ seconds.
RAM-backed deployments on high-memory systems like the DL360 G8 are ideal for:
I am curious why your CPU usage is so high. I have it running on an i9-10980xe box (18 core) and off of a RAID1 array hard drives, and I can completely saturate a 1Gbit/s internet link if I allow it to and yet only consume less than 10% CPU. Given that I can completely saturate a 1Gb/s link with it on a hard drive RAID array, I haven’t seen much incentive to wear out ssd’s fast.
High CPU usage is a common illusion when I/O contention is present. Most systems cannot properly distinguish between busy and wait-locked threads, especially Linux. Write-intensive workloads on traditional RAID further worsen the matter by starving the I/O read path with queued writes. Add in millions of small-file operations on a single filesystem, and it almost stops mattering how fast the individual disks are. It’s no surprise at all that RAM-backed storage for something like this would perform well.
Certain software-defined storage configurations can handle this better, as can tiering or lazily-replicating processed data to slower volumes. The only thing that higher-level applications can do to improve situations like this (assuming the Java garbage collector isn’t eating too much into this, but that’s not my suspicion) is to bundle I/O operations from many small files into fewer larger files. Inodes and files that are smaller than your chosen block size require multiple read-write-rewrite requests for what appears to be a single operation, each of which must go through RAID parity calculations, so it gets expensive quick. I don’t know that there’s all that much YaCy can do about the small-file problem (arguably the biggest offender), without a significant redesign.
I’m no YaCy authority here, but I formerly did storage benchmarking and validation as my primary career, which was followed by tuning performance for a mostly-Java distributed monitoring system whose I/O patterns would put YaCy to shame. Things get real weird when you have to start caring about NUMA domains and how much cache each individual component of the I/O path has from the individual core-down.
YaCy Burst Load Benchmark: Corretto 21.0.7.6.1 Test Results
After resolving Corretto issue #99 and upgrading to Amazon Corretto 21.0.7.6.1, I ran a controlled burst test using JSON queries at 1 kHz, with a 10-second cooldown between bursts.
Test System
Hardware: HP DL360 Gen8
RAM: 384 GB ECC
Java: Amazon Corretto 21.0.7.6.1
YaCy Peers:
peer-universal on port 8093 (running from tmpfs)
peer-asx on port 8055 (running from RAID 5 backend)
Performance Summary
peer-ramdrive (RAM-backed tmpfs)
Metric
Value
CPU Usage
1100% → 700% → 500%
Min Response
0.01 s
Max Response
13.99 s
Avg Response
0.61 s
Std Deviation
0.59 s
peer-asx (RAID 5-backed)
Metric
Value
CPU Usage
1100% → 1500% → 900%
Min Response
0.01 s
Max Response
36.5 s
Std Deviation
1.66 s
Conclusion
The RAM-backed peer (peer-ramdrive) performed significantly better under burst load, with consistently low latency and tight standard deviation.
The RAID 5 peer (peer-asx) exhibited much higher CPU usage and tail latency spikes up to 36.5 seconds, likely due to disk I/O and thread contention.
I’m unable to overload the cpu’s to the same level as I was able to before with previous tests with stress testers in QB64 and python.