🔬 YaCy Peer Performance: RAID 5 vs RAM Drive (DL360 Gen8, 384 GB RAM)

roamn · 15 April 2025 09:44

Test Setup

System: HP DL360 Gen8, 384 GB ECC RAM
CPU Load Gen: High-frequency search burst script (10–1000 Hz)
Tooling: Custom Python benchmark with randomized query load
YaCy Peers:
- agent-asx → running on RAID 5
- agent-ramdrive → running on tmpfs RAM disk

Observations

`agent-asx` (RAID 5 backend)

CPU Usage: Peaks at 1600%
Response Times:
- Min: 0.01 s
- Max: 51.62 s (!)
- Average: 0.82 s
Issues: Performance degraded under load, visible spikes in response time due to disk I/O bottlenecks (jbd2 activity + thread stalls)

`agent-ramdrive` (RAM-backed storage)

CPU Usage: Peaks at 909%
Response Times:
- Min: 0.01 s
- Max: 7.89 s
- Average: 0.15 s
Result: Maintains low-latency searches even under extreme query pressure

Conclusion

Running YaCy on a RAM-backed temp store significantly improves query responsiveness and stability under high load. While RAID 5 can handle normal indexing workloads, it chokes under bursty traffic, introducing latency up to 50+ seconds.

RAM-backed deployments on high-memory systems like the DL360 G8 are ideal for:

Distributed P2P indexing
API front-ends
Latency-sensitive peer-to-peer workloads

okybaca · 15 April 2025 09:51

cool! is the raid5 array made of ssds or hdds?

roamn · 15 April 2025 09:53

SSD’s 10 off

Nanook · 16 April 2025 22:43

I am curious why your CPU usage is so high. I have it running on an i9-10980xe box (18 core) and off of a RAID1 array hard drives, and I can completely saturate a 1Gbit/s internet link if I allow it to and yet only consume less than 10% CPU. Given that I can completely saturate a 1Gb/s link with it on a hard drive RAID array, I haven’t seen much incentive to wear out ssd’s fast.

roamn · 17 April 2025 03:22

http://192.168.1.55:8055/yacysearch.json?query=random search string

I ask the above search at 1000 times a second eg 1 kHz.

Its only while my system is under extreme load that the values are so high but the .

My load tester in python

#!/usr/bin/env python3
import requests
import time
import random
import statistics
from concurrent.futures import ThreadPoolExecutor
#Use
# Configuration
YACY_PEERS = ["http://localhost:8090"]
FREQ_HZ = 10
BURST_DURATION_SEC = 10
BURST_INTERVAL_SEC = 10
TOTAL_BURSTS = 3
TIMEOUT = 120
MAX_THREADS = 195

# Sample search terms
QUERY_POOL = [
   "cloudparty", "nextcloud", "linux", "java", "ssd", "ramdisk", "fast upload",
   "yacy peer", "distributed index", "apache", "python3", "webdav", "cloudflare",
   "speedtest", "upload limit", "server benchmark", "proxy settings", "ipv6 ready",
   "peer to peer", "distributed computing","smokingwheels"
]

# Global list to hold all response times
response_times = []

def do_search(peer_url):
   query = random.choice(QUERY_POOL)
   start = time.time()
   try:
       r = requests.get(f"{peer_url}/yacysearch.json", params={"query": query}, timeout=TIMEOUT)
       elapsed = time.time() - start
       response_times.append(elapsed)
       if r.status_code == 200:
           print(f"[{peer_url}] OK ({elapsed:.2f}s) - {query}")
       else:
           print(f"[{peer_url}] Error {r.status_code} ({elapsed:.2f}s) - {query}")
   except Exception as e:
       elapsed = time.time() - start
       response_times.append(elapsed)
       print(f"[{peer_url}] Failed: {e} ({elapsed:.2f}s) - {query}")

def burst():
   with ThreadPoolExecutor(max_workers=MAX_THREADS) as executor:
       for _ in range(FREQ_HZ * BURST_DURATION_SEC):
           for peer in YACY_PEERS:
               executor.submit(do_search, peer)
           time.sleep(1 / FREQ_HZ)

def print_summary():
   if response_times:
       print("\n📊 Summary of Response Times")
       print(f"Total Requests: {len(response_times)}")
       print(f"Min: {min(response_times):.2f}s")
       print(f"Max: {max(response_times):.2f}s")
       print(f"Avg: {sum(response_times)/len(response_times):.2f}s")
       print(f"Std Dev: {statistics.stdev(response_times):.2f}s" if len(response_times) > 1 else "Std Dev: N/A")
   else:
       print("No responses recorded.")

if __name__ == "__main__":
   for i in range(TOTAL_BURSTS):
       print(f"🌩️ Burst {i+1}/{TOTAL_BURSTS}")
       burst()
       if i < TOTAL_BURSTS - 1:
           print(f"🛌 Cooling off for {BURST_INTERVAL_SEC}s...")
           time.sleep(BURST_INTERVAL_SEC)
   print_summary()

I would be interested in other peer’s results and specs we could build a table!

Nanook · 17 April 2025 03:38

Ok makes sense, I thought crawling was what was loading it, didn’t consider the search.

MrDrMcCoy · 22 April 2025 04:01

High CPU usage is a common illusion when I/O contention is present. Most systems cannot properly distinguish between busy and wait-locked threads, especially Linux. Write-intensive workloads on traditional RAID further worsen the matter by starving the I/O read path with queued writes. Add in millions of small-file operations on a single filesystem, and it almost stops mattering how fast the individual disks are. It’s no surprise at all that RAM-backed storage for something like this would perform well.

Certain software-defined storage configurations can handle this better, as can tiering or lazily-replicating processed data to slower volumes. The only thing that higher-level applications can do to improve situations like this (assuming the Java garbage collector isn’t eating too much into this, but that’s not my suspicion) is to bundle I/O operations from many small files into fewer larger files. Inodes and files that are smaller than your chosen block size require multiple read-write-rewrite requests for what appears to be a single operation, each of which must go through RAID parity calculations, so it gets expensive quick. I don’t know that there’s all that much YaCy can do about the small-file problem (arguably the biggest offender), without a significant redesign.

I’m no YaCy authority here, but I formerly did storage benchmarking and validation as my primary career, which was followed by tuning performance for a mostly-Java distributed monitoring system whose I/O patterns would put YaCy to shame. Things get real weird when you have to start caring about NUMA domains and how much cache each individual component of the I/O path has from the individual core-down.

okybaca · 22 April 2025 15:34

on unix systems, you can use iostat tool (if installed) to monitor the I/O load.

roamn · 6 May 2025 22:15

YaCy Burst Load Benchmark: Corretto 21.0.7.6.1 Test Results

After resolving Corretto issue #99 and upgrading to Amazon Corretto 21.0.7.6.1, I ran a controlled burst test using JSON queries at 1 kHz, with a 10-second cooldown between bursts.

Test System

Hardware: HP DL360 Gen8
RAM: 384 GB ECC
Java: Amazon Corretto 21.0.7.6.1
YaCy Peers:
- peer-universal on port 8093 (running from tmpfs)
- peer-asx on port 8055 (running from RAID 5 backend)

Performance Summary

`peer-ramdrive` (RAM-backed `tmpfs`)

Metric	Value
CPU Usage	1100% → 700% → 500%
Min Response	0.01 s
Max Response	13.99 s
Avg Response	0.61 s
Std Deviation	0.59 s

`peer-asx` (RAID 5-backed)

Metric	Value
CPU Usage	1100% → 1500% → 900%
Min Response	0.01 s
Max Response	36.5 s
Std Deviation	1.66 s

Conclusion

The RAM-backed peer (peer-ramdrive) performed significantly better under burst load, with consistently low latency and tight standard deviation.
The RAID 5 peer (peer-asx) exhibited much higher CPU usage and tail latency spikes up to 36.5 seconds, likely due to disk I/O and thread contention.
I’m unable to overload the cpu’s to the same level as I was able to before with previous tests with stress testers in QB64 and python.

🔬 YaCy Peer Performance: RAID 5 vs RAM Drive (DL360 Gen8, 384 GB RAM)

Test Setup

Observations

agent-asx (RAID 5 backend)

agent-ramdrive (RAM-backed storage)

Conclusion

YaCy Burst Load Benchmark: Corretto 21.0.7.6.1 Test Results

Test System

Performance Summary

peer-ramdrive (RAM-backed tmpfs)

peer-asx (RAID 5-backed)

Conclusion

`agent-asx` (RAID 5 backend)

`agent-ramdrive` (RAM-backed storage)

`peer-ramdrive` (RAM-backed `tmpfs`)

`peer-asx` (RAID 5-backed)