πŸ”¬ YaCy Peer Performance: RAID 5 vs RAM Drive (DL360 Gen8, 384 GB RAM)

:test_tube: Test Setup

  • System: HP DL360 Gen8, 384 GB ECC RAM
  • CPU Load Gen: High-frequency search burst script (10–1000 Hz)
  • Tooling: Custom Python benchmark with randomized query load
  • YaCy Peers:
    • agent-asx β†’ running on RAID 5
    • agent-ramdrive β†’ running on tmpfs RAM disk

:gear: Observations

:satellite: agent-asx (RAID 5 backend)

  • CPU Usage: Peaks at 1600%
  • Response Times:
    • Min: 0.01 s
    • Max: 51.62 s (!)
    • Average: 0.82 s
  • Issues: Performance degraded under load, visible spikes in response time due to disk I/O bottlenecks (jbd2 activity + thread stalls)

:zap: agent-ramdrive (RAM-backed storage)

  • CPU Usage: Peaks at 909%
  • Response Times:
    • Min: 0.01 s
    • Max: 7.89 s
    • Average: 0.15 s
  • Result: Maintains low-latency searches even under extreme query pressure

:white_check_mark: Conclusion

Running YaCy on a RAM-backed temp store significantly improves query responsiveness and stability under high load. While RAID 5 can handle normal indexing workloads, it chokes under bursty traffic, introducing latency up to 50+ seconds.

:brain: RAM-backed deployments on high-memory systems like the DL360 G8 are ideal for:

  • Distributed P2P indexing
  • API front-ends
  • Latency-sensitive peer-to-peer workloads

cool! is the raid5 array made of ssds or hdds?

SSD’s 10 off

I am curious why your CPU usage is so high. I have it running on an i9-10980xe box (18 core) and off of a RAID1 array hard drives, and I can completely saturate a 1Gbit/s internet link if I allow it to and yet only consume less than 10% CPU. Given that I can completely saturate a 1Gb/s link with it on a hard drive RAID array, I haven’t seen much incentive to wear out ssd’s fast.

http://192.168.1.55:8055/yacysearch.json?query=random search string

I ask the above search at 1000 times a second eg 1 kHz.

Its only while my system is under extreme load that the values are so high but the .

My load tester in python

#!/usr/bin/env python3
import requests
import time
import random
import statistics
from concurrent.futures import ThreadPoolExecutor
#Use
# Configuration
YACY_PEERS = ["http://localhost:8090"]
FREQ_HZ = 10
BURST_DURATION_SEC = 10
BURST_INTERVAL_SEC = 10
TOTAL_BURSTS = 3
TIMEOUT = 120
MAX_THREADS = 195

# Sample search terms
QUERY_POOL = [
   "cloudparty", "nextcloud", "linux", "java", "ssd", "ramdisk", "fast upload",
   "yacy peer", "distributed index", "apache", "python3", "webdav", "cloudflare",
   "speedtest", "upload limit", "server benchmark", "proxy settings", "ipv6 ready",
   "peer to peer", "distributed computing","smokingwheels"
]

# Global list to hold all response times
response_times = []

def do_search(peer_url):
   query = random.choice(QUERY_POOL)
   start = time.time()
   try:
       r = requests.get(f"{peer_url}/yacysearch.json", params={"query": query}, timeout=TIMEOUT)
       elapsed = time.time() - start
       response_times.append(elapsed)
       if r.status_code == 200:
           print(f"[{peer_url}] OK ({elapsed:.2f}s) - {query}")
       else:
           print(f"[{peer_url}] Error {r.status_code} ({elapsed:.2f}s) - {query}")
   except Exception as e:
       elapsed = time.time() - start
       response_times.append(elapsed)
       print(f"[{peer_url}] Failed: {e} ({elapsed:.2f}s) - {query}")

def burst():
   with ThreadPoolExecutor(max_workers=MAX_THREADS) as executor:
       for _ in range(FREQ_HZ * BURST_DURATION_SEC):
           for peer in YACY_PEERS:
               executor.submit(do_search, peer)
           time.sleep(1 / FREQ_HZ)

def print_summary():
   if response_times:
       print("\nπŸ“Š Summary of Response Times")
       print(f"Total Requests: {len(response_times)}")
       print(f"Min: {min(response_times):.2f}s")
       print(f"Max: {max(response_times):.2f}s")
       print(f"Avg: {sum(response_times)/len(response_times):.2f}s")
       print(f"Std Dev: {statistics.stdev(response_times):.2f}s" if len(response_times) > 1 else "Std Dev: N/A")
   else:
       print("No responses recorded.")

if __name__ == "__main__":
   for i in range(TOTAL_BURSTS):
       print(f"🌩️ Burst {i+1}/{TOTAL_BURSTS}")
       burst()
       if i < TOTAL_BURSTS - 1:
           print(f"πŸ›Œ Cooling off for {BURST_INTERVAL_SEC}s...")
           time.sleep(BURST_INTERVAL_SEC)
   print_summary()

I would be interested in other peer’s results and specs we could build a table!

Ok makes sense, I thought crawling was what was loading it, didn’t consider the search.