Peer running out of memory when overloaded with searching requests

:test_tube: YaCy Stress Testing & Optimization β€” 12-Month Benchmark Overview

:round_pushpin: Overview:

Over the past 12 months, I have conducted extensive stress testing on YaCy using various configurations and backend setups, including RAID 5, RAM drives, and JVM tuning. The goal was to evaluate response time under burst load conditions, identify potential memory leaks, and assess the impact of JVM updates on system stability.


:wrench: System Setup:

  • Server: HP DL360 Gen8
  • RAM: 384 GB ECC
  • Java: Amazon Corretto 21
  • Storage Backends:
    • RAID 5
    • RAMDisk (tmpfs for /var and /var/tmp)

:hammer_and_wrench: Issue Encountered:

  • During testing, the Java process crashed, leaving behind a diagnostic log.
  • Upon inspection, the crash appeared to be related to a A fatal error has been detected by the Java Runtime Environment:, causing memory parsing issues.
  • The issue was reported to Amazon Corretto 21 on GitHub:

Suggested Resolution: Update to Amazon Corretto 21.0.7.6-1.


:white_check_mark: Resolution & Testing:

After updating to 21.0.7.6-1, I ran several test scripts, including:

  • Python Burst Script: JSON API search at 1 kHz with 10s cooldown
  • QB64 Custom Program: High-frequency search testing at 15 Hz and 100 Hz

Findings:

  • Memory usage decreased slightly with the JVM update, likely due to improved garbage collection and heap management.
  • RAMDisk significantly improved response times, reducing disk I/O contention.
  • RAID 5 backend remained the primary source of latency spikes, particularly under 100 Hz sustained query load.

:package: Java Options:

To further optimize JVM performance, the following Java options were applied:

JAVA_ARGS="-server -Djava.awt.headless=true -Dfile.encoding=UTF-8"
JAVA_ARGS="-XX:+UseZGC -XX:ZUncommitDelay=60 -XX:+UseLargePages $JAVA_ARGS"
JAVA_ARGS="-XX:+PerfDisableSharedMem $JAVA_ARGS"

Explanation:

  • -XX:+UseZGC β€” Enables the Z Garbage Collector for rapid, low-latency memory management.
  • -XX:ZUncommitDelay=60 β€” Frees unused memory after 60 seconds of inactivity.
  • -XX:+UseLargePages β€” Reduces TLB misses, improving memory access efficiency.
  • -XX:+PerfDisableSharedMem β€” Disables unnecessary shared memory, minimizing contention.

:bar_chart: Benchmark Data & Analysis:


:framed_picture: Visuals:

  • After JVM Update

After starting 4000 queries 100 hz

At 1 min

note cpu avg 392

22 k searches Hz dropped

49 k searches 9 min

My stress tester pc Locked me out and took me to the logon screen. Ubuntu 24.04.

:mag: Next Steps:

  • Further testing with increased thread pool size and advanced GC tuning (ZGC with string deduplication).
  • Investigate potential I/O scheduler optimizations to reduce disk wait states under concurrent load.