Yacy stops after some hours

lib20 · 17 April 2022 11:16

I have Yacy running v1.924_20210209 running on a FreeBSD host.
It starts fine, has already crawled some sites and is connected p2p.

But each time I start it, it stops after some hours and I can’t find anything relevant in the logs at DATA/LOG except maybe for a threaddump.txt file.
The threaddump.txt has mtime of 10h02
The last yacy00.log has mtime of 10h07

The threaddump.txt file contents is at https://termbin.com/psvr

What should I do?

lib20 · 17 April 2022 21:29

Started it again and it stopped after a few hours.
The last lines outputted were:

I 2022/04/17 17:42:16 STACKCRAWL URL 'https://github.com/flathub/org.onionshare.OnionShare' file extension is not supported and indexing of linked non-parsable documents is disabled.
I 2022/04/17 17:42:16 REJECTED https://github.com/flathub/org.onionshare.OnionShare - URL 'https://github.com/flathub/org.onionshare.OnionShare' file extension is not supported and indexing of linked non-parsable documents is disabled.
I 2022/04/17 17:42:16 REJECTED https://www.githubstatus.com/ - url does not match must-match filter (smb|ftp|https?)://(www.)?(\Qgithub.com\E.*)
I 2022/04/17 17:42:16 REJECTED https://github.githubassets.com/images/modules/profile/badge--acv-64.png - url does not match must-match filter (smb|ftp|https?)://(www.)?(\Qgithub.com\E.*)
I 2022/04/17 17:42:16 REJECTED https://docs.github.com/categories/setting-up-and-managing-your-github-profile - url does not match must-match filter (smb|ftp|https?)://(www.)?(\Qgithub.com\E.*)
I 2022/04/17 17:42:16 REJECTED https://education.github.com/ - url does not match must-match filter (smb|ftp|https?)://(www.)?(\Qgithub.com\E.*)
I 2022/04/17 17:42:16 REJECTED https://lab.github.com/ - url does not match must-match filter (smb|ftp|https?)://(www.)?(\Qgithub.com\E.*)
I 2022/04/17 17:42:16 REJECTED https://support.github.com/?tags=dotcom-footer - url does not match must-match filter (smb|ftp|https?)://(www.)?(\Qgithub.com\E.*)
I 2022/04/17 17:42:16 SWITCHBOARD Excluded 14 words in URL https://github.com/proletarius101
I 2022/04/17 17:42:16 Fulltext indexing: L6rZksS6MKD4 https://github.com/proletarius101
Killed

It exited with error code 137:

$ echo $?
137

Lumberjack · 18 April 2022 13:51

How many crawlers are you running? What does the System Status page report for memory usage?

lib20 · 18 April 2022 23:11

The first time it happened with only three or four crawlers.
At later times, about eight were running.

Today, there was some error output:

        at net.yacy.cora.federate.solr.connector.MirrorSolrConnector.add(MirrorSolrConnector.java:204)
        at net.yacy.search.index.Fulltext.putDocument(Fulltext.java:370)
        at net.yacy.search.index.Segment.putDocument(Segment.java:556)
        at net.yacy.search.index.Segment.storeDocument(Segment.java:639)
        at net.yacy.search.Switchboard.storeDocumentIndex(Switchboard.java:3468)
        at net.yacy.search.Switchboard.storeDocumentIndex(Switchboard.java:3382)
        at net.yacy.search.Switchboard$7.process(Switchboard.java:1058)
        at net.yacy.search.Switchboard$7.process(Switchboard.java:1054)
        at net.yacy.kelondro.workflow.InstantBlockingThread.job(InstantBlockingThread.java:72)
        at net.yacy.kelondro.workflow.AbstractBlockingThread.run(AbstractBlockingThread.java:82)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed
        at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:680)
        at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:694)
        at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1613)
        at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1608)
        at org.apache.solr.update.DirectUpdateHandler2.updateDocOrDocValues(DirectUpdateHandler2.java:969)
        at org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:341)
        at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:288)
        at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:235)
        ... 45 more
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded

The command invoked that was shown under ps is

yacy      5525 259.7 27.2 2969172 984388 15  INJ  22:39       124:20.38 /usr/local/openjdk8-jre/bin/java -Xms90m -Xmx600m -server -Djav

I

'l like to answer about memory, but it seems that I cannot start anymore the service.
When tried to start it, got

 >> YaCy started as daemon process. Administration at http://localhost:8090 <<
70_search@70_search:~/yacy $ W 2022/04/18 23:41:57 Cache file and metadata size is not equal, starting a cleanup thread...
W 2022/04/18 23:42:00 BROWSER System unknown
W 2022/04/18 23:42:09 YACY rejected bad yacy news record (3): attributes length (1026) exceeds maximum (974)
W 2022/04/18 23:42:40 YACY rejected bad yacy news record (3): attributes length (1026) exceeds maximum (974)
Control+C

p0lifariat · 22 April 2022 12:39

Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded

Looks like you ran out of memory. How much memory does your machine have?

lib20 · 22 April 2022 23:25

The server (VPS) has 3.5GB of RAM.

lib20 · 2 May 2022 20:47

Hi all.
So, no clues on ways to solve this issue?
Is this the end for my attempt to run yacy?

okybaca · 9 May 2022 21:19

Hi,
its a pleasure to have a fbsd fellow here!

YaCy is quite RAM hungry and, as far as i know, it’s by design. I struggle with RAM all the time. It occupies only the amount of RAM specified in “Maximum Used Memory”, so increasing this value will help. Sometimes also ‘Database Optimisation’ helps, but it takes some time to run.

Solution I’m thinking of is to use external solr (on the same machine), which should help, but I didn’t have time to do so yet.

I have experimented with YaCy for about a year and half now, time to time so angry, I think I’ll leave it, but the concept seems good and I didn’t find any similar project. Implementation is lazy, resources hungry and buggy, but it still make sense for me. So let’s report bugs and push the developers to fix them.

Are you a man of Java?

nhaas · 13 May 2022 16:01

I am starting to have the same issue. works great, gets crawling done without issue. I started to have the problem with system stopping when I activated the Process Scheduler. I had 7 re-crawls activated at different hours of the night. Last night I took it down to 2 process’s, Yacy hung again. I an going to turn all processes off to see what happens tonight.

lib20 · 27 May 2022 17:22

Hi @okybaca,

Thank you for your message.

No, I’m not a Java man, never liked Java because it’s so demanding in resources. And I’ve only seen a few Java apps that really worked well.
Go would make a very good replacement for the project.

From the web page you’ve cited, there’s this:

Why is this not done by default?

YaCy wants to be nice to the average computer user and their systems. Modern computers have 512MB RAM or more. We believe that 96MB for YaCy as default is a good tradeoff between performance and resource allocation.

Well, in the Memory reserved for JVM there was 600MB, there’s 900MB now. I have plenty of resources in my VPS, if another one was another Java one it couldn’t run.
Java is great for hardware sellers and consulting companies charging by the hour.

Let’s see how the modification turns out.

lib20 · 27 May 2022 21:56

It stopped again with 50% more maximum RAM.

okybaca · 16 September 2022 08:46

recently, out of frustration, i bought 32GB of ram and put it into yacy machine. yacy works really nice, now.
well… java.
probably not a solution for everyone, but handful of ram really helps the yacy speed and responsiveness.

ScRe · 17 September 2022 13:20

Yes, I can confirm. My dedicated machine now has 64 GB RAM, reserved 56GB for YaCy:

robert-winkler · 27 December 2023 14:10

Hi, I had the same problem. My VPS has got 2GiB RAM and is running also a friendica instance. My solution was:

# crontab -e

* * * * * /bin/bash /usr/local/yacy_search_server/startYACY.sh

up to now, it seems to work and also survives reboots .

okybaca · 10 January 2024 12:46

there is even script restartYACY.sh, which stops and starts the instance again. i use it from cron, every day, since the instance tended to freeze after few days.

roamn · 21 January 2024 02:59

Just out of interest.
What size swap file are you all running?

I usually have yacy running for 15 days ok.
You can reuse the existing settings to just make the swap file increase in size.

sudo swapoff -a
sudo dd if=/dev/zero of=/swapfile count=20 bs=1G
sudo mkswap /swapfile
sudo chmod 0600 /swapfile
sudo swapon /swapfile
swapon --show

Nanook · 13 May 2024 02:43

Try increasing the memory, I found it unreliable if set to anything less than the max. I’ve got mine on a 128gb virtual machine on a physical box with 256gb of RAM.