Crawler load que gets stuck downloading even though parser.extensions.deny= eg mp4

I’ve been adding extensions and restarting the server to clear the que.
My internet connection gets swamped and crawling slows down a bit.

From DATA/SETTINGS/yacy.conf line 313

parser.extensions.deny=iso,apk,dmg,aiff,flac,aifc,m4p,wma,wav,ogg,ra,mp4,mp3,m4b,m4a,rm,aif,zip,gz,rar,pdf,deb,dmg,exe,mov

Have also tried adding a dot to extensions eg .iso,.apk

I’m not sure, but I noticed, that YaCy somehow overwrites the config file while running. The safest way seems to edit the config while the instance is down and then start it again.

Hello, I can confirm this, it’s the same for me.
As a workaround, I use “Do not load URLs with an unsupported file extension” in the “Advanced Crawler”. Unfortunately the settings are also ignored by the “Autocrawler” and “shallow crawl”, so these features are useless for me, as the queue is very quickly clogged with mp3, mp4, tgz and so on.

2 Likes

I saw this in my pihole log, quite a few site’s were repeatedly asking for DNS resolution while I was crawling.

Not sure why?

Also I have been using my pihole to block sites that cause the loader to swap my internet connection.

In testing I have had the loader ques as low as 5 to try and limit DNS requests.
Note: You have to restart yacy to take effect!

I Had a problem after 11:00 to try and slow yacy’s DNS requests down. To no effect, even slowed crawler settings down as well.

Note: Ubuntu 22.04 desktop os was used.
I’m current running a test on a Debian 12 server and Yacy is working well.

I wrote a program in qb64 to test DNS server’s response time on random 4 char .com.
Just a shell to the dig command.

Screenshot from 2024-03-11 15-22-11