How to prevent crawling of disallowed path?

DusteDdk · 5 January 2024 14:05

I run a small webring ( geekring.net ) and I get rather a lot of traffic from yacy, and while I appreciate the indexing of the webring-related stuff, everything under /sites/ is disallowed because its simply redirection rules (accessing something under /sites/ don’t provide content, it simply redirects to a ring-member site)

I have the following robots.txt file:
User-agent: *
Crawl-delay: 5
Disallow: /site/
Sitemap: https://geekring.net/sitemap.xml

okybaca · 9 January 2024 10:57

Hi, DusteDdk and welcome!

And what is the problem then - that YaCy disrespects the robots.txt, right?

There is a mechanism of respecting robots.txt, for sure, and it cannot be switched off, I believe. So there must be some error. Can you reproduce with your own YaCy instance?