I’ve been trying to mass add RSS feeds to yacy’s via the API no luck yet.
I got a web search to work after plugging in the example to ChatGPT https://yacy.net/api/crawler/
I changed “range to domain” it was set to “wide”
curl -G "http://localhost:8090/Crawler_p.html" \
--data-urlencode "crawlingDomMaxPages=10000" \
--data-urlencode "range=domain" \
--data-urlencode "intention=" \
--data-urlencode "sitemapURL=" \
--data-urlencode "crawlingQ=on" \
--data-urlencode "crawlingMode=url" \
--data-urlencode "crawlingURL=https://community.searchlab.eu/" \
--data-urlencode "crawlingFile=" \
--data-urlencode "mustnotmatch=" \
--data-urlencode "crawlingFile$file=" \
--data-urlencode "crawlingstart=Neuen Crawl starten" \
--data-urlencode "mustmatch=.*" \
--data-urlencode "createBookmark=on" \
--data-urlencode "bookmarkFolder=/crawlStart" \
--data-urlencode "xsstopw=on" \
--data-urlencode "indexMedia=on" \
--data-urlencode "crawlingIfOlderUnit=hour" \
--data-urlencode "cachePolicy=iffresh" \
--data-urlencode "indexText=on" \
--data-urlencode "crawlingIfOlderCheck=on" \
--data-urlencode "bookmarkTitle=" \
--data-urlencode "crawlingDomFilterDepth=1" \
--data-urlencode "crawlingDomFilterCheck=on" \
--data-urlencode "crawlingIfOlderNumber=1" \
--data-urlencode "crawlingDepth=4"
A way to check your lists to crawl.
cat feeds.csv | xargs -n 1 -P 10 -I {} bash -c 'curl -s -o /dev/null -w "%{http_code}" {} && echo {} >> valid_urls.txt || echo {} >> invalid_urls.txt'