Crawl regex and auto crawl

koks · 18 May 2020 13:37

1st question

how to tell yacy to crawl only these pages

http://example.com/*/abc/*.html

2nd question

can yacy do a crawl automatically every day

koks · 23 May 2020 09:21

3rd question

how to exclude <div id="someDiv"> in some.html from search results

Orbiter · 4 June 2020 12:08

The crawl start provides fields for regular expressions to include or exclude urls.
Then there is the crawl scheduler where you can reschedule a crawl on a daily basis.

zooom · 16 August 2020 12:29

This is not a regex.

Maybe you should try http:\/\/example\.com\/.+\/abc.+\.html
I by myself am never sure which syntax to follow as regex are mixed sometimes (e.g. blacklist) with file globs (stuff with *)