Database full of website urls

Hi , can you force Yacy to access a URL database of 25,000 websites that have been currated?

1 Like

Easy, just convert those 25000 Websites to a text-based URL list, one URL per line, and paste that list inside the Crawl Start url-window. I just tried this some weeks ago.

It will take about one hour until YaCy has ingested that huge bunch of URLs in the Crawl Start, but it works.

You can adjust the crawl start with some proper or crazy settings, like (proper) craw depth = 0 to only index the given urls. Or differently if you want to have a full crawl of any depth for each of thos 25000 URLs.

2 Likes

Let us know how long it takes. I am interested.

1 Like

Do you know if you can push into the URL WINDOW from a database and update the pasted crawl list. IE remove and add??

Thank you

1 Like