(Newbie help) Want to index only specific websites

In my setup, I use bookmarks exported from browser as a feed for YaCy indexer. Then I got indexed everything i bookmarked.

In this setup, I use Crawl Depth of 1 (only links I bookmark are indexed), but you can use Depth of 2 and all the links from bookmarked pages would be indexed as well.

When I know, I’ll be interesed in some site in general, I just crawl the whole domain. It may be lenghty in some cases (small blog is in average several thousands of pages, small magazine is like 20 000 to 100 000, New York Times is around 15 000 000 with the whole archive).

There is also a function of “Heuristic” in search-result: (Search Portal Integration> Ranking and Heuristic > Heuristic > shallow crawl on all displayed search results = yes). With this settings, after searching and showing the results, all links from resulting pages are crawled.
I personally don’t use this function, because of garbage and performance, but could be of use for your case.

Yeah, that’s definitely possible: First Steps > Use Case & Account > Search portal for your own pages.

1 Like