YaCy Crawl Depth Question

stembod · 30 December 2020 13:25

EDIT: @isle See this: Clarification on crawling levels?

@transysthor
While i can’t in a meaningful manner explain crawl depth levels accurately (for all i know YaCy applies the set depth level every time it branches outside of the current domain and into a new one), i think i can perhaps shed some light on Rows to fetch at once etc…

If you go to Crawler Monitor, you’ll see solr search api mentioned; linking to something like e.g this:

https://peach.stembod.online:8443/solr/select?core=collection1&q=*:*&start=0&rows=3 (replace host and port as needed)
(this is also useful for testing any desired Auto Crawler solr query string, instead of default *:*. See links below for more.)

Number of rows, at least in most databases, usually refers to number of database entries to fetch. (think of it like a spreadsheet/table, having rows (and columns))

And as you can see, that request asks for 3 results (rows=3). (in combo with start=0 , i’m guessing would mean kind of like ‘fetch me the rows 0 to 3’)

so Rows to fetch at once (with the default setting at 100, and no/default *:* query) would mean

select?core=collection1&q=*:*&start=0&rows=100 (row 0 to 100)

And then, when Auto Crawler gets done with that set, it probably does

select?core=collection1&q=*:*&start=100&rows=100 (row 100 to 200)
, then
select?core=collection1&q=*:*&start=200&rows=100 (row 200 to 300)
, and so on…

And with Deepcrawl every set to the default 50 . It means that result 50,150,250 etc. … Would get set to be Deep crawled (default 3), while the others gets set to be done at Shallow depth crawl (default 2(?)) .

I’m guessing… And not sure in what way it deals with the various custom collections, e.g user …

In regards to the Query setting, i’ve found this useful:

in combination with looking at the fields present in yacy’s IndexSchema_p.html page