How to activate and rank by CR - citation rank

How to activate and rank by CR - citation rank?

From the Github

There exists actually an attempt to compute the page rank in YaCy. The process is somewhat hidden and also disabled by default.

  • this is called CR - citation rank in the source code. Because pagerank is essentially a citation rank.
  • the process to collect backlinks is extremely time- and IO-intensive. It is computed in a so-called postprocess
  • postprocessing is part of the cleaning process and is only computed if a solr field process_sxt is activated. This is deactivated by default. Activating postprocessing can therefore be done by activating that field (can be done in the front-end)
  • the actual computation is done in yacy_search_server/source/net/yacy/search/schema/CollectionConfiguration.java at master · yacy/yacy_search_server · GitHub
  • the result is a backlink-counter which can then be part of ranking

The implementation can be considered as experimental-only. The result was, that a computation would increase the level of IO and CPU activity on the user side in such a great amount that the normal user would not accept the application any more. Therefore it was deactivated.

I would consider to run a new implementation of pagerank as process for YaCy Grid. “legacy” YaCy would not be the right place for such intensive computations. It’s too bad but you also have to consider user-acceptance.

You are free to activate the feature for experiments, I would love to get your input here.

How to activate and rank by CR - citation rank?

2 Likes

It could be activated at Index Administration > Index Sources & Targets > Web Structure Index. There are two options to check.

AFAIK, difference between “citation reference index” and “webgraph search index” is the way it’s stored (kelondro or solr).

Whether the function actually works, and how is that dependent on “postprocessing”, I’m not sure.

1 Like

I tried these options, but they didn’t affect the search results ranking. Maybe I need to do something else?

did you try to activate process_sxt in solr schema, as suggested here?

i disabled that in my case, the ‘postprocess’ was really a resource-eater and throwed a lot of errors.

1 Like