Solr error: number of documents in the index cannot exceed 2147483519

When using external solr on a local machine, certain limit was probably reached and crawling couldn’t continue. error from yacy log:

request: http://127.0.0.1:8983/solr/webgraph/update?wt=javabin&version=2 
Remote error message: Exception writing document id DIPzuvxSD0j4WPPTvvxSD0j4cb97ff6f to the index; possible analysis error: number
 of documents in the index cannot exceed 2147483519 
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) 
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) 
        at java.base/java.lang.Thread.run(Thread.java:833) 
E 2023/01/12 12:38:58 org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient error

is this behavior expected? is there a bug? how to solve or workaround?

It is probably a feature of solr, that number of documents cannot exceed 2147483519. Which, in the case of webgraph is done quite easily.

Possibly, it might by solvable by splitshard, as described here. But in that case, we need to use solr in ‘cloud mode’, is that so?

@orbiter, what is the exact difference between using ‘webgraph’ and and ‘citation reference’? I couldn’t find anywhere in the documentation…

I use an external solr (so the db process is independent on yacy and doesn’t fill the ram and other hardware constraints so easily), is that a reason? Does it happen in embeded solr as well? Did someone else meet this bug as well? What could be a solution? Could some sort of automatic split help?

“webgraph” is an index in solr, “citation reference” is a built-in data structure in YaCy

Thanks for reply! You described the difference in storage, but, to be more specific, what is the difference in function? What is the impact on sorting of results? Is it worth more to use “webgraph” or “citation reference”?

Has anyone tried java developer 23?
Just what I picked up on twitter.
Or is this way off base?