Hi Karel,
a) Yes you can use one (or more) external solr cores from here /IndexFederated_p.html
You can see documentation here Dev:Solr – YaCyWiki
I use yacy for linguistic research and I still do experiment in various setups, however I am very new user, so others may have better ideas. However I saw great improvement by disabling embedded solr and setting up a stand alone in the same machine. Setting a solr server proven to be very easy (under 10 minutes of reading and setting up), but since I do not have experience it might need some fine tuning later for your needs.
Please check the solr version that your version of yacy use and use the same.
b) Yacy is a crawler and the solr is the core search engine. Both can have very heavy IO operations. The embedded solr index is located under yacy/DATA/INDEX/[network]/SEGMENTS/solr_version, ie yacy/DATA/INDEX/webportal/SEGMENTS/solr_8_8_1/
By having the index in a separated drive, you can fully parallelize IO operations of index and crawler since they are in a different drive. You can do this by shutting down yacy, move solr_x_x to a different drive and make a symbolic link there ln -s /new/full/path/to/solr_x_x in order yacy to find it.
Bear in mind that a drive can have large bandwidth, but it can also have high I/O latency on heavy loads that can greatly affect performance on heavy operations such as large indexes and crawling. Even on SSD drives, latency can be a bottleneck in some situations.
Virtualbox is a good virtualizer for desktop, however I would suggest qemu-KVM for that kind of usage with virtio drivers in order to avoid SATA emulation. (I am not sure if virtio is supported by VirtualBox yet but it worth a try to give a look)
Use tools like iotop, htop, sysstat and the like to check for your bottlenecks and watch for your IO wait
Regarding memory, if it’s your bottleneck, Yacy in general is a well written architecture as far as I have seen so far, but crawling and indexing are both heavy operations and the internet is vast. I come from a C++ world but I write C# for a living, and I dont want to start a religious war here, but GC can greatly affect performance in heavy usage scenarios like gaming or crawling millions of sites. I see that YaCy, although is well written, it is affected by this and I dont think that it’s YaCy or solr fault. You might need to fine tune java GC parameters according to your needs. I havent test latest Orbiter fix for startup parameters yet