Understanding YaCy

Are there any docuemnts to learn and understand YaCy ? The videos are pretty fine to reproduce some things, and some pages of the wiki is missing …
How can I get a deeper understanding on the principles of SE and ist there a documentation explaining the software point by point ?

1 Like

This 16-year old project has a long history of “please explain everything” - Requests. I understand that things are confusing, but where should anyone start? For example: I was once an assistant teacher for the lecture “Information Retrieval”. We could start with a one semester lecture and after that we did not even touch the basics of YaCy because we learned only theoretical basics. I doubt you want that.

Its the same with linux, you never ask “please explain everything”. Too much. The better choice is an adventure where you peek around and try to understand pieces here and there. Please have a look around in this forum, in the FAQ of yacy.net and then try out some things yourself.

1 Like

To be perfectly honest, I love YaCy, and digging into it and exploring all it’s mysteries is rewarding at times, but the lack of clear, consistent, thorough step by step documentation, all in one place can be very frustrating.

For example, my recent post covering or addressing how to use the “site:” search.to instruct Yacy to search the content of just one website or domain.

The instructions I stumbled upon here: https://wiki.yacy.net/index.php/En:SearchParameters didn’t seem to work. About a year ago I spent untold hours struggling, using trial and error to try to figure out this and probably half a dozen other recondite aspects of the program.

YaCy is a GIFT to the world, of, IMO, untold value and importance, and it is vital that more people learn how to use it, if we would have a free internet. By free, I do not necessarily mean without cost, I mean “by the people for the people” but how can that be, if “the people” give up, because they can’t understand how to use it.

It’s fine to say, you shouldn’t be using a search engine without understanding it. Great. But some of the inner workings of YaCy are so esoteric, I’m not sure there is more than one person on the planet who really knows what all is inside.

Language is an issue. Some time back I spent hours maticulusly trying to translate some of the available documentation from German to English, as that was mostly all I could find.

These aren’t really criticisms, but praise and appreciation. People love the IDEA of YaCy, I read that in review after review, but people just aren’t really able to use it due to the complexity and lack of understanding.

The developers are too busy to spend all their time explaining what may be obvious to them, over and over and over again, busy on the cutting edge, developing something new. And most people like myself, are too busy with the struggles of day to day life to figure it out by trial and error or in depth research with little guidance.

People sincerely want “a deeper understanding”.

I have tried, and would be happy to write documentation, but I don’t understand half of what YaCy can do myself, and only wish I had the freedom to spend more time trying to figure it all out.

I suppose if I knew Java, it would all be transparent. Just look at the source code, but that isn’t really true either. How many other things have been incorporated into YaCy that require in depth study? A lot.

If someone competent enough with YaCy could work on documentation, I would be glad to contribute financially to such a project.


I totally agree. This is such a fascinating project but too hard to understand and make it work. One of the reasons behind evil corpotations is that their services are easy to use while one has to have a degree in computer science to use many of the alternatives (a bit exaggerated, but you get my point). Consequence: surveillance capitalism is winning, the decay of the open net continues

In my case and for many others it would be helpful to have simple instructions om

  1. How to set up your own search engine for your website
    1.1. How to add the domain(s) you would like to be crawled
    1.2. How to exclude folders from site search
    1.3. How to automate crawling
    1.4. How to edit what I just entered (site search settings)

  2. Regular expressions: Please provide examples

I have now spent many hours trying to set this up, some things work now, but many don’t. I tried to excude folders from search and rerun the indexer without success. Regular expressions remain a mystery even after some googling.

1 Like

Running a search engine is not trivial, but YaCy is AFAIK the only package available, which runs out-of-the-box within 5 minutes and contains almost all features you need.

I recommend to setup a “virgin” public P2P instance to play around and a second one (or more) as soon as you understand what you are doing.

1.1. - Either enter some domains into the textarea or (what I prefer): Maintain lists of starting points as text files. I therefore always have a subdir “starturls” where I keep the files.

1.2 ?? What do you mean? Search syntax like “not containing something”?

1.3. Menu “Crawler Monitor”->> (upper right)
All crawls are saved and listed. The last 2 colums can be edited for scheduling a crawl again.

1.4 Same table left side. You cannot edit the record, but copy it into a new one.

  1. Useless for crawls. Weird non regex standard syntax for blacklists. Good starting point to play: https://regex101.com/

I stopped messing around with blacklists. Now i kick out spammers from time to time by simply deleting stuff from the index or better: Tell the crawls in more detail what you want, restrict to dedicated domains and crawl the levels separately.

1 Like

Oh wow, I am so glad I found this thread. I thought it was just me that found YaCy too much to understand and so esoteric. Yes it is fun figuring all this out but also frustrating. I honestly believe I will contribute something to the YaCy project, one day even if it’s just a page in a wiki about what I discovered about how to make YaCy work my way.
It has made me realise that in order to make YaCy really work it needs people to add to the project in a significant way. I am no genius programmer. I would say my talents in programming are mediocre at best, and even though the brain is willing, the clock is ticking. I’m not fortunate enough to be a genius researcher and not in a situation where I can help make YaCy better with programming . What I am thinking though is could I just script a front-end search page that interacts with YaCY?

What I’m trying to say is
I think documentation would be too much work for those who develop YaCy and experiment with all the high level stuff. Have you ever come home from 8 hours work in an office and tried to write a book or learn the guitar to a world class standard? It must be like a great programmer/computer-science-researcher going home and trying to write the most coherent documentation of their program ever. Their research isn’t the problem.
I think it’s up to users to seek the help of the creators and researchers but document for ourselves and share our own knowledge of YaCy with other YaCy users
It’s why I love the DD-WRT project, they sort of have that going on at their web space

1 Like

Горячо присоединяюсь к просьбам!
Подробное руководство СОВЕРШЕННО НЕОБХОДИМО!

I totally see where you are coming from and truly respect your viewpoint and see the truth of it, but I also have to disagree with a subtle aspect of it.

Many, many topics are the summation of years, even decades of learning - but that doesn’t imply there’s no way to sort of refine the field into smaller and simpler terms, for a beginner, as stepping stones, so they can learn more cumulatively over a long time. In fact, it’s very beneficial to have a simplified place to start so you can get a handhold to progress to the higher topics. If you go directly to the advanced level, there can be too many unknown concepts all at once for graceful development and progress.

I would actually like to try to explain YaCy in simpler terms, so I’m trying to do some research on those questions right now myself.

As a sum-up: we need documentation.
Possibly it could be crowd-written by experienced users, using forums etc. and reviewed or edited by the developers.
As @Orbiter wrote before, he’s going to abandon the wiki. (why?)

So what is the place where users are able to contribute? What is the most feasible way to do it?
To edit the docs in faq using github?

Is the structure of docs section good? Shouldn’t it be included /linked in the installation of instances as well? What are the instructions for contribution? How to create new sections etc? Why the other pages (operation section) are not linked? Will the documentation have some index or would it be just FAQ with links?

What sort of documentation we need the most? Quickstart guide? Technology in-deep description?

I was once an assistant teacher for the lecture “Information Retrieval”. We could start with a one semester lecture and after that we did not even touch the basics of YaCy because we learned only theoretical basics.

do you have some recordings / slides / papers? actually that is interesting!

simplification of the enviroment. One more tool that does not need maintenance. The YaCy homepage is driven by https://www.mkdocs.org/ which is actually made for project documentation.

ok, but why a bunch of user-contributed fixes is not merged for almost a year?

Yes, thats true. I pulled some of these contributions now but unfortunately CI/CD is broken because of a server fault. Will fix this soon.

Yes, here is the place where I publish almost all of my talks: Index of /material

The lecture paper is here (sorry, some parts are in german, some are in english): https://yacy.net/material/SIM-IR-SS15-MichaelChristen-Introduction_Information_Retrieval-20150512.pdf

A scientific paper about the YaCy network is also here: https://yacy.net/material/Description_of_the_YaCy_Distributed_Web_Search_Engine_Herrmann_Ning_Diaz_Preneel_ESAT_KULeuven_COSIC_article-2459.pdf

1 Like

Great, thanks, Orbiter, that’s really inspiring!