Understanding YaCy

To be perfectly honest, I love YaCy, and digging into it and exploring all it’s mysteries is rewarding at times, but the lack of clear, consistent, thorough step by step documentation, all in one place can be very frustrating.

For example, my recent post covering or addressing how to use the “site:” search.to instruct Yacy to search the content of just one website or domain.

The instructions I stumbled upon here: https://wiki.yacy.net/index.php/En:SearchParameters didn’t seem to work. About a year ago I spent untold hours struggling, using trial and error to try to figure out this and probably half a dozen other recondite aspects of the program.

YaCy is a GIFT to the world, of, IMO, untold value and importance, and it is vital that more people learn how to use it, if we would have a free internet. By free, I do not necessarily mean without cost, I mean “by the people for the people” but how can that be, if “the people” give up, because they can’t understand how to use it.

It’s fine to say, you shouldn’t be using a search engine without understanding it. Great. But some of the inner workings of YaCy are so esoteric, I’m not sure there is more than one person on the planet who really knows what all is inside.

Language is an issue. Some time back I spent hours maticulusly trying to translate some of the available documentation from German to English, as that was mostly all I could find.

These aren’t really criticisms, but praise and appreciation. People love the IDEA of YaCy, I read that in review after review, but people just aren’t really able to use it due to the complexity and lack of understanding.

The developers are too busy to spend all their time explaining what may be obvious to them, over and over and over again, busy on the cutting edge, developing something new. And most people like myself, are too busy with the struggles of day to day life to figure it out by trial and error or in depth research with little guidance.

People sincerely want “a deeper understanding”.

I have tried, and would be happy to write documentation, but I don’t understand half of what YaCy can do myself, and only wish I had the freedom to spend more time trying to figure it all out.

I suppose if I knew Java, it would all be transparent. Just look at the source code, but that isn’t really true either. How many other things have been incorporated into YaCy that require in depth study? A lot.

If someone competent enough with YaCy could work on documentation, I would be glad to contribute financially to such a project.

6 Likes

I totally agree. This is such a fascinating project but too hard to understand and make it work. One of the reasons behind evil corpotations is that their services are easy to use while one has to have a degree in computer science to use many of the alternatives (a bit exaggerated, but you get my point). Consequence: surveillance capitalism is winning, the decay of the open net continues

In my case and for many others it would be helpful to have simple instructions om

  1. How to set up your own search engine for your website
    1.1. How to add the domain(s) you would like to be crawled
    1.2. How to exclude folders from site search
    1.3. How to automate crawling
    1.4. How to edit what I just entered (site search settings)

  2. Regular expressions: Please provide examples

I have now spent many hours trying to set this up, some things work now, but many don’t. I tried to excude folders from search and rerun the indexer without success. Regular expressions remain a mystery even after some googling.

2 Likes

Running a search engine is not trivial, but YaCy is AFAIK the only package available, which runs out-of-the-box within 5 minutes and contains almost all features you need.

I recommend to setup a “virgin” public P2P instance to play around and a second one (or more) as soon as you understand what you are doing.

1.1. http://127.0.0.1:8090/CrawlStartExpert.html - Either enter some domains into the textarea or (what I prefer): Maintain lists of starting points as text files. I therefore always have a subdir “starturls” where I keep the files.

1.2 ?? What do you mean? Search syntax like “not containing something”?

1.3. Menu “Crawler Monitor”->> http://127.0.0.1:8090//CrawlProfileEditor_p.html (upper right)
All crawls are saved and listed. The last 2 colums can be edited for scheduling a crawl again.

1.4 Same table left side. You cannot edit the record, but copy it into a new one.

  1. Useless for crawls. Weird non regex standard syntax for blacklists. Good starting point to play: https://regex101.com/

I stopped messing around with blacklists. Now i kick out spammers from time to time by simply deleting stuff from the index or better: Tell the crawls in more detail what you want, restrict to dedicated domains and crawl the levels separately.

1 Like

Oh wow, I am so glad I found this thread. I thought it was just me that found YaCy too much to understand and so esoteric. Yes it is fun figuring all this out but also frustrating. I honestly believe I will contribute something to the YaCy project, one day even if it’s just a page in a wiki about what I discovered about how to make YaCy work my way.
It has made me realise that in order to make YaCy really work it needs people to add to the project in a significant way. I am no genius programmer. I would say my talents in programming are mediocre at best, and even though the brain is willing, the clock is ticking. I’m not fortunate enough to be a genius researcher and not in a situation where I can help make YaCy better with programming . What I am thinking though is could I just script a front-end search page that interacts with YaCY?

What I’m trying to say is
I think documentation would be too much work for those who develop YaCy and experiment with all the high level stuff. Have you ever come home from 8 hours work in an office and tried to write a book or learn the guitar to a world class standard? It must be like a great programmer/computer-science-researcher going home and trying to write the most coherent documentation of their program ever. Their research isn’t the problem.
I think it’s up to users to seek the help of the creators and researchers but document for ourselves and share our own knowledge of YaCy with other YaCy users
It’s why I love the DD-WRT project, they sort of have that going on at their web space

2 Likes

Горячо присоединяюсь к просьбам!
Подробное руководство СОВЕРШЕННО НЕОБХОДИМО!

I totally see where you are coming from and truly respect your viewpoint and see the truth of it, but I also have to disagree with a subtle aspect of it.

Many, many topics are the summation of years, even decades of learning - but that doesn’t imply there’s no way to sort of refine the field into smaller and simpler terms, for a beginner, as stepping stones, so they can learn more cumulatively over a long time. In fact, it’s very beneficial to have a simplified place to start so you can get a handhold to progress to the higher topics. If you go directly to the advanced level, there can be too many unknown concepts all at once for graceful development and progress.

I would actually like to try to explain YaCy in simpler terms, so I’m trying to do some research on those questions right now myself.

As a sum-up: we need documentation.
Possibly it could be crowd-written by experienced users, using forums etc. and reviewed or edited by the developers.
As @Orbiter wrote before, he’s going to abandon the wiki. (why?)

So what is the place where users are able to contribute? What is the most feasible way to do it?
To edit the docs in faq using github?

Is the structure of docs section good? Shouldn’t it be included /linked in the installation of instances as well? What are the instructions for contribution? How to create new sections etc? Why the other pages (operation section) are not linked? Will the documentation have some index or would it be just FAQ with links?

What sort of documentation we need the most? Quickstart guide? Technology in-deep description?

I was once an assistant teacher for the lecture “Information Retrieval”. We could start with a one semester lecture and after that we did not even touch the basics of YaCy because we learned only theoretical basics.

do you have some recordings / slides / papers? actually that is interesting!

simplification of the enviroment. One more tool that does not need maintenance. The YaCy homepage is driven by https://www.mkdocs.org/ which is actually made for project documentation.

ok, but why a bunch of user-contributed fixes is not merged for almost a year?

Yes, thats true. I pulled some of these contributions now but unfortunately CI/CD is broken because of a server fault. Will fix this soon.

Yes, here is the place where I publish almost all of my talks: Index of /material

The lecture paper is here (sorry, some parts are in german, some are in english): https://yacy.net/material/SIM-IR-SS15-MichaelChristen-Introduction_Information_Retrieval-20150512.pdf

A scientific paper about the YaCy network is also here: https://yacy.net/material/Description_of_the_YaCy_Distributed_Web_Search_Engine_Herrmann_Ning_Diaz_Preneel_ESAT_KULeuven_COSIC_article-2459.pdf

1 Like

Great, thanks, Orbiter, that’s really inspiring!

Это основательный академический подход. Но мы же не собираемся готовить академиков. Мы хотим объяснить пользователю, как работать с инструментом.

Лично я хотел бы видеть руководство следующего плана.

  1. Для кого предназначено.
    Буквально пару строк: на какие знания читателя рассчитывает автор. Потому что одинаково трудно читать текст, написанный для профессоров, как и пособие в стиле для дебилов.

  2. Вводная часть.
    Принципы работы и идеология построения системы. Не теоретический курс по информационным системам вообще, а конкретно по Yacy и Solr: о чем должен обязательно иметь понимание пользователь, чтобы не блуждать наугад в управлении системой.
    Непременное правило: при первом использовании каждого специального термина, должно следовать однократное разъяснение: что этот термин означает и в каком контексте будет далее употребляться в руководстве. Разъяснение сразу же в тексте или подстраничном комментарии, а не где-то там в конце в словаре, где и не ленивый не всегда разыщет.

  3. Объяснение административного интерфейса YaCy.
    По КАЖДОМУ параметру должно быть следующее:

  • место расположения параметра в меню (иначе трудно найти!)
  • диапазон допустимых значений
  • что делает этот параметр
  • на что влияет этот параметр (зависимости)
  • для каких нужд этот параметр введен авторами в интерфейс
  • как этот параметр установить оптимально, разъяснение и примеры.
  • когда лучше не изменять этот параметр
  • примеры неправильной установки и к чему это приводит.

По каждому пункту ожидается по возможности развернутое разъяснение.

Параметры, недоступные для изменения (индицирующие), тоже должны быть разъяснены: что они показывают и о чем говорят те или иные их значения.

  1. Файлы, хранящие настройки и данные.
  • Файлы настроек. Расположение, назначение, формат. (развернуто)
  • Файлы данных. Расположение, назначение, каким режимам работы принадлежат. (обзор)
  • другие важные файлы, про которые полезно знать.
  1. Особенности режимов работы YaCy.
    Как оптимально настроить YaCy на работу в Интрасети, на поиск по диску одного компьютера, на индексацию избранных сайтов, на глобальное индексирование в Интернете, и др. Каковы важные нюансы для каждого из режимов.
    Настройки для работы на слабом ресурсе, настройки для мощного узла.
    Особенности установки на удаленном сервере, на персональном компьютере и т.п.
    Установка и работа под различными операционными системами, нюансы связанные с этим.
    Рекомендации по индексированию и администрированию в целом.

  2. Исключительные ситуации.
    Разбор особых, аварийных и ошибочных ситуаций, которые могут возникнуть, и что с этим делать, как бороться.

  3. Как искать.
    Полное описание парамеров поиска. Эффективные приемы поиска через YaCy.

  4. Примеры.
    Подборка конкретных прописей, практических рекомендаций по наилучшим настройкам. Описание некоторых поучительных случаев, имевших место у пользователей.

  5. Приложения.
    Разъяснение языка регулярных выражений и их использования, примеры.
    Справочные сведения.
    Другие сопутствующие полезные материалы.

Зная объем административного интерфейса YaCy, я понимаю, что это запрос на хорошую книгу. Но это - необходимый абсолютный минимум того, что нужно пользователю. Я просто перечислил всё, чего я про YaCy не знаю, и хотел бы узнать.

Если такое руководство будет написано, я возьму на себя труд перевести его на русский язык бесплатно.

Hi,
I tried to start fixing the documentation. I wrote the contribute info (basically for myself, to know, how to contribute) and tried to sort of reorganise the faq in a more logical way, and aded some questions out of forum as well.

Unfortunately, my pull request still hangs there, as well as many typo fixes from others, and a restructured download and installation by global667.
Some of them are more than year old, and although the developers are calling volunteers to help with the docs, PRs are never comitted – which is dicouraging and demotivating.

What should be done with the docs now? What is necessary to do and what would be helpful from volunteers? Or should we start the work on some fork or what?

oh wow, I’m sorry for not reviewing this. Your pull request is very helpfull!
I pulled it now!

1 Like

Thank you!

What should be done with the docs now? What is necessary to do and what would be helpful from volunteers?

Hi there okybaca,
is there anything I can support your works on the documentation ?

1 Like

Hi T!

Glad to hear!

I just started to improve the docs, because I’m not a Java developer, and this is the only thing I could help here. Fortunately, the PRs are commited quite fast now, thanks, @Orbiter!

I think you can start anywhere you wish:

  • write about a specific thing you explored and understand
  • go through the docs and find bugs, typos or missing links
  • go through the forum and legacy wiki and put already solved questions to FAQ
  • if you understand german, translate from german wiki, which is probably more information rich

Yesterday, I started a ‘Getting started’ section of FAQ, trying to mimick the perspective of somebody who just started with YaCy and provide all the informaton, beginner would need. It’s just really elementary now, and could be expanded.

Just fork the docs repository, improve, and push the changes. I tried to write a contribution guide, primarily for myself, as I’m a github newbie, which may be of some help (and could be improved eventually).

Thanks for contributing!

And a question for all in the forum: what do we miss the most in the docs? What was the most difficult thing to understand, and what did you find out?

I messed with the same thing. Would you share, what did you found about search parameters?