Errors list of YaCy

Sviatoslav · 29 March 2023 05:08

Исполнился год с того момента, когда я приступил к запуску узла YaCy.
Вот список ошибок, или того, что я принял за ошибки в силу своего незнания или неумения правильно настроить.
Но мне простительно, поскольку никакой подробной информации по YaCy не существует, обучиться негде.
Возможно, что в новых версиях, вышедших за это время, некоторые ошибки уже исправлены, но я не имел сил и возможностей успеть проверить новые версии.
Всё ниже описанное относится к ver. 1.924/10079 работающей под Windows XP, 1GHz, 1Gb.
Для YaCy было выделено 870Mb RAM.
Большинство настроек оставлены по умолчанию.

Самая худшая трудность заключалась в быстром переполнении памяти. Мне так и не удалось решить эту проблему путем настройки YaCy. Пришлось написать собственную утилиту, сбрасывающую память когда это требуется. (Об этом я уже давал тему, которая никого не заинтересовала).

Итак, наблюдал следующие ошибки и глюки:

При достижении максимума памяти останавливается индексатор (а не только DHT).
По случаю исчерпания памяти: после очистки индекса, освобождения памяти и перезапуска ядра Solr, интерфейс все равно продолжал тормозить очень сильно - до полного перезапуска YaCy.
После исчерпания памяти или лимита дискового пространства, YaCy не могла перезагрузиться - Не стартует, пока файлы индекса не будут удалены в ручную.
Явление наблюдалось при излишне большом индексе. Опыт показал, что при выделенных 870Mb RAM, YaCu не может обрабатывать существенно больше 10Gb данных (хотя диск вмещает 20Gb). Поэтому дисковая квота для YaCy была сокращена до 8Gb.
Выдано сообщение “Свободной памяти меньше, чем 27 MB. DHT-in отключен. Пожалуйста, исправьте. Потребуется перезапуск YaCy.”
На самом деле остановлен краулер, а DHT продолжается!
Состояние исчерпанности памяти не отображено в /api/status_p.xml
Почему при переполнении памяти пользователь должен нажимать появляющуюся кнопку “Сбросить состояние”? Программа самостоятельно не может нажать эту кнопку, когда она появилась?
При периодическом сбросе переполнений памяти YaCy вошла в следующее состояние: интерфейс доступен, краулеры остановлены, команда очистки индекса не исполняется, кривая графика использования памяти везде равна точному нулю, но в списке общий расход памяти отображается правдоподобно.
Состояние сохранялось до перезапуска.
Нужен параметр “максимальный размер очереди” или автоматическое ограничение очереди при приближении расхода памяти к предельному.
Независимо от того, сколько памяти оставлено в запас, на некоторых сайтах всё равно происходит переполнение памяти и интерфейс зависает.
Перегрузка кэша. Установлен размер кэша слов 9К, и в целом он балансирует в пределах 10К, но появляются отдельные пики, доходящие до 70К. Впрочем, каких-либо сбоев это не вызывает.
Несовпадение масштабов графиков Network History.
“Count of all Active Peers Per Day” в среднем в районе 390, а “Count of all Active Peers Per Week” за этот день - в районе 630. А на графике “Count of all Active Peers Per Month” этот же день приходится около 1400.
Форма кривой на недельном и месячном графике подобна: с учетом нужного растяжения по горизонтали.
А на годовом - ход кривой не совпадает с месячным, ход кривых в принципе различен.
Неправильные даты столбчатой диаграммы внизу страницы Index Browser. На странице результатов поиска даты диаграммы тоже неправильные.
Ошибка с отрисовкой графов. Рисунок не появляется. Удаление из индекса не работает. Ошибка наблюдалась после авторегуляции при заполнении дисковой квоты, или при большом индексе.
Ошибка сбрасывается после перезагрузки.
“Удаление ошибок загрузки” периодически не работает.
Удаление индекса в этом состоянии вообще не работает, требуется перезагрузка или перезапуск ядра Solr.
Ошибка не происходит, если не установлен флажок Авторегулирования.
Index Browser не показывает список хостов. Состояние возникает (предположительно) во время исчерпания памяти или дисковой квоты и продолжается после освобождения памяти, до перезагрузки.
Во время этого сбоя диаграмма в Application Status также не отображает изменения размера индекса, нарушается также работа Планировщика с индексом.
Похоже, некорректно работает авторегуляция.
Причем сбой происходит не сразу, сначала авторегуляция несколько раз срабатывает нормально.
В индексе 708 документов. Занимают на диске 4.42Гб (по индикации в Status.html) после оптимизации базы и полного рестарта.
Ссылки показываются в Просмотре индекса, но режимы на странице IndexDeletion_p.html не находят документов для удаления (0 документов), Работает только Удаление по возрасту. После удаления и оптимизации, занятое на диске место заметно не уменьшилось.
Состояние произошло предположительно в результате использования Авторегуляции.
Затем было удалено в ручную около 4 тыс файлов в каталоге DATA/INDEX/freeworld/SEGMENTS/default
При запуске краулера через “Расширенную индикацию” не работает авторегуляция использования диска. Вместо освобождения дискового пространства краулер просто отключается. (Не paused, а останавливается совсем).
Суммарное впечатление: функция авторегуляции неработоспособна, ее использование ломает базу индекса.
IndexControlURLs_p.html, Statistics about the top-100 domains in the database:
“delete all” - адрес убирается из списка только после второго нажатия.
По истечении 14 дней индекс становится неактивен, поиск по нему ничего не находит, но эта потеря активности никак не отражена, никаких сообщений об этом не выдается.
Более того: пока попытка поиска не произведена, продолжаются автоподсказки при наборе в строке поиска, хотя индекс уже не активен.
Глюки с удалением индекса.
Открываю “Просмотр индекса”. Копирую в буфер обмена одно из имен присутствующих хостов. Перехожу в Удаление индекса по совпадению. Вставляю из буфера взятое имя хоста - совпадения не найдено, 0 документов.
Удаление документов из коллекции user через Планировщик, приводит к нарушению базы индекса и исчерпанию памяти. Индексируемые после этого документы хотя и показываются, но не обнаруживаются для deletion regex .*
Эти последствия можно ликвидировать только очисткой базы (Cleanup).
Удаление индекса не освобождает дисковое место. “Удалением по возрасту” было удалено 80 тыс документов, затем была сделана оптимизация базы. Занятое на диске пространство не уменьшилось в течение 10 часов.
Обнаружено, что место занимают файлы в каталоге \DATA\INDEX\freeworld\SEGMENTS\default и \DATA\HTCACHE\file.array
Эти файлы не удаляются программно.
Нарушение отображения графика размера индекса. Связано с переполнением пула соединений. Исчезало также отображение графика количества узлов сети. Однако связь с самими узлами оставалась и они отображались на карте сети и в окне Монитора производительности.
Состояние не исправляется самопроизвольно.
Состояние не исправляется перезапуском ядра Solr.
Кнопки перезапуска и выключения не работают.
Узел был перезапущен вызовом непосредственно Steering.html, сбой исправился.
В этом состоянии краулер не в паузе, но имеет скорость ноль. На графике не отображается работа, как в паузе. Форма кривой памяти показывает, что индексирование не совершается, но оно запустилось фактом удаленного открытия административного интерфейса.
Этот сбой мог предположительно иметь причиной повторный запуск через планировщик индексации по списку адресов, когда еще не завершилась уже запущенная индексация по этому заданию.
По шаблону .* и оптимизации после этого, индекс удаляется весь, но дисковое пространство не освобождается. Возобновление индексации очень быстро снова приводит к остановке по исчерпанию дисковой квоты. Размер нового собранного индекса при этом незначителен.
Другие способы удаления (записываемые в Планировщик) тоже не освобождают дисковое пространство.
Как выяснилось, это место занимают очереди краулера в каталоге YaCy/DATA/INDEX/freeworld/QUEUES/CrawlerCoreStacks
Crawler_p.html сообщения об остановке краулера продолжают висеть, когда они уже не актуальны. (Очень мешает!)
Не ясно, как работает краулер. То он может загрузить 200 тыс страниц с форума, на котором только 40тыс, то может не проиндексировав до конца, прекратить работу и сбросить все очереди в ноль.
Автокраулер включен, но его работа не проявилась ни разу.
“мгновенная неглубокая индексация” работает только, если найдены какие-то результаты из других узлов.
Если запрошенный host отсутствует в индексе везде, и результатов 0, то его немедленная индексация не происходит.
В Расширенном индексировании применение любого фильтра с опцией “запретить запуск домена” выдает ошибку “фильтр индексирования “(smb|ftp|https?)://(www.)?)” не совпадает с корень индексирования “-UNRESOLVED_PATTERN-”.”
(При этом “Запретить часть пути” с регулярным выражением - работает.)
Документы, помещенные в htroot\www не доступны (в том числе через YaCy-прокси) по www.[peername].yacy , они доступны по www.[peername].yacy/www/, [peername].yacy/www/ и [ip.adress]/www/
При доступе через YaCy-прокси: если в адресе конечное www не завершается слэшем, это вызывает ошибку (хотя в каталоге www присутствует Index.html ).
API Network.xml?page=5 не работает, заполнено сообщением -UNRESOLVED_PATTERN-
выдается ошибка на просмотр get_bookmarks.xml
Не отображается параметр “загрузка процессора” (Всё время -1).
Отмечено самопроизвольное закрытие YaCu. Причины неизвестны.
Проверка регулярного выражения застревает. По-видимому некоторые проверяемые комбинации способны вызывать внутреннюю ошибку.
Некоторые выражения черного списка приводят к вылетанию в ошибку страницы Blacklist_p.html
Страница “Управление черным списком”: Около поля выбора операции (внизу списка) поле выбора файла черного списка не активно и при двух файлах показывает не тот файл, который редактируется.
Самопроизвольно сбрасываются флажки эвристики “Оператор ‘site’: мгновенная неглубокая индексация” и “Загрузка результатов внешнего поиска из списка активных систем”. Моменты и причины сбрасывания не выяснены. Наблюдалось неоднократно, никакие другие настройки при этом повреждены не были.
Действия по установке этих флажков хотя и записываются в Планировщик, но при попытке их выполнить в Планировщике - не выполняются, status 404:
404 http://localhost:8090/ConfigHeuristics.html?site_on=&apicall_pk=000000000249
404 http://localhost:8090/ConfigHeuristics.html?opensearch_on=&apicall_pk=000000000250
(уважаемый автор забыл, что добавил _p в имени файла)
Планировщик. Table_API_p.html Не удалялись записи все сразу с установкой галочки “все”.
В некоторые моменты не удается удалить выбранное действие. Через некоторое время удаление опять становится возможным.
При запуске индексирования одновременно нескольких сайтов, в поле “Comment” Планировщика указан только один адрес, что приводит к недопониманию. Надо или указывать все, или указывать: “группа”.
Действие, запланированное на выполнение ежедневно, выполняется каждый раз сразу четырехкратно (показание счетчика запусков). Дата последнего запуска при этом не обновляется.
По неизвестной причине не удалялось действие в планировщике. Возвращает пустую страницу; при повторном обращении показывает не удаленное действие. Интерфейс подтормаживает.
Сбой исчез после рестарта.
Функционирование планировщика непонятно. Если задано повторение, то “нет события”, а если задать триггер события, то устанавливается “не повторять” в недоступном поле.
Недостаточно событий планировщика. Не записывается “Очистка индекса”. Нужны события по переполнению памяти или дисковой квоты.
Нужно, чтобы записывались также действия: ручной сброс переполнения памяти, очистка очередей, удаление ошибок загрузки из индекса.
Так как большинство ошибочных состояний выправляются только полным перезапуском YaCy, требуется возможность задания рестарта по любому из событий.
В целом управление Планировщиком оформлено нелогично: большинство изменений в таблице применяются немедленно, а редактирование даты почему-то с подтверждением.
По-нормальному надо было бы применять с сохранением все изменения в странице целиком, кнопкой “Сохранить”.
при использовании транслятора (Translator_p.html) невозможно просмотреть (view it) страницы yacysearchitem.html и yacysearchtrailer.html
На странице результатов поиска невозможно перевести через Translation Editor заголовки блоков слева: Location Provider, Wiki Name Space, Language, Authors, а также выпадающие подсказки к элементам. Текст информации об RSS доступен для перевода не весь (неправильное положение английского текста в теге).
Не переводятся сообщения об ограничении поиска, и др.
При клике по полю поиска на странице результатов, надпись на кнопке сменяется на непереведенную.
Вообще, недоперевод и глюки со страницей результатов поиска особенно постыдны, так как это - лицо узла, эту страницу видят посетители. Это должно быть исправлено и отлажено в самую первую очередь.
Неоднозначное поведение страницы результатов поиска. То на ней есть переключатель P2P/Stells, то этого переключателя нет.
Неясно, почему иногда ищет только в локальном индексе.
Эффект фильтра по странам практически незаметен.
Поиск через активные open-search системы не происходит, хотя флажки установлены.
Для того, чтоб удалить open-search систему, требуется заполнить поле URL, абсурдное требование.
При большом количестве результатов не показывается список страниц:
“1-10 из 2 763 ; (2 635 локально, 128 удалённо из 16 узлов YaCy)”.
Сбои при поиске изображений. Нет показа результатов, в т.ч. сообщение “10-10 из 0”.
Наблюдалось, если произведен первый клик по первому недоступному превью (thumbnails), следующие страницы перестают показывать результаты.
Сбой сложно воспроизводимый. Предположительно это связано с исчерпанием памяти в процессе такого поиска.
Режим: расширенный, поиск из других узлов.
Во время поиска, пока результаты еще не получены, на странице результатов не должно быть надписи “0 результатов из 0 узлов”, потому что она воспринимается как неудачный поиск и побуждает посетителя не ждать дальше, а закрыть страницу.
Вместо той должна быть надпись “происходит поиск…”.
Точный поиск фразы в кавычках дает несоответствующие результаты, совпадающие частично. В Гугле и Яндексе уже ЗАМУЧИЛО такое!!! ИСПРАВЬТЕ!!! ОЧЕНЬ ПРОШУ!!!
Не работает точный поиск. Например: я задал поиск фразы:
“Время было но его больше не будет”. На что получаю сообщение:
“Следующие слова являются стоп-словами и были исключены из поиска: [больше, будет, было, его, не, но].” - то есть, вместо искомой цельной фразы осталось только одно слово “Время”. Это неправильно.

Из суммы выше наблюдавшегося неизбежен вывод, что в этих условиях YaCy 1.924/10079 в принципе не способна к полноценной длительной автономной работе. Такой вывод подтверждается и взглядом на таблицу активных узлов: время непрерывной работы многих из них составляет единицы дней только.

Anyday · 2 April 2023 01:43

Hey, recently started tinkering with it myself, so hopefully I can help!
I’ve see that you’re only giving your instance 870MB of ram.

I’ve noticed in my tests that Yacy becomes a bit unstable with less than 3GB. The crawler can use around ~2GB of ram, but can be as high as ~6GB. This seems to be related to any media that may be loaded from a page during a crawl. Large PDF documents seem to be the worst.

Also, with 1GB of space, I think the most urls you could hold is about ~50,000 - 100,000. It tends to get more efficient once you have about 2,000,000 but it will take around 45GB with 3-4GB ram to hold that.

Sviatoslav · 2 April 2023 22:06

Yes, you are right in everything, but the peer is on a remote virtual server with 1GB RAM.
Hoster does not give an increase of RAM. So I have to use it.
For OS itself, I have to leave at least 128MB, so 870MB remains for YaCy and no more.
To facilitate crawling, I had to abandon the media indexation, I mainly indexing the text.
You also correctly appreciated the volume of the index - about 100 thousand.

Nevertheless, there are still lot of errors that are not related to the amount of memory, and they are not fixed, unfortunately.

(This post is Google Translated.)

roamn · 5 April 2023 14:10

Try WattOS #32 or 64 bit
Have basic netbook running.
Don’t do upgrade…
Try spare USB stick first HDD unplugged
Caution is #linux…

Sviatoslav · 6 April 2023 04:59

К сожалению, для меня нет смысла организовывать узел на собственном компьютере любым способом.
Потому что мой провайдер дает подключение через NAT (так называемый “серый” IP).
Такой узел не будет доступен со стороны сети, уже испробовано.

okybaca · 7 April 2023 08:12

Hi, since it looks realy elaborate, would you mind translating your post to english, eg. using google translate? Everyone wanting to read your post has to do by himself, so maybe if you do, you can catch even some mistakes in automatic translation… Thanks!

okybaca · 7 April 2023 08:16

Recently I bought a maximum RAM for my yacy server, and now, reaching some 20.000.000 of pages, it’s exhausted again (8.5GB RAM reserved for Yacy, Solr is a separate process). So it seems that with increasing the index size, the ammount of RAM used grows as well.

Sviatoslav · 8 April 2023 06:54

I’m not sure it will work out well. Previously, the forum had its own translation service, but now it does not exist… it probably didn’t live up to expectations.

Sviatoslav:

It’s been a year since I started launching the YaCy node.
Here is a list of errors, or what I took for errors due to my ignorance or inability to configure correctly.
But I can be forgiven, since there is no detailed information on YaCy, and there is no place to learn.
It is possible that in the new versions released during this time, some errors have already been fixed, but I did not have the strength and opportunities to check the new versions in time.
Everything described below applies to ver. 1.924 / 10079 running under Windows XP, 1GHz, 1Gb.
870Mb of RAM was allocated for YaCy.
Most of the settings are left as default.

The worst difficulty was the rapid memory overflow. I was never able to solve this problem by configuring YaCy. I had to write my own utility that resets memory when required. (I already gave a topic about this that no one was interested in).

So, I observed the following errors and glitches::

When the maximum memory is reached, the indexer stops (not just DHT).

In the case of memory exhaustion: after clearing the index, freeing up memory, and restarting the Solr kernel, the interface still continued to slow down very much-until YaCy was completely restarted.

After running out of memory or the disk space limit, YaCy couldn’t restart - It doesn’t start until the index files are manually deleted.
The phenomenon was observed with an excessively large index. Experience has shown that with 870Mb of RAM allocated, YaCu cannot process significantly more than 10Gb of data (although the disk can hold 20Gb). Therefore, the disk quota for YaCy has been reduced to 8Gb.

The message " Free memory is less than 27 MB. DHT-in is disabled. Please correct it. You will need to restart YaCy."
In fact, the crawler is stopped, and DHT continues!

The memory depletion status is not displayed in /api/status_p.xml

Why should the user click the “Reset State " button that appears when the memory is full? The program itself can’t click this button when it appears?

When periodically resetting memory overflows, YaCy entered the following state: the interface is available, crawlers are stopped, the index cleanup command is not executed, the curve of the memory usage graph is exactly zero everywhere, but the total memory consumption list shows plausibly.
The state was saved until the restart.

You need the parameter “maximum queue size” or automatic queue limit when the memory consumption approaches the limit.

No matter how much memory is left in reserve, some sites still run out of memory and the interface freezes.

Cache overload. The word cache size is set to 9K, and in general it balances within 10K, but there are individual peaks reaching up to 70K. However, this does not cause any failures.

The scale of Network History graphs does not match.
The average “Count of all Active Peers Per Day “is around 390, and the average” Count of all Active Peers Per Week " for this day is around 630. And on the “Count of all Active Peers Per Month " chart, the same day is about 1400.
The shape of the curve on the weekly and monthly charts is similar: taking into account the desired horizontal stretching.
And on an annual basis, the curve course does not coincide with the monthly one, the course of the curves is basically different.

Incorrect dates in the bar chart at the bottom of the Index Browser page. On the search results page, the chart dates are also incorrect.

Error with drawing graphs. The drawing doesn’t appear. Deleting from the index doesn’t work. The error was observed after autoregulation when filling the disk quota, or when the index is large.
The error is cleared after a reboot.
“Deleting boot errors” periodically doesn’t work.
Deleting the index in this state does not work at all, and a reboot or restart of the Solr kernel is required.
The error does not occur if the Auto-Adjust option is not selected.

Index Browser does not show a list of hosts. The state occurs (presumably) during memory or disk quota exhaustion and continues after memory is released, until a reboot.
During this crash, the chart in Application Status also does not display changes in the index size, and the Scheduler’s work with the index is also disrupted.
It looks like autoregulation isn’t working correctly.
Moreover, the failure does not occur immediately. At first, autoregulation works normally several times.

The index contains 708 documents. They occupy 4.42 GB of disk space (as indicated in Status.html) after optimizing the database and completely restarting it.
Links are shown in the Index View, but not on the page IndexDeletion_p.html they don’t find any documents to delete (0 documents). Only Age-based Deletion works. After deleting and optimizing it, the disk space used didn’t noticeably decrease.
The condition is supposed to have occurred as a result of using Autoregulation.
Then about 4 thousand files were manually deleted in the DATA/INDEX/freeworld/SEGMENTS/default directory

When you start the crawler through the “Advanced Display”, auto-regulation of disk usage does not work. Instead of freeing up disk space, the crawler simply shuts down. (Not paused, but stops altogether).
Overall impression: the autoregulation function is inoperable, and its use breaks the index database.

IndexControlURLs_p.html, Statistics about the top-100 domains in the database:
“delete all” - the address is removed from the list only after the second click.

After 14 days, the index becomes inactive, a search for it does not find anything, but this loss of activity is not reflected in any way, and no messages are issued about this.
Moreover, while no search attempt is made, auto-prompts continue when typing in the search bar, even though the index is no longer active.

Glitches with deleting the index.
I open “View Index”. I copy one of the names of the present hosts to the clipboard. I go to Delete the index by coincidence. I paste the host name taken from the buffer - no matches found, 0 documents.

Deleting documents from the user collection via the Scheduler causes a violation of the index base and runs out of memory. Documents indexed after this are shown, but not detected for the deletion regex .*
These effects can only be eliminated by cleaning up the database.

Deleting an index does not free up disk space. "Deleting by age” removed 80 thousand documents, then the database was optimized. The disk space used didn’t decrease for 10 hours.
Files in the \DATA\INDEX\freeworld\SEGMENTS\default and \DATA\HTCACHE\file.array directories were found to take
up space. These files are not being deleted programmatically.

The index size graph is not displayed correctly. This is due to an overflow of the connection pool. The graph of the number of network nodes also disappeared. However, the connection to the nodes themselves remained and they were displayed on the network map and in the Performance Monitor window.
The condition is not corrected automatically.
The condition is not corrected by restarting the Solr kernel.
The restart and shutdown buttons don’t work.
The node was restarted by calling directly Steering.html, the crash was corrected.
In this state, the crawler is not paused, but has a speed of zero. The chart does not show the work as in a pause. The shape of the memory curve indicates that indexing is not performed, but it was started by opening the administrative interface remotely.
This failure could presumably have been caused by a re-run through the address list indexing scheduler, when the already started indexing for this task has not yet finished.

Based on the template .* and even after that, the entire index is deleted, but disk space is not released. Resuming indexing very quickly again leads to a stop when the disk quota is exhausted. The size of the newly created index is insignificant.
Other deletion methods (recorded in the Scheduler) they also don’t free up disk space.
As it turned out, this place is occupied by crawler queues in the YaCy/DATA/INDEX/freeworld/QUEUES/CrawlerCoreStacks directory

Crawler_p.html messages about stopping the crawler continue to hang when they are no longer relevant. (Very annoying!)

It’s not clear how the crawler works. Then it can load 200 thousand pages from a forum with only 40 thousand pages, then it can stop working without indexing to the end and reset all queues to zero.

The Autocrawler is turned on, but it doesn’t work even once.

“instant shallow indexing” works only if some results from other nodes are found.
If the requested host is not present in the index everywhere, and there are 0 results, then its immediate indexing does not occur.

In Advanced indexing, applying any filter with the “prevent domain launch” option returns the “indexing filter " error(smb / ftp|https?)://(www.)?)“does not match the indexing root “-UNRESOLVED_PATTERN-”.”
(At the same time, “Prohibit part of the path” with a regular expression - works.)

Documents placed in htroot \ www are not accessible (including via the YaCy proxy) by www.[peername].yacy, they are accessible by www. [peername].yacy/www/, [peername].yacy/www/, and [ip.adress]/www/
When accessed via a YaCy proxy: if the destination www does not end with a slash in the address, this causes an error (although www is present in the directory Index.html ).

API Network.xml?page=5 doesn’t work, filled with the message-UNRESOLVED_PATTERN-

an error is returned for viewing get_bookmarks.xml

The “CPU usage” parameter is not displayed (All times -1).

Spontaneous closure of YaCu was noted. The reasons are unknown.

34 Checking the regular expression gets stuck. Apparently, some tested combinations can cause an internal error.

Some blacklist expressions result in a page fault Blacklist_p.html

Blacklist Management page: Near the operation selection field (at the bottom of the list), the blacklist file selection field is not active and shows the wrong file that is being edited for two files.

The “site operator: instant shallow indexing” and “Loading external search results from the list of active systems”heuristics are automatically cleared. The moments and reasons for the reset were not clarified. It was observed repeatedly, but no other settings were damaged.

Actions to set these checkboxes are recorded in the Scheduler, but when you try to perform them in the Scheduler, they are not performed, status 404:
404 http://localhost:8090/ConfigHeuristics.html?site_on=&apicall_pk=000000000249
404 http://localhost:8090/ConfigHeuristics.html?opensearch_on=&apicall_pk=000000000250
(author forgot that he added _p in the file name)

The scheduler. Table_API_p.html All entries were not deleted at once with the " all " checkbox checked.
At some points, you can’t delete the selected action. After some time, deletion becomes possible again.

When you start indexing multiple sites at the same time, in the”Comment " field Only one address is specified for the scheduler, which leads to misunderstandings. You must either specify everything, or specify “ " group”.

An action scheduled for execution on a daily basis is performed four times at once each time (the start counter is displayed). The last launch date is not updated.

For some unknown reason, the action in the scheduler was not deleted. Returns an empty page; when accessed again, it shows an action that was not deleted. The interface slows down.
The crash disappeared after the restart.

The functioning of the scheduler is unclear. If repetition is set, then “no event”, and if an event trigger is set, then “do not repeat” is set in the unavailable field.
Not enough scheduler events. “Index Clearing” is not recorded. We need events for memory or disk quota overflow.
You also need to record actions: manually resetting memory overflow, clearing queues, and removing loading errors from the index.
Since most erroneous states are corrected only by a full restart of YaCy, the ability to set a restart for any of the events is required.
In general, managing the Scheduler is illogical: most changes to the table are applied immediately, but for some reason editing the date is confirmed.
Normally, it would be necessary to apply all changes to the entire page with the “Save”button.

when using a translator (Translator_p.html) unable to view (view it) pages yacysearchitem.html and yacysearchtrailer.html

On the search results page, you can’t use the Translation Editor to translate the block headers on the left: Location Provider, Wiki Name Space, Language, Authors, or drop-down hints for elements. The entire text of RSS information is not available for translation (the position of the English text in the tag is incorrect).
Messages about search restrictions are not translated, etc.

When you click on the search field on the results page, the label on the button changes to untranslated.
In general, undertranslation and glitches with the search results page are especially shameful, since this is the face of the node, and visitors can see this page. This should be fixed and debugged very first.

Ambiguous behavior of the search results page. Either it has a P2P/Stells switch, or it doesn’t have this switch.
It’s not clear why it sometimes searches only in the local index.

The country filter effect is almost invisible.

There is no search using active open-search systems, although the check boxes are checked.

In order to delete the open-search system, you need to fill in the URL field, an absurd requirement.

If there are a large number of results, the list of pages is not shown:
"1-10 out of 2,763; (2,635 locally, 128 remotely out of 16 YaCy nodes)”.

Image search crashes. No results are shown, including the message “10-10 out of 0”.
It was observed that if the first click is made on the first unavailable preview (thumbnails), the following pages stop showing results.
The crash is difficult to reproduce. Presumably, this is due to the exhaustion of memory during such a search.
Mode: advanced, search from other nodes.

During the search, while the results are not yet received, the results page should not contain the inscription " 0 results from 0 nodes”, because it is perceived as a failed search and encourages the user not to wait further, but to close the page.
Instead of that, there should be an inscription “a search is underway…".

An accurate search for a phrase in quotation marks gives inconsistent results that partially match. Google and Yandex are already TORMENTED by this!!! FIX IT!!! PLEASE!!!

Accurate search doesn’t work. For example: I set a phrase search:
“Время было но его больше не будет” (”There was a time, but there won’t be any more.") To which I get the message:
" The following words are stop words and were excluded from the search: [больше, будет, было, его, не, но] (more, will, was, his, not, but).” - that is, instead of the whole phrase you are looking for, there is only one word left “Time”. This is wrong.

From the sum of the above observations, it is inevitable to conclude that under these conditions, YaCy 1.924/10079 is in principle not capable of full-fledged long-term autonomous operation. This conclusion is also confirmed by looking at the table of active nodes: the continuous operation time of many of them is only a few days.

Sviatoslav · 8 April 2023 07:04

It looks like it is. For this reason, I cannot use a large disk space entirely with a small memory.

okybaca · 8 April 2023 07:32

Wow, that’s impressive and diligent! Thanks for translating!
It seems to me, that a larger part is a behavior under low ram and disk condition (well, 870MB is not exactly low :-] and is default now and recommended in the docs), which should be solved.
I don’t quite understand how YaCy works with memory, I only observed, that GC is called when memory is full.
The second thing I don’t understand is, how heaps in \DATA\INDEX\freeworld\SEGMENTS work. I got an instance with separate solr, and still the index is written in local text files, which are held on disk and probably somehow in memory as well (the more indices, the more RAM occupied). ‘Database optimization’ somehow merge these files together. And some merging and compressing is done upon shutdown and start-up – that’s why I rather restart YaCy every few days (and have only few days uptime in the table). After crash or killing the java process, the start-up and probably some sort of repair, takes quite long.
How does it work, @Orbiter?

roamn · 8 April 2023 18:37

I have ver 1.924 10069 from web site and have had issues with deep crawls of larger than normal site…
What is the depth you crawl at ?
I have 4 CPU host running at 13 times overload in Linux lasts 3 mins,…
The tar.gz I think is made for modern servers…
When I first started it it would last an hour or so…
It’s very hard to use a consol window on a 99 aud android phone
But worth every second.
Thanks for supporting this project like I have.
Try…GitHub.com/smokingwheels for some lighter versions of yacy. But make sure you have a working backup.

Sviatoslav · 9 April 2023 06:44

I think other scattered error messages from YaCy can be collected in this topic.
Here are the latest messages that appeared after my list:

Violation of the structure of the index base (apparently similar to what I observed): Yacy decided to not recognize my ~60 GB Index anymore a few months ago (+ webcrawler not working; data does still exists)
infinite indexing loop and index base access violation: Delete domain info from index
index size limit (probably related to the use of Long variables): Solr error: number of documents in the index cannot exceed 2147483519
For unknown reasons, the peer loses its name: Peer name changed on its own

okybaca · 9 April 2023 12:36

Good idea, mine are:

Solr error: number of documents in the index cannot exceed 2147483519
YaCy as a news search engine - couple of bugs / problems using yacy as a news search engine
/date vs date of indexing

okybaca · 9 April 2023 12:50

Some things with handling memory were improved in the source code at github. Are you able to build yourself from source? Current version I use is: yacy_v1.926.
You can easily backup the old version and copy/move only the DATA dir to the new version – and it sould run as before.

To find out, what might be wrong, you could try to dig in source codes. YaCy Java reference might help.

If you have an external machine with public IP, you can try ssh tunnel as described in FAQ.

roamn · 9 April 2023 14:11

What do you see from this peer

138.68.166.11
http://138.68.166.11
Ver 10069
http://149.28.182.181/index.php/s/GyEAT4ZqwnHGgkf
Community education and power
File is writeable but not sure what to do…
Hosted by my nextcloud server.
Using 99 and android 11 phone

Sorry wrong link.
Will keep looking on phone
http://149.28.182.181/index.php/s/GyEAT4ZqwnHGgkf
WattOS
I have tried and run pixel and yacy.
Pixel runs in my.vultr
.com
Is impossible on android phone no zoom

Felix · 13 April 2023 06:40

How exactly to find version v1.926?
Is this a stable version? It this an official version? Why is it not on the yacy.net website?

roamn · 13 April 2023 08:07

I’m sure if you looked in GitHub you would find it. For many years I had a fork unsynchronized but don’t anymore.
Asking at a torrent site might be another way…

okybaca · 13 April 2023 08:51

Hi Felix,

You can compile it yourself out of source code at GitHub.

You gotta to ask @Orbiter.
In my opinion, some automaticaly built ‘nightly’ version, auto published on yacy.net would be nice. Are we able to do that?

roamn · 13 April 2023 18:43

This URL is from 6 years ago about YACY I would of been using Ubuntu desktop 14 .04
On a 10 year old rubbish tip find PC., The java version was 1.7 or 7 back then and now min 1.8 8 …
I have no way to open files to see what I was doing at the time…

I have stress tested yacy for many years now on at least 10 year old rubbish tip hardware
I have had a good time doing IT to help with this project.

Sviatoslav · 14 April 2023 06:26

It was still not that a error was discovered, but apparently a shortcoming:
YaCy cannot index this site
https://анимевост.рф/sitemap.php
does not find links on the page.