Yacy searXNG Integration via docker

slyle · 13 May 2025 02:45

Hi all,

I have been on yacy and searXNG, but I have failed to connect my personal yacy index to my searXNG. The routes are correct, I set up an api account, it just times out 6 seconds on Yacy and 30 seconds on images - and on the backend. Do I need to set up my yacy index for portal use? I am pretty confused as to how yacy should be handled in this use case, or if it is better to simply forego searXNG and make a separate request pretending the container itself is honoring the engine. I have my settings.yml set up like it is in the default case, for searXNG, and this may be more a searXNG question, but I have sort of honed in on perhaps I have administrated my instance incorrectly. I also think the 8443 TLS via a crawl and intra Yacy communication is confusing as I’m not clear if I should be using localhost:8443 or localhost:8090 in my api calls because accessing localhost:8443 through a browser obviously returns encrypted symbols. Weighing that, vs making the API call via localhost, or if I need to in these middleware docker cases (I have tested both on the same “docker network” and off with the correct base URL, so far as I know, and have even tried with no password in a controlled environment. As you can tell I’m a bit confused in general, so I apologize heavily for the stream of consciousness. I’m in hour 14 and throwing all sorts of food at the wall to see what sticks.

I wanted this to be conceptual as a starter, and then I can provide code that other more experienced users find to be relevant. I’m still learning. Thank you!

slyle · 13 May 2025 02:47

i also want to mention, my yacy instance knows it can be seen externally, port forwarding seems correct therefore. i think it just has to do with how I actually request a search from searXNG’s engine handling of Yacy, or if this is probably better reworked myself.

roamn · 26 May 2025 10:22

Hi @slyle Welcome

I tried Searxng docker but I had an issue when it was referenced to IP address, I had no results from the default search engines.

So I did a full install of Searxng as per Document then started learning about it and how to add my 3 Yacy’s to it.

There is the settings.yml but also the engine definition files as well for each engine.

settings.yml

outgoing:
 enable_http: true

ui:
 static_use_hash: true

 preferences:
   engine_selection:
     - name: yacylocal
       category: general
       enabled: true



engines:
 - name: yacylocal
   engine: yacylocal
   shortcut: yalo
   categories: [general]
   base_url:
     - http://192.168.1.55:8090
   search_url: http://192.168.1.55:8090/yacysearch.html?query={query}
   enable_http: true
   search_mode: global    # <--- use local to reduce timeout risk
   timeout: 60.0
   disabled: false

yacylocal.py in engine folder
I had to modify Searxng source so that if there are 2 results you dont end up with parsing error.

# SPDX-License-Identifier: AGPL-3.0-or-later
"""YaCy_ is a free distributed search engine, built on the principles of
peer-to-peer (P2P) networks.

API: Dev:APIyacysearch_

Releases:

- https://github.com/yacy/yacy_search_server/tags
- https://download.yacy.net/

.. _Yacy: https://yacy.net/
.. _Dev:APIyacysearch: https://wiki.yacy.net/index.php/Dev:APIyacysearch

Configuration
=============

The engine has the following (additional) settings:

- :py:obj:`http_digest_auth_user`
- :py:obj:`http_digest_auth_pass`
- :py:obj:`search_mode`
- :py:obj:`search_type`

The :py:obj:`base_url` has to be set in the engine named `yacy` and is used by
all yacy engines.

.. code:: yaml

  - name: yacylocal
    engine: yacylocal
    categories: general
    search_type: text
    shortcut: yalo
    base_url:
      - http://192.168.1.55:8090
      
  - name: yacylocal images
    engine: yacylocal
    categories: images
    search_type: image
    shortcut: yailo
    disabled: true


Implementations
===============
"""
# pylint: disable=fixme

from __future__ import annotations

import random
from json import loads
from urllib.parse import urlencode
from dateutil import parser

from httpx import DigestAuth

from searx.utils import html_to_text

# about
about = {
    "website": 'https://yacy.net/',
    "wikidata_id": 'Q1759675',
    "official_api_documentation": 'https://wiki.yacy.net/index.php/Dev:API',
    "use_official_api": True,
    "require_api_key": False,
    "results": 'JSON',
}

# engine dependent config
categories = ['general']
paging = True
number_of_results = 10
http_digest_auth_user = ""
"""HTTP digest user for the local YACY instance"""
http_digest_auth_pass = ""
"""HTTP digest password for the local YACY instance"""

search_mode = 'global'
"""Yacy search mode ``global`` or ``local``.  By default, Yacy operates in ``global``
mode.

``global``
  Peer-to-Peer search

``local``
  Privacy or Stealth mode, restricts the search to local yacy instance.
"""
search_type = 'text'
"""One of ``text``, ``image`` / The search-types ``app``, ``audio`` and
``video`` are not yet implemented (Pull-Requests are welcome).
"""

base_url: list | str = 'http://192.168.1.55:8090'
"""The value is an URL or a list of URLs.  In the latter case instance will be
selected randomly.
"""


def init(_):
    valid_types = [
        'text',
        'image',
        # 'app', 'audio', 'video',
    ]
    if search_type not in valid_types:
        raise ValueError('search_type "%s" is  not one of %s' % (search_type, valid_types))


def _base_url() -> str:
    from searx.engines import engines  # pylint: disable=import-outside-toplevel

    url = engines['yacylocal'].base_url  # type: ignore
    if isinstance(url, list):
        url = random.choice(url)
    if url.endswith("/"):
        url = url[:-1]
    return url


def request(query, params):

    offset = (params['pageno'] - 1) * number_of_results
    args = {
        'query': query,
        'startRecord': offset,
        'maximumRecords': number_of_results,
        'contentdom': search_type,
        'resource': search_mode,
    }

    # add language tag if specified
    if params['language'] != 'all':
        args['lr'] = 'lang_' + params['language'].split('-')[0]

    params["url"] = f"{_base_url()}/yacysearch.json?{urlencode(args)}"

    if http_digest_auth_user and http_digest_auth_pass:
        params['auth'] = DigestAuth(http_digest_auth_user, http_digest_auth_pass)

    return params


def response(resp):
    results = []

    try:
        raw_search_results = loads(resp.text)
    except Exception as e:
        print("⚠️ YaCy JSON parse error:", e)
        print("Raw text:", resp.text[:1000])
        return []

    if not raw_search_results:
        return []

    search_results = raw_search_results.get('channels', [])
    if not search_results or not isinstance(search_results, list):
        return []

    items = search_results[0].get('items', []) if len(search_results) > 0 else []

    for result in items:
        if search_type == 'image':
            result_url = result.get('url') or result.get('link')
            if not result_url:
                continue
            results.append({
                'url': result_url,
                'title': result.get('title', ''),
                'content': '',
                'img_src': result.get('image', ''),
                'template': 'images.html',
            })
        else:
            publishedDate = None
            if 'pubDate' in result:
                try:
                    publishedDate = parser.parse(result['pubDate'])
                except Exception:
                    publishedDate = None
            results.append({
                'url': result.get('link', ''),
                'title': result.get('title', ''),
                'content': html_to_text(result.get('description', '')),
                'publishedDate': publishedDate,
            })

    return results

Hope that helps.