The Searchlab Project

For quite some time I am working on a concept for a portal using YaCy Grid as Crawler/Indexing engine. It is about two years ago that I tried to find a sponsor for the portal. Today not only the project has started, it also has reached it’s first milestone!

The searchlab portal is actually live right now - but I do not share the temporary link until the new searchlab portal can be - maybe you guess it - here, at the same place where the forums are. This means we must migrate this forum to a subdomain of searchlab.eu - I will keep you updated.

The YaCy Searchlab project is kindly sponsored by NLnet and YaCy Patreon patrons. I would kindly ask you here to become a patron as well to support my work on YaCy and the searchlab.

The six milestones of the project explain pretty well where the project is going and what you can expect:

A lot more details is contained in the README of the searchlab repository at github.

If you actually want to see the portal yourself, you can easily do so by using docker:

docker run -d --rm -p 8400:8400 --name searchlab yacy/searchlab

… end then open http://localhost:8400 in your browser.

If you have any ideas suggestions or questions, please let me know!

1 Like

The Milestone M2 is now implemented! Unfortunately this is not visible right now in public, because that requires to move the searchlab portal to searchlab.eu - which is currently here, the location of the forum. I will therefore move the searchlab forum to community.searchlab.eu.

The searchlab is now publicly available - where is forum was: https://searchlab.eu

This is now a going-public with a “small-bang” (as opposite of a “big bang”) with a small set of functions as described in M2. Most notably is the search function and the index which is provided by a small set of test crawls, you can test it at Search - Searchlab

That search function is only the first in a series of many, because the next milestone M3 will provide search apps. Those apps will be hosted in a separate reposity: GitHub - yacy/searchlab_apps: Search Apps for the Searchlab

Great, and congrats!
Once again, what is the planned future of “legacy” YaCy? Will it be actively developed?
While still lot of folks use old YaCy and crawl the web, is it still worth of (human & computer) time investment, or is it slowly dying project?
Will the grid version include RWI and P2P functions and be somehow backward compatible with legacy one?
thanx & good luck

“legacy YaCy” is an important project, is ongoing and will go on.

In YaCy Grid there will be no RWI and no P2P function because it was somehow the purpose to build a high-performace search portal that does not need networking to other peers to work.

A “backward compatibility” of YaCy Grid/Searchlab to P2P YaCy is partly already there (the crawl start API and the search API is completely identical, only paths differ) and I plan to implement some kind of “forward compatibility” into P2P YaCy: it would be great to make the searchlab apps usable for the old YaCy as well. Some new interfaces will be required, and I will take care that this way is possible.

1 Like

The Searchlab Apps are beeing implemented!

Right now there are three apps available, created from older YaCy stand-alone search web interfaces:



These apps will work with both, searchlab and YaCy installations, you only have to change the address of the back-end.

The apps will appear on searchlab.eu in an apps sections and will be shown in an app-store like UI. But it will be easy for everyone to add another search app because all of them are hosted in a new git repository: https://github.com/yacy/searchlab_apps

The way you can extend the apps is defined in the README of that repository:

Contributing Your Own Apps

If you like please give us a pull request with your new app!
We love to extend the searchlab apps with community-created content.

To do so, please…

  • Create a new subfolder within htdocs/app/ with the name of your app
  • Create a app.json and fill it with an app description using at least
    the same fields as used in htdocs/app/websearch_lit/app.json.
    The app.json is used within https://searchlab.eu to show a proper visualization
    of your app.
  • You must create a index.html file within your app folder.
  • You must create a screenshot.png file with the exact size of 1024x1024.
    The image should not contain any transparency and it should show a mostly
    proper screenshot of your app when it is producing something useful for the user.
  • You can use all css and js code as given in htdocs/app/_/css and htdocs/app/_/js,
    but you MUST NOT add any files to those directories. If you need any other
    css and js code, please link them directly from the internet or add those
    to your app folder in a separate css/js-path within your app folder.
  • Your App must be published under the CC0 license.
  • Make a pull request where only files within your app folder is added/modified,
    not anything else.

Mix and Merge with Searchlab

Everything that is inside the htdocs folder of the searchlab_app folder is hosted in searchlab.eu/en/, for example:

… but with future versions of the searchlab, the content will be available with a customized user-account path which then accesses only user-account generated content. The user-account paths will be https://searchlab.eu/<user>

That means, if a user named freedom has an account, one web interface for search of the freedom index is i.e. https://searchlab.eu/freedom/app/websearch_bootstrap, which can be embedded elsewhere easily.

1 Like

Web Crawler and Data Warehouse

The next milestone is reached:

  • you can now “pseudo-login” to your own searchlab account. Currently there is no authentication, only anonymous accounts that you get when you click on the “login” button. If you remember your login-id, you can re-use that id later to access your personal assets store
  • you can now start a web crawl with not-yet-everything-enabled options. The web crawl is executed by a YaCy Grid network and results of that crawl go into the search index and into the assets store (as long as you checked the storage options)

  • there is now a Data Warehouse which hosts the assets of web crawls:
    • a corpus database (a table which describes the content of your search index)
    • a crawl start history (json objects which can be used for detailed analyses and to automate crawl starts)
    • a graph database for each crawl (link structure between the documents in your index as json objects)
    • an index dump for each crawl (json objects with parsed documents, the same that has been pushed into elasticsearch)
    • original warc files from the crawl-loader for each crawl (compressed and optional)

There is currently no limitation on crawl starts, the start location, the crawl depth and other crawl options. You can start crawls anonymously without login and you can also do that using the login and the anonymous user-ids. That will change slightly within the work of the next milestone, where proper authentication (possibly oauth) is implemented.

For now, I invite everyone to try out the new function that make the searchlab actually usable. However, when the authentication is implemented, all currently created indexes will be cleared so we can start fresh with real accounts.