Someone using yacy to teach lllm?

Did you also notice huge mass of search queries prefixed www?

My instance is asked plenty of these, containing unrelated keywords such as:

20250117112915 0 sq ("www" AND "references")
20250117112921 17 sq ("www" AND "2016" AND "worst" AND "labor" AND "open" AND "2018")
20250117112933 0 sq ("www" AND "2016" AND "2020" AND "keyword" AND "climate")
20250117112934 0 sq ("www" AND "cam" AND "keyword" AND "3626" AND "casino" AND "hedge" AND "herself" AND "scarab")
20250117112942 0 sq ("www" AND "cam" AND "keyword" AND "outlined.")
20250117112947 0 sq ("www" AND "cam" AND "keyword" AND "challenges" AND "communication")
20250117112954 0 sq ("www" AND "sync" AND "class" AND "keyword" AND "html")
20250117112957 0 sq ("www" AND "2016" AND "pharmacy" AND "2024")
20250117113002 0 sq ("www" AND "people" AND "roman")
20250117113128 1 sq ("www" AND "2016" AND "growth")
20250117113159 0 sq ("www" AND "pdf")
20250117113211 0 sq ("www" AND "sportingbet" AND "keyword" AND "bwin")
20250117113323 0 sq "www"
20250117113334 0 sq ("www" AND "2016" AND "keyword" AND "party" AND "access")
20250117113348 0 sq ("www" AND "2016" AND "keyword" AND "driven" AND "applications" AND "memory" AND "hopper")
20250117113357 2 sq ("www" AND "2016" AND "zoll" AND "full")
20250117113407 0 sq ("www" AND "2016" AND "keyword" AND "discrimination" AND "high" AND "new" AND "e.g.")
20250117113412 0 sq "www"
20250117113413 0 sq ("www" AND "2006")
20250117113424 1 sq ("www" AND "2016" AND "2022")
20250117113435 0 sq ("www" AND "2016" AND "management" AND "keyword" AND "biodiversity")
20250117113444 0 sq "www"
20250117113446 0 sq ("www" AND "keyword" AND "blague")
20250117113456 0 sq ("www" AND "2016" AND "2022" AND "keyword" AND "video" AND "distinct")
20250117113459 0 sq ("www" AND "2016" AND "cape")
20250117113505 0 sq ("www" AND "2016" AND "keyword" AND "violence" AND "core" AND "who")
20250117113512 0 sq ("www" AND "zip" AND "keyword" AND "download" AND "free" AND "ŠæŠ¾ŠøсŠŗŠ¾Š²ŠøŠŗ")
20250117113521 0 sq ("www" AND "sync" AND "lib" AND "extra")
20250117113524 0 sq ("www" AND "little" AND "2022" AND "keyword" AND "all")
20250210082232 0 sq "www"
20250210082306 0 sq ("www" AND "review")
20250210082306 0 sq ("www" AND "2016" AND "programas" AND "ebook" AND "keyword" AND "changes" AND "1999" AND "mathematics")
20250210082307 0 sq ("www" AND "2016" AND "cape" AND "signed" AND "keyword" AND "cat")
20250210082307 0 sq ("www" AND "2016" AND "keyword" AND "most" AND "model" AND "rapidly" AND "novel")
20250210082309 0 sq ("www" AND "2016" AND "pharmacy" AND "2017" AND "january")
20250210082310 0 sq ("www" AND "2016" AND "keyword" AND "applications" AND "2020" AND "various" AND "review" AND "sports")
20250210082311 0 sq ("www" AND "2016" AND "2022" AND "reality" AND "keyword" AND "crime" AND "self-defense" AND "2020" AND "office")
20250210082311 0 sq ("www" AND "2016" AND "2019" AND "kenya" AND "keyword" AND "sharing" AND "protection" AND "change" AND "offers" AND "2018" AND "system")
20250210082314 0 sq ("www" AND "2016" AND "keyword" AND "100" AND "registry" AND "draw" AND "2019" AND "short" AND "some")
20250210082322 0 sq ("www" AND "2016" AND "keyword" AND "plasma" AND "science" AND "2021" AND "response" AND "hygienistsā€™")
20250210082352 0 sq ("www" AND "2016" AND "keyword" AND "levels" AND "2018" AND "what" AND "urban")
20250210082405 0 sq ("www" AND "2016" AND "analytics" AND "keyword" AND "three" AND "arts" AND "helps")

Some is apparently teaching his/her LLM or filling some search engine indexā€¦ and using my hardware extensively.

2 Likes

That looks familiarā€¦ and I remember seeing something months ago on my server logs resembling that. But I do not understand the point of including ā€œwwwā€ in the queries.

If itā€™s not for training AI, it could be for crashing a search engine that does not obey the dictates of worldly corporations.

donā€™t know why. thatā€™s the pattern. and random or semi-random words.

:wink: ha ha
I donā€™t think that yacy is real competitor of any corp ;-))
does it really work? :stuck_out_tongue:

Other search engines ban robots and all the other programmatic ways of search engine use. So when some search engine doesnā€™t limit that, it probably attracts bots, scripts like ā€œask all the instances all these wordsā€¦ā€, spammers etc. I remember a huge flood of slot-machines related terms last year. Google sometimes banned me even after quick repeated search during my research.
Iā€™d suppose thatā€™s the price we pay for being open.

Since I tunnel the incoming connections to private ipv4 address, I see only 127.0.0.1 as source address in YaCy log. Donā€™t know whether itā€™s a bug or just logic. But then I cannot filter/ban the source IPs.

1 Like

While that is true, the propaganda machine has to censor any opposing voices. Even though my voice is barely heard in the world, that did not stop giants like YouTube and Twitter from suspending my accounts when they did not like my content, even though I pose no threat as a competitor. So, itā€™s not always about competition, but about controlling public consciousness. Yacy searches can produce results that the corporations never want you to see.

I also remember slots and related terms showing up in my yacy resultsā€¦ and so I spent some time deleting and blacklisting related spam websites from my index. I donā€™t know if that time was well-spent, but I donā€™t want junk in my index.