Setting up Yacy for Port 80

poz · 12 November 2020 20:12

I have tried to set Yacy to port 80 via the system administration port settings
It fails to startup saying Java.net Socket connection refused…

I am on a typical Ubuntu/linux box. I can run any other install and use port 80 all day long.

I can see that there is this Default index.html Page (by forwarder) in the portal configuration. Don’t know if that will let me use port 80 separately from 8090 or not. Or where to point it.

I understand the engine shares the peer to peer with the main port. But I cannot expect regular users to use port 8090. How can I get a normal yacy page on port 80 without breaking it?

I am using it in “Search portal for your own pages” mode. So It doesn’t need to access all the other peers out there (at this moment)

I understand that Orbiter is busy with Yacy Grid and considers this version obsolete. But can somebody please help me? I have put in a couple days on this just getting this running well enough to put it into service.

Thanks,

poz · 13 November 2020 21:11

Any possibility that somebody knows how to set the search portal to port 80?
I cannot understand why the peerconnection port is tied to front-end client page. Or why you wouldn’t want standard port 80.
I took a look at the this problem more setting the main port to 80 will fail it at startup with…

I 2020/11/13 06:25:28 ConcurrentLog shutdown of ConcurrentLog.Worker: injection of poison message
I 2020/11/13 06:25:28 ConcurrentLog terminating ConcurrentLog.Worker with 0 cached loglines.
I 2020/11/13 06:25:28 ConcurrentLog shutdown of ConcurrentLog.Worker: terminated

It seems its very unhappy with port 80.

I have tried a number of ways by using a apache2 to host the front-end files. But yacy uses a custom html-rewriter. So that’s out. The YacySearch.html/Yacyinteractive.html is very complex

IFrames will not work because upon clicking a link. The browser will refuse to render a cross-domain page inside the frame.

Nor can you use _Blank as a target to open a new window because modern browsers will ignore it.

poz · 13 November 2020 21:16

I have even tried yacygrid. I successfully got it loaded as a docker container on a remote server. But I have no clue what to do next. It looks very complex, and theirs no clear instructions. Going on 30 hrs now on this. If Orbiter could just respond for 1 minute. I could be on my way…

Orbiter · 13 November 2020 21:48

running any service on a port <1000 is limited to administrator accounts. For that you would need to run YaCy as root.

poz · 13 November 2020 21:51

Forgive my ignorance, linux is not my primary platform. How do I do that? And if I want to have Yacy operate the peer side of things on 8090 but the front-end on 80. Is that possible?

Orbiter · 13 November 2020 22:42

Yes. It is not encouraged to run YaCy as root and the constellation to run YaCy on 8090 while using port 80 “outside” is not only possible, but recommened.
To do so, you need a reverse proxy, like nginx which runs on port 80 and provides a multi-domain configuration to proxy from a outside port 80 on a specific subdomain to inside port 8090.
The most proper reference for a tutorial to do so is probably https://docs.nginx.com/nginx/admin-guide/web-server/reverse-proxy/ but it is not the most easy one. There are plenty of tutorials, just look for “nginx reverse proxy subdomain”.

poz · 13 November 2020 23:12

I actually got it to work finally with you pointing out to me what is stopping it.

I have to say, I am totally blown away with yacy. This must of been a lifetime of man hours in there.
I have looked into a ton of sourcecode in packages before. But yours is off the chart with capability and features. So incredibly organized. And you put so much effort into the front-end with tips and options. You didn’t even tell anybody about yacyinteractive.html which seems to be all that susper is. I need to work over the front-end page quite abit to adapt it to my use case though. Can you tell me what style of http rewritting engine that is? Or is that totally custom? I don’t have a chance to work in it if I don’t know how your development workflow on page creation happened.
It would be easier if I could just tack-on php front-end and call your api. Has anybody ever done this?
My use case doesn’t have all the issues you mentioned in yacygrid. I don’t need to index the world.
So it will stay lightning fast in my scenario. Anyway if nobody has ever told you thank you for all your hard work. I will
“Thank you”!
POZ

Orbiter · 14 November 2020 00:41

Great it worked!
The web interface was made some time before all these JavaScript-based frameworks were developed; actually JavaScript was a no-go for some time. So we developed our own CMS framework which I still like a lot. Its easy and produces static-looking web pages that are dynamically produced from templates.

If you want to attach your own front-end, please use the API (just click on the API signs in the upper right whenever its there).

poz · 14 November 2020 02:11

Well I guess I can’t work by editing the existing pages because I won’t be able to do any intellisense or linting if there’s no standard parser that works within visual studio.

Easiest option now include an extra webserver. Write the equivalent pages in php and get the xml responses the old fashioned way.

But I need those query parameters for yacy. Except the site is dead.

http://www.yacy-websearch.net/wiki/index.php/Dev:APICrawler

I specifically need the expertcrawler params. And of course the yacysearch params.

One thing that’s probably going to give me an unsolvable problem. Is I need to “TAG” crawled pages with some name so I can DROP any pages submitted by malicious users. Almost a certainty that somebody is going to poison the index with garbage and I will need to know by TAG and erase those.
The TAG is just a name from a future login page I will create. User has to signup for index ability. And if they prove to be malicious. Erase all their crawling and ban them.
You have built-in solutions I can draw from in your code?

ecxod · 22 September 2023 12:41

Recommending user to not run programs as superuser or root seems so funny to me, as long they are all members of the sudo group per default, from install of the operating system, and most every command starts with “sudo”. :))
Same like this funny policy to not permit running daemons or services on low ports …

Orbiter · 25 September 2023 08:28

The page was rescued and is available here: