Cleaning DATA/SURROGATES/out/ (using tmpreaper)

So, discovered the other day that one of my yacy was complaining about diskspace being full. I cleaned whatever i could, using the admin panels. But to no avail.

Though, i shortly after discovere ~9GB(!!) of quite old data lingering in DATA/SURROGATES/out/

The files had apparently neither been touched or modiefied in quite some time.

(This might be because of changes in the requesting peers. Or rare/never-returning peers; ‘Leaving behind’ their requested data :question: )

I decided they’re probably lingering files of no use. So I decideded to apply tmpreaper

In case someone might find it useful; The following is a crontab entry i currently use to prevent lingering 'out’s . It deletes them 12 hours after their modified/creation time (mtime), if not accessed (atime) in 12 hours. Checking is being done at reboot, and then subsequently every 15th minute.

(note that user in this case is ‘yacy’. evnt. adjust as needed)

#clean yacy out
@reboot                 tmpreaper 12h /home/yacy/yacy/DATA/SURROGATES/out/ >/dev/null 2>&1
*/15 * * * *            tmpreaper 12h /home/yacy/yacy/DATA/SURROGATES/out/ >/dev/null 2>&1

:exclamation::exclamation:Keep in mind that mtime (-m) argument will likely need to be added if noatime arg have been set on the disk/partition used in /etc/fstab (often set for SSDs etc.) . Since ‘last time file was accessed’ records are then not kept by the filesystem. :exclamation::exclamation:

https://linux.die.net/man/8/tmpreaper_selinux
https://tracker.debian.org/pkg/tmpreaper
https://aur.archlinux.org/packages/tmpreaper/
https://tldp.org/LDP/solrhe/Securing-Optimizing-Linux-RH-Edition-v1.3/chap6sec73.html

I expect there is no problem doing it as such… :question:

AFAIK, the files in

are exports being made explicitly.

1 Like

as far as I remember, in SURROGATES/out/ is only which was put inside SURROGATES/in/ by the user themself. Its moved there as soon files are processed from in.

1 Like

Ah ok :+1: I simply assumed they were .gz’s intended to be sent out to other peers since they, at least the ones i glanced at (there were ALOT of them), were named containing what to me looked quite like peer ‘hashes’ (as seen in the ‘robinson’ network setting etc.)

Sorry - I confused DATA/EXPORT w/ SURROGATES

1 Like

No worries

I assume they were caused by perhaps interrupted xml .gz importing/unpacking of exported index from one of my other machines (or perhaps warc?). Likely from when i began experimenting with yacy in the first place, a rel. short while back :slight_smile: Anywho; There have not been any more ‘left behinds’ so far.

Though, unless it could actually cause a problem :question:, i guess it doesn’t hurt to let the tmpreaper cron be a sort of backup in case of crash or poweroutage in middle of importing something in the future.

And i probably confused index/document hash with being peer hash , in the filenames, now that i’ve become a bit more accustomed observing the yacy log :smiley:*

Skål! and God Jul! :beer: