So, discovered the other day that one of my yacy was complaining about diskspace being full. I cleaned whatever i could, using the admin panels. But to no avail.
Though, i shortly after discovere ~9GB(!!) of quite old data lingering in DATA/SURROGATES/out/
The files had apparently neither been touched or modiefied in quite some time.
(This might be because of changes in the requesting peers. Or rare/never-returning peers; ‘Leaving behind’ their requested data )
I decided they’re probably lingering files of no use. So I decideded to apply tmpreaper
In case someone might find it useful; The following is a crontab entry i currently use to prevent lingering 'out’s . It deletes them 12 hours after their modified/creation time (mtime), if not accessed (atime) in 12 hours. Checking is being done at reboot, and then subsequently every 15th minute.
(note that user in this case is ‘yacy’. evnt. adjust as needed)
Keep in mind that mtime(-m) argument will likely need to be added if noatime arg have been set on the disk/partition used in /etc/fstab (often set for SSDs etc.) . Since ‘last time file was accessed’ records are then not kept by the filesystem.
as far as I remember, in SURROGATES/out/ is only which was put inside SURROGATES/in/ by the user themself. Its moved there as soon files are processed from in.
Ah ok I simply assumed they were .gz’s intended to be sent out to other peers since they, at least the ones i glanced at (there were ALOT of them), were named containing what to me looked quite like peer ‘hashes’ (as seen in the ‘robinson’ network setting etc.)
I assume they were caused by perhaps interrupted xml .gz importing/unpacking of exported index from one of my other machines (or perhaps warc?). Likely from when i began experimenting with yacy in the first place, a rel. short while back Anywho; There have not been any more ‘left behinds’ so far.
Though, unless it could actually cause a problem , i guess it doesn’t hurt to let the tmpreaper cron be a sort of backup in case of crash or poweroutage in middle of importing something in the future.
And i probably confused index/document hash with being peer hash , in the filenames, now that i’ve become a bit more accustomed observing the yacy log *
I just renamed the surrogate functionality and combined exporting/importing into one concept: packs.
The following paths had now been renamed:
DATA/SURROGATE/in → DATA/PACKS/load
DATA/SURROGATE/out → DATA/PACKS/loaded
DATA/EXPORT → DATA/PACKS/hold
There will also be a pack manager that can trigger the import/export/delete activities that can be done around this concept. There will also be the option to import packs directly from Web addresses.