/date search uses the “date indexed” to sort out the results. If I crawl a huge news site, all, even the really historical pages (NYtimes got archives dating back to 19th century, for example) are dated as “today” (or sometimes even in future, if page says so). Would it be possible to do some heuristics on a real date published, probably using some combination of metadata? Other search engines do, somehow.
Is it possible to switch /date operator to use http date_modified header indexed (or other, in best case, heuristicaly found date) in solr instead?
inspiration could be, for example, this python code (line 192 onwards), which uses various methods to detect the date of publishing.