Hi,
I am trying to figure out how the document push API works to “upload” page html source for indexing.
Following the examples I tried:
❯ wget “localhost:8090/api/push_p.json?count=1&url-0=nowhere.cc/example4.txt&data-0=hello world&responseHeader-0=Last-Modified:Tue, 15 Nov 1994 12:45:26 GMT&responseHeader-0=Content-Type:text/plain&collection-0=testpush”
which returns:
{ “count”:“1”, “successall”: “true”, “item-0”:{ “item”:“0”, “url”:“nowhere.cc/example4.txt”, “success”: “true”, “message”: “localhost:8090/solr/select?q=sku:“nowhere.cc/example4.txt”” }, “countsuccess”:1, “countfail”:0}
what seems to confirm “success”.
Correspond log entries (size 0?):
I 2025/12/19 22:32:08 org.apache.solr.core.QuerySenderListener QuerySenderListener done.
I 2025/12/19 22:32:03 SWITCHBOARD Indexed 5 words in URL htt*://nowhere.cc/example4.txt [RISpp7B_OzwQ] Description: MimeType: | Charset: UTF-8 | Size: 0 bytes | LinkStorageTime: 2 ms | indexStorageTime: 0 ms
I 2025/12/19 22:32:03 org.apache.solr.update.processor.LogUpdateProcessorFactory [collection1] webapp=null path=/update params={}{add=[RISpp7B_OzwQ (1851973879713497088)]} 0 2
I 2025/12/19 22:32:03 Fulltext * indexing: RISpp7B_OzwQ hp://nowhere.cc/example4.txt
I 2025/12/19 22:32:03 SWITCHBOARD * Excluded 0 words in URL hp://nowhere.cc/example4.txt
W 2025/12/19 22:31:50 PLASMA * CrawlResults: URL not in index with url hash H6t8w7B_OzwQ
I 2025/12/19 22:31:50 org.apache.solr.update.processor.LogUpdateProcessorFactory [webgraph] webapp=null path=/update params={commit=true&softCommit=true&waitSearcher=true}{commit=} 0 0
I 2025/12/19 22:31:50 org.apache.solr.update.processor.LogUpdateProcessorFactory [collection1] webapp=null path=/update params={commit=true&softCommit=true&waitSearcher=true}{commit=} 0 1
W 2025/12/19 22:31:49 PLASMA * CrawlResults: URL not in index with url hash H6t8w7B_OzwQ
W 2025/12/19 22:31:41 PLASMA * CrawlResults: URL not in index with url hash H6t8w7B_OzwQ
W 2025/12/19 22:31:39 PLASMA * CrawlResults: URL not in index with url hash H6t8w7B_OzwQ
I 2025/12/19 22:31:39 org.apache.solr.update.processor.LogUpdateProcessorFactory [webgraph] webapp=null path=/update params={commit=true&softCommit=true&waitSearcher=true}{commit=} 0 0
I 2025/12/19 22:31:39 org.apache.solr.update.processor.LogUpdateProcessorFactory [collection1] webapp=null path=/update params={commit=true&softCommit=true&waitSearcher=true}{commit=} 0 0
(had to modify the links as new users have a maximum of 2 per post) And the overview for indexed files:
(5) Results for Local Crawling
These web pages had been crawled by your own crawl task.
Use Case: start a crawl by setting a crawl start point on the ‘Index Create’ page.
Statistics about 1 domains in this stack:
Domain
URLs
Blacklist to use
url.default.black
nowhere.cc
1
undefined
-—
-—
-—
-—
Showing all 2 entries in this stack.
Collection
Modified
Words
Title
URL
[testpush]
1994/11/15
0
no title
hp://nowhere.cc/example4.txt
undefined
-—
-—
-—
-—
-—
But when I search for it, it doesn’t find any result.
Does anyone have an idea what’s wrong here or an working example?
BTW: Indexing and searching works great when indexing web-pages in the internet.
Thanks for any help!