Document Push API example?

DoerrSt · 19 December 2025 21:47

Hi,

I am trying to figure out how the document push API works to “upload” page html source for indexing.

Following the examples I tried:

❯ wget “localhost:8090/api/push_p.json?count=1&url-0=nowhere.cc/example4.txt&data-0=hello world&responseHeader-0=Last-Modified:Tue, 15 Nov 1994 12:45:26 GMT&responseHeader-0=Content-Type:text/plain&collection-0=testpush”

which returns:

{ “count”:“1”, “successall”: “true”, “item-0”:{ “item”:“0”, “url”:“nowhere.cc/example4.txt”, “success”: “true”, “message”: “localhost:8090/solr/select?q=sku:“nowhere.cc/example4.txt”” }, “countsuccess”:1, “countfail”:0}

what seems to confirm “success”.

Correspond log entries (size 0?):

I 2025/12/19 22:32:08 org.apache.solr.core.QuerySenderListener QuerySenderListener done.

I 2025/12/19 22:32:03 SWITCHBOARD Indexed 5 words in URL htt*://nowhere.cc/example4.txt [RISpp7B_OzwQ] Description: MimeType: | Charset: UTF-8 | Size: 0 bytes | LinkStorageTime: 2 ms | indexStorageTime: 0 ms

I 2025/12/19 22:32:03 org.apache.solr.update.processor.LogUpdateProcessorFactory [collection1] webapp=null path=/update params={}{add=[RISpp7B_OzwQ (1851973879713497088)]} 0 2

I 2025/12/19 22:32:03 Fulltext * indexing: RISpp7B_OzwQ hp://nowhere.cc/example4.txt

I 2025/12/19 22:32:03 SWITCHBOARD * Excluded 0 words in URL hp://nowhere.cc/example4.txt

W 2025/12/19 22:31:50 PLASMA * CrawlResults: URL not in index with url hash H6t8w7B_OzwQ

I 2025/12/19 22:31:50 org.apache.solr.update.processor.LogUpdateProcessorFactory [webgraph] webapp=null path=/update params={commit=true&softCommit=true&waitSearcher=true}{commit=} 0 0

I 2025/12/19 22:31:50 org.apache.solr.update.processor.LogUpdateProcessorFactory [collection1] webapp=null path=/update params={commit=true&softCommit=true&waitSearcher=true}{commit=} 0 1

W 2025/12/19 22:31:49 PLASMA * CrawlResults: URL not in index with url hash H6t8w7B_OzwQ

W 2025/12/19 22:31:41 PLASMA * CrawlResults: URL not in index with url hash H6t8w7B_OzwQ

W 2025/12/19 22:31:39 PLASMA * CrawlResults: URL not in index with url hash H6t8w7B_OzwQ

I 2025/12/19 22:31:39 org.apache.solr.update.processor.LogUpdateProcessorFactory [webgraph] webapp=null path=/update params={commit=true&softCommit=true&waitSearcher=true}{commit=} 0 0

I 2025/12/19 22:31:39 org.apache.solr.update.processor.LogUpdateProcessorFactory [collection1] webapp=null path=/update params={commit=true&softCommit=true&waitSearcher=true}{commit=} 0 0

(had to modify the links as new users have a maximum of 2 per post) And the overview for indexed files:

(5) Results for Local Crawling

These web pages had been crawled by your own crawl task.

Use Case: start a crawl by setting a crawl start point on the ‘Index Create’ page.

Statistics about 1 domains in this stack:

Domain

URLs

Blacklist to use

url.default.black

nowhere.cc

1

undefined

-—

Showing all 2 entries in this stack.

Collection

Modified

Words

Title

URL

[testpush]

1994/11/15

0

no title

hp://nowhere.cc/example4.txt

undefined

-—

But when I search for it, it doesn’t find any result.

Does anyone have an idea what’s wrong here or an working example?

BTW: Indexing and searching works great when indexing web-pages in the internet.

Thanks for any help!

roamn · 22 December 2025 07:50

Hi @DoerrSt welcome

Try this layout.

wget "http://localhost:8090/api/push_p.json?\
count=1&\
url-0=http://example.local/hello.txt&\
data-0=Hello%20World&\
responseHeader-0=Content-Type:text/plain"