how much space does an index actually need on the disc? I’ve been crawling a few pages for weeks and it’s getting more and more. I have about 15,000,000 documents in the index and the disc takes up about 350 GB.
Is the index an exact textual copy of the web pages? Then I can understand the quantity. Or is the index somehow a reduced list of the words it contains?
Is there anything that can be done to reduce the disc space?
Aah, ok thanks a lot! I haven’t used the advanced crawl yet. I only used the simple crawl to start with. You don’t seem to be able to set that. Are the images also saved in the simple crawl?