solr - how to crawl a website by specifying depth -
I am using nutch 2.x. So I'm trying to use the nutch command with deeper options
$: nutch Injection ./urls/seed.txt -depth 5
To execute this order After receiving messages like
Unrecognized Arg-Deep
So when I failed on this, I tried to use the nutch crawl
$: Nutch crawl ./urls/seed.txt -depth 5
is being like an error
The command crawl has been deprecated, please use bin / crawl instead.
e So I tried to use the crawl command to crawl the URL in the CRP. In that case the depth option is asking for solr but I am not using solr
so my question is
My question is, what do you do by crawling the page Want and list it in SORR Do not smoke?
Answer your question:
If you want to use Nutch crawler and you want to list it in SOLR, remove the following piece of code from the crawl script.
Answer to another question:
Be sure to get HTML content for all links that have been crawled by nach (this link ):
This will definitely solve your issue.
Comments
Post a Comment