python - When and how should use multiple spiders in one Scrapy project -

I'm using Scrapy , this is great! To make a crawler, the number of fast web sites is increasing, new spiders need to be made, but these are the same type of web, all these spiders use the same items, pipelines, parsing process

Contents of the project directory:

  test / ├── scrapy.cfg └── exam ├── __init__.py ├ items.py ├── mybasespider.py ├ ── pipelines.py Settings Py ├── spider1_settings.py ├── spider2_settings.py └── Spider ├── __init__.py ├── spider1.py └── spi To reduce the source code, redundancy is a base spider  MyBaseSpider  in  mybasespider.py , it contains 95% The source code is inherited from all other spiders, if the spider has certain things, then override something  class methods , in general, to create only a new spider, many lines have the source code Need to add 
  Keep all the normal settings in  settings.py , the special setting of a spider [Name of the spider] is in _settings.py , like: 
   spider1_settings.py   Code>: 
   Settings import * LOG_FILE = 'spider1.log' LOG_LEVEL = 'INFO' JOBDIR = 'spider1-job' START_URLS = ['http://test1.com/  spider2_settings.py : 
  from  settings  
  to     >  import * LOG_FILE = 'spider2.log' LOG_LEVEL = 'DEBUG' JOBDIR = 'spider2-job' START_URLS = ['http://test2.com/',]

Using all URLs in , LOG_FILE , LOG_LEVEL , JOBDIR Does; The START_URLS is filled in MyBaseSpider.start_urls , the different spider has different content, but the name used in Base Spider MyBaseSpider Changed START_URLS .

Content of scrapy.cfg :

  [settings] default = test.settings spider1 = spider1.settings spider2 = spider2.settings [ Deploy] url = http: // localhost: 6800 / project = test

to run a spider, such as spider1 :

export SCRAPY_PROJECT = spider1 <

But in this way spiders run Can not be used for scrapyd . scrapyd-deploy command always used in 'default' project name scrapy.cfg in the 'Settings' section to create an egg file and scrapyd

Is this a project Is there a way to use multiple spiders? I do not build per spider? Are there any better ways?

How to separate the spider's special settings as above, which can run in scrapyd and reduce source code redundancy < / P>

If all spiders use the same JOBDIR , then is all the spiders safe to run together? Is constant spider condition contaminated?

Any insights will be highly appreciated. All spiders must have their own class as

, you should have per-spider per setting custom_settings < / Code> can be set with class arguments, something like this:

   class MySpider1 (spider): name = "spider1" custom_settings = {'USER_AGENT': 'user_agent_for_spider1 / version1'} class MySpider1 (spider): name = "spider1" custom_settings = {'USER_AGENT': 'user_agent_for_spider2 / version2'}  
  this  custom_settings  people  settings.py  if you can still set some global people file then






-



03:22


















Get link





Facebook





X





Pinterest





Email





Other Apps




Comments





Post a Comment



Popular posts from this blog




java - Joda Time Interval Not returning what I expect -



    मेरे पास निम्न प्रोग्राम है    import java.util। *; Import java.text। *; आयात करें org.joda.time। *; सार्वजनिक श्रेणी के स्कोप कंट्रोल {सार्वजनिक स्थिर अंतराल मिलनसार () {दिनांक समय currDate = नया दिनांक समय (2008, 4, 4, 15, 30, 0, 0); दिनांक समय epochDate = नया दिनांकटाइम (2000, 1, 1, 12, 0, 0, 0); अंतराल अंतराल = नया अंतराल (युरोप डेट, करोडेट); वापसी अंतराल; } सार्वजनिक स्थिर शून्य मुख्य (स्ट्रिंग [] आर्ग्स) {डबल दिनबात = मिलते समय ()। ToDurationMillis () / 1000/60/60/24; StdOut.println (daysBtween); }}    मुझे आउटपुट मिल रहा है: 3016.0   लेकिन जो मैं देख रहा हूं वह है: 3016.1458333333   मैं क्या कर रहा हूँ ?      toDurationMillis एक लंबा लौटा देता है, और प्रत्येक प्रभाग को int के रूप में घोषित किया जाता है। जावा इस प्रकार इनट्स को लॉन्ग में परिवर्तित कर देगा और डिवीजन को लंबे समय तक लौटाना होगा। परिणाम को अंत में दोहरे रूप में परिवर्तित करना। डबल्स का उपयोग करके अभिव्यक्ति करने के लिए जावा को बताने के लिए, अभिव्यक्ति के किसी भी घटक को दोहरे रूप में घोषित करें। उदाहरण के लिए:    ...





Read more





javascript - Render HTML after each iteration in loop -



    I'm trying to gradually increase the font size of text on a web page. The code works in my place, although the new HTML / CSS does not render after every iteration of the loop and when all this is done, then display only the text of 100 px size. To see the text as if it is slowly zooming, I need to do this JavaScript is down because it is being used from a different file here is what I have ...    & lt; P class = "game-title" style = "font-size: 50px" & gt; Test & lt; / P & gt; Function sleep (milliseconds) {var start = new date (). GetTime (); For (var i = 0; i & lt; 1e7; i ++) {if ((new date). GetTime () - start) & gt; Milliseconds {break; CSS ('font-size', 'parseInt ($ (' game-title '). CSS (' font-size ')) + 1 + "pixels"); } While (parasont ($ ('# text'). Css ('font-size')) & lt; = 100) {sleep (1000); IncreaseSize (); Using CSS:           Use of jQuery (sample):    $ ('...





Read more





sip - Call SipJs to Asterisk 12 -



    I am trying to call Asterisk 12 from SIPJ. My partner is here    [6002]] type = friend secret = 6002 host = dynamic reference = public transport = ws avpf = yes icesupport = no encryption = no    And my JSP code is here    var configuration = {'ws_servers':' ws: //192.168.0.102: 8088 / ws', 'Yuri': 'SIP: 6002 @ 192.168.0.102 ',' Password ':' 6002 '}; Var option = {'EventHolders': EventHandler, 'Media Consultants': {'Audio': True, 'Video': Incorrect}}; Function call () {quietphonecall ('sip: 6003@192.168.0.102', option); }    It is properly registered, but when I call "call" function asterisk logs this error    secure without encryption details Rejecting Audio Stream: Audio 46421 RTP / SAVPF 111 103 104 0 8 106 105 13 126    JSSIp error is here   Call failed with reason: Incompatible SDP   Can someone help me?      First of all, you will have to create a certificate for DTLS. Then enabl...





Read more

Search This Blog

Alcantara

python - When and how should use multiple spiders in one Scrapy project -

Comments

Post a Comment

Popular posts from this blog

java - Joda Time Interval Not returning what I expect -

javascript - Render HTML after each iteration in loop -

sip - Call SipJs to Asterisk 12 -