Scrapy 0.14 Documentation
second <p> tag inside thetag with id=specifications:<p> Category: Movies > Documentary p> <p> Total Total size: 699.79 megabytep> An XPath expression to select the description could be: //div[@id='specifications']/p[2]/text()[2] For more information about XPath see the XPath reference = x.select("//div[@id='description']").extract() torrent['size'] = x.select("//div[@id='info-left']/p[2]/text()[2]").extract() return torrent For brevity’s sake, we intentionally left out the import statements0 码力 | 179 页 | 861.70 KB | 1 年前3Scrapy 0.22 Documentation
second <p> tag inside thetag with id=specifications:<p> Category: Movies > Documentary p> <p> Total Total size: 150.62 megabytep> An XPath expression to select the file size could be: //div[@id=’specifications’]/p[2]/text()[2] For more information about XPath see the XPath reference sel.xpath("//div[@id=’description’]").extract() torrent[’size’] = sel.xpath("//div[@id=’info-left’]/p[2]/text()[2]").extract() return torrent The TorrentItem class is defined above. 2.1.4 Run the spider0 码力 | 199 页 | 926.97 KB | 1 年前3Scrapy 0.20 Documentation
second <p> tag inside thetag with id=specifications:<p> Category: Movies > Documentary p> <p> Total Total size: 699.79 megabytep> An XPath expression to select the file size could be: //div[@id=’specifications’]/p[2]/text()[2] For more information about XPath see the XPath reference sel.xpath("//div[@id=’description’]").extract() torrent[’size’] = sel.xpath("//div[@id=’info-left’]/p[2]/text()[2]").extract() return torrent For brevity’s sake, we intentionally left out the import statements0 码力 | 197 页 | 917.28 KB | 1 年前3Scrapy 0.18 Documentation
second <p> tag inside thetag with id=specifications:<p> Category: Movies > Documentary p> <p> Total Total size: 699.79 megabytep> An XPath expression to select the file size could be: //div[@id='specifications']/p[2]/text()[2] For more information about XPath see the XPath reference = x.select("//div[@id='description']").extract() torrent['size'] = x.select("//div[@id='info-left']/p[2]/text()[2]").extract() return torrent For brevity’s sake, we intentionally left out the import statements0 码力 | 201 页 | 929.55 KB | 1 年前3Scrapy 0.16 Documentation
second <p> tag inside thetag with id=specifications:<p> Category: Movies > Documentary p> <p> Total Total size: 699.79 megabytep> An XPath expression to select the file size could be: //div[@id='specifications']/p[2]/text()[2] For more information about XPath see the XPath reference = x.select("//div[@id='description']").extract() torrent['size'] = x.select("//div[@id='info-left']/p[2]/text()[2]").extract() return torrent For brevity’s sake, we intentionally left out the import statements0 码力 | 203 页 | 931.99 KB | 1 年前3Scrapy 0.24 Documentation
second <p> tag inside thetag with id=specifications:<p> Category: Movies > Documentary p> <p> Total Total size: 150.62 megabytep> An XPath expression to select the file size could be: //div[@id='specifications']/p[2]/text()[2] For more information about XPath see the XPath reference xpath("//div[@id='description']").extract() torrent['size'] = response.xpath("//div[@id='specifications']/p[2]/text()[2]").extract() return torrent The TorrentItem class is defined above. 2.1.4 Run the spider0 码力 | 222 页 | 988.92 KB | 1 年前3Scrapy 0.9 Documentation
second <p> tag inside thetag with id=specifications:<p> Category: Movies > Documentary p> <p> Total Total size: 699.79 megabytep> An XPath expression to select the description could be: //div[@id='specifications']/p[2]/text()[2] For more information about XPath see the XPath reference First steps Scrapy Documentation, Release 0.9 torrent['size'] = x.select("//div[@id='info-left']/p[2]/text()[2]").extract() return torrent For brevity sake, we intentionally left out the import statements0 码力 | 156 页 | 764.56 KB | 1 年前3Scrapy 1.0 Documentation
stackoverflow_spider.py and run the spider using the runspider command: scrapy runspider stackoverflow_spider.py -o top-stackoverflow-questions.json When this finishes you will have in the top-stackoverflow-questions • lxml. Most Linux distributions ships prepackaged versions of lxml. Otherwise refer to http://lxml.de/ installation.html • OpenSSL. This comes preinstalled in all operating systems, except Windows where to store the scraped data is by using Feed exports, with the following command: scrapy crawl dmoz -o items.json That will generate an items.json file containing all scraped items, serialized in JSON.0 码力 | 244 页 | 1.05 MB | 1 年前3Scrapy 1.1 Documentation
quotes_spider.py and run the spider using the runspider command: scrapy runspider quotes_spider.py -o quotes.json When this finishes you will have in the quotes.json file a list of the quotes in JSON format • lxml. Most Linux distributions ships prepackaged versions of lxml. Otherwise refer to http://lxml.de/ installation.html • OpenSSL. This comes preinstalled in all operating systems, except Windows where store the scraped data is by using Feed exports, with the following command: scrapy crawl quotes -o quotes.json That will generate an quotes.json file containing all scraped items, serialized in JSON0 码力 | 260 页 | 1.12 MB | 1 年前3Scrapy 1.2 Documentation
quotes_spider.py and run the spider using the runspider command: scrapy runspider quotes_spider.py -o quotes.json When this finishes you will have in the quotes.json file a list of the quotes in JSON format store the scraped data is by using Feed exports, with the following command: scrapy crawl quotes -o quotes.json That will generate an quotes.json file containing all scraped items, serialized in JSON up with a broken JSON file. You can also used other formats, like JSON Lines: scrapy crawl quotes -o quotes.jl The JSON Lines format is useful because it’s stream-like, you can easily append new records0 码力 | 266 页 | 1.10 MB | 1 年前3共 62 条- 1
- 2
- 3
- 4
- 5
- 6
- 7