博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Scrapy at a glance预览
阅读量:5871 次
发布时间:2019-06-19

本文共 2701 字,大约阅读时间需要 9 分钟。

1、安装scrapy

2、创建爬虫项目 scrapy startproject test_scrapy 3、创建quotes_spider.py文件 4、复制下面代码到quotes_spider.py文件 import scrapy #导入模块 #编写QuotesSpider类
class QuotesSpider(scrapy.Spider):     name = "quotes"     #爬取网站地址     start_urls = [         'http://quotes.toscrape.com/tag/humor/',     ]     def parse(self, response): #定义解析方法         for quote in response.css('div.quote'):   #解析class="quote"的div             #采用字典记录,爬取内容部分定义             yield {
'text': quote.css('span.text::text').extract_first(), 'author': quote.xpath('span/small/text()').extract_first(), } #下一页地址 next_page = response.css('li.next a::attr("href")').extract_first() if next_page is not None: yield response.follow(next_page, self.parse) 5、cd test_scrapy 到quotes_spider.py文件目录 6、运行scrapy runspider quotes_spider.py -o quotes.json命令 可看到目录下多了quotes.json文件 打开quotes文件可看到
[ {"text": "\u201cThe person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.\u201d", "author": "Jane Austen"}, {"text": "\u201cA day without sunshine is like, you know, night.\u201d", "author": "Steve Martin"}, {"text": "\u201cAnyone who thinks sitting in church can make you a Christian must also think that sitting in a garage can make you a car.\u201d", "author": "Garrison Keillor"}, {"text": "\u201cBeauty is in the eye of the beholder and it may be necessary from time to time to give a stupid or misinformed beholder a black eye.\u201d", "author": "Jim Henson"}, {"text": "\u201cAll you need is love. But a little chocolate now and then doesn't hurt.\u201d", "author": "Charles M. Schulz"}, {"text": "\u201cRemember, we're madly in love, so it's all right to kiss me anytime you feel like it.\u201d", "author": "Suzanne Collins"}, {"text": "\u201cSome people never go crazy. What truly horrible lives they must lead.\u201d", "author": "Charles Bukowski"}, {"text": "\u201cThe trouble with having an open mind, of course, is that people will insist on coming along and trying to put things in it.\u201d", "author": "Terry Pratchett"}, {"text": "\u201cThink left and think right and think low and think high. Oh, the thinks you can think up if only you try!\u201d", "author": "Dr. Seuss"}, {"text": "\u201cThe reason I talk to myself is because I\u2019m the only one whose answers I accept.\u201d", "author": "George Carlin"}, {"text": "\u201cI am free of all prejudice. I hate everyone equally. \u201d", "author": "W.C. Fields"}, {"text": "\u201cA lady's imagination is very rapid; it jumps from admiration to love, from love to matrimony in a moment.\u201d", "author": "Jane Austen"} ]
 

转载于:https://www.cnblogs.com/CelonY/p/10173580.html

你可能感兴趣的文章
weblogic故障报错
查看>>
CLR线程池的作用与原理浅析
查看>>
MySQL -- 获取当前数据行号
查看>>
CentOS 7.0编译安装Nginx1.6.0+MySQL5.6.19+PHP5.5.14
查看>>
我的友情链接
查看>>
Waymo在美国推出自动驾驶汽车共享服务
查看>>
Windows 10企业批量部署实战之Windows 10客户端部署
查看>>
python3+arcface2.0 离线人脸识别 demo
查看>>
数据类型之字符串篇
查看>>
linux中引入python的tkinter模块
查看>>
不学无数——适配器模式
查看>>
一张图了解Spring Cloud微服务架构
查看>>
上下文管理器
查看>>
用Golang写一个搜索引擎(0x03)
查看>>
OSChina 周六乱弹 ——用大脑直接写代码
查看>>
notepad++配置Zen Coding
查看>>
make
查看>>
python临时笔记
查看>>
ios自定义UITextView 支持placeholder的方法
查看>>
多语言跨平台远程过程调用【Avro】
查看>>