Python爬虫之正则表达式(3) Python 第1张Python爬虫之正则表达式(3) Python 第2张Python爬虫之正则表达式(3) Python 第3张Python爬虫之正则表达式(3) Python 第4张Python爬虫之正则表达式(3) Python 第5张Python爬虫之正则表达式(3) Python 第6张Python爬虫之正则表达式(3) Python 第7张

 1 # re.sub
 2 # 替换字符串中每一个匹配的子串后返回替换后的字符串
 3 import re
 4 content = 'Extra strings Hello 1234567 World_This is a Regex Demo Extra strings'
 5 content = re.sub('\d+', '', content)
 6 print(content)
 7 
 8 import re
 9 content = 'Extra strings Hello 1234567 World_This is a Regex Demo Extra strings'
10 content = re.sub('\d+', 'Replacement', content)
11 print(content)
12 
13 # \1 是转义字符
14 import re
15 content = 'Extra strings Hello 1234567 World_This is a Regex Demo Extra strings'
16 content = re.sub('(\d+)', r'\1 8910', content)
17 print(content)
18 
19 # re.compile
20 # 将正则字符串编译成正则表达式对象
21 # 将一个正则表达式串编译成正则对象,以便于复用该匹配模式
22 import re
23 content = '''Hello 1234567 World_This
24 is a Regex Demo'''
25 pattern = re.compile('Hello.*Demo', re.S)
26 result = re.match(pattern, content)
27 print(result)

下面是爬取豆瓣图书的实战代码

 1 import requests
 2 import re
 3 content = requests.get('https://book.douban.com/').text
 4 # print(content)
 5 pattern = re.compile('<li.*?cover.*?title="(.*?)".*?author">(.*?)</div>.*?year">(.*?)</span>.*?</li>', re.S)
 6 results = re.findall(pattern, content)
 7 for result in results:
 8     name, author, date = result
 9     author = re.sub("\s", "", author)
10     date = re.sub("\s", "", date)
11     print("【书名】:", name, " 【作者】:", author, " 【出版年】:", date)

本篇内容为:崔庆才爬虫学习笔记

SRE实战 互联网时代守护先锋,助力企业售后服务体系运筹帷幄!一键直达领取阿里云限量特价优惠。
扫码关注我们
微信号:SRE实战
拒绝背锅 运筹帷幄