基本信息
源码名称:获取上海二手房
源码大小:3.03M
文件格式:.csv
开发语言:Python
更新时间:2024-07-27
友情提示:(无需注册或充值,赞助后即可获取资源下载链接)
嘿,亲!知识可是无价之宝呢,但咱这精心整理的资料也耗费了不少心血呀。小小地破费一下,绝对物超所值哦!如有下载和支付问题,请联系我们QQ(微信同号):813200300
本次赞助数额为: 2 元×
微信扫码支付:2 元
×
请留下您的邮箱,我们将在2小时内将文件发到您的邮箱
源码介绍
爬虫获取上海二手房
def ua():
"""随机获取一个浏览器用户信息"""
user_agents = [
'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11',
'Opera/9.25 (Windows NT 5.1; U; en)',
'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)',
'Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.5 (like Gecko) (Kubuntu)',
'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.12) Gecko/20070731 Ubuntu/dapper-security Firefox/1.5.0.12',
'Lynx/2.8.5rel.1 libwww-FM/2.14 SSL-MM/1.4.1 GNUTLS/1.2.9',
'Mozilla/5.0 (X11; Linux i686) AppleWebKit/535.7 (KHTML, like Gecko) Ubuntu/11.04 Chromium/16.0.912.77 Chrome/16.0.912.77 Safari/535.7',
'Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:10.0) Gecko/20100101 Firefox/10.0',
]
agent = random.choice(user_agents)
return {
'User-Agent': agent
}
def get(url):
"""
获取网页源码
url: 目标网页的地址
return:网页源码
"""
res = requests.get(url=url, headers = ua())
return res.text
def get_url(res_text):
"""
获取源码中每个二手房详情页的url
res_text:网页源码
return:列表形式的30个二手房详情页的url
"""
re_f = '<a class="" href="(.*?)" target="_blank"'
url_list = re.findall(re_f, res_text)
return url_list
爬虫获取上海二手房
def ua():
"""随机获取一个浏览器用户信息"""
user_agents = [
'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11',
'Opera/9.25 (Windows NT 5.1; U; en)',
'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)',
'Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.5 (like Gecko) (Kubuntu)',
'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.12) Gecko/20070731 Ubuntu/dapper-security Firefox/1.5.0.12',
'Lynx/2.8.5rel.1 libwww-FM/2.14 SSL-MM/1.4.1 GNUTLS/1.2.9',
'Mozilla/5.0 (X11; Linux i686) AppleWebKit/535.7 (KHTML, like Gecko) Ubuntu/11.04 Chromium/16.0.912.77 Chrome/16.0.912.77 Safari/535.7',
'Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:10.0) Gecko/20100101 Firefox/10.0',
]
agent = random.choice(user_agents)
return {
'User-Agent': agent
}
def get(url):
"""
获取网页源码
url: 目标网页的地址
return:网页源码
"""
res = requests.get(url=url, headers = ua())
return res.text
def get_url(res_text):
"""
获取源码中每个二手房详情页的url
res_text:网页源码
return:列表形式的30个二手房详情页的url
"""
re_f = '<a class="" href="(.*?)" target="_blank"'
url_list = re.findall(re_f, res_text)
return url_list