python第三方库网站（1）

记得安装快速第三方库，Python经常需要安装第三方库，原始的下载速度很慢，使用国内的镜像就很快啦 ↓↓↓1、pip3 install -i https://pypi.tuna.tsinghua.edu.cn/simple 包名快速下载模块：Python经常需要安装第三方库，原始的下载速度很慢，使用国内的镜像就很快（嗖嗖滴）国内镜像源网址：清华：https://pypi.tuna.tsinghua.edu.cn/simple阿里云：http://mirrors.aliyun.com/pypi/simple/中国科学技术大学：https://pypi.mirrors.ustc.edu.cn/simple/豆瓣：http://pypi.douban.com/simple/使用方法pip3 install -i https://pypi.tuna.tsinghua.edu.cn/simple 包名就ok了。永久修改：上面的方法是还是比较麻烦，现在介绍一种永久修改的方法Linux下，修改 ~/.pip/pip.conf (没有就创建一个文件夹及文件。文件夹要加“.”，表示是隐藏文件夹)内容如下：[global]index-url = https://pypi.tuna.tsinghua.edu.cn/simple[install]trusted-host=mirrors.aliyun.comwindows下，直接在user目录中创建一个pip目录，再新建文件pip.ini。（例如：C:\Users\zzk\pip\pip.ini）内容同上。官方网址：Requests: 让 HTTP 服务人类：https://docs.python-requests.org/zh_CN/latest/Requests：requests官方文档 https://docs.python-requests.org/zh_CN/latest/进行爬虫，首先要对网址进行请求，这个时候就要用到我们的requests模块了。requests是python的一个HTTP客户端库，跟urllib，urllib2类似。与urllib，urllib2相比，requests模块语法更加简单。正如他的官网所说：Requests模块介绍：发送http请求，获取响应数据requests模块是一个第三方模块，需要在你的python(虚拟)环境中额外安装： pip/pip3 install requestsrequests基础：requests模块发送get请求#https://beishan.blog.csdn.net/import requests # 目标urlurl = 'https://www.baidu.com' # 向目标url发送get请求response = requests.get(url)# 打印响应内容print(response.text)Response响应对象：观察上边代码运行结果发现，有好多乱码；这是因为编解码使用的字符集不同造成的；我们尝试使用下边的办法来解决中文乱码问题import requests url = 'https://www.baidu.com' # 向目标url发送get请求response = requests.get(url)# 打印响应内容# print(response.text)print(response.content.decode()) # 注意这里！1、response.text是requests模块按照chardet模块推测出的编码字符集进行解码的结果2、网络传输的字符串都是bytes类型的，所以response.text = response.content.decode(‘推测出的编码字符集’)3、我们可以在网页源码中搜索charset，尝试参考该编码字符集，注意存在不准确的情况。Response.text 和Response.content的区别：1、Response.text类型：str解码类型： requests模块自动根据HTTP 头部对响应的编码作出有根据的推测，推测的文本编码2、Response.content类型：bytes解码类型：没有指定解决中文乱码：通过对response.content进行decode，来解决中文乱码1、Response.content.decode() 默认utf-82、Response.content.decode("GBK")3、常见的编码字符集utf-8gbkgb2312ascii （读音：阿斯克码）iso-8859-1Response响应对象的其它常用属性或方法：#https://beishan.blog.csdn.net/# 1.2.3-response其它常用属性import requests# 目标urlurl = 'https://www.baidu.com'# 向目标url发送get请求response = requests.get(url)# 打印响应内容# print(response.text)# print(response.content.decode()) # 注意这里！print(response.url) # 打印响应的urlprint(response.status_code) # 打印响应的状态码print(response.request.headers) # 打印响应对象的请求头print(response.headers) # 打印响应头print(response.request._cookies) # 打印请求携带的cookiesprint(response.cookies) # 打印响应中携带的cookiesRequests实操:requests模块发送请求发送带header的请求我们先写一个获取百度首页的代码import requestsurl = 'https://www.baidu.com'response = requests.get(url)print(response.content.decode())# 打印响应对应请求的请求头信息print(response.request.headers)从浏览器中复制User-Agent，构造Headers字典；完成下面的代码后，运行代码查看结果import requestsurl = 'https://www.baidu.com'headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36"}# 在请求头中带上User-Agent，模拟浏览器发送请求response = requests.get(url, headers=headers) print(response.content)# 打印请求头信息print(response.request.headers)发送带参数的请求：我们在使用百度搜索的时候经常发现url地址中会有一个 ?，那么该问号后边的就是请求参数，又叫做查询字符串在url携带参数，直接对含有参数的url发起请求import requestsheaders = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36"}url = 'https://www.baidu.com/s?wd=python'response = requests.get(url, headers=headers)通过Params携带参数字典：1.构建请求参数字典2.向接口发送请求的时候带上参数字典，参数字典设置给paramsimport requestsheaders = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36"}# 这是目标url# url = 'https://www.baidu.com/s?wd=python'# 最后有没有问号结果都一样url = 'https://www.baidu.com/s?'# 请求参数是一个字典即wd=pythonkw = {'wd': 'python'}# 带上请求参数发起请求，获取响应response = requests.get(url, headers=headers, params=kw)print(response.content)从浏览器中复制User-Agent和Cookie浏览器中的请求头字段和值与headers参数中必须一致headers请求参数字典中的Cookie键对应的值是字符串import requestsurl = 'https://github.com/USER_NAME'# 构造请求头字典headers = { # 从浏览器中复制过来的User-Agent 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36', # 从浏览器中复制过来的Cookie 'Cookie': 'xxx这里是复制过来的cookie字符串'}# 请求头参数字典中携带cookie字符串resp = requests.get(url, headers=headers)print(resp.text)超时参数timeout的使用：在平时网上冲浪的过程中，我们经常会遇到网络波动，这个时候，一个请求等了很久可能仍然没有结果。在爬虫中，一个请求很久没有结果，就会让整个项目的效率变得非常低，这个时候我们就需要对请求进行强制要求，让他必须在特定的时间内返回结果，否则就报错。1、超时参数timeout的使用方法response = requests.get(url, timeout=3)2、timeout=3表示：发送请求后，3秒钟内返回响应，否则就抛出异常import requestsurl = 'https://twitter.com'response = requests.get(url, timeout=3) # 设置超时时间Requests发送post请求的方法：response = requests.post(url, data)data参数接收一个字典requests模块发送post请求函数的其它参数和发送get请求的参数完全一致

相关文章