语言版本:Python 2.7
函数库:urllib2、chardet、BeautifulSoup
示例代码如下:
import urllib2
import chardet
from bs4 import BeautifulSoup
data = urllib2.urlopen('http://www.nitrohsu.com').read()
encodeStr = chardet.detect(a)['encoding']
soup=BeautifulSoup(data,from_encoding=encodeStr)
print soup.prettify
--------------------------------------------------------------------------------
chardet是一个自动检测网页编码的函数,调用detect会返回一个字典:
{'confidence': 0.99, 'encoding': 'utf-8'}
confidence是检测的正确率,encoding是网页编码的代码
---