文章/答案/技术大牛

发布

社区首页 >问答首页 >‘'gbk’编解码器无法在32位置编码字符‘\u 2022’：非法多字节序列

问‘'gbk’编解码器无法在32位置编码字符‘\u 2022’：非法多字节序列
EN

Stack Overflow用户

提问于 2022-08-30 09:52:48

回答 1查看 117关注 0票数 0

有一个关于写文件的问题。

当我使用data.to_csv('/home/bio_kang/Learning/Python/film_project/top250_film_info.csv', index=None, encoding='gbk')时，它给了我一个错误，即'gbk' codec can't encode character '\u2022' in position 32: illegal multibyte sequence。

这些数据来自https://movie.douban.com/top250网站。我用requests，beautifulsoup和re把他们从网站上弄出来。

这是我的部分代码：

uni_num = []
years = []
countries = []
directors = []
actors = []
descriptions = []
for i in range(250):
    with open('/home/bio_kang/Learning/Python/film_project/film_info/film_{}.html'.format(i), 'rb') as f:
        film_info = f.read().decode('utf-8','ignore')

        pattern_uni_num = re.compile(r'<span class="pl">IMDb:</span> (.*?)<br/>') # unique number
        pattern_year = re.compile(r'<span class="year">\((.*?)\)</span>') # year
        pattern_country = re.compile(r'<span class="pl">制片国家/地区:</span>(.*?)<br/>') # country
        pattern_director = re.compile(r'<meta content=(.*?) property="video:director"/>') # director
        pattern_actor = re.compile(r'<meta content="(.*?)" property="video:actor"/>') # actors
        pattern_description = re.compile(r'<meta content="(.*?)property="og:description">') # description

        uni_num.append(str(re.findall(pattern_uni_num, film_info)).strip("[]").strip("'"))
        years.append(str(re.findall(pattern_year, film_info)).strip("[]").strip("'"))
        countries.append(str(re.findall(pattern_country, film_info)).strip("[]").strip("'").split('/')[0])
        directors.append(re.findall(pattern_director, film_info))
        actors.append(re.findall(pattern_actor, film_info))
        descriptions.append(str(re.findall(pattern_description, film_info)).strip('[]').strip('\''))

raw_data = {'encoding':uni_num, 'name':names, 'description':descriptions, 'country':countries, 'director':new_director, 'actor':new_actor, 'vote':new_votes, 'score':scores, 'year':years, 'link':urls }
data = pd.DataFrame(raw_data)
data.to_csv('/home/bio_kang/Learning/Python/film_project/top250_film_info.csv', index=None, encoding='gbk')

python

python-3.x

encoding

回答 1

Stack Overflow用户

发布于 2022-11-23 19:28:40

试一试：

open('...','rb',encoding='utf-8')

或utf-16

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/73540470

复制

相似问题

问‘'gbk’编解码器无法在32位置编码字符‘\u 2022’：非法多字节序列
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问‘'gbk’编解码器无法在32位置编码字符‘\u 2022’：非法多字节序列EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问‘'gbk’编解码器无法在32位置编码字符‘\u 2022’：非法多字节序列
EN