文章/答案/技术大牛

发布

社区首页 >问答首页 >如何修改熊猫脚本来编辑csv文件？？删除单引号，添加双引号，删除不必要的列。

问如何修改熊猫脚本来编辑csv文件？？删除单引号，添加双引号，删除不必要的列。
EN

Stack Overflow用户

提问于 2022-06-30 19:06:42

回答 1查看 65关注 0票数 0

开始学习熊猫，可能因为它而迷路了，所以需要一些帮助。

我正在尝试自动化一个编辑csv文件的过程。我正在接收未排序的那些，并试图让他们准备去进行一些批量更新。

因此，我需要做的是:1.重命名所需的列(我做得不错)2.在标题列3中替换两个单行的现有双引号。在所有标题(即第一列)中添加双引号。4. do列(不确定什么是最好的- del或df.drop)。

这是我现在尝试过的代码，它使用的代码是：

import pandas as pd

df = pd.read_csv('Sheet1.csv')
droped = df.drop(df.columns[[2,3,4,5,6,7,8,9,10]],axis = 1,inplace=True)
renamed = df.rename(columns={df.columns[0]: 'title', df.columns[1]: 'product_id'})
swapped = renamed[[0]].str.replace(r'\"', r"\''")
updated = swapped.update('"' + df.columns[[0]].astype(str) + '"')

print(renamed.head())

这些引号现在让我头疼--有些标题里面有双引号--我需要删除它们，然后在每个标题周围添加双引号。

前三行工作正常，我可以删除不必要的列并重命名其余的列，但是最后两个操作写得不好。我试图找出谷歌搜索，但到目前为止没有运气。

此引号交换的示例：

现在:巴尼尼电影“提瑟想要的：”巴尼尼电影“剧院”

错误：

Traceback (most recent call last):
  File "Desktop/testing_scripts/bulk_script.py", line 6, in <module>
    swapped = renamed[[0]].str.replace(r'\"', r"\''")
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pandas/core/frame.py", line 3511, in __getitem__
    indexer = self.columns._get_indexer_strict(key, "columns")[1]
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 5782, in _get_indexer_strict
    self._raise_if_missing(keyarr, indexer, axis_name)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 5842, in _raise_if_missing
    raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Int64Index([0], dtype='int64')] are in the [columns]"

python

pandas

csv

data-science

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-07-01 18:43:07

为了回答你的问题，你只是有一个错误。

# Here, you rename your columns:
renamed = df.rename(columns={df.columns[0]: 'title', df.columns[1]: 'product_id'})
# Here, you try to access the column by its old name...
renamed[[0]]
> None of [Int64Index([0], dtype='int64')] are in the [columns]
# This error is correct, only 'title' and 'product_id' are in 'columns'.

# What it could be:
renamed['title']

# Or to access by col #:
renamed.iloc[:, 0]

特定列中的

读取：

from io import StringIO 
# StringIO just allows me to treat a string like a file, 
# you'll use your file name in read_csv instead.
import pandas as pd

file = '0,1,2,3,4\nsr,ddpi,np.log(dpi),pop75,np.log(pop15)\nnp.log(pop15),sr,ddpi,np.log(dpi),pop75\npop75,np.log(pop15),sr,ddpi,np.log(dpi)\nnp.log(dpi),pop75,np.log(pop15),sr,ddpi\nddpi,np.log(dpi),pop75,np.log(pop15),sr\n'

print(pd.read_csv(StringIO(file)))
print(pd.read_csv(StringIO(file), usecols=[0,1]))

输出时，我们可以看到，通过使用关键字usecols，我们可以在读取文件时对我们想要的列进行筛选，而不必在以后删除它们：

               0              1              2              3              4
0             sr           ddpi    np.log(dpi)          pop75  np.log(pop15)
1  np.log(pop15)             sr           ddpi    np.log(dpi)          pop75
2          pop75  np.log(pop15)             sr           ddpi    np.log(dpi)
3    np.log(dpi)          pop75  np.log(pop15)             sr           ddpi
4           ddpi    np.log(dpi)          pop75  np.log(pop15)             sr

               0              1
0             sr           ddpi
1  np.log(pop15)             sr
2          pop75  np.log(pop15)
3    np.log(dpi)          pop75
4           ddpi    np.log(dpi)

另外，另一种“删除”列的方法是，当您真正想要的只是保留几个列时：

df = pd.read_csv(StringIO(file))
df = df[['0', '1']]
print(df)

               0              1
0             sr           ddpi
1  np.log(pop15)             sr
2          pop75  np.log(pop15)
3    np.log(dpi)          pop75
4           ddpi    np.log(dpi)

当您知道列的顺序时，

重命名列：

# Personally, I find these methods to be simpler for the case you've presented~

df = pd.read_csv(StringIO(file), usecols=[0,1])
df.columns = ['title', 'product_id']

# OR
# If your file doesn't have a header, then remove 'header=0'
df = pd.read_csv(StringIO(file), usecols=[0,1], header=0, names=['title', 'product_id'])

print(df)

           title     product_id
0             sr           ddpi
1  np.log(pop15)             sr
2          pop75  np.log(pop15)
3    np.log(dpi)          pop75
4           ddpi    np.log(dpi)

修改了引号，并将其放在一起：

file = '0,1,2,3,4\nsr,Banini Movie" Teather,np.log(dpi),pop75,np.log(pop15)\nnp.log(pop15),sr,ddpi,np.log(dpi),pop75\npop75,Banini Movie" Teather,sr,ddpi,np.log(dpi)\nBanini Movie" Teather,pop75,Banini Movie" Teather,sr,ddpi\nddpi,np.log(dpi),pop75,Banini Movie" Teather,sr\n'

df = pd.read_csv(StringIO(file), usecols=[0,1], header=0, names=['title', 'product_id'])
print(df)

                   title             product_id
0                     sr  Banini Movie" Teather
1          np.log(pop15)                     sr
2                  pop75  Banini Movie" Teather
3  Banini Movie" Teather                  pop75
4                   ddpi            np.log(dpi)

# To do it to one column:
df['title'] = df['title'].str.replace('"', "''")
df['title'] = '"' + df['title'] + '"'
print(df)

                      title             product_id
0                      "sr"  Banini Movie" Teather
1           "np.log(pop15)"                     sr
2                   "pop75"  Banini Movie" Teather
3  "Banini Movie'' Teather"                  pop75
4                    "ddpi"            np.log(dpi)

# To do it to all columns:
for col in df:
    df[col] = df[col].str.replace('"', "''")
    df[col] = '"' + df[col] + '"'

print(df)

                      title                product_id
0                      "sr"  "Banini Movie'' Teather"
1           "np.log(pop15)"                      "sr"
2                   "pop75"  "Banini Movie'' Teather"
3  "Banini Movie'' Teather"                   "pop75"
4                    "ddpi"             "np.log(dpi)"

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/72820886

复制

相似问题

问如何修改熊猫脚本来编辑csv文件？？删除单引号，添加双引号，删除不必要的列。
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何修改熊猫脚本来编辑csv文件？？删除单引号，添加双引号，删除不必要的列。EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何修改熊猫脚本来编辑csv文件？？删除单引号，添加双引号，删除不必要的列。
EN