因为我对python很陌生,因为我试图拆分文本数据,并将其转换为excel列和行记录。假设我有100条记录,因为我需要拆分为1-7是一列,8-8是第二列,9-10是第三列,11-18是第四列,第五列是19-24,第六列是25-124,第七列是125-1000。下面的示例记录在text.txt中。我想转换成基于上述字符的excel文件。有谁能帮我吗。
示例文本格式:
animals210 redwingsclearmist
animals220 redwingsclearmist
animals230 redwingsclearmist
animals240 redwingsclearmist
输出格式示例:
0 1 2 3 4
0 animals 210 red wings clearmist
1 animals 210 red wings clearmist
2 animals 210 red wings clearmist
3 animals 210 red wings clearmist
发布于 2021-12-15 13:17:50
您可以将itertools.tee
和zip_longest
结合起来
功能拆分:
from itertools import tee, zip_longest
def split_by_index(s):
indices = [0,7,10,14,20]
start, end = tee(indices)
next(end)
return " ".join([s[i:j] for i,j in zip_longest(start, end)])
你的数据:
import pandas as pd
df = pd.DataFrame()
df["sentence"] = ["animals120 redlivinginjungle",
"animals140 redlivinginjungle",
"animals160 redlivinginjungle"]
sentence
0 animals120 redlivinginjungle
1 animals140 redlivinginjungle
2 animals160 redlivinginjungle
然后应用函数创建新的dataframe:
new_df = df["sentence"].apply(split_by_index).str.split(expand=True)
输出
print(new_df)
0 1 2 3 4
0 animals 120 red living injungle
1 animals 140 red living injungle
2 animals 160 red living injungle
发布于 2021-12-15 13:00:51
使用.str
访问器
column_splits = {'first': [0, 7], 'second': [7, 10]}
for column, limits in column_splits.items():
start, end = limits
df[column] = df['your_column'].str[start: end]
https://stackoverflow.com/questions/70364112
复制相似问题