我目前正在通过在os.system()中使用awk提取文件中的列:
os.system("awk '{print $'%i'}' < infile > outfile"%some_column)
np.loadtxt('outfile')
有没有使用正则表达式来完成此任务的等效方法?
谢谢。
编辑:我想要澄清的是,我正在寻找提取大文件特定列的最佳方法。
发布于 2018-07-12 06:34:30
根据您的数据分隔符是什么,正则表达式可能对此过于苛刻。如果分隔符很简单(空格或特定字符/字符串),则只需使用string.split
method就可以分隔列。
下面是一个示例程序来解释这是如何工作的:
column = 0 # First column
with open("data.txt") as file:
data = file.readlines()
columns = list(map(lambda x: x.strip().split()[column], data))
要分解它,请执行以下操作:
column = 0
# Read a file named "data.txt" into an array of lines
with open("data.txt") as file:
data = file.readlines()
# This is where we will store the columns as we extract them
columns = []
# Iterate over each line in the file
for line in data:
# Strip the whitespace (including the trailing newline character) from the
# start and end of the string
line = line.strip()
# Split the line, using the standard delimiter (arbitrary number of
# whitespace characters)
line = line.split()
# Extract the column data from the desired index and store it in our list
columns.append(line[column])
# columns now holds a list of strings extracted from that column
https://stackoverflow.com/questions/51294237
复制相似问题