我有以下格式的数据:
data = """
[Data-0]
Data = BATCH
BatProtocol = DIAG-ST
BatCreate = 20010724
[Data-1]
Data = SAMP
SampNum = 357
SampLane = 1
[Data-2]
Data = SAMP
SampNum = 357
SampLane = 2
[Data-9]
Data = BATCH
BatProtocol = VCA
BatCreate = 20010725
[Data-10]
Data = SAMP
SampNum = 359
SampLane = 1
[Data-11]
Data = SAMP
SampNum = 359
SampLane = 2
"""结构如下:
[Data-x],其中x是一个数字Data =后面跟着BATCH或SAMPLE我正在尝试编写一个函数,为每个“批”生成一个列表。列表的第一项是包含行Data = BATCH的文本块,列表中的以下项是包含行Data = SAMP的文本块。我现在有
def get_batches(data):
textblocks = iter([txt for txt in data.split('\n\n') if txt.strip()])
batch = []
sample = next(textblocks)
while True:
if 'BATCH' in sample:
batch.append(sample)
sample = next(textblocks)
if 'BATCH' in sample:
yield batch
batch = []
else:
batch.append(sample)如果是这样的话:
batches = get_batches(data)
for batch in batches:
print batch
print '_' * 20但是,它只返回第一个“批”:
['[Data-0]\nData = BATCH\nBatProtocol = DIAG-ST\nBatCreate = 20010724',
'[Data-1]\nData = SAMP\nSampNum = 357\nSampLane = 1',
'[Data-2]\nData = SAMP\nSampNum = 357\nSampLane = 2']
____________________我的预期产出在哪里:
['[Data-0]\nData = BATCH\nBatProtocol = DIAG-ST\nBatCreate = 20010724',
'[Data-1]\nData = SAMP\nSampNum = 357\nSampLane = 1',
'[Data-2]\nData = SAMP\nSampNum = 357\nSampLane = 2']
____________________
['[Data-9]\nData = BATCH\nBatProtocol = VCA\nBatCreate = 20010725',
'[Data-10]\nData = SAMP\nSampNum = 359\nSampLane = 1',
'[Data-11]\nData = SAMP\nSampNum = 359\nSampLane = 2']
____________________我缺少什么或者如何改进我的功能?
发布于 2013-04-09 19:47:11
正如@F.J所解释的那样,代码的真正问题是没有生成最后一个值。然而,还有其他的改进可以做,其中一些使解决最后的价值问题更容易。
在我第一次查看您的代码时,最突出的一个是检查'BATCH' in sample的两个'BATCH' in sample语句,它们可以组合成一个。
下面是一个这样的版本,以及在生成器上使用for循环,而不是while True
def get_batches(data):
textblocks = (txt for txt in data.split('\n\n') if txt.strip())
batch = [next(textblocks)]
for sample in textblocks:
if 'BATCH' in sample:
yield batch
batch = []
batch.append(sample)
yield batch我在最后无条件地生成batch,因为不存在batch空的情况(如果data是空的,那么开始时的batch初始化会引发StopIteration)。
发布于 2013-04-09 19:34:17
只有在找到下一批数据的开始时才会生成批处理,因此永远不会包含最后一批数据。要解决这个问题,您需要在函数结束时使用如下内容:
if batch:
yield batch然而,仅仅这样做是行不通的。最终,循环中的next(textblocks)将引发一个StopIteration,因此在while循环之后不会执行任何代码。这里有一种方法可以让您只对当前代码做一些小小的更改(更好的版本请参见下面):
def get_batches(data):
textblocks = iter([txt for txt in data.split('\n\n') if txt.strip()])
batch = []
sample = next(textblocks)
while True:
if 'BATCH' in sample:
batch.append(sample)
try:
sample = next(textblocks)
except StopIteration:
break
if 'BATCH' in sample:
yield batch
batch = []
else:
batch.append(sample)
if batch:
yield batch我建议只在textblocks上循环一个for循环,而不是:
def get_batches(data):
textblocks = (txt for txt in data.split('\n\n') if txt.strip())
batch = []
for sample in textblocks:
if 'BATCH' in sample:
if batch:
yield batch
batch = []
batch.append(sample)
if batch:
yield batchhttps://stackoverflow.com/questions/15910802
复制相似问题