我希望从位于指定索引indices_to_remove的原始列表标记中删除元素,并保留已删除记录的记录,但问题是,当我有两个索引要删除并从第一个索引中删除记录时,第二个记录无法正确删除。知道怎么解决这个问题吗?
deleted_records = []
tokens = ['hi', 'what', 'is', 'there']
indices_to_remove = [2,3]
for index in indices_to_remove:
deleted_tokens.append(tokens[index])
del tokens[index]发布于 2022-09-21 19:09:46
在迭代集合的同时修改集合通常不是一个好主意。
因此,最简单的方法是维护两个列表来收集这样的结果
removed, retained = [], []
tokens = ['hi', 'what', 'is', 'there']
indices_to_remove = {2, 3} # Set for faster lookups
for idx, token in enumerate(tokens):
selected_list = removed if idx in indices_to_remove else retained
selected_list.append(token)
print(removed, retained)发布于 2022-09-21 19:11:55
清单理解的速度很快。
deleted = [tokens[i] for i in sorted(indices_to_remove)]
tokens = [e for i, e in enumerate(tokens) if i not in indices_to_remove]但首先,考虑到问题中的一些含糊不清之处,为了表现起见,有些假设是:
indices_to_remove是一个set (O(1)成员资格测试),OP希望deleted保持与初始tokens 0 <= i < len(tokens)对于所有i)。如果没有,那么你会得到一个IndexError.速度试验
import random
from math import isqrt
def gen(n):
tokens = [f'x-{i}' for i in range(n)]
indices_to_remove = set(random.sample(range(n), isqrt(n)))
return tokens, indices_to_remove
# our initial solution
def f0(tokens, indices_to_remove):
deleted = [e for i, e in enumerate(tokens) if i in indices_to_remove]
tokens = [e for i, e in enumerate(tokens) if i not in indices_to_remove]
return deleted, tokens
# improved version, based on a comment by @KellyBundy
def f1(tokens, indices_to_remove):
deleted = [tokens[i] for i in sorted(indices_to_remove)]
tokens = [e for i, e in enumerate(tokens) if i not in indices_to_remove]
return deleted, tokens
# @thefourtheye version
def f4(tokens, indices_to_remove):
removed, retained = [], []
for idx, token in enumerate(tokens):
selected_list = removed if idx in indices_to_remove else retained
selected_list.append(token)
return removed, retained
# modified Kelly_1 to comply with our assumptions
# the original is certainly even faster, but modifies
# tokens in place, so harder to benchmark
def Kelly_1(tokens, indices_to_remove):
tokens = tokens.copy()
indices_to_remove = sorted(indices_to_remove)
return [*map(tokens.pop, reversed(indices_to_remove))][::-1], tokens测试:
tokens, indices_to_remove = gen(1_000_000)
%timeit f0(tokens, indices_to_remove)
134 ms ± 603 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit f4(tokens, indices_to_remove)
111 ms ± 34.8 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit f1(tokens, indices_to_remove)
76.9 ms ± 194 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)下面是一个基准,用于查看运行时与规模的差异(令牌的长度;注意,要删除的令牌的数量总是~sqrt(n) )。
import perfplot
perfplot.show(
setup=gen,
kernels=[f0, f1, f4, Kelly_1],
n_range=[2**k for k in range(3, 20)],
relative_to=1,
xlabel='n elements',
equality_check=lambda x, y: x == y,
)
我们看到了关于Kelly_1的一些非常有趣的东西:尽管它做了一些额外的步骤(复制初始列表,以保护参数,对集合排序以使其成为排序列表),但这绝对会抹掉所有其他解决方案,最多可达100 K元素。在那之后,时间大幅上升。我得弄明白为什么。
用Python3.9.0进行测试。
发布于 2022-09-21 19:08:51
按降序排序索引列表。如果首先删除最大索引,则不可能影响序列前面元素的索引。
deleted_records = []
tokens = ['hi', 'what', 'is', 'there']
indices_to_remove = [2,3]
for index in sorted(indices_to_remove, reverse=True):
deleted_tokens.append(tokens[index])
del tokens[index]https://stackoverflow.com/questions/73805764
复制相似问题