## Python应急表内容来源于 Stack Overflow，并遵循CC BY-SA 3.0许可协议进行翻译与使用

• 回答 (1)
• 关注 (0)
• 查看 (52)

• 获取具有连续（浮点）行的大型数据数组，并通过分箱将其转换为离散整数值（例如，结果行的值为0-9）
• 将两行切成向量X和Y并从中生成列联表，以便我具有二维频率分布
• 例如，我有一个10 x 10阵列，计算出现的（xi，yi）数
• 使用列联表来做一些信息理论数学

``````def make_table(x, y, num_bins):
ctable = np.zeros((num_bins, num_bins), dtype=np.dtype(int))
for xn, yn in zip(x, y):
ctable[xn, yn] += 1
return ctable
``````

``````def make_table(x, y, num_bins):
ctable = np.zeros(num_bins ** 2, dtype=np.dtype(int))
reindex = np.dot(np.stack((x, y)).transpose(),
np.array([num_bins, 1]))
idx, count = np.unique(reindex, return_counts=True)
for i, c in zip(idx, count):
ctable[i] = c
return ctable.reshape((num_bins, num_bins))
``````

``````def timetable(func):
size = 5000
bins = 10
repeat = 1000
start = time.time()
for i in range(repeat):
x = np.random.randint(0, bins, size=size)
y = np.random.randint(0, bins, size=size)
func(x, y, bins)
end = time.time()
print("Func {na}: {ti} Ms".format(na=func.__name__, ti=(end - start)))
``````

### 1 个回答

``````In [92]: %timeit np.dot(np.stack((x, y)).transpose(), np.array([bins, 1]))
109 µs ± 6.55 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [94]: %timeit bins*x + y
12.1 µs ± 260 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
``````

``````np.unique(bins * x + y, return_counts=True)[1].reshape((bins, bins))
``````

``````np.bincount(bins * x + y).reshape((bins, bins))
``````

``````In [78]: %timeit make_table(x, y, bins)  # Your first solution
3.86 ms ± 159 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [79]: %timeit make_table2(x, y, bins)  # Your second solution
443 µs ± 23.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [101]: %timeit np.unique(bins * x + y, return_counts=True)[1].reshape((bins, bins))
307 µs ± 25 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [118]: %timeit np.bincount(bins * x + y).reshape((10, 10))
30.3 µs ± 3.44 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
``````