numpy.amax()将在数组中查找最大值,而numpy.amin()将对最小值执行相同的操作。如果我想同时找到max和min,我必须调用这两个函数,这需要对(非常大的)数组传递两次,这似乎很慢。
numpy API中是否有一个函数只需一次遍历数据即可同时找到max和min?
发布于 2015-11-25 22:30:55
您可以使用Numba,它是一个使用LLVM的支持NumPy的动态Python编译器。由此产生的实现非常简单和清晰:
import numpy
import numba
@numba.jit
def minmax(x):
maximum = x[0]
minimum = x[0]
for i in x[1:]:
if i > maximum:
maximum = i
elif i < minimum:
minimum = i
return (minimum, maximum)
numpy.random.seed(1)
x = numpy.random.rand(1000000)
print(minmax(x) == (x.min(), x.max()))
它也应该比Numpy的min() & max()
实现更快。这一切都不需要编写任何C/Fortran代码。
做你自己的性能测试,因为它总是依赖于你的架构,你的数据,你的包版本……
发布于 2020-01-04 07:28:44
给出以下方法,只是为了了解一些人们可以预期的数字:
import numpy as np
def extrema_np(arr):
return np.max(arr), np.min(arr)
import numba as nb
@nb.jit(nopython=True)
def extrema_loop_nb(arr):
n = arr.size
max_val = min_val = arr[0]
for i in range(1, n):
item = arr[i]
if item > max_val:
max_val = item
elif item < min_val:
min_val = item
return max_val, min_val
import numba as nb
@nb.jit(nopython=True)
def extrema_while_nb(arr):
n = arr.size
odd = n % 2
if not odd:
n -= 1
max_val = min_val = arr[0]
i = 1
while i < n:
x = arr[i]
y = arr[i + 1]
if x > y:
x, y = y, x
min_val = min(x, min_val)
max_val = max(y, max_val)
i += 2
if not odd:
x = arr[n]
min_val = min(x, min_val)
max_val = max(x, max_val)
return max_val, min_val
%%cython -c-O3 -c-march=native -a
#cython: language_level=3, boundscheck=False, wraparound=False, initializedcheck=False, cdivision=True, infer_types=True
import numpy as np
cdef void _extrema_loop_cy(
long[:] arr,
size_t n,
long[:] result):
cdef size_t i
cdef long item, max_val, min_val
max_val = arr[0]
min_val = arr[0]
for i in range(1, n):
item = arr[i]
if item > max_val:
max_val = item
elif item < min_val:
min_val = item
result[0] = max_val
result[1] = min_val
def extrema_loop_cy(arr):
result = np.zeros(2, dtype=arr.dtype)
_extrema_loop_cy(arr, arr.size, result)
return result[0], result[1]
%%cython -c-O3 -c-march=native -a
#cython: language_level=3, boundscheck=False, wraparound=False, initializedcheck=False, cdivision=True, infer_types=True
import numpy as np
cdef void _extrema_while_cy(
long[:] arr,
size_t n,
long[:] result):
cdef size_t i, odd
cdef long x, y, max_val, min_val
max_val = arr[0]
min_val = arr[0]
odd = n % 2
if not odd:
n -= 1
max_val = min_val = arr[0]
i = 1
while i < n:
x = arr[i]
y = arr[i + 1]
if x > y:
x, y = y, x
min_val = min(x, min_val)
max_val = max(y, max_val)
i += 2
if not odd:
x = arr[n]
min_val = min(x, min_val)
max_val = max(x, max_val)
result[0] = max_val
result[1] = min_val
def extrema_while_cy(arr):
result = np.zeros(2, dtype=arr.dtype)
_extrema_while_cy(arr, arr.size, result)
return result[0], result[1]
( extrema_loop_*()
方法类似于建议的here,而extrema_while_*()
方法基于here的代码)
以下计时:
表示extrema_while_*()
最快,其中extrema_while_nb()
最快。在任何情况下,extrema_loop_nb()
和extrema_loop_cy()
解决方案都优于仅使用NumPy的方法(分别使用np.max()
和np.min()
)。
最后,请注意,所有这些都不像np.min()
/np.max()
那样灵活(就n维支持、axis
参数等而言)。
(完整代码可在here上找到)
发布于 2017-01-19 11:20:55
通常,您可以通过一次处理两个元素并仅将较小的元素与临时最小值进行比较,将较大的元素与临时最大值进行比较,来减少minmax算法的比较次数。平均来说,一个人只需要3/4的比较,而不是天真的方法。
这可以用c或fortran (或任何其他低级语言)实现,并且在性能方面应该几乎是无与伦比的。我使用numba来说明这个原理,并得到一个非常快速的、独立于dtype的实现:
import numba as nb
import numpy as np
@nb.njit
def minmax(array):
# Ravel the array and return early if it's empty
array = array.ravel()
length = array.size
if not length:
return
# We want to process two elements at once so we need
# an even sized array, but we preprocess the first and
# start with the second element, so we want it "odd"
odd = length % 2
if not odd:
length -= 1
# Initialize min and max with the first item
minimum = maximum = array[0]
i = 1
while i < length:
# Get the next two items and swap them if necessary
x = array[i]
y = array[i+1]
if x > y:
x, y = y, x
# Compare the min with the smaller one and the max
# with the bigger one
minimum = min(x, minimum)
maximum = max(y, maximum)
i += 2
# If we had an even sized array we need to compare the
# one remaining item too.
if not odd:
x = array[length]
minimum = min(x, minimum)
maximum = max(x, maximum)
return minimum, maximum
这绝对比Peque提出的幼稚方法要快得多:
arr = np.random.random(3000000)
assert minmax(arr) == minmax_peque(arr) # warmup and making sure they are identical
%timeit minmax(arr) # 100 loops, best of 3: 2.1 ms per loop
%timeit minmax_peque(arr) # 100 loops, best of 3: 2.75 ms per loop
正如预期的那样,新的最小最大实现只需要原始实现所需时间的大约3/4 (2.1 / 2.75 = 0.7636363636363637
)
https://stackoverflow.com/questions/12200580
复制相似问题