Pandas有两种排序方式:
以如下代码生成的DataFrame作为例子:
import pandas as pd
import numpy as np
unsorted_df = pd.DataFrame(np.random.randn(10,2),index=[1,4,6,2,3,5,9,8,0,7], columns = ["col1", "col2"])
'''
col1 col2
1 0.120356 -0.160916
4 1.268437 -0.416132
6 2.126170 -2.693228
2 1.116525 -0.262073
3 0.666465 0.845862
5 0.221342 1.641566
9 -0.977082 -0.221055
8 -0.840693 -0.645618
0 -1.902482 1.845218
7 -1.904138 0.159210
'''
sort_index()
方法,通过传递axis
参数和排序顺序,对DataFrame排序,默认升序:
sorted_df = unsorted_df.sort_index()
'''
col1 col2
0 -1.902482 1.845218
1 0.120356 -0.160916
2 1.116525 -0.262073
3 0.666465 0.845862
4 1.268437 -0.416132
5 0.221342 1.641566
6 2.126170 -2.693228
7 -1.904138 0.159210
8 -0.840693 -0.645618
9 -0.977082 -0.221055
'''
将bool值传递给ascending参数,可以控制排序顺序。
sorted_df = unsorted_df.sort_index(ascending=False)
'''
col1 col2
9 -0.977082 -0.221055
8 -0.840693 -0.645618
7 -1.904138 0.159210
6 2.126170 -2.693228
5 0.221342 1.641566
4 1.268437 -0.416132
3 0.666465 0.845862
2 1.116525 -0.262073
1 0.120356 -0.160916
0 -1.902482 1.845218
'''
通过传递axis参数为0或者1,可以对列标签进行排序。默认axis=0,逐行排序:
sorted_df=unsorted_df.sort_index(axis=1, ascending = False)
'''
col2 col1
1 -0.160916 0.120356
4 -0.416132 1.268437
6 -2.693228 2.126170
2 -0.262073 1.116525
3 0.845862 0.666465
5 1.641566 0.221342
9 -0.221055 -0.977082
8 -0.645618 -0.840693
0 1.845218 -1.902482
7 0.159210 -1.904138
'''
sort_values()
方法按照值排序,接受by
参数指定排序根据的列名称:
sorted_df=unsorted_df.sort_values(by='col1')
'''
col1 col2
7 -1.904138 0.159210
0 -1.902482 1.845218
9 -0.977082 -0.221055
8 -0.840693 -0.645618
1 0.120356 -0.160916
5 0.221342 1.641566
3 0.666465 0.845862
2 1.116525 -0.262073
4 1.268437 -0.416132
6 2.126170 -2.693228
'''
sorted_values()用kind参数指定mergesort,heapsort或quicksort作为排序算法:
sorted_df=unsorted_df.sort_values(by='col1', kind="quicksort")
'''
col1 col2
7 -1.904138 0.159210
0 -1.902482 1.845218
9 -0.977082 -0.221055
8 -0.840693 -0.645618
1 0.120356 -0.160916
5 0.221342 1.641566
3 0.666465 0.845862
2 1.116525 -0.262073
4 1.268437 -0.416132
6 2.126170 -2.693228
'''