# 涨姿势！看骨灰级程序员如何玩转Python

1. read_csv

(或者，你可以在linux中使用'head'命令来检查任何文本文件中的前5行，例如：head -c 5 data.txt)

2. select_dtypes

`df.dtypes.value_counts（） `

`df.select_dtypes（include = ['float64'，'int64']） `

3. Copy

```1.  import pandas as pd
2.  df1 = pd.DataFrame({ ‘a’:[0,0,0], ‘b’: [1,1,1]})
3.  df2 = df1
4.  df2[‘a’] = df2[‘a’] + 1
5.  df1.head() ```

```1.  df2
2.  = df1.copy()
3.  br ```

```1.  from copy import deepcopy
2.  df2 = deepcopy(df1) ```

4. Map

```1.  level_map = {1: ‘high’, 2: ‘medium’, 3: ‘low’}
2.  df[‘c_level’] = df[‘c’].map(level_map) ```

5. apply or not apply?

```1.  def rule(x, y):
2.  if x == ‘high’ and y > 10:
3.  return 1
4.  else:
5.  return 0
6.  df = pd.DataFrame({ 'c1':[ 'high' ,'high', 'low', 'low'], 'c2': [0, 23, 17, 4]})
7.  df['new'] = df.apply(lambda x: rule(x['c1'], x['c2']), axis =  1)
8.  df.head() ```

`1.  df[‘maximum’] = df.apply(lambda x: max(x[‘c1’], x[‘c2’]), axis = 1) `

`1.  df[‘maximum’] = df[[‘c1’,’c2']].max(axis =1) `

`1.  df.apply(lambda x: round(x['c'], 0), axis = 1) `

6. value counts

`1.  df[‘c’].value_counts() `

```1.  A. normalize = True：如果你要检查频率而不是计数。
2.  B. dropna = False：如果你要统计数据中包含的缺失值。
3.  C. df['c'].value_counts().reset_index():  如果你想将stats表转换成pandas数据帧并进行操作。
4.  D. df['c'].value_counts().reset_index().sort_values(by='index') : 显示按值而不是按计数排序的统计数据。 ```

7. 缺失值的数量

```1.  import pandas as pd
2.  import numpy as np
3.  df = pd.DataFrame({ ‘id’: [1,2,3], ‘c1’:[0,0,np.nan], ‘c2’: [np.nan,1,1]})
4.  dfdf = df[[‘id’, ‘c1’, ‘c2’]]
5.  df[‘num_nulls’] = df[[‘c1’, ‘c2’]].isnull().sum(axis=1)
6.  df.head() ```

8. 选择具有特定ID的行

```1.  dfdf_filter = df[‘ID’].isin([‘A001’,‘C022’,...])
2.  df[df_filter] ```

9. Percentile groups

```1.  import numpy as np
2.  cut_points = [np.percentile(df[‘c’], i) for i in [50, 80, 95]]
3.  df[‘group’] = 1
4.  for i in range(3):
5.  df[‘group’] = df[‘group’] + (df[‘c’] < cut_points[i])
6.  # or <= cut_points[i] ```

10. to_csv

`1.  print(df[:5].to_csv()) `

595 篇文章33 人订阅

0 条评论