# 翻译|给数据科学家的10个提示和技巧Vol.2

## 2 R

### 2.1 基于列名获得对应行的值

```set.seed(5)
df<-as.data.frame(matrix(sample(1:100,12),ncol=3))
df\$Selection<-c("V1","V3","V2","V3")
df

V1 V2 V3 Selection
1 66 41 19        V1
2 57 85  3        V3
3 79 94 38        V2
4 75 71 58        V3
```
```df\$Value<-as.numeric(df[cbind(seq_len(nrow(df)), match(df\$Selection,names(df)))])
df

V1 V2 V3 Selection Value
1 66 41 19        V1    66
2 57 85  3        V3     3
3 79 94 38        V2    94
4 75 71 58        V3    58
```

### 2.2 创建时间属性

• Year
• Month
• Weekday
• Hour
• Minute
• Week of the year
• Quarter

• 一个名为isWeekend的布尔值，周末为1，其他为0。
• 一天中的时间段(如上午、下午、晚上)。
```library(tidyverse)
set.seed(5)
df<- tibble(my_date = lubridate::as_datetime( runif(10, 1530000000, 1577739600)))
df%>%mutate(Year = format(my_date, '%Y'), Month_Number = as.factor(format(my_date, '%m')),
Weekday = as.factor(weekdays(my_date)), Hour =as.factor(format(my_date, '%H')),
Minute =as.factor(format(my_date, '%M')), Week =(format(my_date, '%W')),
Quarter = lubridate::quarter(my_date, with_year = T))
```

## 3 Python

### 3.1 从Jupyter创建文件

```%%writefile myfile.py
def my_function():
print("Hello from a function")
```

```%%writefile -a myfile.py
my_function()
```

### 3.2 基于列名获得对应行的值

```import pandas as pd
df = pd.DataFrame.from_dict({"V1": [66, 57, 79,75], "V2": [41,85,94,71],
"V3":[19,3,38,58], "Selection":['V1','V3', 'V2','V3']})
df

V1  V2  V3 Selection
0  66  41  19        V1
1  57  85   3        V3
2  79  94  38        V2
3  75  71  58        V3
```

```df['Value'] = df.lookup(df.index, df.Selection)
df

V1  V2  V3 Selection  Value
0  66  41  19        V1     66
1  57  85   3        V3      3
2  79  94  38        V2     94
3  75  71  58        V3     58
```

### 3.3 从字典中创建词云

```import matplotlib.pyplot as plt
from wordcloud import WordCloud
# assume that this is the dictionary, feel free to change it
word_could_dict = {'Git':100, 'GitHub':100, 'push':50, 'pull':10, 'commit':80,
'add':30, 'diff':10, 'mv':5, 'log':8, 'branch':30, 'checkout':25}
wordcloud = WordCloud(width = 1000, height = 500).generate_from_frequencies(word_could_dict)
plt.figure(figsize=(15,8))
plt.imshow(wordcloud)
```

### 3.4 检查pandas数据框的列是否包含一个特定的值

```import pandas as pd
df = pd.DataFrame({"A"  : ["a", "b", "c"], "B" : ["d", "e", "f"], "C" : ["x", "y" , "a"]})
df

A  B  C
0  a  d  x
1  b  e  y
2  c  f  a
```

```(df=='a').any()

A     True
B    False
C     True
```

### 3.5 将多个pandas数据框保存到单个Excel文件

```# create the xlswriter and give a name to the final excel
# for example Final.xlsx

writer = pd.ExcelWriter('Final.xlsx', engine='xlsxwriter')

# it is convenient to store the pandas dataframes in a
# dictionary, where the key is the worksheet name that you want to give
# and the value is the data frame
df_dict = {'My_First_Tab': df1, 'My_Second_Tab': df2,
'My_Third_Tab':df3, 'My_Forth_Tab':df4}
#iterate over the data frame of dictionaries
for my_sheet, dframe in  df_dict.items():
dframe.to_excel(writer, sheet_name = my_sheet, index=False)

# finaly you have to save the writer
# and the Final.xlsx has been created
writer.save()
```

### 4.1 谷歌文档和电子表格的版本管理

• 打开谷歌文档。
• 在顶部，点击文件- >版本历史。

## 5 Linux

### 5.1 在Linux复制一个文件夹

```cp -R /some/dir/ /some/other/dir/
```
• 如果/some/other/dir/不存在，它将被创建。
• -R表示递归复制目录。也可以使用-r，因为它不区分大小写。

### 参考资料

[1]

10 Tips And Tricks For Data Scientists Vol.2: https://predictivehacks.com/10-tips-and-tricks-for-data-scientists-vol-2/

