文章/答案/技术大牛

发布

社区首页 >问答首页 >将存储在字典中的数据拆分为训练、验证和测试

问将存储在字典中的数据拆分为训练、验证和测试
EN

Stack Overflow用户

提问于 2020-05-26 05:49:23

回答 2查看 755关注 0票数 0

我有一本字典，它包含从一个文件中读取的所有数据。字典有三个键：

name
seq
seq_len

这些键中的每一个都对应于一个列表，因此我有一个列表字典，例如：

dictionary = {'name':['seq1','seq2','seq3','seq4',...,'seq10000'],
             'seq':['actatsts','gfsfsfsg','gstfdh','gsydg',...,'hdbcjshy'],
              'seq_len':[8,8,6,5,...,8]}

现在我想把这本字典一分为二，这样我就能得到80%的训练、验证和测试的字典。如何使用字典数据结构来实现这一点？因为我不能在这里使用sklearn train_test_split。我会很感激你的见解。

python-3.x

dictionary

machine-learning

回答 2

Stack Overflow用户

回答已采纳

发布于 2020-05-26 06:55:12

from sklearn.model_selection import train_test_split
train_test_split(list(dictionary.values()),train_size = 0.8)

不过，如果你用熊猫代替清单，那就太好了。

import pandas as pd

df = pd.DataFrame(dictionary)
train_test_split(df,train_size = 0.8)

票数 1

Stack Overflow用户

发布于 2020-05-26 06:51:13

你可以试试这个

# Way 1
df = pd.DataFrame(dictionary)
train_val = df.sample(frac=0.8, random_state=42)
# `how='all'` if there are missing values in your raw data.
test = df[~df.isin(train_val)].dropna(how='all')


# Way 2
np.random.seed(42)

length = len(dictionary['name'])
new_index = np.random.permutation(length)

train_val_index = new_index[:int(length*0.8)]
test_index = list(set(new_index) - set(train_val_index))
train_val = {key: [value[i] for i in train_val_index] for key, value in dictionary.items()}
test = {key: [value[i] for i in test_index] for key, value in dictionary.items()}

我希望这能帮到你。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/62015467

复制

相似问题

问将存储在字典中的数据拆分为训练、验证和测试
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问将存储在字典中的数据拆分为训练、验证和测试EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问将存储在字典中的数据拆分为训练、验证和测试
EN