首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
社区首页 >问答首页 >如何从特征中选择特定的单词

如何从特征中选择特定的单词
EN

Stack Overflow用户
提问于 2019-07-28 18:55:00
回答 1查看 71关注 0票数 0

我正在使用value_counts实现特征提取,以显示最大数量的重复字符串,但我想提取一个特定的单词,并将值1赋给出现的单词,而其他NaN值必须填充为0。我现在正在做的是在字符串中手动搜索该单词,然后将字符串映射为1,并使用NaN (0)将填充值填充为0。

代码语言:javascript
代码运行次数:0
运行
复制
print(train.key_skills.value_counts(), '\n')

train['key_skills'] = train['key_skills'].map({
    'Linear Regression, Insurance Analytics, Business Analysis..':1,
    'Linear Regression, Insurance Analytics, Business Analysis...':1,
    'Analytics, SAS, banking, insurance, Analytics Head':1,
    'NoSQL, Spark, Mapreduce, SQL, Cassandra, Data Science, SCALA, Big Data...':1,
    'NoSQL, Spark, Mapreduce, SQL, Cassandra, Data Science, SCALA, Big Data...':1,
    'Excel, SQL, Data Analysis, Segmentation, SAS, Data Mining, SPSS...':1,
    'Linear Regression, Business Analysis, Model Development, Segmentation, Base...':1,
    'Data analysis, SQL, Consulting, Data management, SPSS, FMCG, Analytical...':1,
    'Data Analytics, Business Intelligence, Communication Protocols...':1,
    'r, advanced analytics, segmentation, sas, machine learning...':1,
    'Data Analytics, Data Science, Predictive Modeling, Project Management...':1,
    'NLP, Neural Networks, Machine Learning, Data Mining...':1,
    'Text Mining, Hive, NoSQL, Python, R, SQL, Data Analysis, Machine Learning...':1,
    'Data Science, R, Machine Learning, Linear Regression, Cluster Analysis...':1,
    'Retail Analytics, Analytics, clustering, segmentation, ranking, correlation...':1,
    'Linear Regression, SAS, Data Analytics, Correlation, Statistics, analytic...':1,
    'Analytics, Machine Learning, TensorFlow, Pytorch, python libraries...':1,
    'Data Analytics, SQL, Statistics, R, Econometrics, Data Mining...':1,
    'Quant Analytics, Analytics, Data Analysis, Sentiment Analysis...':1,
    'machine learning, text mining, r, python, neural networks, sql, sas...':1,
    'Predictive Modeling, Logistic Regression, R, SAS, Predictive Analytics...':1,
    'Business Analyst, Data Analytics, R, Python, MATLAB, SQL, Machine Learning,...':1,
    'Business Analyst, Data Analytics, R, Python, MATLAB, SQL, Machine Learning,...':1,
    'Retail Analytics, Business Analysis, Excel, SAS, Data Analytics, VBA...':1,
    'Deep Learning, R, Machine Learning, Python, Stakeholder Management...':1,
    'Hadoop, Java, Data Science, Cloudera, Spark, Hive, Impala, Presales...':1,
    'SQL, Javascript, Automation, Python, Ruby, Analytics, Machine learning...':1,
    'machine learning, team leading, Analytics, Natural Language Processing...':1,
    'Analytics, Data Science, Program Delivery, Solutioning, Presales, Proposals...':1,
    'NLP, SAS, User Stories, Agile Development, Machine Learning, Test Scenarios...':1,
    'Analytics, Head - Analytics, data analytics, Data Science, business process...':1,
    'Java, SCALA, Spring, Python, Solr, Redis, Machine Learning, Algorithms, Web...':1,
    'Deep Learning, NLP, Spark, Information Retrieval, Java, Python...':1,
    'SCALA, Machine Learning, Java, Python, SQL, R, Pig, Data Mining, Perl...':1
})

在这里,我想要一个代码,它应该映射数据科学家一词,在字符串中的任何位置,通过1,在它没有出现的地方,它应该放在0。

EN

回答 1

Stack Overflow用户

发布于 2019-07-28 19:51:47

您无需手动绘制地图,只需结合使用str.containsnp.where即可

代码语言:javascript
代码运行次数:0
运行
复制
import pandas as pd
import numpy as np

df = pd.DataFrame()

df['train_skills'] = [
        'Linear Regression, Insurance Analytics, Business Analysis..',
        'Linear Regression, Insurance Analytics, Business Analysis...',
        'Analytics, SAS, banking, insurance, Analytics Head',
        'NoSQL, Spark, Mapreduce, SQL, Cassandra, Data Science, SCALA, Big Data...',
        'NoSQL, Spark, Mapreduce, SQL, Cassandra, Data Science, SCALA, Big Data...',
        np.nan]

###### THE LINE OF CODE YOU NEED ######
df['train_skills'] = np.where(df.train_skills.str.contains('Data Science'), 1, 0)

输出:

代码语言:javascript
代码运行次数:0
运行
复制
   train_skills
0             0
1             0
2             0
3             1
4             1
5             1
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/57240171

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档