import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
from sklearn.pipeline import Pipeline
from sklearn.svm import LinearSVC
from sklearn.preprocessing import MinMaxScaler
from sklearn.compose import ColumnTransformer
data = [[1, 3, 4, 'text',
我正在学习scikit中的管道和FeatureUnions,因此怀疑是否可以在类上重复应用'make_union‘?
考虑以下代码:
import numpy as np
import pandas as pd
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.linear_model import LogisticRegression
import sklearn.datasets as d
我无法加载使用sklearn.externals.joblib.dump或pickle.dump保存的自定义转换器的实例,因为当前python会话中缺少自定义转换器的原始定义。
假设在一个python会话中,我定义、创建并保存了一个自定义转换器,它也可以在同一个会话中加载:
from sklearn.base import TransformerMixin
from sklearn.base import BaseEstimator
from sklearn.externals import joblib
class CustomTransformer(BaseEstimator, Tran
我正在尝试将CountVectorizer()与Pipeline和ColumnTransformer结合使用。因为CountVectorizer()生成稀疏矩阵,所以我使用FunctionTransformer来确保ColumnTransformer在组合结果矩阵时能够正确地生成hstack。
import pandas as pd
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder, FunctionTransformer
from sklearn.compose i
我想在我的sklearn's Pipeline中使用UMAP,并且我想缓存这一步来加快速度。但是,因为我有自定义的Transformer,所以the suggested method不能工作。 示例代码: from sklearn.preprocessing import FunctionTransformer
from tempfile import mkdtemp
from sklearn.pipeline import Pipeline
from umap import UMAP
from hdbscan import HDBSCAN
import seaborn as sns
我正在尝试创建一个sklearn.compose.ColumnTransformer管道,用于转换分类和连续输入数据:
import pandas as pd
from sklearn.base import TransformerMixin, BaseEstimator
from sklearn.preprocessing import OneHotEncoder, FunctionTransformer
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer, make_col
我设计了下列管道来训练我的模型:
from sklearn.compose import make_column_selector as selector
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
cat_imputer = SimpleImputer(stra
目标:当模型输入为int、浮和对象时(根据熊猫的数据),预测给定类别集的概率。
我正在使用来自UCI的以下数据集:
我已经创建了一条管道,它几乎可以工作:
# create transformers for the different variable types.
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
import pandas as pd
im
我在ColumnTransformer上有一条流水线。其中一个变压器是PCA。当我使用fit并进行转换时,数据看起来是正确的,一切都正常。但是当我尝试在fit之后访问管道中PCA的explained_variance_ratio_时,属性就不存在了。我所有的其他变压器在管道中也失去了他们的属性,他们应该有安装后。我做错什么了?
代码如下所示:
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.pipe
在sklearn中,我们可以使用管道中的列转换器对特定的列应用预处理选项,如下所示:
import pandas as pd
from sklearn.preprocessing import MaxAbsScaler, MinMaxScaler, StandardScaler, ...
from sklearn.compose import ColumnTransformer, make_column_transformer
from sklearn.pipeline import Pipeline
from sklearn.neural_network import MLPClassifi
我有以下df text count daytime label
I think... 4 morning pos
You should... 3 afternoon neg
Better... 7 evening neu 我尝试只使用ColumnTransform预处理text列,方法是使用 from sklearn.feature_extraction.text import TfidfVectorizer
我有一个包含有分类变量的列的dataframe,其中也包含了NaNs。
Category
1 A
2 A
3 Na
4 B
我想使用sklearn.compose.make_column_transformer()以一种干净的方式准备df。我试图计算nan值并使用以下代码对列进行OneHotEncode:
from sklearn.preprocessing import OneHotEncoder
from sklearn.impute import SimpleImputer
from sklearn.compose import make_column_transformer
tr
我无法将以下管道转换为pmml,因为“未指定输入特征的数量”。
重现错误的最小示例管道是:
import pandas as pd
from sklearn.compose import ColumnTransformer
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScal
我有一个复杂的错误,不能解释,也不能弄清楚。显然,我的预处理流水线在简化拟合模型时可以工作,但在尝试交叉验证时会失败。我无法判断错误,也无法理解问题所在。请帮帮忙。
预处理
我已经创建了一个对数据执行一些预处理任务的管道。它起作用了。包括一些客户变压器。下面是代码。
from sklearn.pipeline import Pipeline
from sklearn.pipeline import FeatureUnion
from sklearn.base import BaseEstimator, TransformerMixin
class column_selector(BaseE
目前,我可以通过使用make_column_transformer和make_pipeline构建一个模型,如下所示: from sklearn.compose import make_column_transformer
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import OneHotEncoder
from sklearn.tree import DecisionTreeClassifier
from sklearn.pipeline import make_pipel
我试图使用scikit的ColumnTransformer类作为一个实际的DataFrame转换器和一个“监视”转换器--即监视新类何时进入数据集中的分类特性的对象。
import numpy as np
import pandas as pd
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
# Original DataFrame off of which transformers are fit
orig_df = pd.DataFrame(
我希望能够使用最近的邻居来尝试在具有连续、分类和文本特性的数据集中找到与样本子类(认为已处理的和未处理的)最相似的样本。
玩具数据集:
import numpy as np
import pandas as pd
from sklearn.preprocessing import OneHotEncoder, QuantileTransformer
from sklearn.neighbors import NearestNeighbors
from sklearn.compose import ColumnTransformer
from sklearn.feature_extraction
我试图分别使用不同的数据集作为训练集和测试集。但是有了下面的代码,我得到了:
File "main.py", line 84, in main_test X2 = tf_transformer.transform(word_counts2) File "/Library/Python/2.7/site-packages/sklearn/feature_extraction/text.py", line 1020, in transform n_features, expected_n_features)) ValueError: Input
我正在尝试用下面的代码安装SkLearn DecisionTree的数据格式。但是我得到了一个错误Length of feature_names, 9 does not match number of features, 8。DecisionTree似乎只有经过一次编码转换后才有合适的分类特征,而不是数字特征。如何将数值特征包含在决策树模型中?
import pandas as pd
import numpy as np
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import make_pipelin
我已经编写了下面的示例代码
import numpy as np
import pandas as pd
import csv
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.naive_bayes import MultinomialNB
text = ["this is dog" , "this is bull dog" , "
我试图计算tf-idf,下面是我的代码:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from nltk.corpus import stopwords
import numpy as np
import numpy.linalg as LA
train_set = ["The sky is blue.", "The sun is bright."] #Docume
我想为day_of_week列定义一个带有OneHotEncoder的管道。我不明白为什么我会得到一个ValueError:
import pandas as pd
from sklearn.compose import ColumnTransformer
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing im
我有一条管道和柱状变压器,还有一些自定义变压器
Input In [8], in <cell line: 21>()
19 # Fit all (1) models defined in our model-search object
20 print(X_train.shape)
---> 21 best = cv_model_search.fit(X_train,y_train)
File ~\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:891, in BaseSear
我需要在使用列名的管道中使用自定义转换器。但是,前面的管道转换将数据转换为numpy数组。我知道在管道合适之后,我可以从列转换器对象中检索列名,但是我需要在fit步骤中访问列名。下面的示例中的自定义转换器是一个简单的最小示例,仅用于说明,而不是真正的转换。
import pandas as pd
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder, MinMaxScaler
from
我正在应用以下代码对数据集中的分类数据进行计算,然后对其进行编码:
# Encoding categorical data
# Define a Pipeline with an imputing step using SimpleImputer prior to the OneHot encoding
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.impute import Simple
我正在读有关列转换器的scikitlearn教程。给定的示(https://scikit-learn.org/stable/modules/generated/sklearn.compose.make_column_selector.html#sklearn.compose.make_column_selector)工作,但当我尝试只选择几列时,它给出了错误。 MWE import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.compose import make_column_transformer
当引用param_grid for grid_search中包含在param_grid(这是管道的一部分)中的单个预处理器时,我想找出正确的命名约定。
环境和样本数据:
import seaborn as sns
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder, KBinsDiscretizer, MinMaxScaler
from
我正在尝试用下面的代码安装SkLearn DecisionTree的数据格式。但是我得到了一个错误Length of feature_names, 9 does not match number of features, 8。DecisionTree似乎只有经过一次编码转换后才有合适的分类特征,而不是数字特征。如何将数值特征包含在决策树模型中?
import pandas as pd
import numpy as np
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import make_pipelin
我想使用sklearn.compose.TransformedTargetRegressor,就像在中显示的那样。但是,转换器是自定义的,我遇到了一个错误。
在这个最小的例子中,目标值应该乘以10,然后在预测时再除以10。(在我的实际应用程序中,目标值必须从非数字格式转换为数字格式。)
import numpy as np
import sklearn
from sklearn.compose import TransformedTargetRegressor
from sklearn.linear_model import LinearRegression
class MyTransfor
我试图对一个文件进行分类,其中一个文件完全是培训,而另一个文件则完全是测试。有可能吗?我试过:
import pandas
import numpy as np
import pandas as pd
from sklearn import metrics
from sklearn import cross_validation
from sklearn.pipeline import Pipeline
from sklearn.metrics import confusion_matrix
from sklearn.linear_model import LogisticRegression