文章/答案/技术大牛

发布

社区首页 >问答首页 >Scikit-Learn with Dask-Distributed using nested？

问Scikit-Learn with Dask-Distributed using nested？
EN

Stack Overflow用户

提问于 2017-02-13 11:00:01

回答 1查看 492关注 0票数 2

例如，假设我有这样的代码：

vectorizer = CountVectorizer(input=u'filename', decode_error=u'replace')
classifier = OneVsRestClassifier(LinearSVC())
pipeline = Pipeline([
('vect', vectorizer),
('clf', classifier)])

 with parallel_backend('distributed',    scheduler_host=host_port):
    scores = cross_val_score(pipeline, X, y, cv=10)

如果我执行这段代码，我可以在dask webview (通过Bokeh)中看到创建了10个任务(每个文件夹1个)。但是，如果我执行：

(我知道x和y应该分为训练和测试，但这只是为了测试目的)。

with parallel_backend('distributed', scheduler_host=host_port):
    pipeline.fit(X,y)

我可以看到创建的每个y类对应一个任务(在我的例子中是20个)。有没有办法让cross_val_score和底层的OneVsRestClassifier并行运行？的原始代码。

with parallel_backend('distributed', scheduler_host=host_port):
    scores = cross_val_score(pipeline, X, y, cv=10)

在并行运行cross_val_score的同时运行OneVsRestClassifier，我就是看不出来？我是否必须使用dask-distributed手动实现此功能？

parallel-processing

scikit-learn

data-science

dask

joblib

回答 1

Stack Overflow用户

发布于 2017-09-08 16:15:57

目前，joblib的并行后端的设计非常有限，无法处理嵌套的并行调用。可在此处跟踪此问题：https://github.com/joblib/joblib/pull/538

我们还需要扩展joblib的分布式后端以使用http://distributed.readthedocs.io/en/latest/api.html#distributed.get_client

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/42196002

复制

相似问题

问Scikit-Learn with Dask-Distributed using nested？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Scikit-Learn with Dask-Distributed using nested？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Scikit-Learn with Dask-Distributed using nested？
EN