blocks|key|547775|text|没有内置功能，但是有什么问题吗？|type|unstyled|depth|inlineStyleRanges|entityRanges|data|547776|probs+=+clf.predict_proba(test)
best_n+=+np.argsort(probs,+axis=1)[-n:]|code-block|syntax|javascript|547777|正如其中一条注释所建议的，应该将[-n:]更改为[:,-n:]|offset|length|style|CODE|547778|probs+=+clf.predict_proba(test)
best_n+=+np.argsort(probs,+axis=1)[:,-n:]|547779|entityMap^0|0|0|G|5|O|7|0|0^^$0|@$1|2|3|4|5|6|7|Q|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|R|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|S|8|@$I|T|J|U|K|L]|$I|V|J|W|K|L]]|9|@]|A|$]]|$1|M|3|N|5|D|7|X|8|@]|9|@]|A|$E|F]]|$1|O|3|-4|5|6|7|Y|8|@]|9|@]|A|$]]]|P|$]]

There is no built-in function, but what is wrong with 

<pre><code>probs = clf.predict_proba(test)
best_n = np.argsort(probs, axis=1)[-n:]
</code></pre>

<h2>?</h2>

As suggested by one of the comment, should change <code>[-n:]</code> to <code>[:,-n:]</code>

<pre><code>probs = clf.predict_proba(test)
best_n = np.argsort(probs, axis=1)[:,-n:]
</code></pre>

blocks|key|1195992|text|希望Andreas能在这方面有所帮助。当丢失=‘铰链’时，predict_probs不可用。当损失=‘铰链’时，要获得最高的n级：|type|unstyled|depth|inlineStyleRanges|entityRanges|offset|length|data|1195993|calibrated_clf+=+CalibratedClassifierCV(clfSDG,+cv=3,+method='sigmoid')
model+=+calibrated_clf.fit(train.data,+train.label)

probs+=+model.predict_proba(test_data)
sorted(+zip(+calibrated_clf.classes_,+probs[0]+),+key=lambda+x:x[1]+)[-n:]|code-block|syntax|javascript|1195994|不确定clfSDG.predict和calibrated_clf.predict是否总是预测同一个类。|1195995|entityMap|0|LINK|mutability|MUTABLE|url|https://stackexchange.com/users/344360/andreas-mueller^0|2|7|0|0|0|0^^$0|@$1|2|3|4|5|6|7|S|8|@]|9|@$A|T|B|U|1|V]]|C|$]]|$1|D|3|E|5|F|7|W|8|@]|9|@]|C|$G|H]]|$1|I|3|J|5|6|7|X|8|@]|9|@]|C|$]]|$1|K|3|-4|5|6|7|Y|8|@]|9|@]|C|$]]]|L|$M|$5|N|O|P|C|$Q|R]]]]

Hopefully, <a href="https://stackexchange.com/users/344360/andreas-mueller">Andreas</a> will help with this. predict_probs is not available when loss='hinge'. To get top n class when loss='hinge' do:

<pre><code>calibrated_clf = CalibratedClassifierCV(clfSDG, cv=3, method='sigmoid')
model = calibrated_clf.fit(train.data, train.label)

probs = model.predict_proba(test_data)
sorted( zip( calibrated_clf.classes_, probs[0] ), key=lambda x:x[1] )[-n:]
</code></pre>

Not sure if clfSDG.predict and calibrated_clf.predict will always predict the same class.

blocks|key|727210|text|我知道这是answered...but，我可以多加一点.|type|unstyled|depth|inlineStyleRanges|entityRanges|data|727211|#both+preds+and+truths+are+same+shape+m+by+n+(m+is+number+of+predictions+and+n+is+number+of+classes)
def+top_n_accuracy(preds,+truths,+n):
++++best_n+=+np.argsort(preds,+axis=1)[:,-n:]
++++ts+=+np.argmax(truths,+axis=1)
++++successes+=+0
++++for+i+in+range(ts.shape[0]):
++++++if+ts[i]+in+best_n[i,:]:
++++++++successes+%2B=+1
++++return+float(successes)/ts.shape[0]|code-block|syntax|javascript|727212|它又快又脏，但我觉得它很有用。我们可以添加自己的错误检查，等等。|727213|entityMap^0|0|0|0^^$0|@$1|2|3|4|5|6|7|K|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|L|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|M|8|@]|9|@]|A|$]]|$1|I|3|-4|5|6|7|N|8|@]|9|@]|A|$]]]|J|$]]

I know this has been answered...but I can add a bit more...

<pre><code>#both preds and truths are same shape m by n (m is number of predictions and n is number of classes)
def top_n_accuracy(preds, truths, n):
 best_n = np.argsort(preds, axis=1)[:,-n:]
 ts = np.argmax(truths, axis=1)
 successes = 0
 for i in range(ts.shape[0]):
 if ts[i] in best_n[i,:]:
 successes += 1
 return float(successes)/ts.shape[0]
</code></pre>

It's quick and dirty but I find it useful. One can add their own error checking, etc..

blocks|key|727221|text|argsort按升序给出结果，如果你想用不寻常的循环或混乱来拯救自己，你可以使用一个简单的技巧。|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|727222|probs+=+clf.predict_proba(test)
best_n+=+np.argsort(-probs,+axis=1)[:,+:n]|code-block|syntax|javascript|727223|否定概率将变成最小到最大，因此你可以采取上n的结果降序。|727224|entityMap^0|0|7|0|0|0^^$0|@$1|2|3|4|5|6|7|O|8|@$9|P|A|Q|B|C]]|D|@]|E|$]]|$1|F|3|G|5|H|7|R|8|@]|D|@]|E|$I|J]]|$1|K|3|L|5|6|7|S|8|@]|D|@]|E|$]]|$1|M|3|-4|5|6|7|T|8|@]|D|@]|E|$]]]|N|$]]

<code>argsort</code> gives results in ascending order, if you want to save yourself with unusual loops or confusion you can use a simple trick.
<pre><code>probs = clf.predict_proba(test)
best_n = np.argsort(-probs, axis=1)[:, :n]
</code></pre>
Negating the probabilities will turn smallest to largest and hence you can take top-n results in descending order.

blocks|key|547872|text|正如@FredFoo在How+do+I+get+indices+of+N+maximum+values+in+a+NumPy+array?中所描述的，一个更快的方法是使用argpartition。|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|547873|较新的NumPy版本(1.8及更高版本)有一个名为argpartition的函数。要获得四大元素的指数，请执行以下操作|blockquote|547874|>>>+a+=+np.array([9,+4,+4,+3,+3,+9,+0,+4,+6,+0])
>>>+a+array([9,+4,+4,+3,+3,+9,+0,+4,+6,+0])
>>>+ind+=+np.argpartition(a,+-4)[-4:]
>>>+ind+array([1,+5,+8,+0])
>>>+a[ind]+array([4,+9,+6,+9])|code-block|syntax|javascript|547875|与argsort不同，这个函数在最坏的情况下以线性时间运行，但是返回的索引没有排序，从计算a[ind]的结果可以看出。如果你也需要的话，把它们整理一下：|547876|>>>+ind[np.argsort(a[ind])]+array([1,+8,+5,+0])+|547877|以这种方式按排序顺序获取top-k元素需要O(n+%2B+k+log+k)时间。|547878|entityMap|0|LINK|mutability|MUTABLE|url|https://stackoverflow.com/questions/6910641/how-do-i-get-indices-of-n-maximum-values-in-a-numpy-array/38884051|1|http://docs.scipy.org/doc/numpy/reference/generated/numpy.argpartition.html^0|2D|C|B|1M|0|2D|C|1|0|P|C|0|0|1|7|19|6|0|0|C|5|L|E|0^^$0|@$1|2|3|4|5|6|7|13|8|@$9|14|A|15|B|C]]|D|@$9|16|A|17|1|18]|$9|19|A|1A|1|1B]]|E|$]]|$1|F|3|G|5|H|7|1C|8|@$9|1D|A|1E|B|C]]|D|@]|E|$]]|$1|I|3|J|5|K|7|1F|8|@]|D|@]|E|$L|M]]|$1|N|3|O|5|H|7|1G|8|@$9|1H|A|1I|B|C]|$9|1J|A|1K|B|C]]|D|@]|E|$]]|$1|P|3|Q|5|K|7|1L|8|@]|D|@]|E|$L|M]]|$1|R|3|S|5|H|7|1M|8|@$9|1N|A|1O|B|C]|$9|1P|A|1Q|B|C]]|D|@]|E|$]]|$1|T|3|-4|5|6|7|1R|8|@]|D|@]|E|$]]]|U|$V|$5|W|X|Y|E|$Z|10]]|11|$5|W|X|Y|E|$Z|12]]]]

As @FredFoo described in <a href="https://stackoverflow.com/questions/6910641/how-do-i-get-indices-of-n-maximum-values-in-a-numpy-array/38884051">How do I get indices of N maximum values in a NumPy array?</a> a faster method would be to use <a href="http://docs.scipy.org/doc/numpy/reference/generated/numpy.argpartition.html" rel="nofollow noreferrer"><code>argpartition</code></a>.
<blockquote>
Newer NumPy versions (1.8 and up) have a function called <code>argpartition</code>
for this. To get the indices of the four largest elements, do
</blockquote>
<pre><code>&gt;&gt;&gt; a = np.array([9, 4, 4, 3, 3, 9, 0, 4, 6, 0])
&gt;&gt;&gt; a array([9, 4, 4, 3, 3, 9, 0, 4, 6, 0])
&gt;&gt;&gt; ind = np.argpartition(a, -4)[-4:]
&gt;&gt;&gt; ind array([1, 5, 8, 0])
&gt;&gt;&gt; a[ind] array([4, 9, 6, 9])
</code></pre>
<blockquote>
Unlike <code>argsort</code>, this function runs in linear time in the worst case, but the returned indices are not
sorted, as can be seen from the result of evaluating <code>a[ind]</code>. If you
need that too, sort them afterwards:
</blockquote>
<pre><code>&gt;&gt;&gt; ind[np.argsort(a[ind])] array([1, 8, 5, 0]) 
</code></pre>
<blockquote>
To get the <code>top-k</code> elements in sorted order in this way takes <code>O(n + k log k)</code> time.
</blockquote>

blocks|key|727310|text|我编写了一个函数，该函数输出带有前n个预测值及其概率的dataframe，并将其与类名联系起来。希望这能帮上忙！|type|unstyled|depth|inlineStyleRanges|entityRanges|data|727311|def+return_top_n_pred_prob_df(n,+model,+X_test,+column_name):
++predictions+=+model.predict_proba(X_test)
++preds_idx+=+np.argsort(-predictions)+
++classes+=+pd.DataFrame(model.classes_,+columns=['class_name'])
++classes.reset_index(inplace=True)
++top_n_preds+=+pd.DataFrame()
++for+i+in+range(n):
++++++++top_n_preds[column_name+%2B+'_prediction_{}_num'.format(i)]+=+++++[preds_idx[doc][i]+for+doc+in+range(len(X_test))]
++++top_n_preds[column_name+%2B+'_prediction_{}_probability'.format(i)]+=+[predictions[doc][preds_idx[doc][i]]+for+doc+in+range(len(X_test))]
++++top_n_preds+=+top_n_preds.merge(classes,+how='left',+left_on=+column_name+%2B+'_prediction_{}_num'.format(i),+right_on='index')
++++top_n_preds+=+top_n_preds.rename(columns={'class_name':+column_name+%2B+'_prediction_{}'.format(i)})
++++try:+top_n_preds.drop(columns=['index',+column_name+%2B+'_prediction_{}_num'.format(i)],+inplace=True)+
++++except:+pass
++return+top_n_preds|code-block|syntax|javascript|727312|entityMap^0|0|0^^$0|@$1|2|3|4|5|6|7|I|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|J|8|@]|9|@]|A|$E|F]]|$1|G|3|-4|5|6|7|K|8|@]|9|@]|A|$]]]|H|$]]

I wrote a function that outputs a dataframe with the top n predictions and their probabilities, and ties it back to class names. Hope this is helpful!
<pre><code>def return_top_n_pred_prob_df(n, model, X_test, column_name):
 predictions = model.predict_proba(X_test)
 preds_idx = np.argsort(-predictions) 
 classes = pd.DataFrame(model.classes_, columns=['class_name'])
 classes.reset_index(inplace=True)
 top_n_preds = pd.DataFrame()
 for i in range(n):
 top_n_preds[column_name + '_prediction_{}_num'.format(i)] = [preds_idx[doc][i] for doc in range(len(X_test))]
 top_n_preds[column_name + '_prediction_{}_probability'.format(i)] = [predictions[doc][preds_idx[doc][i]] for doc in range(len(X_test))]
 top_n_preds = top_n_preds.merge(classes, how='left', left_on= column_name + '_prediction_{}_num'.format(i), right_on='index')
 top_n_preds = top_n_preds.rename(columns={'class_name': column_name + '_prediction_{}'.format(i)})
 try: top_n_preds.drop(columns=['index', column_name + '_prediction_{}_num'.format(i)], inplace=True) 
 except: pass
 return top_n_preds
</code></pre>

<pre class="lang-py prettyprint-override"><code>from sklearn.feature_extraction.text import TfidfVectorizer
import numpy as np
from sklearn import linear_model
arr=['dogs cats lions','apple pineapple orange','water fire earth air', 'sodium potassium calcium']
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(arr)
feature_names = vectorizer.get_feature_names()
Y = ['animals', 'fruits', 'elements','chemicals']
T=["eating apple roasted in fire and enjoying fresh air"]
test = vectorizer.transform(T)
clf = linear_model.SGDClassifier(loss='log')
clf.fit(X,Y)
x=clf.predict(test)
#prints: elements
</code></pre>

In the above code, <code>clf.predict()</code> prints only 1 best prediction for a sample from list X.
I am interested in top 3 predictions for a particular sample in the list X, i know the function <code>predict_proba</code>/<code>predict_log_proba</code> returns a list of all probabilities for each feature in list Y, but it has to sorted and then associated with the features in list Y before getting the top 3 results.
Is there any direct and efficient way?

How to get Top 3 or Top N predictions using sklearn's SGDClassifier

翻译质量差，导致语言生硬或混乱。

没有提供实际的解决方法或示例。

解答不清晰，无法理解或解决问题。

页面排版不美观，阅读体验差。

文章

问答

视频

学习中心

腾讯云实验室

直播

竞赛

腾讯云代码分析专区

腾讯iOA零信任安全管理系统专区

腾讯云架构师技术同盟交流圈

腾讯云数据库专区

腾讯云顾问专区

腾讯云原生专区

腾讯混元专区

腾讯云TCE专区

腾讯云Lighthouse专区

腾讯云HAI专区

腾讯云Edgeone专区

腾讯云存储专区

腾讯云智能专区

腾讯轻联专区 

腾讯云开发专区

TAPD专区

腾讯轻量云游戏服专区

腾讯云最具价值专家

腾讯云架构师技术同盟

腾讯云创作之星

腾讯云开发者先锋

腾讯云代码助手

云原生构建

TAPD 敏捷项目管理

Cloud Studio

SDK中心

API中心

命令行工具

涵盖代码开发、场景应用、自动测试全流程，助你从零构建专属AI助手

一站式MCP教程库，解锁AI应用新玩法

from sklearn.feature_extraction.text import TfidfVectorizerimport numpy as npfrom sklearn import linear_modelarr=['dogs cats lions','apple pineapple orange','wa...

问如何利用sklearn的SGDClassifier获得前3位或前N位的预测
EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何利用sklearn的SGDClassifier获得前3位或前N位的预测EN