blocks|key|17522|text|您可以将每一项转换为百分比，然后将每一项应用于已知的数量。然后使用新值的和。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|17523|(1-+(in_degee+/+15)+*+2000)+%2B(1-+(betweenness_centrality+/+35000)+*+2000)+=？|17524|entityMap^0|0|0^^$0|@$1|2|3|4|5|6|7|F|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|G|8|@]|9|@]|A|$]]|$1|D|3|-4|5|6|7|H|8|@]|9|@]|A|$]]]|E|$]]

you could translate each to a percentage and then apply each to a known qunantity. Then use the sum of the new value. 

((1 - (in_degee / 15) * 2000) + ((1 - (betweenness_centrality / 35000) * 2000) = ?

blocks|key|113846|text|您似乎对潜在的发行版有很强的感觉。一个自然的重新标度就是用它的概率来替换每个变量。或者，如果您的模型不完整，请选择一个近似于此的转换。如果不能做到这一点，这里有一个相关的方法:如果你有大量的单变量数据来建立一个直方图(每个变量)，你可以根据它是在0-+10+%25的百分位还是10-20%25的百分位.90-100%25百分位数的基础上，将每一个变量转换成一个10点的标度。通过构造，这些转换后的变量在1,2，...,10上有一个统一的分布，您可以任意组合它们。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|113847|entityMap^0|0^^$0|@$1|2|3|4|5|6|7|D|8|@]|9|@]|A|$]]|$1|B|3|-4|5|6|7|E|8|@]|9|@]|A|$]]]|C|$]]

You seem to have a strong sense of the underlying distributions. A natural rescaling is to replace each variate with its probability. Or, if your model is incomplete, choose a transformation that approximately acheives that. Failing that, here's a related approach: If you have a lot of univariate data from which to build a histogram (of each variate), you could convert each to a 10 point scale based on whether it is in the 0-10% percentile or 10-20%-percentile ...90-100% percentile. These transformed variates have, by construction, a uniform distribution on 1,2,...,10, and you can combine them however you wish.

blocks|key|18336|text|规范到0,1将是我简短的回答建议，合并这两个值，因为它将保持您提到的分布形状，并且应该解决合并这些值的问题。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|18337|如果这两个变量的分布是不同的--听起来很可能--这并不能真正给出我的想法，这是衡量每个变量在其给定分布中的位置的一个综合指标。你必须想出一个度量来确定这个值在给定分布中的位置，这可以有很多种方法，其中之一就是确定与给定值的平均值相距有多少标准差，然后你可以以某种方式组合这两个值来得到你的指数。(加法可能已不再足够)|18338|你必须弄清楚你所看到的数据集最有意义的是什么。标准差对您的应用程序来说可能毫无意义，但是您需要查看与分布相关的统计度量，并将它们组合起来，而不是将绝对值标准化与否。|18339|entityMap^0|0|0|0^^$0|@$1|2|3|4|5|6|7|H|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|I|8|@]|9|@]|A|$]]|$1|D|3|E|5|6|7|J|8|@]|9|@]|A|$]]|$1|F|3|-4|5|6|7|K|8|@]|9|@]|A|$]]]|G|$]]

normalizing to [0,1] would be my short answer recommendation to combine the 2 values as it will maintain the distribution shape as you mentioned and should solve the problem of combining the values.

if the distribution of the 2 variables is different which sounds likely this won't really give you what i think your after, which is a combined measure of where each variable is within its given distribution. you would have to come up with a metric which determines where in the given distribution the value lies, this could be done many ways, one of which would be to determine how many standard deviations away from the mean the given value is, you could then combine these 2 values in some way to get your index. (addition may no longer be sufficient) 

you'd have to work out what makes the most sense for the data sets your looking at. standard deviations may well be meaningless for your application, but you need to look at statistical measures that related to the distribution and combine those, rather than combing absolute values, normalized or not.

blocks|key|18341|text|非常有趣的问题。像这样的东西能起作用吗？|type|unstyled|depth|inlineStyleRanges|entityRanges|data|18342|让我们假设我们希望将这两个变量缩放到-1的范围，例如betweeness_centrality的例子，其范围为0-35000。|18343|18344|按变量范围的顺序选择一个大的数字。举个例子，让我们选择25,000+|ordered-list-item|18345|，在最初的范围内创建25,000桶，在新的范围内创建25,000桶--1，1|18346|，对于每个数字x--我发现它落在原来的垃圾箱中的bin#。让这是B-i+|18347|，在范围-1,1中找到B-i的范围。|18348|使用b-i+in+-1范围的最大值/分钟作为x-i|18349|的缩放版本。|18350|这保留了幂律分布，同时也将其缩小为-1,1，并且没有(x-均值)/sd所经历的问题。|18351|entityMap^0|0|0|0|0|0|0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|V|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|W|8|@]|9|@]|A|$]]|$1|D|3|-4|5|6|7|X|8|@]|9|@]|A|$]]|$1|E|3|F|5|G|7|Y|8|@]|9|@]|A|$]]|$1|H|3|I|5|G|7|Z|8|@]|9|@]|A|$]]|$1|J|3|K|5|G|7|10|8|@]|9|@]|A|$]]|$1|L|3|M|5|G|7|11|8|@]|9|@]|A|$]]|$1|N|3|O|5|G|7|12|8|@]|9|@]|A|$]]|$1|P|3|Q|5|6|7|13|8|@]|9|@]|A|$]]|$1|R|3|S|5|6|7|14|8|@]|9|@]|A|$]]|$1|T|3|-4|5|6|7|15|8|@]|9|@]|A|$]]]|U|$]]

Very interesting question. Could something like this work:

Lets assume that we want to scale both the variables to a range of [-1,1]
Take the example of betweeness_centrality that has a range of 0-35000

<ol>
<li>Choose a large number in the order of the range of the variable. As an example lets choose 25,000</li>
<li>create 25,000 bins in the original range [0-35000] and 25,000 bins in the new range [-1,1]</li>
<li>For each number x-i find out the bin# it falls in the original bin. Let this be B-i</li>
<li>Find the range of B-i in the range [-1,1]. </li>
<li>Use either the max/min of the range of B-i in [-1,1] as the scaled version of x-i.</li>
</ol>

This preserves the power law distribution while also scaling it down to [-1,1] and does not have the problem as experienced by (x-mean)/sd.

I'd like to combine a few metrics of nodes in a social network graph into a single value for rank ordering the nodes: 

<code>in_degree + betweenness_centrality = informal_power_index</code>

The problem is that <code>in_degree</code> and <code>betweenness_centrality</code> are measured on different scales, say 0-15 vs 0-35000 and follow a power law distribution (at least definitely not the normal distribution)

Is there a good way to rescale the variables so that one won't dominate the other in determining the <code>informal_power_index</code>? 

Three obvious approaches are:

<ul>
<li>Standardizing the variables (subtract <code>mean</code> and divide by <code>stddev</code>). This seems it would squash the distribution too much, hiding the massive difference between a value in the long tail and one near the peak.</li>
<li>Re-scaling variables to the range [0,1] by subtracting <code>min(variable)</code> and dividing by <code>max(variable)</code>. This seems closer to fixing the problem since it won't change the shape of the distribution, but maybe it won't really address the issue? In particular the means will be different.</li>
<li>Equalize the means by dividing each value by <code>mean(variable)</code>. This won't address the difference in scales, but perhaps the mean values are more important for the comparison?</li>
</ul>

Any other ideas?

Correct way to standardize/scale/normalize multiple variables following power law distribution for use in linear combination

翻译质量差，导致语言生硬或混乱。

没有提供实际的解决方法或示例。

解答不清晰，无法理解或解决问题。

页面排版不美观，阅读体验差。

文章

问答

视频

学习中心

腾讯云实验室

直播

竞赛

腾讯云代码分析专区

腾讯iOA零信任安全管理系统专区

腾讯云架构师技术同盟交流圈

腾讯云数据库专区

腾讯云顾问专区

腾讯云原生专区

腾讯混元专区

腾讯云TCE专区

腾讯云Lighthouse专区

腾讯云HAI专区

腾讯云Edgeone专区

腾讯云存储专区

腾讯云智能专区

腾讯轻联专区 

腾讯云开发专区

TAPD专区

腾讯轻量云游戏服专区

腾讯云最具价值专家

腾讯云架构师技术同盟

腾讯云创作之星

腾讯云开发者先锋 

腾讯云代码助手

CODING DevOps

Cloud Studio

SDK中心

API中心

命令行工具

我想将社交网络图中的几个节点组合成一个值，用于排序节点：in_degree + betweenness_centrality = informal_power_index问题是，in_degree和betweenness_centrality是在不同的尺度上测量的，例如0-15 vs 0-35000，并且遵循幂律分布(...

问用于线性组合的服从幂律分布的多变量标准化/标度/规范化的正确方法
EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问用于线性组合的服从幂律分布的多变量标准化/标度/规范化的正确方法EN