文章/答案/技术大牛

发布

问按条件进行最小归一化
EN

Stack Overflow用户

提问于 2018-11-02 03:19:29

回答 1查看 42关注 0票数 0

我有一个下面的数据集，其中包含每周按组细分的销售数据和其他数据：

df

  Market  Week Sales  diff_data1    another2
1      1     1     5          30         -40
2      1     2     4           7          -8
3      1     3     7         100           9
4      1     4    11          92          50
5      2     1     8           0           8
6      2     2     5           0          14
7      2     3     8           9          98
8      2     4     1           3           3

我的目标是通过两种不同的方法对数据进行标准化:均值归一化和最小归一化。对销售数据进行平均归一化，而对非销售数据进行最小归一化。我认为我的均值归一化是正确的，但最小归一化有点棘手，因为我对所选择的数据有条件。下面是我目前所拥有的。

##Function to standardizing variables
group = "Market"
date = "Week"

##Function to standardize sales by dividing by the standard deviation of sales
normalized_mean <- function(x){
  return(x/(sd(x)))
}

##Function to standardize variables by subtracting min
##Used for non-sales data
normalized_min<-function(x){
  out<- ifelse(x>0, ((x-min(x)) / sd(x)),
               ifelse(x<0, ((x+max(x)) / sd(x)), 
                      ifelse(x==0, 0,0)))
  return(out)
}

if (!("Sales" %in% colnames(df))){
  df_index<-df %>% 
    dplyr::group_by(!!sym(group)) %>% 
    dplyr::mutate_at(vars(-one_of(!!group,!!date)), normalized_min)
} else {
  df_index<-df %>% 
    dplyr::group_by(!!sym(group)) %>% 
    dplyr::mutate_at(vars(-one_of(!!group,!!date)), normalized_mean)

}

它的当前输出是：

df_index

  Market  Week Sales  diff_data1   another2
1      1     1 1.62        0.655     -1.07  
2      1     2 1.29        0.153     -0.213 
3      1     3 2.26        2.18       0.240 
4      1     4 3.55        2.01       1.33  
5      2     1 2.41        0          0.178 
6      2     2 1.51        0          0.311 
7      2     3 2.41        2.12       2.17  
8      2     4 0.302       0.707      0.0666

输出应如下所示：

  Market  Week Sales  diff_data1    another2
1      1     1 1.62        0.501     0.26679  
2      1     2 1.29            0     1.12053
3      1     3 2.26         2.02     1.30729
4      1     4 3.55         1.85     2.40114 
5      2     1 2.41            0     7.93342
6      2     2 1.51            0     13.9334
7      2     3 2.41        2.121     97.9334
8      2     4 0.302       0.707     2.93342

我的问题是下面这个公式。

如何使条件适用于这类示例？看起来它没有考虑到x>0、x<0和x==0的条件。

normalized_min<-function(x){
  out<- ifelse(x>0, ((x-min(x)) / sd(x)),
               ifelse(x<0, ((x+max(x)) / sd(x)), 
                      ifelse(x==0, 0,0)))
  return(out)
}

任何帮助都是最好的，谢谢！

indexing

normalization

回答 1

Stack Overflow用户

发布于 2018-11-02 04:57:28

在"Sales“前去掉感叹号很好，我想你有一个打字错误：

if ("Sales" %in% colnames(df)){
  df_index<-df %>% 
    dplyr::group_by(!!sym(group)) %>% 
    dplyr::mutate_at(vars(-one_of(!!group,!!date)), normalized_min)
} else {
  df_index<-df %>% 
    dplyr::group_by(!!sym(group)) %>% 
    dplyr::mutate_at(vars(-one_of(!!group,!!date)), normalized_mean)

}

输出：

  Market  Week Sales diff_data1 another2
   <int> <int> <dbl>      <dbl>    <dbl>
1      1     1 0.323      0.502    0.267
2      1     2 0          0        1.12 
3      1     3 0.969      2.03     1.31 
4      1     4 2.26       1.85     2.40 
5      2     1 2.11       0        0.111
6      2     2 1.21       0        0.244
7      2     3 2.11       2.12     2.11 
8      2     4 0          0.707    0

这当然取决于你真正想要的是什么。

在您的描述中，它似乎应该计算归一化平均值(这实际上也是您在输出中获得的平均值)，但从您的示例中可以看出，只要名称中有销售额，它就应该开始计算归一化的最小值。

如果从数据集中删除"Sales“列，它也可以很好地使用初始函数：

df <- df[,-3]


if (!("Sales" %in% colnames(df))){
  df_index<-df %>% 
    dplyr::group_by(!!sym(group)) %>% 
    dplyr::mutate_at(vars(-one_of(!!group,!!date)), normalized_min)
} else {
  df_index<-df %>% 
    dplyr::group_by(!!sym(group)) %>% 
    dplyr::mutate_at(vars(-one_of(!!group,!!date)), normalized_mean)

}

  Market  Week diff_data1 another2
   <int> <int>      <dbl>    <dbl>
1      1     1      0.502    0.267
2      1     2      0        1.12 
3      1     3      2.03     1.31 
4      1     4      1.85     2.40 
5      2     1      0        0.111
6      2     2      0        0.244
7      2     3      2.12     2.11 
8      2     4      0.707    0

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/53107995

复制

相似问题

问按条件进行最小归一化
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问按条件进行最小归一化EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问按条件进行最小归一化
EN