blocks|key|996635|text|使用model.matrix函数：|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|996636|model.matrix(+~+Species+-+1,+data=iris+)|code-block|syntax|javascript|996637|entityMap^0|2|C|0|0^^$0|@$1|2|3|4|5|6|7|M|8|@$9|N|A|O|B|C]]|D|@]|E|$]]|$1|F|3|G|5|H|7|P|8|@]|D|@]|E|$I|J]]|$1|K|3|-4|5|6|7|Q|8|@]|D|@]|E|$]]]|L|$]]

Use the <code>model.matrix</code> function:

<pre><code>model.matrix( ~ Species - 1, data=iris )
</code></pre>

blocks|key|996672|text|如果您的数据框仅由因子组成(或者您正在处理全部为因子的变量子集)，则还可以使用ade4包中的acm.disjonctif函数：|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|996673|R>+library(ade4)
R>+df+<-data.frame(eggs+=+c("foo",+"foo",+"bar",+"bar"),+ham+=+c("red","blue","green","red"))
R>+acm.disjonctif(df)
++eggs.bar+eggs.foo+ham.blue+ham.green+ham.red
1++++++++0++++++++1++++++++0+++++++++0+++++++1
2++++++++0++++++++1++++++++1+++++++++0+++++++0
3++++++++1++++++++0++++++++0+++++++++1+++++++0
4++++++++1++++++++0++++++++0+++++++++0+++++++1|code-block|syntax|javascript|996674|与你描述的情况不完全一样，但它也是有用的……|996675|entityMap^0|13|4|1A|E|0|0|0^^$0|@$1|2|3|4|5|6|7|O|8|@$9|P|A|Q|B|C]|$9|R|A|S|B|C]]|D|@]|E|$]]|$1|F|3|G|5|H|7|T|8|@]|D|@]|E|$I|J]]|$1|K|3|L|5|6|7|U|8|@]|D|@]|E|$]]|$1|M|3|-4|5|6|7|V|8|@]|D|@]|E|$]]]|N|$]]

If your data frame is only made of factors (or you are working on a subset of variables which are all factors), you can also use the <code>acm.disjonctif</code> function from the <code>ade4</code> package :

<pre><code>R&gt; library(ade4)
R&gt; df &lt;-data.frame(eggs = c("foo", "foo", "bar", "bar"), ham = c("red","blue","green","red"))
R&gt; acm.disjonctif(df)
 eggs.bar eggs.foo ham.blue ham.green ham.red
1 0 1 0 0 1
2 0 1 1 0 0
3 1 0 0 1 0
4 1 0 0 0 1
</code></pre>

Not exactly the case you are describing, but it can be useful too...

blocks|key|782443|text|使用reshape2包的一种快速方法：|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|782444|require(reshape2)

>+dcast(df.original,+ham+~+eggs,+length)

Using+ham+as+value+column:+use+value_var+to+override.
++ham+bar+foo
1+++1+++0+++1
2+++2+++0+++1
3+++3+++1+++0
4+++4+++1+++0|code-block|syntax|javascript|782445|请注意，这将精确地生成所需的列名。|782446|entityMap^0|2|8|0|0|0^^$0|@$1|2|3|4|5|6|7|O|8|@$9|P|A|Q|B|C]]|D|@]|E|$]]|$1|F|3|G|5|H|7|R|8|@]|D|@]|E|$I|J]]|$1|K|3|L|5|6|7|S|8|@]|D|@]|E|$]]|$1|M|3|-4|5|6|7|T|8|@]|D|@]|E|$]]]|N|$]]

A quick way using the <code>reshape2</code> package:

<pre><code>require(reshape2)

&gt; dcast(df.original, ham ~ eggs, length)

Using ham as value column: use value_var to override.
 ham bar foo
1 1 0 1
2 2 0 1
3 3 1 0
4 4 1 0
</code></pre>

Note that this produces precisely the column names you want.

blocks|key|996616|text|可能虚拟变量与您想要的类似。然后，model.matrix很有用：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|996617|>+with(df.original,+data.frame(model.matrix(~eggs%2B0),+ham))
++eggsbar+eggsfoo+ham
1+++++++0+++++++1+++1
2+++++++0+++++++1+++2
3+++++++1+++++++0+++3
4+++++++1+++++++0+++4|code-block|syntax|javascript|996618|entityMap^0|0|0^^$0|@$1|2|3|4|5|6|7|I|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|J|8|@]|9|@]|A|$E|F]]|$1|G|3|-4|5|6|7|K|8|@]|9|@]|A|$]]]|H|$]]

probably dummy variable is similar to what you want.
Then, model.matrix is useful:

<pre><code>&gt; with(df.original, data.frame(model.matrix(~eggs+0), ham))
 eggsbar eggsfoo ham
1 0 1 1
2 0 1 2
3 1 0 3
4 1 0 4
</code></pre>

blocks|key|996822|text|来自nnet包的最新入门class.ind|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|996823|library(nnet)
+with(df.original,+data.frame(class.ind(eggs),+ham))
++bar+foo+ham
1+++0+++1+++1
2+++0+++1+++2
3+++1+++0+++3
4+++1+++0+++4|code-block|syntax|javascript|996824|entityMap^0|2|4|C|9|0|0^^$0|@$1|2|3|4|5|6|7|M|8|@$9|N|A|O|B|C]|$9|P|A|Q|B|C]]|D|@]|E|$]]|$1|F|3|G|5|H|7|R|8|@]|D|@]|E|$I|J]]|$1|K|3|-4|5|6|7|S|8|@]|D|@]|E|$]]]|L|$]]

A late entry <code>class.ind</code> from the <code>nnet</code> package 

<pre><code>library(nnet)
 with(df.original, data.frame(class.ind(eggs), ham))
 bar foo ham
1 0 1 1
2 0 1 2
3 1 0 3
4 1 0 4
</code></pre>

blocks|key|996787|text|刚遇到这个旧的线程，我想我应该添加一个函数，该函数利用ade4获取由因子和/或数字数据组成的数据帧，并返回一个包含因子的数据帧作为伪代码。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|996788|dummy+<-+function(df)+{++

++++NUM+<-+function(dataframe)dataframe[,sapply(dataframe,is.numeric)]
++++FAC+<-+function(dataframe)dataframe[,sapply(dataframe,is.factor)]

++++require(ade4)
++++if+(is.null(ncol(NUM(df))))+{
++++++++DF+<-+data.frame(NUM(df),+acm.disjonctif(FAC(df)))
++++++++names(DF)[1]+<-+colnames(df)[which(sapply(df,+is.numeric))]
++++}+else+{
++++++++DF+<-+data.frame(NUM(df),+acm.disjonctif(FAC(df)))
++++}
++++return(DF)
}+|code-block|syntax|javascript|996789|让我们试一试。|996790|df+<-data.frame(eggs+=+c("foo",+"foo",+"bar",+"bar"),+
++++++++++++ham+=+c("red","blue","green","red"),+x=rnorm(4))+++++
dummy(df)

df2+<-data.frame(eggs+=+c("foo",+"foo",+"bar",+"bar"),+
++++++++++++ham+=+c("red","blue","green","red"))++
dummy(df2)|996791|entityMap^0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|M|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|N|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|O|8|@]|9|@]|A|$]]|$1|I|3|J|5|D|7|P|8|@]|9|@]|A|$E|F]]|$1|K|3|-4|5|6|7|Q|8|@]|9|@]|A|$]]]|L|$]]

Just came across this old thread and thought I'd add a function that utilizes ade4 to take a dataframe consisting of factors and/or numeric data and returns a dataframe with factors as dummy codes.

<pre><code>dummy &lt;- function(df) { 

 NUM &lt;- function(dataframe)dataframe[,sapply(dataframe,is.numeric)]
 FAC &lt;- function(dataframe)dataframe[,sapply(dataframe,is.factor)]

 require(ade4)
 if (is.null(ncol(NUM(df)))) {
 DF &lt;- data.frame(NUM(df), acm.disjonctif(FAC(df)))
 names(DF)[1] &lt;- colnames(df)[which(sapply(df, is.numeric))]
 } else {
 DF &lt;- data.frame(NUM(df), acm.disjonctif(FAC(df)))
 }
 return(DF)
} 
</code></pre>

Let's try it.

<pre><code>df &lt;-data.frame(eggs = c("foo", "foo", "bar", "bar"), 
 ham = c("red","blue","green","red"), x=rnorm(4)) 
dummy(df)

df2 &lt;-data.frame(eggs = c("foo", "foo", "bar", "bar"), 
 ham = c("red","blue","green","red")) 
dummy(df2)
</code></pre>

blocks|key|782644|text|这里有一种更清晰的方法。我使用model.matrix创建虚拟布尔变量，然后将其合并回原始数据帧。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|782645|df.original+<-data.frame(eggs+=+c("foo",+"foo",+"bar",+"bar"),+ham+=+c(1,2,3,4))
df.original
#+++eggs+ham
#+1++foo+++1
#+2++foo+++2
#+3++bar+++3
#+4++bar+++4

#+Create+the+dummy+boolean+variables+using+the+model.matrix()+function.
>+mm+<-+model.matrix(~eggs-1,+df.original)
>+mm
#+++eggsbar+eggsfoo
#+1+++++++0+++++++1
#+2+++++++0+++++++1
#+3+++++++1+++++++0
#+4+++++++1+++++++0
#+attr(,"assign")
#+[1]+1+1
#+attr(,"contrasts")
#+attr(,"contrasts")$eggs
#+[1]+"contr.treatment"

#+Remove+the+"eggs"+prefix+from+the+column+names+as+the+OP+desired.
colnames(mm)+<-+gsub("eggs","",colnames(mm))
mm
#+++bar+foo
#+1+++0+++1
#+2+++0+++1
#+3+++1+++0
#+4+++1+++0
#+attr(,"assign")
#+[1]+1+1
#+attr(,"contrasts")
#+attr(,"contrasts")$eggs
#+[1]+"contr.treatment"

#+Combine+the+matrix+back+with+the+original+dataframe.
result+<-+cbind(df.original,+mm)
result
#+++eggs+ham+bar+foo
#+1++foo+++1+++0+++1
#+2++foo+++2+++0+++1
#+3++bar+++3+++1+++0
#+4++bar+++4+++1+++0

#+At+this+point,+you+can+select+out+the+columns+that+you+want.|code-block|syntax|javascript|782646|entityMap^0|0|0^^$0|@$1|2|3|4|5|6|7|I|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|J|8|@]|9|@]|A|$E|F]]|$1|G|3|-4|5|6|7|K|8|@]|9|@]|A|$]]]|H|$]]

Here is a more clear way to do it. I use model.matrix to create the dummy boolean variables and then merge it back into the original dataframe.

<pre><code>df.original &lt;-data.frame(eggs = c("foo", "foo", "bar", "bar"), ham = c(1,2,3,4))
df.original
# eggs ham
# 1 foo 1
# 2 foo 2
# 3 bar 3
# 4 bar 4

# Create the dummy boolean variables using the model.matrix() function.
&gt; mm &lt;- model.matrix(~eggs-1, df.original)
&gt; mm
# eggsbar eggsfoo
# 1 0 1
# 2 0 1
# 3 1 0
# 4 1 0
# attr(,"assign")
# [1] 1 1
# attr(,"contrasts")
# attr(,"contrasts")$eggs
# [1] "contr.treatment"

# Remove the "eggs" prefix from the column names as the OP desired.
colnames(mm) &lt;- gsub("eggs","",colnames(mm))
mm
# bar foo
# 1 0 1
# 2 0 1
# 3 1 0
# 4 1 0
# attr(,"assign")
# [1] 1 1
# attr(,"contrasts")
# attr(,"contrasts")$eggs
# [1] "contr.treatment"

# Combine the matrix back with the original dataframe.
result &lt;- cbind(df.original, mm)
result
# eggs ham bar foo
# 1 foo 1 0 1
# 2 foo 2 0 1
# 3 bar 3 1 0
# 4 bar 4 1 0

# At this point, you can select out the columns that you want.
</code></pre>

blocks|key|996871|text|我需要一个更灵活的函数来“分解”因子，并基于ade4包中的acm.disjonctif函数创建了一个函数。这使您可以选择分解的值，在acm.disjonctif中为0和1。它只会分解具有“几个”级别的因素。保留数值列。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|996872|#+Function+to+explode+factors+that+are+considered+to+be+categorical,
#+i.e.,+they+do+not+have+too+many+levels.
#+-+data:+The+data.frame+in+which+categorical+variables+will+be+exploded.
#+-+values:+The+exploded+values+for+the+value+being+unequal+and+equal+to+a+level.
#+-+max_factor_level_fraction:+Maximum+number+of+levels+as+a+fraction+of+column+length.+Set+to+1+to+explode+all+factors.
#+Inspired+by+the+acm.disjonctif+function+in+the+ade4+package.
explode_factors+<-+function(data,+values+=+c(-0.8,+0.8),+max_factor_level_fraction+=+0.2)+{
++exploders+<-+colnames(data)[sapply(data,+function(col){
++++++is.factor(col)+&&+nlevels(col)+<=+max_factor_level_fraction+*+length(col)
++++})]
++if+(length(exploders)+>+0)+{
++++exploded+<-+lapply(exploders,+function(exp){
++++++++col+<-+data[,+exp]
++++++++n+<-+length(col)
++++++++dummies+<-+matrix(values[1],+n,+length(levels(col)))
++++++++dummies[(1:n)+%2B+n+*+(unclass(col)+-+1)]+<-+values[2]
++++++++colnames(dummies)+<-+paste(exp,+levels(col),+sep+=+'_')
++++++++dummies
++++++})
++++#+Only+keep+numeric+data.
++++data+<-+data[sapply(data,+is.numeric)]
++++#+Add+exploded+values.
++++data+<-+cbind(data,+exploded)
++}
++return(data)
}|code-block|syntax|javascript|996873|entityMap^0|0|0^^$0|@$1|2|3|4|5|6|7|I|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|J|8|@]|9|@]|A|$E|F]]|$1|G|3|-4|5|6|7|K|8|@]|9|@]|A|$]]]|H|$]]

I needed a function to 'explode' factors that is a bit more flexible, and made one based on the acm.disjonctif function from the ade4 package.
This allows you to choose the exploded values, which are 0 and 1 in acm.disjonctif. It only explodes factors that have 'few' levels. Numeric columns are preserved.

<pre><code># Function to explode factors that are considered to be categorical,
# i.e., they do not have too many levels.
# - data: The data.frame in which categorical variables will be exploded.
# - values: The exploded values for the value being unequal and equal to a level.
# - max_factor_level_fraction: Maximum number of levels as a fraction of column length. Set to 1 to explode all factors.
# Inspired by the acm.disjonctif function in the ade4 package.
explode_factors &lt;- function(data, values = c(-0.8, 0.8), max_factor_level_fraction = 0.2) {
 exploders &lt;- colnames(data)[sapply(data, function(col){
 is.factor(col) &amp;&amp; nlevels(col) &lt;= max_factor_level_fraction * length(col)
 })]
 if (length(exploders) &gt; 0) {
 exploded &lt;- lapply(exploders, function(exp){
 col &lt;- data[, exp]
 n &lt;- length(col)
 dummies &lt;- matrix(values[1], n, length(levels(col)))
 dummies[(1:n) + n * (unclass(col) - 1)] &lt;- values[2]
 colnames(dummies) &lt;- paste(exp, levels(col), sep = '_')
 dummies
 })
 # Only keep numeric data.
 data &lt;- data[sapply(data, is.numeric)]
 # Add exploded values.
 data &lt;- cbind(data, exploded)
 }
 return(data)
}
</code></pre>

I have an R data frame containing a factor that I want to "expand" so that for each factor level, there is an associated column in a new data frame, which contains a 1/0 indicator. E.g., suppose I have: 

<pre><code>df.original &lt;-data.frame(eggs = c("foo", "foo", "bar", "bar"), ham = c(1,2,3,4))
</code></pre>

I want: 

<pre><code>df.desired &lt;- data.frame(foo = c(1,1,0,0), bar=c(0,0,1,1), ham=c(1,2,3,4))
</code></pre>

Because for certain analyses for which you need to have a completely numeric data frame (e.g., principal component analysis), I thought this feature might be built in. Writing a function to do this shouldn't be too hard, but I can foresee some challenges relating to column names and if something exists already, I'd rather use that.

Automatically expanding an R factor into a collection of 1/0 indicator variables for every factor level

我有一个包含要“展开”的因子的R数据框，因此对于每个因子级别，在包含1/0指示符的新数据框中有一个关联的列。例如，假设我有：df.original <-data.frame(eggs = c("foo", "foo", "bar", "bar"), ham = c(1,2,3,4))我想要：df.desired <- data.frame(foo = c(1,1,0,0), bar=c(0,0,

问自动将R因子扩展为每个因子级别的1/0指标变量集合
EN

回答 8

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问自动将R因子扩展为每个因子级别的1/0指标变量集合EN

回答 8

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问自动将R因子扩展为每个因子级别的1/0指标变量集合
EN