我现在对先验函数有问题。问题是我有一个csv,其数据如下所示:
Desc,Cantidad,Valor,Fecha,Lugar,UUID
DESCUENTO,1,-3405,2014-10-04T14:02:57,53100,7F74AFC0-FC28-4105-89A5-CD99416B50C7
DESCUENTO,1,-3405,2014-10-04T14:02:57,53100,7F74AFC0-FC28-4105-89A5-CD99416B50C7
DESCUENTO,1,-170,2014-09-05T15:10:24,83000,7F0C7F0B-BCFC-4FCA-8740-B36AE9932869
Descuento de TYK Dia,1,-156,2014-06-19T16:52:27,86280,1E08E51E-213A-4EE0-8FE9-492E677FF0C9
Descuento de TYK Dia,1,-139,2014-04-25T10:52:44,86280,AB802E63-2D0D-4B47-AB70-DDE007929F9F
DESCUENTO,1,-63,2014-07-04T13:53:10,83000,5B1F12BB-71DE-4734-A774-8D377757A880
REDONDEO,1,-1,2014-03-29T10:50:59,0,5B241EFA-6654-46EA-B47A-3CB76C5EA923
DESCUENTO,1,-1,2014-10-04T14:02:57,53100,7F74AFC0-FC28-4105-89A5-CD99416B50C7
DESCUENTO,1,-1,2014-10-04T14:02:57,53100,7F74AFC0-FC28-4105-89A5-CD99416B50C7
LAVADO,1,0,2014-05-27T18:18:11,44500,e5d540d6-0f98-4993-ec09-56887cd4a27d
TUA,1,0,2014-09-29T10:20:31,6500,1d8ada06-a8a1-4bd8-9356-851b5da28108
Transportación Aerea,1,0,2014-10-03T10:41:09,6500,5fc3925a-d08a-4cdc-be7e-ca02bd488d5b
OBSEQUIO LAVADO DE CARROCERIA,1,0,2014-04-07T13:45:55,91800,8148ab07-5804-4b2b-b37c-5323b394907a
Arroz Al Azafran Combos A,1,0,2014-08-19T11:50:34,11520,f09c23e6-dc60-4aaf-a1b8-1506d38f3585
Frijoles Charros A,1,0,2014-08-19T11:50:34,11520,f09c23e6-dc60-4aaf-a1b8-1506d38f3585
Pepsi Ch A,1,0,2014-08-19T11:50:34,11520,f09c23e6-dc60-4aaf-a1b8-1506d38f3585
FECHA DE CONSUMO 18/07/2014,1,0,2014-07-19T18:01:45,6060,0f0465aa-a75b-4f95-8e3b-43c13452cafb
CAMBIO DE ACEITE DE MOTOR,1,0,2014-02-01T11:18:53,39890,5BDF0742-CDF5-4F6B-9937-DF1CB00274ED
CAMBIO DE FILTRO DE ACEITE,1,0,2014-02-01T11:18:53,39890,5BDF0742-CDF5-4F6B-9937-DF1CB00274ED要下载整个CSV (https://github.com/antonio1695/BaseX/blob/master/facturas1.csv)文件,只需单击find,然后您将看到该文件。所以我做的是:
> df1 <- read.csv("facturas1.csv")
> rules <- apriori(df1,parameter=list(support=0.01,confidence=0.5))
Error in asMethod(object) : 
column(s) 3 not logical or a factor. Discretize the columns first.尽管如此,问题是这些列已经是离散的,如果我更改数据以使其将第3列替换为第2列,反之亦然。它仍然说,第3栏是不符合逻辑的,也不是一个因素,它应该说它关于第2栏。谢谢!
发布于 2016-06-28 18:24:32
经过一些研究,我发现先验函数必须有间隔才能正常工作,所以当您使用离散化时,必须添加参数“类别”来选择所需的间隔。不间隔是不可能的。我会在这里发布代码:
我决定采取20个间隔,这都取决于间隔中的值被重复的频率。
df$Valor <- discretize(df$Valor, method="frequency",categories = 20)希望它能帮到别人。
发布于 2016-06-27 06:00:12
 library(arules)
 df1 <- read.csv("https://raw.githubusercontent.com/antonio1695/BaseX/master/facturas1.csv")
 trans <- as(df1, "transactions")
  Error in asMethod(object) : 
  column(s) 3 not logical or a factor. Discretize the columns first. 让我们看看数据框架:
str(df1)
 'data.frame':  10510 obs. of  6 variables:
 $ Desc    : Factor w/ 3927 levels "0","00000215R0 - LIQUIDO DE FRENOS",..: 1490 1490 1490 1491 1491 1490 3209 1490 1490 2238 ...
 $ Cantidad: Factor w/ 85 levels "","1","-1","10",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ Valor   : int  -3405 -3405 -170 -156 -139 -63 -1 -1 -1 0 ...
 $ Fecha   : Factor w/ 4054 levels "1294","2014-01-06T11:10:21",..: 4041 4041 3443 1794 596 2125 241 4041 4041 1215 ...
 $ Lugar   : Factor w/ 982 levels "","0","1000",..: 487 487 802 848 848 802 2 487 487 373 ...
 $ UUID    : Factor w/ 4056 levels "0019A60D-78F8-E341-8D3E-9786201FE017",..: 1988 1988 1979 456 2711 1423 1424 1988 1988 3658 ...英勇是一个数字(int),需要谨慎处理!例如,使用discretize():
df1$Valor <- discretize(df1$Valor)
head(df1$Valor)
 [1] [-3405, 2400) [-3405, 2400) [-3405, 2400) [-3405, 2400) [-3405, 2400)
 [6] [-3405, 2400)
 Levels: [-3405, 2400) [ 2400, 8204) [ 8204,14009]现在您可以创建事务并应用APRIORI:
trans <- as(df1, "transactions")
rules <- apriori(trans,parameter=list(support=0.01,confidence=0.5))
rules
 set of 84 rules https://stackoverflow.com/questions/38001635
复制相似问题