文章/答案/技术大牛

发布

社区首页 >问答首页 >如何在使用大查询safe_divide逻辑操作数据时停止重复数据？

问如何在使用大查询safe_divide逻辑操作数据时停止重复数据？
EN

Stack Overflow用户

提问于 2019-07-03 04:19:32

回答 1查看 225关注 0票数 0

我的问题是，在我的大型查询#标准SQL语句中添加了一些逻辑(safe_divide)之后，我开始收到重复的数据。此问题仅在我添加了此行之后才出现

SAFE_DIVIDE( u.weekly_capacity/25200, 1) AS TargetDailyHours

如果我不能解决这个问题，我可能只需要在data studio中编写所有的逻辑，因为当前的工作流程是嘉实->缝合-> Bigquery->data studio

在这个查询中，我使用表time_entires on MAX(updated_at)或最近的时间条目的左连接到用户当前处于活动状态的表users的完全连接。我希望实际操作数据，这样我就可以找到FTE实际工作时数/ weekly_capacity。但是，每当我编写逻辑或大型查询函数时，结果中都会出现重复吗？

SELECT DISTINCT outer_e.hours, outer_e.id, outer_e.updated_at, 
                outer_e.spent_date, outer_e.created_at, 
                outer_e.client_id, outer_e.user_id AS harvest_userid,
                u.is_admin, u.first_name, u.is_active, u.id AS user_id, 
                u.weekly_capacity,
                client.name as names,

--SAFE_DIVIDE( u.weekly_capacity /25200, 1) AS TargetDailyHours

FROM
  (SELECT  e.id, MAX(e.updated_at) AS updated_at FROM `harvest-experiment.harvest.time_entries` AS e   
  GROUP BY e.id LIMIT 1000
  ) AS inner_e

LEFT JOIN `harvest-experiment.harvest.time_entries` AS outer_e
ON inner_e.id = outer_e.id AND inner_e.updated_at = outer_e.updated_at
FULL JOIN ( SELECT DISTINCT id, first_name, weekly_capacity, is_active, is_admin FROM `harvest-experiment.harvest.users`WHERE is_active = true
) AS u
ON outer_e.user_id = u.id  

JOIN (SELECT DISTINCT id , 
         name FROM `harvest-experiment.harvest.clients`) AS client
ON outer_e.client_id = client.id

例如，结果中的周容量列将开始显示具有不同周容量数字的人

Row hours   id  updated_at  spent_date  created_at  client_id   harvest_userid  is_admin    first_name  is_active   user_id weekly_capacity TargetDailyHours    

1   
0.22
995005338
2019-05-07 15:14:13 UTC
2019-04-29 00:00:00 UTC
2019-04-29 15:30:40 UTC
6864491
2622223
false
Nolan
true
2622223
72000
2.857142857142857


2   
0.22
995005338
2019-05-07 15:14:13 UTC
2019-04-29 00:00:00 UTC
2019-04-29 15:30:40 UTC
6864491
2622223
false
Nolan
true
2622223
129600
5.142857142857143

在此结果中，用户Nolan将显示序列号为995005338的两次条目，时间为0.22小时，weekly_capacity号将从ROW:2中的129600变为ROW:1中的72000

sql

google-cloud-platform

google-bigquery

回答 1

Stack Overflow用户

发布于 2019-09-10 07:39:27

实际的问题在于u.weekly_capacity列，对于同一个用户，它有两个或更多不同的值。SAFE_DIVIDE操作只是反映了这个问题。

您可以跟踪这个重复的值到您的"u“子查询：

SELECT DISTINCT id, first_name, weekly_capacity, is_active, is_admin 
    FROM `harvest-experiment.harvest.users`
    WHERE is_active = true

users表包含两个或多个具有相同id的行，其中is_active=true。这似乎是数据的一个问题，所以为了避免重复行，您必须决定哪一个是您想要保留的值。例如，如果只想保留最大值，则可以使用GROUP BY：

SELECT id, first_name, MAX(weekly_capacity) as weekly_capacity, is_active, is_admin
    FROM `harvest-experiment.harvest.users`
    WHERE is_active = true
    GROUP BY id, first_name, is_active, is_admin

此外，如果users表包含足够的信息，则可以使用其他列进一步缩小结果范围

例如：

...
LEFT JOIN `harvest-experiment.harvest.time_entries` AS outer_e
    ON inner_e.id = outer_e.id AND inner_e.updated_at = outer_e.updated_at
FULL JOIN ( 
    SELECT DISTINCT id, first_name, weekly_capacity, is_active, is_admin, last_updated
        FROM `harvest-experiment.harvest.users` WHERE is_active = true
) AS u
ON outer_e.user_id = u.id AND outer_e.updated_at = u.last_updated
...

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/56859528

复制

相似问题

问如何在使用大查询safe_divide逻辑操作数据时停止重复数据？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在使用大查询safe_divide逻辑操作数据时停止重复数据？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在使用大查询safe_divide逻辑操作数据时停止重复数据？
EN