我打算在列中的值之间进行插值(SQL线性插值),并使用查询将其插入到新列中。根据我在网上的搜索,我怀疑LEAD分析函数可能有用。我是编写SQL查询的新手。因此,任何关于如何实现这一目标的见解都将非常有帮助。
示例数据集如下所述:
Emp Test_date Value
--- --------- -----
A 1/1/2001 null
A 1/2/2001 100
A 1/3/2001 null
A 1/4/2001 80
A 1/5/2001 null
A 1/6/2001 null
A 1/7/2001 75
我们的想法是获得第四列,它的值如下:
null
100
interpolatedValue1
80
interpolatedValue2
interpolatedValue3
75
Interpolatedvalue1将是100和80之间的内插值,
Interpolatedvalue2将是80和75之间的线性插值值。
InterpolatedValue3将是Interpolatedvalue2和75之间的线性插值值
这里是一个简单的线性插值是如何工作的:
给出两分(V1 at D1),(V3 at D3)。V2在D2的价值是什么?
(V3-V1)/(D3-D1) * (D2-D1) + V1
发布于 2019-05-09 03:39:14
这可能会被简化一点,但我相信这会得到你想要的答案。稍微棘手的部分是获取非空值之间的天数(即您正在填充的间隙的大小),然后获取该间隙中的位置:
-- CTE for sample data
with your_table (emp, test_date, value) as (
select 'A', date '2001-01-01', null from dual
union all select 'A', date '2001-01-02', 100 from dual
union all select 'A', date '2001-01-03', null from dual
union all select 'A', date '2001-01-04', 80 from dual
union all select 'A', date '2001-01-05', null from dual
union all select 'A', date '2001-01-06', null from dual
union all select 'A', date '2001-01-07', 75 from dual
)
-- actual query
select emp, test_date, value,
coalesce(value,
(next_value - prev_value) -- v3-v1
/ (count(*) over (partition by grp) + 1) -- d3-d1
* row_number() over (partition by grp order by test_date desc) -- d2-d1, indirectly
+ prev_value -- v1
) as interpolated
from (
select emp, test_date, value,
last_value(value ignore nulls)
over (partition by emp order by test_date) as prev_value,
first_value(value ignore nulls)
over (partition by emp order by test_date range between current row and unbounded following) as next_value,
row_number() over (partition by emp order by test_date) -
row_number() over (partition by emp order by case when value is null then 1 else 0 end, test_date) as grp
from your_table
)
order by test_date;
E TEST_DATE VALUE INTERPOLATED
- ---------- ---------- ------------
A 2001-01-01
A 2001-01-02 100 100
A 2001-01-03 90
A 2001-01-04 80 80
A 2001-01-05 76.6666667
A 2001-01-06 78.3333333
A 2001-01-07 75 75
我使用的是last_value
和first_value
,而不是lead
和lag
,但两者都可以工作。(我认为在大型数据集上,领先/滞后可能会更快)。grp
的计算结果是Tabibitosan。
发布于 2019-05-09 02:14:20
您可以使用lag(ignore nulls)
。您没有指定如何进行插值,但线性插值将是:
select emp, test_date,
coalesce(test_value,
( next_tv * (next_td - test_date) +
prev_tv * (test_date - prev_td)
) / (next_td - prev_td)
) as imputed_value
from (select t.*,
lag(test_value ignore nulls) over (partition by emp order by test_date) as prev_tv,
lag(case when test_value is not null then test_date end ignore nulls) over (partition by emp order by test_date) as prev_td,
lead(test_value ignore nulls) over (partition by emp order by test_date) as next_tv,
lead(case when test_value is not null then test_date end ignore nulls) over (partition by emp order by test_date) as next_td
from t
) t
https://stackoverflow.com/questions/56046612
复制相似问题