我有一些时间序列数据,其中有多组时间序列,其中每个Timeseries
实例都与Point
实例具有一对多关系。下面是数据的简化表示。
tables.py:
class Timeseries(Base):
__tablename__ = "timeseries"
id = Column("id", Integer, primary_key=True)
points = relationship("Point", back_populates="ts")
class Point(Base):
__tablename__ = "point"
id = Column("id", Integer, primary_key=True)
t = Column("t", Float)
v = Column("v", Float)
ts_id = Column(Integer, ForeignKey("timeseries.id"))
ts = relationship("Timeseries", back_populates="points")
query :我正在尝试提出一个包含以下列的查询:"timeseries_id","id","t","v","id_next","t_next","v_next“。也就是说,我希望能够按时间顺序看到每个点的数据以及时间序列中的下一个点数据,但我一直在努力获得一个不包含隐式连接元素的表?(编辑:重要的一点是,我希望能够在sqlalchemy中使用100%的查询和子查询对象来获得这个列表,因为我需要在进一步的连接、过滤器等中使用这个查询表。)下面是我得到的内容的基本起点(请注意,我没有运行这段代码,因为这是我的实际数据库的简化版本,但它的想法是相同的):
# The point data actually in the database.
sq = (session.query(
Timeseries.id.label("timeseries_id"),
Point.id,
Point.t,
Point.v)
.select_from(
join(Timeseries, Point, Timeseries.id==Point.ts_id))
.group_by('timeseries_id')
.subquery())
# first point manually added to each list in query
sq_first = (session.query(
Timeseries.id.label("timeseries_id"),
sa.literal_column("-1", Integer).label("id"), # Some unused Point.id value
sa.literal_column(-math.inf, Float).label("t"),
sa.literal_column(-math.inf, Float).label("v"))
.select_from(
join(Timeseries, Point, Timeseries.id==Point.ts_id))
.subquery())
# last point manually added to each list in query.
sq_last = (session.query(
Timeseries.id.label("timeseries_id"),
sa.literal_column("-2", Integer).label("id"), # Another unused Point.id value
sa.literal_column(math.inf, Float).label("t"),
sa.literal_column(math.inf, Float).label("v"))
.select_from(
join(Timeseries, Point, Timeseries.id==Point.ts_id))
.subquery())
# Append each timeseries in `sq` table with last point
sq_points_curr = session.query(sa.union_all(sq_first, sq)).subquery()
sq_points_next = session.query(sa.union_all(sq, sq_last)).subquery()
假设我到目前为止所做的是有用的,这是我被卡住的部分:
#I guess rename the columns in `sq_points_next` to append them by "_next"....
sq_points_next = (session.query(
sq_points_curr.c.timeseries_id
sq_points_curr.c.id.label("id_next"),
sq_points_curr.c.t.label("t_next"),
sq_points_curr.c.v.label("v_next"))
.subquery())
# ... and then perform a join along "timeseries_id" somehow to get the table I originally wanted...
sq_point_pairs = (session.query(
Timeseries.id.label("timeseries_id")
"id",
"t",
"v",
"id_next",
"t_next",
"v_next"
).select_from(
sq_points, sq_points_next, sq_points.timeseries_id==sq_points_next.timeseries_id)
)
我甚至不确定最后一个是否会在这一点上编译,因为它是从真实代码改编/简化的,但它不会产生一个相邻时间点的表,等等。
编辑(2019年8月10日)
以下来自Nathan的简化查询肯定是接近正常工作的正确方法,但会为sqlite带来错误。
sq = session.query(
Timeseries.id.label("timeseries_id"),
Point.t.label("point_t"),
func.lead(Point.t).over().label('point_after_t')
).select_from(
join(Timeseries, Point, Timeseries.id == Point.ts_id)
).order_by(Timeseries.id)
print(sq.all())
发布于 2019-03-24 03:27:05
假设您可以获得足够新版本的sqlite3 python模块(例如,通过使用Anaconda),您可以使用LEAD
窗口函数来实现您的目标。为了在进一步的查询中使用LEAD
函数的结果,还需要使用CTE。以下方法适用于您给出的模式:
sq = session.query(
Timeseries.id.label("timeseries_id"),
Point.id.label("point_id"),
Point.t.label("point_t"),
Point.v.label("point_v"),
func.lead(Point.id).over().label('point_after_id'),
func.lead(Point.v).over().label('point_after_v'),
func.lead(Point.t).over().label('point_after_t')).select_from(
join(Timeseries, Point, Timeseries.id == Point.ts_id)).order_by(Timeseries.id)
with_after = sq.cte()
session.execute(with_after.select().where(
with_after.c.point_v < with_after.c.point_after_v)).fetchall()
发布于 2019-03-22 05:34:39
为什么不检索与特定Timeseries
行相关的所有points
数据,然后将数据重新组合成您正在查找的对,而不是通过反复操作来让查询生成您正在寻找的成对结果呢?例如:
from operator import attrgetter
def to_dict(a, b):
# formats a pair of points rows into a dict object
return {
'timeseries_id': a.ts_id,
'id': a.id, 't': a.t, 'v': a.v,
'id_next': b.id, 't_next': b.t, 'v_next': b.v
}
def timeseries_pairs(session, ts_id):
# queries the db for particular Timeseries row, and combines points pairs
ts = session.query(Timeseries).\
filter(Timeseries.id == ts_id).\
first()
ts.points.sort(key=attrgetter('t'))
pairs = [to_dict(a, b) for a, b in zip(ts.points, ts.points[1:])]
last = ts.points[-1]
pairs.append({
'timeseries_id': last.ts_id,
'id': last.id, 't': last.t, 'v': last.v,
'id_next': None, 't_next': None, 'v_next': None
})
return pairs
# pass the session and a timeseries id to return a list of points pairs
timeseries_pairs(session, 1)
https://stackoverflow.com/questions/55234272
复制相似问题