blocks|key|183079|text|您可以使用Composer+(气流)，并且仍然可以重用大部分现有的设置。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|183080|首先，您可以保留所有现有的云功能，并使用HTTP触发器+(或您喜欢的其他函数)在气流中触发它们。您需要做的唯一改变就是在气流中实现一个PubSub传感器，因此它会触发您的云功能(从而确保您可以从流程的末端控制编排)。|offset|length|183081|您的解决方案将是一个气流达格，它根据PubSub消息触发云函数，如果函数成功，则报告回气流，然后，如果两者都成功，则使用HTTP触发器或类似的方法触发第三个云函数，相同。|183082|最后一个注意事项，这不是立即直观的。气流不是为了运行作业本身，而是为了协调和管理依赖关系。事实上，使用由气流触发的云功能并不是一种反模式，实际上是一个最佳实践。|183083|在您的情况下，您可以100%25重写一些东西并使用BigQuery操作符，因为您不做任何处理，只触发查询/作业，但这个概念仍然正确，最佳实践是利用气流来确保事情发生在您需要的时间和顺序，而不是处理这些事情本身。(希望这有任何意义)|183084|entityMap|0|LINK|mutability|MUTABLE|url|https://cloud.google.com/functions/docs/calling/http|1|https://airflow.apache.org/docs/stable/_modules/airflow/contrib/sensors/pubsub_sensor.html|2|https://airflow.apache.org/docs/stable/concepts.html#dags^0|0|K|7|0|1V|9|1|0|C|2|2|0|0|0^^$0|@$1|2|3|4|5|6|7|X|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|Y|8|@]|9|@$D|Z|E|10|1|11]|$D|12|E|13|1|14]]|A|$]]|$1|F|3|G|5|6|7|15|8|@]|9|@$D|16|E|17|1|18]]|A|$]]|$1|H|3|I|5|6|7|19|8|@]|9|@]|A|$]]|$1|J|3|K|5|6|7|1A|8|@]|9|@]|A|$]]|$1|L|3|-4|5|6|7|1B|8|@]|9|@]|A|$]]]|M|$N|$5|O|P|Q|A|$R|S]]|T|$5|O|P|Q|A|$R|U]]|V|$5|O|P|Q|A|$R|W]]]]

You can use Cloud Composer (Airflow) and still reutilise most of your existing set-up. 

Firstly, you can keep all your existing Cloud Functions and use <a href="https://cloud.google.com/functions/docs/calling/http" rel="nofollow noreferrer">HTTP triggers</a> (or others you prefer) to trigger them in Airflow. The only change you will need to do is to implement a <a href="https://airflow.apache.org/docs/stable/_modules/airflow/contrib/sensors/pubsub_sensor.html" rel="nofollow noreferrer">PubSub Sensor</a> in Airflow, so it triggers your Cloud Functions (therefore ensuring you can control orchestration from end to end of your process). 

Your solution will be an Airflow <a href="https://airflow.apache.org/docs/stable/concepts.html#dags" rel="nofollow noreferrer">DAG</a> that triggers the Cloud Functions based on the PubSub messages, reports back to Airflow if the functions were successful and then, if both were successful, trigger the third Cloud Function with an HTTP trigger or similar, just the same.

A final note, which is not immediately intuitive. Airflow is not meant to run the jobs itself, it is meant to orchestrate and manage dependencies. The fact that you use Cloud Functions triggered by Airflow is not an anti-pattern, is actually a best practice. 

In your case, you could 100% rewrite a few things and use the BigQuery operators, as you don't do any processing, just triggering of queries/jobs, but the concept stays true, the best practice is leveraging Airflow to make sure things happen when and in the order you need, not to process those things itself. (Hope that made any sense)

blocks|key|183105|text|作为气流的替代方案，我会看一看"argo工作流程“->+https://github.com/argoproj/argo。|type|unstyled|depth|inlineStyleRanges|entityRanges|offset|length|data|183106|它没有作曲家的开销，特别是对于较小的工作量。|183107|我会：|183108|创建一个从外部工具读取pubsub消息的部署，并将其部署到kubernetes。|183109|基于消息执行工作流。工作流中的每个步骤都可以是一个云函数，打包在docker中。|183110|(我会用kubernetes作业替换云函数，然后由工作流触发。)|183111|用docker打包云功能并在kuberentes中运行是非常直接的。|183112|存在带有gsutil/bq/gcloud的预构建坞映像，因此您可以创建bash脚本，使用"bq“命令行执行bigquery中的内容。|183113|entityMap|0|LINK|mutability|MUTABLE|url|https://github.com/argoproj/argo^0|S|W|0|0|0|0|0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|Z|8|@]|9|@$A|10|B|11|1|12]]|C|$]]|$1|D|3|E|5|6|7|13|8|@]|9|@]|C|$]]|$1|F|3|G|5|6|7|14|8|@]|9|@]|C|$]]|$1|H|3|I|5|6|7|15|8|@]|9|@]|C|$]]|$1|J|3|K|5|6|7|16|8|@]|9|@]|C|$]]|$1|L|3|M|5|6|7|17|8|@]|9|@]|C|$]]|$1|N|3|O|5|6|7|18|8|@]|9|@]|C|$]]|$1|P|3|Q|5|6|7|19|8|@]|9|@]|C|$]]|$1|R|3|-4|5|6|7|1A|8|@]|9|@]|C|$]]]|S|$T|$5|U|V|W|C|$X|Y]]]]

As an alternative to airflow I would have looked at "argo workflows" -> <a href="https://github.com/argoproj/argo" rel="nofollow noreferrer">https://github.com/argoproj/argo</a>

It doesnt have the cost overhead the composer has, especially for smaller workloads.

I would have:

Created a deployment that read pubsub messages from external tool and deployed this to kubernetes.

Based on message executed a workflow. Each step in the workflow could be a cloud function, packaged in docker.

(I would have replaced the cloud function with a kubernetes job, which is then triggered by the workflow.)

It is pretty straight forward to package a cloud function with docker and run it in kuberentes.

There exists prebuilt docker images with gsutil/bq/gcloud, so you could create bash scripts that uses "bq" command line to execute stuff inside bigquery.

We are using pubsub and cloud functions in GCP to orchestrate our data workflow.

Our workflow is something like : 

<a href="https://i.stack.imgur.com/FUswT.png" rel="nofollow noreferrer">workflow_gcp</a>

pubsub1 and pubsub3 can be triggered at different times (ex: 1am and 4am). They are triggered daily, from an external source (our ETL, Talend).

Our cloud functions basically execute SQL in BigQuery. 

This is working well but we had to manually create a orchestration database to log when functions start and end (to answer the question "function X executed ok?"). And the orchestration logic is strongly coupled with our business logic, since our cloud function must know what functions has to be executed before, and what pubsub to trigger after.

So we're looking for a solution that separate the orchestration logic and the business logic.

I found that composer (airflow) could be a solution, but : 

<ul>
<li>it can't run cloud function natively (and with API it's very limited, 16 calls par 100 seconds per project)</li>
<li>we can use BigQuery inside airflow with BigQuery operators, but orchestration and business logics would be strongly coupled again</li>
</ul>

So what is the best practise in our case?

Thanks for your help

Best practise to orchestrate small python task (mostly executing SQL in BigQuery)

翻译质量差，导致语言生硬或混乱。

没有提供实际的解决方法或示例。

解答不清晰，无法理解或解决问题。

页面排版不美观，阅读体验差。

文章

问答

视频

学习中心

腾讯云实验室

直播

竞赛

腾讯云代码分析专区

腾讯iOA零信任安全管理系统专区

腾讯云架构师技术同盟交流圈

腾讯云数据库专区

腾讯云顾问专区

腾讯云原生专区

腾讯混元专区

腾讯云TCE专区

腾讯云Lighthouse专区

腾讯云HAI专区

腾讯云Edgeone专区

腾讯云存储专区

腾讯云智能专区

腾讯轻联专区 

腾讯云开发专区

TAPD专区

腾讯轻量云游戏服专区

腾讯云最具价值专家

腾讯云架构师技术同盟

腾讯云创作之星

腾讯云开发者先锋

腾讯云代码助手

云原生构建

TAPD 敏捷项目管理

Cloud Studio

SDK中心

API中心

命令行工具

涵盖代码开发、场景应用、自动测试全流程，助你从零构建专属AI助手

一站式MCP教程库，解锁AI应用新玩法

我们正在使用GCP中的公共和云功能来协调我们的数据工作流。我们的工作流程类似于：pubsub1和pubsub3可以在不同的时间触发(例如:凌晨1点和凌晨4点)。他们每天被触发，从外部来源(我们的ETL，Talend)。我们的云函数基本上在BigQuery中执行SQL。这很好，但是我们必须手动创建一个编排数据库来记录函数...

问编排小型python任务的最佳实践(主要是在BigQuery中执行SQL )
EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问编排小型python任务的最佳实践(主要是在BigQuery中执行SQL )EN