前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >【GitLab CI/CD】:Cache vs artifacts

【GitLab CI/CD】:Cache vs artifacts

作者头像
WEBJ2EE
发布2021-01-04 14:55:42
2.6K0
发布2021-01-04 14:55:42
举报
文章被收录于专栏:WebJ2EEWebJ2EE
代码语言:javascript
复制
目录
1. 用途?
2. 机制?
    2.1. Cache vs artifacts
    2.2. Good caching practices
        2.2.1. Share caches across the same branch
        2.2.2. Share caches across different branches
        2.2.3. Disable cache on specific jobs
        2.2.4. Common use cases:Cache Node.js dependencies
    2.3. Where the caches are stored?
    2.4. How archiving and extracting works
    2.5. Clearing the cache manually
    2.6. artifacts

1. 用途?

回顾一下:【GitLab CI/CD】:一些有用的基础知识,在默认Git strategy(fetch)下,每个 Job 执行之前,都会进行 git clean 操作,也就是说 job 执行过程中产生的中间结果,都会被清理,多数情况是没问题的。但总有一些例外情况,我们需要之前 job 执行过程中产生的中间结果,最具代表性的两类:

  • npm install 安装到本地的依赖。如果每次执行 job,都重新下载、安装依赖,那执行效率会非常低,此时需要使用GitLab CI/CD 的 cache 特性解决。
  • 上一个 Job 产生的中间结果(例如:package、image),下一个 Job 需要对它进行后续处理(否则就只能把所有脚本写在同一个 Job 中)。此时需要使用 GitLab CI/CD 的 artifact 特性解决。

2. 机制?

GitLab CI/CD provides a caching mechanism that can be used to save time when your jobs are running.

Caching is about speeding the time a job is executed by reusing the same content of a previous job. Use caching when you are developing software that depends on other libraries which are fetched via the internet during build time.

If caching is enabled, it’s shared between pipelines and jobs at the project level by default. Caches are not shared across projects.

2.1. Cache vs artifacts

If you use cache and artifacts to store the same path in your jobs, the cache might be overwritten because caches are restored before artifacts.

Don’t use caching for passing artifacts between stages, as it is designed to store runtime dependencies needed to compile the project:

  • cache: For storing project dependencies
    • Caches can increase the speed of a given job in subsequent pipelines. You can store downloaded dependencies so that they don’t have to be fetched from the internet again. Dependencies include things like npm packages, Go vendor packages, and so on. You can configure a cache to pass intermediate build results between stages, but you should use artifacts instead.
  • artifacts: Use for stage results that are passed between stages.
    • Artifacts are files that are generated by a job so they can be stored and uploaded. You can fetch and use artifacts in jobs in later stages of the same pipeline. You can’t create an artifact in a job in one stage, and use this artifact in a different job in the same stage. This data is not available in different pipelines, but can be downloaded from the UI.
    • If you download modules while building your application, you can declare them as artifacts and subsequent stage jobs can use them.
    • You can define an expiry time so artifacts are deleted after a defined time. Use dependencies to control which jobs fetch the artifacts.
    • Artifacts can also be used to make files available for download after a pipeline completes, like a build image.

Caches:

  • Are disabled if not defined globally or per job (using cache:).
  • Are available for all jobs in your .gitlab-ci.yml if enabled globally.
  • Can be used in subsequent pipelines by the same job in which the cache was created (if not defined globally).
  • Are stored where GitLab Runner is installed and uploaded to S3 if distributed cache is enabled.
  • If defined per job, are used:
    • By the same job in a subsequent pipeline.
    • By subsequent jobs in the same pipeline, if they have identical dependencies.

Artifacts:

  • Are disabled if not defined per job (using artifacts:).
  • Can only be enabled per job, not globally.
  • Are created during a pipeline and can be used by subsequent jobs in the same pipeline.
  • Are always uploaded to GitLab (known as coordinator).
  • Can have an expiration value for controlling disk usage (30 days by default).

Both artifacts and caches define their paths relative to the project directory, and can’t link to files outside it.

2.2. Good caching practices

2.2.1. Share caches across the same branch

Define a cache with the key: ${CI_COMMIT_REF_SLUG} so that jobs of each branch always use the same cache:

代码语言:javascript
复制
cache:
  key: ${CI_COMMIT_REF_SLUG}

This configuration is safe from accidentally overwriting the cache, but merge requests get slow first pipelines. The next time a new commit is pushed to the branch, the cache is re-used and jobs run faster.

To enable per-job and per-branch caching:

代码语言:javascript
复制
cache:
  key: "$CI_JOB_NAME-$CI_COMMIT_REF_SLUG"

To enable per-stage and per-branch caching:

代码语言:javascript
复制
cache:
  key: "$CI_JOB_STAGE-$CI_COMMIT_REF_SLUG"

备注:Predefined environment variables

  • CI_COMMIT_REF_NAME
    • The branch or tag name for which project is built
  • CI_COMMIT_REF_SLUG
    • $CI_COMMIT_REF_NAME lowercased, shortened to 63 bytes, and with everything except 0-9 and a-z replaced with -. No leading / trailing -. Use in URLs, host names and domain names.
  • CI_JOB_NAME
    • The name of the job as defined in .gitlab-ci.yml
  • CI_JOB_STAGE
    • The name of the stage as defined in .gitlab-ci.yml

2.2.2. Share caches across different branches

To share a cache across all branches and all jobs, use the same key for everything:

代码语言:javascript
复制
cache:
  key: one-key-to-rule-them-all

To share caches between branches, but have a unique cache for each job:

代码语言:javascript
复制
cache:
  key: ${CI_JOB_NAME}

2.2.3. Disable cache on specific jobs

If you have defined the cache globally, it means that each job uses the same definition. You can override this behavior per-job, and if you want to disable it completely, use an empty hash:

代码语言:javascript
复制
job:
  cache: {}

2.2.4. Common use cases:Cache Node.js dependencies

The most common use case of caching is to avoid downloading content like dependencies or libraries repeatedly between subsequent runs of jobs. Node.js packages, PHP packages, Ruby gems, Python libraries, and others can all be cached.

By default, npm stores cache data in the home folder ~/.npm but you can’t cache things outside of the project directory. Instead, we tell npm to use ./.npm, and cache it per-branch:

代码语言:javascript
复制
build_sef:
  tags:
    - webdepartment
  cache:
    key: ${CI_COMMIT_REF_SLUG}
    paths:
      - sef/.npm/
      - sef/sef_web_legacy/build/.npm/
      - sef/sef_web_modern/build/.npm/
  stage: build
  script:
    - cd sef
    -   npm ci --cache .npm --prefer-offline
    -   npm run changelog
    -   cd -
    - cd sef/sef_web_legacy/build
    -   npm ci --cache .npm --prefer-offline
    -   npx grunt buildcss
    -   npx grunt compressJS
    -   cd -
    - cd sef/sef_web_modern/build
    -   npm ci --cache .npm --prefer-offline
    -   npm run buildcss
    -   npm run buildjs
    -   cd -

2.3. Where the caches are stored?

The runner is responsible for storing the cache, so it’s essential to know where it’s stored. All the cache paths defined under a job in .gitlab-ci.yml are archived in a single cache.zip file and stored in the runner’s configured cache location. By default, they are stored locally in the machine where the runner is installed and depends on the type of the executor.

2.4. How archiving and extracting works

This example has two jobs that belong to two consecutive stages:

代码语言:javascript
复制
stages:
  - build
  - test

before_script:
  - echo "Hello"

job A:
  stage: build
  script:
    - mkdir vendor/
    - echo "build" > vendor/hello.txt
  cache:
    key: build-cache
    paths:
      - vendor/
  after_script:
    - echo "World"

job B:
  stage: test
  script:
    - cat vendor/hello.txt
  cache:
    key: build-cache
    paths:
      - vendor/

If you have one machine with one runner installed, and all jobs for your project run on the same host:

  1. Pipeline starts.
  2. job A runs.
  3. before_script is executed.
  4. script is executed.
  5. after_script is executed.
  6. cache runs and the vendor/ directory is zipped into cache.zip. This file is then saved in the directory based on the runner’s setting and the cache: key.
  7. job B runs.
  8. The cache is extracted (if found).
  9. before_script is executed.
  10. script is executed.
  11. Pipeline finishes.

By using a single runner on a single machine, you don’t have the issue where job B might execute on a runner different from job A. This setup guarantees the cache can be reused between stages. It only works if the execution goes from the build stage to the test stage in the same runner/machine. Otherwise, the cache might not be available.

2.5. Clearing the cache manually

If you want to avoid editing .gitlab-ci.yml, you can clear the cache via the GitLab UI:

  1. Navigate to your project’s CI/CD > Pipelines page.
  2. Click on the Clear runner caches button to clean up the cache.

2.6. artifacts

  • artifacts is used to specify a list of files and directories that are attached to the job when it succeeds, fails, or always.
  • The artifacts are sent to GitLab after the job finishes. They are available for download in the GitLab UI if the size is not larger than the maximum artifact size.
  • Job artifacts are only collected for successful jobs by default, and artifacts are restored after caches.

示例:

代码语言:javascript
复制
build_sef:
  tags:
    - webdepartment
  cache:
    key: ${CI_COMMIT_REF_SLUG}
    paths:
      - sef/.npm/
      - sef/sef_web_legacy/build/.npm/
      - sef/sef_web_modern/build/.npm/
  stage: build
  script:
    - cd sef
    -   npm ci --cache .npm --prefer-offline
    -   npm run changelog
    -   cd -
    - cd sef/sef_web_legacy/build
    -   npm ci --cache .npm --prefer-offline
    -   npx grunt buildcss
    -   npx grunt compressJS
    -   cd -
    - cd sef/sef_web_modern/build
    -   npm ci --cache .npm --prefer-offline
    -   npm run buildcss
    -   npm run buildjs
    -   cd -
    - cd sef
    -   mvn -q versions:set -DnewVersion=$CI_COMMIT_TAG
    -   mvn -q versions:commit
    -   mvn -q -Dmaven.test.skip=true clean deploy
    -   cd -
  artifacts:
    paths:
      - sef/sef_web/target/sef_web.war
      - sef/sef_wing/target/sef_wing.war
      - sef/sef_muif/target/sef_muif.war

参考:

Cache dependencies in GitLab CI/CD: https://docs.gitlab.com/ee/ci/caching/


本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2020-12-27,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 WebJ2EE 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档