前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Python气候数据分析的简要指南+代码

Python气候数据分析的简要指南+代码

作者头像
气象学家
发布2022-04-18 20:01:49
2.5K0
发布2022-04-18 20:01:49
举报
文章被收录于专栏:气象学家气象学家

By: Ali Ahmadalipour (LinkedIn, Twitter)

April 2022

Link:

https://colab.research.google.com/drive/1B7gFBSr0eoZ5IbsA0lY8q3XL8n-3BOn4#scrollTo=_00P5Wq6whH2

新段落

Link to a LinkedIn article I wrote explaining the details: https://www.linkedin.com/pulse/python-climate-data-analysis-tutorial-code-ali-ahmadalipour/

Basic info for running the code in a notebook environment:

  • To run a cell, press Shift + Enter
  • To restart the analysis (i.e. clean the variables and RAM, but keep the downloaded data), restart the runtime from the top menu
  • To completely start over (i.e. clean RAM and temporary storage that may contain downloaded data or any saved figure), go to Runtime>Manage sessions->TERMINATE.

Basic info about Google Colab Notebooks:

  • If the page stays idle for several minutes, the runtime may be terminated, and you'll need to connect again.
  • Google provides you about 70GB free temporary disk space. Any data you download or any model output is by default saved temporarily in this terminal. Therefore, if the runtime is terminated, you'll lose that data. If you would like to keep the data or the outputs, you can connect to your Google drive and choose any specific directory there. Here's how to connect to your google drive:
代码语言:javascript
复制
# from google.colab import drive
# drive.mount('/content/drive')

1. Basic (download, extract & save data, concat, groupby, select):

In this section, we will download and analyze gridded precipitation data (from CPC). The goal is to extract daily data, find monthly totals, find spatial average of precipitation in a given domain, plot the results, and save the outputs as netcdf files. We will work with some of the commonly used functionalities of xarray (a powerful python library for analyzing geospatial data) such as:

  • open_mfdataset (in xarray, which opens multiple files at the same time)
  • concatenate datasets
  • groupby
  • slicing and selecting data
  • save as netcdf
代码语言:javascript
复制
# Later in the advanced section of this tutorial (section 3.2), we will be analyzing 
# zarr data format, and the pre-installed xarray on google colab is not able to 
# do so. Thus, we need to intall the complete version of xarray to be able to do it.
!pip install xarray[complete] # this may take a few seconds
代码语言:javascript
复制
Requirement already satisfied: xarray[complete] in /usr/local/lib/python3.7/dist-packages (0.18.2)
Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.7/dist-packages (from xarray[complete]) (1.21.5)
Requirement already satisfied: pandas>=1.0 in /usr/local/lib/python3.7/dist-packages (from xarray[complete]) (1.3.5)
Requirement already satisfied: setuptools>=40.4 in /usr/local/lib/python3.7/dist-packages (from xarray[complete]) (57.4.0)
Requirement already satisfied: seaborn in /usr/local/lib/python3.7/dist-packages (from xarray[complete]) (0.11.2)
Requirement already satisfied: netCDF4 in /usr/local/lib/python3.7/dist-packages (from xarray[complete]) (1.5.8)
Requirement already satisfied: cftime in /usr/local/lib/python3.7/dist-packages (from xarray[complete]) (1.6.0)
Requirement already satisfied: bottleneck in /usr/local/lib/python3.7/dist-packages (from xarray[complete]) (1.3.4)
Requirement already satisfied: dask[complete] in /usr/local/lib/python3.7/dist-packages (from xarray[complete]) (2.12.0)
Collecting h5netcdf
  Downloading h5netcdf-1.0.0-py2.py3-none-any.whl (24 kB)
Collecting pydap
  Downloading Pydap-3.2.2-py3-none-any.whl (2.3 MB)
[K     |████████████████████████████████| 2.3 MB 16.9 MB/s 
[?25hCollecting fsspec
  Downloading fsspec-2022.3.0-py3-none-any.whl (136 kB)
[K     |████████████████████████████████| 136 kB 61.5 MB/s 
[?25hCollecting zarr
  Downloading zarr-2.11.2-py3-none-any.whl (153 kB)
[K     |████████████████████████████████| 153 kB 64.4 MB/s 
[?25hCollecting numbagg
  Downloading numbagg-0.2.1-py2.py3-none-any.whl (18 kB)
Requirement already satisfied: pooch in /usr/local/lib/python3.7/dist-packages (from xarray[complete]) (1.6.0)
Collecting nc-time-axis
  Downloading nc_time_axis-1.4.0-py3-none-any.whl (15 kB)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.7/dist-packages (from xarray[complete]) (3.2.2)
Collecting cfgrib
  Downloading cfgrib-0.9.10.1-py3-none-any.whl (45 kB)
[K     |████████████████████████████████| 45 kB 3.0 MB/s 
[?25hCollecting rasterio
  Downloading rasterio-1.2.10-cp37-cp37m-manylinux1_x86_64.whl (19.3 MB)
[K     |████████████████████████████████| 19.3 MB 1.3 MB/s 
[?25hRequirement already satisfied: scipy in /usr/local/lib/python3.7/dist-packages (from xarray[complete]) (1.4.1)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.7/dist-packages (from pandas>=1.0->xarray[complete]) (2.8.2)
Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.7/dist-packages (from pandas>=1.0->xarray[complete]) (2018.9)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil>=2.7.3->pandas>=1.0->xarray[complete]) (1.15.0)
Requirement already satisfied: attrs>=19.2 in /usr/local/lib/python3.7/dist-packages (from cfgrib->xarray[complete]) (21.4.0)
Collecting eccodes>=0.9.8
  Downloading eccodes-1.4.1.tar.gz (54 kB)
[K     |████████████████████████████████| 54 kB 2.8 MB/s 
[?25hRequirement already satisfied: click in /usr/local/lib/python3.7/dist-packages (from cfgrib->xarray[complete]) (7.1.2)
Requirement already satisfied: cffi in /usr/local/lib/python3.7/dist-packages (from eccodes>=0.9.8->cfgrib->xarray[complete]) (1.15.0)
Collecting findlibs
  Downloading findlibs-0.0.2.tar.gz (6.2 kB)
Requirement already satisfied: pycparser in /usr/local/lib/python3.7/dist-packages (from cffi->eccodes>=0.9.8->cfgrib->xarray[complete]) (2.21)
Requirement already satisfied: cloudpickle>=0.2.1 in /usr/local/lib/python3.7/dist-packages (from dask[complete]->xarray[complete]) (1.3.0)
Requirement already satisfied: bokeh>=1.0.0 in /usr/local/lib/python3.7/dist-packages (from dask[complete]->xarray[complete]) (2.3.3)
Collecting distributed>=2.0
  Downloading distributed-2022.2.0-py3-none-any.whl (837 kB)
[K     |████████████████████████████████| 837 kB 62.2 MB/s 
[?25hRequirement already satisfied: PyYaml in /usr/local/lib/python3.7/dist-packages (from dask[complete]->xarray[complete]) (3.13)
Requirement already satisfied: toolz>=0.7.3 in /usr/local/lib/python3.7/dist-packages (from dask[complete]->xarray[complete]) (0.11.2)
Collecting partd>=0.3.10
  Downloading partd-1.2.0-py3-none-any.whl (19 kB)
Requirement already satisfied: Jinja2>=2.9 in /usr/local/lib/python3.7/dist-packages (from bokeh>=1.0.0->dask[complete]->xarray[complete]) (2.11.3)
Requirement already satisfied: packaging>=16.8 in /usr/local/lib/python3.7/dist-packages (from bokeh>=1.0.0->dask[complete]->xarray[complete]) (21.3)
Requirement already satisfied: typing-extensions>=3.7.4 in /usr/local/lib/python3.7/dist-packages (from bokeh>=1.0.0->dask[complete]->xarray[complete]) (3.10.0.2)
Requirement already satisfied: pillow>=7.1.0 in /usr/local/lib/python3.7/dist-packages (from bokeh>=1.0.0->dask[complete]->xarray[complete]) (7.1.2)
Requirement already satisfied: tornado>=5.1 in /usr/local/lib/python3.7/dist-packages (from bokeh>=1.0.0->dask[complete]->xarray[complete]) (5.1.1)
Requirement already satisfied: psutil>=5.0 in /usr/local/lib/python3.7/dist-packages (from distributed>=2.0->dask[complete]->xarray[complete]) (5.4.8)
Requirement already satisfied: zict>=0.1.3 in /usr/local/lib/python3.7/dist-packages (from distributed>=2.0->dask[complete]->xarray[complete]) (2.1.0)
Requirement already satisfied: msgpack>=0.6.0 in /usr/local/lib/python3.7/dist-packages (from distributed>=2.0->dask[complete]->xarray[complete]) (1.0.3)
Collecting distributed>=2.0
  Downloading distributed-2022.1.1-py3-none-any.whl (830 kB)
[K     |████████████████████████████████| 830 kB 57.9 MB/s 
[?25h  Downloading distributed-2022.1.0-py3-none-any.whl (822 kB)
[K     |████████████████████████████████| 822 kB 56.2 MB/s 
[?25hRequirement already satisfied: sortedcontainers!=2.0.0,!=2.0.1 in /usr/local/lib/python3.7/dist-packages (from distributed>=2.0->dask[complete]->xarray[complete]) (2.4.0)
  Downloading distributed-2021.12.0-py3-none-any.whl (802 kB)
[K     |████████████████████████████████| 802 kB 63.3 MB/s 
[?25h  Downloading distributed-2021.11.2-py3-none-any.whl (802 kB)
[K     |████████████████████████████████| 802 kB 65.3 MB/s 
[?25h  Downloading distributed-2021.11.1-py3-none-any.whl (793 kB)
[K     |████████████████████████████████| 793 kB 38.0 MB/s 
[?25h  Downloading distributed-2021.11.0-py3-none-any.whl (793 kB)
[K     |████████████████████████████████| 793 kB 11.4 MB/s 
[?25h  Downloading distributed-2021.10.0-py3-none-any.whl (791 kB)
[K     |████████████████████████████████| 791 kB 13.0 MB/s 
[?25h  Downloading distributed-2021.9.1-py3-none-any.whl (786 kB)
[K     |████████████████████████████████| 786 kB 55.9 MB/s 
[?25hCollecting cloudpickle>=0.2.1
  Downloading cloudpickle-2.0.0-py3-none-any.whl (25 kB)
Requirement already satisfied: tblib>=1.6.0 in /usr/local/lib/python3.7/dist-packages (from distributed>=2.0->dask[complete]->xarray[complete]) (1.7.0)
Collecting distributed>=2.0
  Downloading distributed-2021.9.0-py3-none-any.whl (779 kB)
[K     |████████████████████████████████| 779 kB 9.0 MB/s 
[?25h  Downloading distributed-2021.8.1-py3-none-any.whl (778 kB)
[K     |████████████████████████████████| 778 kB 61.3 MB/s 
[?25h  Downloading distributed-2021.8.0-py3-none-any.whl (776 kB)
[K     |████████████████████████████████| 776 kB 55.7 MB/s 
[?25h  Downloading distributed-2021.7.2-py3-none-any.whl (769 kB)
[K     |████████████████████████████████| 769 kB 64.4 MB/s 
[?25h  Downloading distributed-2021.7.1-py3-none-any.whl (766 kB)
[K     |████████████████████████████████| 766 kB 62.5 MB/s 
[?25h  Downloading distributed-2021.7.0-py3-none-any.whl (1.0 MB)
[K     |████████████████████████████████| 1.0 MB 61.0 MB/s 
[?25h  Downloading distributed-2021.6.2-py3-none-any.whl (722 kB)
[K     |████████████████████████████████| 722 kB 52.6 MB/s 
[?25h  Downloading distributed-2021.6.1-py3-none-any.whl (722 kB)
[K     |████████████████████████████████| 722 kB 67.0 MB/s 
[?25h  Downloading distributed-2021.6.0-py3-none-any.whl (715 kB)
[K     |████████████████████████████████| 715 kB 63.8 MB/s 
[?25h  Downloading distributed-2021.5.1-py3-none-any.whl (705 kB)
[K     |████████████████████████████████| 705 kB 61.6 MB/s 
[?25h  Downloading distributed-2021.5.0-py3-none-any.whl (699 kB)
[K     |████████████████████████████████| 699 kB 56.3 MB/s 
[?25h  Downloading distributed-2021.4.1-py3-none-any.whl (696 kB)
[K     |████████████████████████████████| 696 kB 84.6 MB/s 
[?25h  Downloading distributed-2021.4.0-py3-none-any.whl (684 kB)
[K     |████████████████████████████████| 684 kB 10.3 MB/s 
[?25h  Downloading distributed-2021.3.1-py3-none-any.whl (679 kB)
[K     |████████████████████████████████| 679 kB 64.6 MB/s 
[?25h  Downloading distributed-2021.3.0-py3-none-any.whl (675 kB)
[K     |████████████████████████████████| 675 kB 23.8 MB/s 
[?25h  Downloading distributed-2021.2.0-py3-none-any.whl (675 kB)
[K     |████████████████████████████████| 675 kB 55.6 MB/s 
[?25h  Downloading distributed-2021.1.1-py3-none-any.whl (672 kB)
[K     |████████████████████████████████| 672 kB 63.4 MB/s 
[?25h  Downloading distributed-2021.1.0-py3-none-any.whl (671 kB)
[K     |████████████████████████████████| 671 kB 70.0 MB/s 
[?25h  Downloading distributed-2020.12.0-py3-none-any.whl (669 kB)
[K     |████████████████████████████████| 669 kB 71.0 MB/s 
[?25h  Downloading distributed-2.30.1-py3-none-any.whl (656 kB)
[K     |████████████████████████████████| 656 kB 72.1 MB/s 
[?25hRequirement already satisfied: MarkupSafe>=0.23 in /usr/local/lib/python3.7/dist-packages (from Jinja2>=2.9->bokeh>=1.0.0->dask[complete]->xarray[complete]) (2.0.1)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.7/dist-packages (from packaging>=16.8->bokeh>=1.0.0->dask[complete]->xarray[complete]) (3.0.7)
Collecting locket
  Downloading locket-0.2.1-py2.py3-none-any.whl (4.1 kB)
Requirement already satisfied: heapdict in /usr/local/lib/python3.7/dist-packages (from zict>=0.1.3->distributed>=2.0->dask[complete]->xarray[complete]) (1.0.1)
Requirement already satisfied: h5py in /usr/local/lib/python3.7/dist-packages (from h5netcdf->xarray[complete]) (3.1.0)
Requirement already satisfied: cached-property in /usr/local/lib/python3.7/dist-packages (from h5py->h5netcdf->xarray[complete]) (1.5.2)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->xarray[complete]) (1.4.0)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib->xarray[complete]) (0.11.0)
Requirement already satisfied: numba in /usr/local/lib/python3.7/dist-packages (from numbagg->xarray[complete]) (0.51.2)
Requirement already satisfied: llvmlite<0.35,>=0.34.0.dev0 in /usr/local/lib/python3.7/dist-packages (from numba->numbagg->xarray[complete]) (0.34.0)
Requirement already satisfied: appdirs>=1.3.0 in /usr/local/lib/python3.7/dist-packages (from pooch->xarray[complete]) (1.4.4)
Requirement already satisfied: requests>=2.19.0 in /usr/local/lib/python3.7/dist-packages (from pooch->xarray[complete]) (2.23.0)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests>=2.19.0->pooch->xarray[complete]) (2021.10.8)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests>=2.19.0->pooch->xarray[complete]) (2.10)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests>=2.19.0->pooch->xarray[complete]) (1.24.3)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests>=2.19.0->pooch->xarray[complete]) (3.0.4)
Collecting Webob
  Downloading WebOb-1.8.7-py2.py3-none-any.whl (114 kB)
[K     |████████████████████████████████| 114 kB 53.4 MB/s 
[?25hRequirement already satisfied: beautifulsoup4 in /usr/local/lib/python3.7/dist-packages (from pydap->xarray[complete]) (4.6.3)
Requirement already satisfied: docopt in /usr/local/lib/python3.7/dist-packages (from pydap->xarray[complete]) (0.6.2)
Collecting affine
  Downloading affine-2.3.1-py2.py3-none-any.whl (16 kB)
Collecting snuggs>=1.4.1
  Downloading snuggs-1.4.7-py3-none-any.whl (5.4 kB)
Collecting click-plugins
  Downloading click_plugins-1.1.1-py2.py3-none-any.whl (7.5 kB)
Collecting cligj>=0.5
  Downloading cligj-0.7.2-py3-none-any.whl (7.1 kB)
Collecting fasteners
  Downloading fasteners-0.17.3-py3-none-any.whl (18 kB)
Collecting asciitree
  Downloading asciitree-0.3.3.tar.gz (4.0 kB)
Collecting numcodecs>=0.6.4
  Downloading numcodecs-0.9.1-cp37-cp37m-manylinux2010_x86_64.whl (6.2 MB)
[K     |████████████████████████████████| 6.2 MB 33.2 MB/s 
[?25hBuilding wheels for collected packages: eccodes, findlibs, asciitree
  Building wheel for eccodes (setup.py) ... [?25l[?25hdone
  Created wheel for eccodes: filename=eccodes-1.4.1-py3-none-any.whl size=39743 sha256=aaba91ddb221706f07001332463191c73757192c36fdc6735e9da9c475a0082b
  Stored in directory: /root/.cache/pip/wheels/7d/99/c9/7d3714ad26d171d095432dc315a82c06a45a7e23296efff981
  Building wheel for findlibs (setup.py) ... [?25l[?25hdone
  Created wheel for findlibs: filename=findlibs-0.0.2-py3-none-any.whl size=6560 sha256=b43d604673579e54a7f048ed3134312a2e14da830efb80fe95df22eed55fb8cd
  Stored in directory: /root/.cache/pip/wheels/34/e9/92/2a09d5a307252d22fb8d99b13685144b0419d98c36dba7b1c0
  Building wheel for asciitree (setup.py) ... [?25l[?25hdone
  Created wheel for asciitree: filename=asciitree-0.3.3-py3-none-any.whl size=5050 sha256=732d94fb520b7419141739c10ee640c5dbfd9648a9a0a3b9b748a535cec97d05
  Stored in directory: /root/.cache/pip/wheels/12/1c/38/0def51e15add93bff3f4bf9c248b94db0839b980b8535e72a0
Successfully built eccodes findlibs asciitree
Installing collected packages: locket, findlibs, cloudpickle, Webob, snuggs, partd, numcodecs, fsspec, fasteners, eccodes, distributed, cligj, click-plugins, asciitree, affine, zarr, rasterio, pydap, numbagg, nc-time-axis, h5netcdf, cfgrib
  Attempting uninstall: cloudpickle
    Found existing installation: cloudpickle 1.3.0
    Uninstalling cloudpickle-1.3.0:
      Successfully uninstalled cloudpickle-1.3.0
  Attempting uninstall: distributed
    Found existing installation: distributed 1.25.3
    Uninstalling distributed-1.25.3:
      Successfully uninstalled distributed-1.25.3
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
gym 0.17.3 requires cloudpickle<1.7.0,>=1.2.0, but you have cloudpickle 2.0.0 which is incompatible.[0m
Successfully installed Webob-1.8.7 affine-2.3.1 asciitree-0.3.3 cfgrib-0.9.10.1 click-plugins-1.1.1 cligj-0.7.2 cloudpickle-2.0.0 distributed-2.30.1 eccodes-1.4.1 fasteners-0.17.3 findlibs-0.0.2 fsspec-2022.3.0 h5netcdf-1.0.0 locket-0.2.1 nc-time-axis-1.4.0 numbagg-0.2.1 numcodecs-0.9.1 partd-1.2.0 pydap-3.2.2 rasterio-1.2.10 snuggs-1.4.7 zarr-2.11.2
代码语言:javascript
复制
import glob
import matplotlib.pyplot as plt
import urllib.request
import xarray as xr
代码语言:javascript
复制
/usr/local/lib/python3.7/dist-packages/xarray/backends/cfgrib_.py:28: UserWarning: Failed to load cfgrib - most likely there is a problem accessing the ecCodes library. Try `import cfgrib` to get the full error message
  "Failed to load cfgrib - most likely there is a problem accessing the ecCodes library. "
代码语言:javascript
复制
for yr in range(2011,2015): # note that in python, the end range is not inclusive. So, in this case data for 2015 is not downloaded.
    url = f'https://downloads.psl.noaa.gov/Datasets/cpc_us_precip/RT/precip.V1.0.{yr}.nc'
    savename = url.split('/')[-1]
    urllib.request.urlretrieve(url,savename)

Let's start simple: open data for two years and concatenate them to one file:

代码语言:javascript
复制
ds2011 = xr.open_dataset('precip.V1.0.2011.nc')
ds2012 = xr.open_dataset('precip.V1.0.2012.nc')
代码语言:javascript
复制
ds2011
代码语言:javascript
复制
<xarray.Dataset>
Dimensions:  (lat: 120, lon: 300, time: 365)
Coordinates:
lat      (lat) float32 20.12 20.38 20.62 20.88 ... 49.12 49.38 49.62 49.88lon      (lon) float32 230.1 230.4 230.6 230.9 ... 304.1 304.4 304.6 304.9time     (time) datetime64[ns] 2011-01-01 2011-01-02 ... 2011-12-31
Data variables:
precip   (time, lat, lon) float32 ...
Attributes:
title:          CPC Unified Gauge-Based Analysis of Daily Precipitation o...
Conventions:    COARDS
description:    Gridded daily Precipitation
platform:       Observations
Comments:       Preciptation is accumulated from 12z of previous day to 1...
history:        originally created RT starting 04/2010 by CAS from data o...
dataset_title:  CPC Unified Gauge-Based Analysis of Daily Precipitation o...
References:     http://www.psl.noaa.gov/data/gridded/data.unified.daily.c...
代码语言:javascript
复制
ds2012
代码语言:javascript
复制
<xarray.Dataset>
Dimensions:  (lat: 120, lon: 300, time: 366)
Coordinates:
lat      (lat) float32 20.12 20.38 20.62 20.88 ... 49.12 49.38 49.62 49.88lon      (lon) float32 230.1 230.4 230.6 230.9 ... 304.1 304.4 304.6 304.9time     (time) datetime64[ns] 2012-01-01 2012-01-02 ... 2012-12-31
Data variables:
precip   (time, lat, lon) float32 ...
Attributes:
title:          CPC Unified Gauge-Based Analysis of Daily Precipitation o...
Conventions:    COARDS
description:    Gridded daily Precipitation
platform:       Observations
Comments:       Preciptation is accumulated from 12z of previous day to 1...
history:        originally created RT starting 04/2010 by CAS from data o...
dataset_title:  CPC Unified Gauge-Based Analysis of Daily Precipitation o...
References:     http://www.psl.noaa.gov/data/gridded/data.unified.daily.c...
代码语言:javascript
复制
ds2011_2012 = xr.concat([ds2011,ds2012], dim='time')
代码语言:javascript
复制
ds2011_2012
代码语言:javascript
复制
<xarray.Dataset>
Dimensions:  (lat: 120, lon: 300, time: 731)
Coordinates:lat      (lat) float32 20.12 20.38 20.62 20.88 ... 49.12 49.38 49.62 49.88
lon      (lon) float32 230.1 230.4 230.6 230.9 ... 304.1 304.4 304.6 304.9
time     (time) datetime64[ns] 2011-01-01 2011-01-02 ... 2012-12-31
Data variables:
precip   (time, lat, lon) float32 nan nan nan nan nan ... nan nan nan nan
Attributes:
title:          CPC Unified Gauge-Based Analysis of Daily Precipitation o...
Conventions:    COARDS
description:    Gridded daily Precipitation
platform:       Observations
Comments:       Preciptation is accumulated from 12z of previous day to 1...
history:        originally created RT starting 04/2010 by CAS from data o...
dataset_title:  CPC Unified Gauge-Based Analysis of Daily Precipitation o...
References:     http://www.psl.noaa.gov/data/gridded/data.unified.daily.c...

Now, let's try something similar, but through a more efficient way (especially if the number of files are more than two):

代码语言:javascript
复制
ds2011_2014 = xr.open_mfdataset('precip.V1.0.*.nc', concat_dim='time', combine='nested')
# Or, you can use the following command to do the same thing:
# ds2011_2014 = xr.open_mfdataset('precip*.nc', combine='by_coords')
代码语言:javascript
复制
ds2011_2014
代码语言:javascript
复制
<xarray.Dataset>
Dimensions:  (lat: 120, lon: 300, time: 1461)
Coordinates:
lat      (lat) float32 20.12 20.38 20.62 20.88 ... 49.12 49.38 49.62 49.88lon      (lon) float32 230.1 230.4 230.6 230.9 ... 304.1 304.4 304.6 304.9time     (time) datetime64[ns] 2011-01-01 2011-01-02 ... 2014-12-31
Data variables:
precip   (time, lat, lon) float32 dask.array<chunksize=(365, 120, 300), meta=np.ndarray>
Attributes:
title:          CPC Unified Gauge-Based Analysis of Daily Precipitation o...
Conventions:    COARDS
description:    Gridded daily Precipitation
platform:       Observations
Comments:       Preciptation is accumulated from 12z of previous day to 1...
history:        originally created RT starting 04/2010 by CAS from data o...
dataset_title:  CPC Unified Gauge-Based Analysis of Daily Precipitation o...
References:     http://www.psl.noaa.gov/data/gridded/data.unified.daily.c...

Now let's focus on 2012 and extract the monthly precipitation sum and make a simple plot of one of the months:

代码语言:javascript
复制
# The great thing about groupby is that you do not need to worry about the leap years or 
# number of days in each month.
# In addition, xarray is label-aware and when you pass the plot function, it understands that you want to
# make a spatial plot and finds the lat and lon values and the appropriate title and labels.
ds2012_mon = ds2012.groupby('time.month').sum()
ds2012_mon.precip[0,:,:].plot(cmap='jet', vmax=300)
代码语言:javascript
复制
<matplotlib.collections.QuadMesh at 0x7f32fffe3d50>

png

The above plot is quite simple and not high quality (e.g. the areas outside the US boundary had no data and are all shown in dark blue, both x & y axis limits are a bit large and can be narrowed down, the title is not exactly what we may like, etc.). We will now develop a more personalized plot for all the 12 months as follows:

代码语言:javascript
复制
import calendar # We'll use this library to easily add month name to subplot titles.
代码语言:javascript
复制
# First, We will develop a land mask data array that we can use to mask out the nan values:
landmask = ds2012.precip.sum(dim='time')>0
代码语言:javascript
复制
fig = plt.figure(figsize=[12,8], facecolor='w')
plt.subplots_adjust(bottom=0.15, top=0.96, left=0.04, right=0.99, 
                    wspace=0.2, hspace=0.27) # wspace and hspace adjust the horizontal and vertical spaces, respectively.
nrows = 3
ncols = 4
for i in range(1, 13):
    plt.subplot(nrows, ncols, i)
    dataplot = ds2012_mon.precip[i-1, :, :].where(landmask) # Remember that in Python, the data index starts at 0, but the subplot index start at 1.
    p = plt.pcolormesh(ds2012_mon.lon, ds2012_mon.lat, dataplot,
                   vmax = 400, vmin = 0, cmap = 'nipy_spectral_r',
                   ) 
    plt.xlim([233,295])
    plt.ylim([25,50])
    plt.title(calendar.month_name[dataplot.month.values], fontsize = 13, 
              fontweight = 'bold', color = 'b')
    plt.xticks(fontsize = 11)
    plt.yticks(fontsize = 11)
    if i % ncols == 1: # Add ylabel for the very left subplots
        plt.ylabel('Latitude', fontsize = 11, fontweight = 'bold')
    if i > ncols*(nrows-1): # Add xlabel for the bottom row subplots
        plt.xlabel('Longitude', fontsize = 11, fontweight = 'bold')

# Add a colorbar at the bottom:
cax = fig.add_axes([0.25, 0.06, 0.5, 0.018])
cb = plt.colorbar(cax=cax, orientation='horizontal', extend = 'max',)
cb.ax.tick_params(labelsize=11)
cb.set_label(label='Precipitation (mm)', color = 'k', size=14)

# Now we can save a high resolution (300dpi) version of the figure:
plt.savefig('Fig_prec_cpc_mon_2012.png', format = 'png', dpi = 300)

png

Now let's say we want to extract data for a specific boundary and look at the average condition within that area of interest. For simplicity, we can think of a rectangular box (but you can easily develop any landmask as above and use it to focus on only your domain of interest). For this case, let's look at a rectangular box almost similar to the state of Kansas.

代码语言:javascript
复制
top = 40
bottom = 37
left = 258
right = 265.4
代码语言:javascript
复制
ds_sel = ds2011_2014.isel(lon=(ds2011_2014.lon >= left) & (ds2011_2014.lon <= right),
                          lat=(ds2011_2014.lat >= bottom) & (ds2011_2014.lat <= top),
                          )
ds_sel_avg = ds_sel.mean(dim=['lat','lon'])

Now let's plot the cumulative daily precipitation of the selected area for each year. To make things easier, let's drop Feb 29th from any leap years in the record. Here we go:

代码语言:javascript
复制
ds_sel_avg_noleap = ds_sel_avg.sel(
    time=~((ds_sel_avg.time.dt.month == 2) & (ds_sel_avg.time.dt.day == 29)))
代码语言:javascript
复制
# Here's how the result will look like:
ds_sel_avg_noleap
代码语言:javascript
复制
<xarray.Dataset>
Dimensions:  (time: 1460)
Coordinates:
time     (time) datetime64[ns] 2011-01-01 2011-01-02 ... 2014-12-31
Data variables:
precip   (time) float32 dask.array<chunksize=(365,), meta=np.ndarray>
代码语言:javascript
复制
# Now we can easily save that output as a netcdf file using xarray:
ds_sel_avg_noleap.to_netcdf('ds_prec_Kansas_noleap_2011_2014.nc')
代码语言:javascript
复制
fig = plt.figure(figsize=[8,5], facecolor='w')
for yr in range(2011,2015):
    da_yr = ds_sel_avg_noleap.isel(time = ds_sel_avg_noleap.time.dt.year==yr).precip
    dataplot = da_yr.cumsum()
    plt.plot(dataplot, linewidth=2, label = yr)
plt.legend(fontsize=13)
plt.grid()
plt.xticks(fontsize=12) # we can also change the ticks to be on Jan-1, Feb-1, etc. but I'll skip it for here.
plt.yticks(fontsize=12)
plt.ylabel('Precipitation (mm)', fontsize = 13, fontweight = 'bold')
plt.xlabel('Day of Year', fontsize = 13, fontweight = 'bold')
plt.xlim([0,365])
plt.ylim(bottom=0)
plt.title('Annual cumulative precipitation in Kansas', fontsize=15)
plt.tight_layout()
plt.savefig('Fig_cumsum_prec_Kansas.png', format = 'png', dpi = 300)

png

We could also do a little more modification to revise the xticklabels and show the exact month and day values (instead of julian day number). See if you can figure it out yourself.

2. Intermediate (interpolate, revise coordinates, rolling average):

For this section, we will dive deeper. We will be using two different datasets with different resolutions and we will work with interpolation. The two datasets that I have considered are CPC-Globe and gridMet minimum air temperature data. CPC-Globe has a 0.5 degree (~ 50km) spatial resolution, whereas the same for gridMET is 1/24 degree (~ 4km). Here's a summary of the main functionalities that we will be practicing in this section:

  • Interpolation
  • Converting 0:360 longitude axis to -180:180
  • Scaling a dataset (e.g. convert degrees Kelvin to degrees Celcius)
  • Rolling average (with any time window)
代码语言:javascript
复制
# I import the libraries again, to keep the examples separate from each other (in case someone wants to start from here).
import urllib.request
import xarray as xr
import matplotlib.pyplot as plt
import datetime
import pandas as pd

Let's download the two datasets for 2021:

代码语言:javascript
复制
# Downloading GridMet tmin ():
url = 'https://www.northwestknowledge.net/metdata/data/tmmn_2021.nc'
savename = 'tmin_gridmet_2021.nc'
urllib.request.urlretrieve(url, savename)

# Downloading CPC-Globe tmin (0.5 deg, ~50km):
url = 'https://downloads.psl.noaa.gov/Datasets/cpc_global_temp/tmin.2021.nc'
savename = 'tmin_CPC_2021.nc'
urllib.request.urlretrieve(url, savename)
代码语言:javascript
复制
('tmin_CPC_2021.nc', <http.client.HTTPMessage at 0x7f32f8b0e550>)
代码语言:javascript
复制
# Now lets open the two datasets and explore them:
ds_gridmet = xr.open_dataset('tmin_gridmet_2021.nc')
ds_CPC = xr.open_dataset('tmin_CPC_2021.nc')
代码语言:javascript
复制
ds_gridmet
代码语言:javascript
复制
<xarray.Dataset>
Dimensions:          (crs: 1, day: 365, lat: 585, lon: 1386)
Coordinates:
lon              (lon) float64 -124.8 -124.7 -124.7 ... -67.14 -67.1 -67.06lat              (lat) float64 49.4 49.36 49.32 49.28 ... 25.15 25.11 25.07day              (day) datetime64[ns] 2021-01-01 2021-01-02 ... 2021-12-31crs              (crs) uint16 3
Data variables:
air_temperature  (day, lat, lon) float32 ...
Attributes: (12/19)
geospatial_bounds_crs:      EPSG:4326
Conventions:                CF-1.6
geospatial_bounds:          POLYGON((-124.7666666333333 49.40000000000000...
geospatial_lat_min:         25.066666666666666
geospatial_lat_max:         49.40000000000000
geospatial_lon_min:         -124.7666666333333
...                         ...
date:                       03 March 2022
note1:                      The projection information for this file is: ...
note2:                      Citation: Abatzoglou, J.T., 2013, Development...
note3:                      Data in slices after last_permanent_slice (1-...
note4:                      Data in slices after last_provisional_slice (...
note5:                      Days correspond approximately to calendar day...
代码语言:javascript
复制
ds_CPC
代码语言:javascript
复制
<xarray.Dataset>
Dimensions:  (lat: 360, lon: 720, time: 365)
Coordinates:
lat      (lat) float32 89.75 89.25 88.75 88.25 ... -88.75 -89.25 -89.75lon      (lon) float32 0.25 0.75 1.25 1.75 2.25 ... 358.2 358.8 359.2 359.8time     (time) datetime64[ns] 2021-01-01 2021-01-02 ... 2021-12-31
Data variables:
tmin     (time, lat, lon) float32 ...
Attributes:
Conventions:    CF-1.0
Source:         ftp://ftp.cpc.ncep.noaa.gov/precip/wd52ws/global_temp/
References:     https://www.psl.noaa.gov/data/gridded/data.cpc.globaltemp...
version:        V1.0
title:          CPC GLOBAL TEMP V1.0
dataset_title:  CPC GLOBAL TEMP
history:        Updated 2022-01-01 16:55:57
代码语言:javascript
复制
# Let's plot the original data for Jan 1st:
fig = plt.figure(figsize = [13,4.5])
plt.subplot(1,2,1)
ds_gridmet.air_temperature[0,:,:].plot(cmap = 'jet')
plt.title('GridMet')
plt.subplot(1,2,2)
ds_CPC.tmin[0,:,:].plot(cmap = 'jet')
plt.title('CPC-Globe')
plt.tight_layout()

png

Looking at the two datasets, we see that there are a few differences that should be addressed:

  • In the gridmet dataset, the "crs" coordinate can be dropped, and the "day" coordinate can be renamed to "time" to be consistent with the CPC dataset (similarly for "air_temperature").
  • The gridmet data is in Kelvin, but CPC is in Celcius. Let's convert gridmet data to Celcius.
  • In addition, the lon coordinate in one dataset is 0:360 and -180:180 in the other one. Let's change that to 0:360 for the gridmet data.
  • Lastly, let's interpolate the coarser resolution data (CPC-Globe) to the gridmet resolution.
代码语言:javascript
复制
ds_gridmet_revised = ds_gridmet.drop('crs').rename({'day':'time', 'air_temperature':'tmin'})
ds_gridmet_revised = ds_gridmet_revised-273.15 # Convert Kelvin to Celcius
lon_revised = ds_gridmet.lon + (ds_gridmet.lon < 0)*360
ds_gridmet_revised = ds_gridmet_revised.assign_coords(lon = lon_revised)
代码语言:javascript
复制
ds_gridmet_revised
代码语言:javascript
复制
<xarray.Dataset>
Dimensions:  (lat: 585, lon: 1386, time: 365)
Coordinates:
lon      (lon) float64 235.2 235.3 235.3 235.4 ... 292.8 292.9 292.9 292.9
lat      (lat) float64 49.4 49.36 49.32 49.28 ... 25.19 25.15 25.11 25.07
time     (time) datetime64[ns] 2021-01-01 2021-01-02 ... 2021-12-31
Data variables:
tmin     (time, lat, lon) float32 nan nan nan nan nan ... nan nan nan nan
Attributes: (12/19)
geospatial_bounds_crs:      EPSG:4326
Conventions:                CF-1.6
geospatial_bounds:          POLYGON((-124.7666666333333 49.40000000000000...
geospatial_lat_min:         25.066666666666666
geospatial_lat_max:         49.40000000000000
geospatial_lon_min:         -124.7666666333333
...                         ...
date:                       03 March 2022
note1:                      The projection information for this file is: ...
note2:                      Citation: Abatzoglou, J.T., 2013, Development...
note3:                      Data in slices after last_permanent_slice (1-...
note4:                      Data in slices after last_provisional_slice (...
note5:                      Days correspond approximately to calendar day...
代码语言:javascript
复制
ds_CPC_interp = ds_CPC.interp(lat = ds_gridmet_revised.lat, lon = ds_gridmet_revised.lon)

Great! We did all that with just a few lines of code. Please note that in interpolation, the data boundary (i.e. latlon bounds) are also matched to the output boundary.

Now let's take a look at Feb 16, 2021 for the regions in Texas, where a severe cold storm happened. I did a quick analysis for that event last year and you can take a look at that in this link: https://www.linkedin.com/pulse/how-unusual-2021-texas-cold-span-ali-ahmadalipour/

To make the temporal analyses easier, we first make sure that the "time" coordinate is in proper datetime format.

代码语言:javascript
复制
ds_gridmet_revised = ds_gridmet_revised.assign_coords(
    time = pd.to_datetime(ds_gridmet_revised.time))
ds_CPC_interp = ds_CPC_interp.assign_coords(
    time = pd.to_datetime(ds_CPC_interp.time))
代码语言:javascript
复制
target_date = datetime.date(2021,2,16)
target_date = pd.to_datetime(target_date)
代码语言:javascript
复制
fig = plt.figure(figsize = [13,4.5], facecolor='w')
plt.subplot(1,2,1)
ds_gridmet_revised.sel(time=target_date).tmin.plot(cmap = 'nipy_spectral', vmin = -30, vmax = 20)
plt.title(f'GridMet tmin on {target_date.strftime("%Y-%m-%d")}')

plt.subplot(1,2,2)
ds_CPC_interp.sel(time=target_date).tmin.plot(cmap = 'nipy_spectral', vmin = -30, vmax = 20)
plt.title(f'CPC-Globe tmin on {target_date.strftime("%Y-%m-%d")}')
plt.tight_layout()

png

It can be seen that the interpolated data (CPC plot shown on right) does not necessarily have the details (specially the orographic and elevation effects) of the finer resolution data.

Now let's find the spatial mean of both datasets around Austin, TX. Again, to make it simpler, I defined an estimate rectangular boundary that we will use.

代码语言:javascript
复制
# Rough boundaries for Austin, TX:
left = 360 - 97.9
right = 360 - 97.6
top = 30.5
bottom = 30.2
代码语言:javascript
复制
ds_Austin_gridmet = ds_gridmet_revised.isel(
    lon=(ds_gridmet_revised.lon >= left) & (ds_gridmet_revised.lon <= right),
    lat=(ds_gridmet_revised.lat >= bottom) & (ds_gridmet_revised.lat <= top),
).mean(dim=['lat','lon'])
ds_Austin_CPC = ds_CPC_interp.isel(
    lon=(ds_CPC_interp.lon >= left) & (ds_CPC_interp.lon <= right),
    lat=(ds_CPC_interp.lat >= bottom) & (ds_CPC_interp.lat <= top),
).mean(dim=['lat','lon'])

Now let's plot the two timeseries, but this time in degrees fahrenheit:

代码语言:javascript
复制
plt.figure(figsize = [12,6])
(ds_Austin_gridmet.tmin*1.8 + 32).plot(label = 'GridMet', color = 'r')
(ds_Austin_CPC.tmin*1.8 + 32).plot(label = 'CPC', color = 'b')
plt.grid(axis='y')
plt.xticks(ticks = [datetime.date(2021,x,1) for x in range(1,13)], fontsize=12)
plt.xlim([datetime.date(2021,1,1), datetime.date(2021,12,31)])
plt.yticks(fontsize=13)
plt.ylabel('Minimum air temperature (Tmin, °F)', fontsize = 12, 
           fontweight = 'bold')
plt.xlabel('')
plt.legend(fontsize=13, loc = 'upper left')
plt.title('Austin, TX', fontsize=13, fontweight = 'bold')
plt.tight_layout()

png

As you can see, the two datasets are considerably different (more than 10°F) in late February and late October. Notably, daily temperature has a lot of noise (short-term variation). To have a smoother plot, let's calculate the moving average of tmin with a 15-day window:

代码语言:javascript
复制
plt.figure(figsize = [12,6])
(ds_Austin_gridmet.tmin*1.8 + 32).rolling(time=15,center=True).mean().plot(label = 'GridMet', color = 'r')
(ds_Austin_CPC.tmin*1.8 + 32).rolling(time=15,center=True).mean().plot(label = 'CPC', color = 'b')
plt.grid()
plt.xticks(ticks = [datetime.date(2021,x,1) for x in range(1,13)], fontsize=12)
plt.xlim([datetime.date(2021,1,1), datetime.date(2021,12,31)])
plt.yticks(fontsize=13)
plt.ylabel('Minimum air temperature (Tmin, °F)', fontsize = 12, 
           fontweight = 'bold')
plt.xlabel('')
plt.legend(fontsize=13, loc = 'upper left')
plt.title('Austin, TX', fontsize=13, fontweight = 'bold')
plt.tight_layout()

png

Awesome! We are done with the intermediate example. You should now be able to replicate similar analyses for various datasets. There are a lot of other things that can be adjusted to make the plots more interesting. You can always search for anything you'd like to do and you will most likely find a decent answer for it on stackoverflow.

3. Advanced (skip data download, anomaly calculation, timelapse animation):

3.1. Seasonal Forecast:

In this part, we will analyze seasonal forecast data on the go (without downloading and saving the data on the disk). Then, we will look at calculating monthly anomaly (departure of each month from its historical mean state).

The data that we will be focusing on is going to be the NMME seasonal climate prediction, which is a global dataset of 1 degree (~ 100km) spatial resolution and monthly temporal resolution with multiple months ahead forecast lead time. To make the analysis simpler, we will only focus on just one model (instead of the entire ensemble of available NMME models). Let's go!

代码语言:javascript
复制
import xarray as xr
import matplotlib.pyplot as plt
import datetime
import pandas as pd
代码语言:javascript
复制
url = 'http://iridl.ldeo.columbia.edu/SOURCES/.Models/.NMME/.GFDL-SPEAR/.FORECAST/.MONTHLY/.tref/dods'
代码语言:javascript
复制
ds = xr.open_dataset(url,engine='netcdf4',decode_times=False)
代码语言:javascript
复制
ds
代码语言:javascript
复制
<xarray.Dataset>
Dimensions:  (L: 12, M: 30, S: 17, X: 360, Y: 181, Z: 1)
Coordinates:
S        (S) float32 731.0 732.0 733.0 734.0 ... 744.0 745.0 746.0 747.0M        (M) float32 1.0 2.0 3.0 4.0 5.0 6.0 ... 26.0 27.0 28.0 29.0 30.0X        (X) float32 0.0 1.0 2.0 3.0 4.0 ... 355.0 356.0 357.0 358.0 359.0L        (L) float32 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 11.5Y        (Y) float32 -90.0 -89.0 -88.0 -87.0 -86.0 ... 87.0 88.0 89.0 90.0Z        (Z) float32 2.0
Data variables:
tref     (S, L, M, Z, Y, X) float32 ...
Attributes:
Conventions:  IRIDL

The NMME coordinates are not really self-explanatory. So, here's an overview of what each coordinate stands for:

  • S: Time (number of months since 1960-Jan-1)
  • M: Ensemble member
  • X: Longitude (in 0 0:360 format)
  • L: Leadtime (in months; 0.5 indicating the current month, 1.5 being one month ahead, and so on)
  • Y: Latitude
  • *Z: this variable is only found in the GFDL model for temperature forecast, and it indicates that the data is at 2m height from ground.

To make the data more descriptive and more convenient for analysis, we need to modify it first:

代码语言:javascript
复制
ds = ds.rename({'S':'time', 'X':'lon', 'Y':'lat'})
代码语言:javascript
复制
start_date_NMME = pd.to_datetime(datetime.date(1960,1,1))
time_new = [start_date_NMME + pd.DateOffset(months = x) for x in ds.time.values]
ds = ds.assign_coords(time = time_new)
if 'Z' in ds.dims:
    ds = ds.squeeze(dim = 'Z').drop('Z')
代码语言:javascript
复制
ds
代码语言:javascript
复制
<xarray.Dataset>
Dimensions:  (L: 12, M: 30, lat: 181, lon: 360, time: 17)
Coordinates:


time     (time) datetime64[ns] 2020-12-01 2021-01-01 ... 2022-04-01M        (M) float32 1.0 2.0 3.0 4.0 5.0 6.0 ... 26.0 27.0 28.0 29.0 30.0lon      (lon) float32 0.0 1.0 2.0 3.0 4.0 ... 355.0 356.0 357.0 358.0 359.0L        (L) float32 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 11.5lat      (lat) float32 -90.0 -89.0 -88.0 -87.0 -86.0 ... 87.0 88.0 89.0 90.0
Data variables:
tref     (time, L, M, lat, lon) float32 ...
Attributes:
Conventions:  IRIDL

The above dataset has all the available forecast data for all leadtimes. We can now select our area of interest and limit the leadtime to our use case. For this example, let's take a look at the temperature forecast for Feb 2021 that was generated at the beginning of the same month (i.e. lead 0.5):

代码语言:javascript
复制
target_date = pd.to_datetime(datetime.date(2021,2,1))
ds_sel = ds.sel(time=target_date).isel(L=0) 
# Note the difference use of "sel" and "isel". For the former, you should indicate 
# the exact value to be selected, but for the latter, the index should be specified.

So far, the data is not loaded yet (although we can see all the metadata). To make the analysis easier, we will first load the data and then continue with the rest of analyses. For loading data, I am simply using .load(), but a better way of doing so is to use Dask and do the work in parallel mode. I won't go into that (partly because I tried it in Google Colab, and I was getting several errors here, and I didn't want to spend too much time on debugging).

代码语言:javascript
复制
ds_sel.load() # this can take a couple of minutes or so
代码语言:javascript
复制
<xarray.Dataset>
Dimensions:  (M: 30, lat: 181, lon: 360)
Coordinates:
time     datetime64[ns] 2021-02-01


M        (M) float32 1.0 2.0 3.0 4.0 5.0 6.0 ... 26.0 27.0 28.0 29.0 30.0
lon      (lon) float32 0.0 1.0 2.0 3.0 4.0 ... 355.0 356.0 357.0 358.0 359.0
L        float32 0.5
lat      (lat) float32 -90.0 -89.0 -88.0 -87.0 -86.0 ... 87.0 88.0 89.0 90.0
Data variables:
tref     (M, lat, lon) float32 nan nan nan nan nan ... nan nan nan nan nan
Attributes:
Conventions:  IRIDL

Now let's plot the data and explore it a bit:

代码语言:javascript
复制
ds_sel.tref[0,:,:].plot()
代码语言:javascript
复制
<matplotlib.collections.QuadMesh at 0x7f32fddf8950>

png

Now let's calculate the standard deviation of forecasts among the ensemble members, and then plot them for the entire globe as well as only North America:

代码语言:javascript
复制
ds_std = ds_sel.std(dim='M')
代码语言:javascript
复制
plt.figure(figsize=[12,4.5])
plt.subplot(1,2,1)
ds_std.tref.plot(cmap = 'Spectral_r', vmin = 0, vmax = 4)
plt.subplot(1,2,2)
ds_std.tref.plot(cmap = 'Spectral_r', vmin = 0, vmax = 4)
plt.xlim([230,300])
plt.ylim([20,55])
plt.tight_layout()

png

Looking at the above plots, we can see that the uncertainty of temperature forecast in February 2021 is much higher across the northern latitudes (i.e. in winter) than the southern latitudes (i.e. in summer).

Now let's check if the forecasts indicated any significant cold anomaly for Feb 2021 across the Midwest US and Southern Plains. To do so, we need to load the climatology (historical mean) of GFDL forecasts and then find the anomaly by subtracting it from forecasts. In other words:

  • Anomaly = Forecast - Climatology
代码语言:javascript
复制
ds_clim = xr.open_dataset('http://iridl.ldeo.columbia.edu/SOURCES/.Models/.NMME/.GFDL-SPEAR/.HINDCAST/.mc9120/.tref/dods',
                          decode_times=False)
代码语言:javascript
复制
# Again, to make things easier, we first load the climatology into RAM:
ds_clim.load()
代码语言:javascript
复制
<xarray.Dataset>
Dimensions:  (L: 12, S: 12, X: 360, Y: 181)
Coordinates:
Y        (Y) float32 -90.0 -89.0 -88.0 -87.0 -86.0 ... 87.0 88.0 89.0 90.0
S        (S) float32 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0
X        (X) float32 0.0 1.0 2.0 3.0 4.0 ... 355.0 356.0 357.0 358.0 359.0
L        (L) float32 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 11.5
Data variables:
tref     (S, L, Y, X) float64 nan nan nan nan nan ... nan nan nan nan nan
Attributes:
Conventions:  IRIDL

In the above dataset, S indicates the month index (0 corresponding to January and 11 indicating December). We should adjust the coordinates of this climatology dataset and match it to the "ds_sel" that we prepared above:

代码语言:javascript
复制
ds_clim = ds_clim.rename({'X':'lon', 'Y':'lat'})
代码语言:javascript
复制
ds_clim_sel = ds_clim.isel(L = 0).sel(S = 1).drop(['S','L']) # S=1 for selecting February
代码语言:javascript
复制
ds_anom = ds_sel - ds_clim_sel
代码语言:javascript
复制
ds_anom.tref.plot(col = 'M', col_wrap = 10, vmin = -8, vmax = 8, cmap = 'bwr')
plt.xlim([230,300])
plt.ylim([20,55])
代码语言:javascript
复制
(20.0, 55.0)

png

As it can be seen, most of the ensemble members of the GFDL-SPEAR model correctly forecasted negative anomaly (i.e. colder than usual condition) for the majority of Midwest US and Texas. Notably, this is the Lead-0 forecast, which was initiated at the beginning of Feb for the month of Feb (thus, basically a couple of weeks ahead). If we look back at lead-1 or beyond, such a strong pattern and consensus are not found.

代码语言:javascript
复制
# And here's the ensemble mean plot:
ds_anom.mean(dim='M').tref.plot(cmap='bwr', vmax=5, vmin=-5)
plt.xlim([230,300])
plt.ylim([20,55])
代码语言:javascript
复制
(20.0, 55.0)

png

I was hoping to use two more libraries in this section:

  • dask --> for parallel computing
  • cartopy --> for customized mapping with geographic projection

However, I ran into several errors trying to use them on Google Colab, so I decided to exclude them from the advanced section. But if you really want to reach an advanced proficiency level, I highly encourage you to try dask and cartopy.

3.2. Climate change assessment using CMIP6 data:

I thought it would be really useful to have a quick exercise for analyzing climate change data. In this part, we will explore CMIP6 air temperature projections from one model. We will look at monthly data for the historical period of 1970-2010 and future projections of SSP585 during 2010-2100.

代码语言:javascript
复制
!pip install gcsfs # this will take a few seconds. We need it to extract CMIP6 data from Google Cloud Storage.

# We will be opening zarr data format, which is a relatively new data structure 
# that is practical for geospatial datasets. The pre-installed xarray on google
# colab does not allow this. So, we need to intall the complete version of xarray.
!pip install xarray[complete] # (adding this again in case someone wants to start from this part)
代码语言:javascript
复制
Collecting gcsfs
  Downloading gcsfs-2022.3.0-py2.py3-none-any.whl (25 kB)
Collecting aiohttp<4
  Downloading aiohttp-3.8.1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.1 MB)
[K     |████████████████████████████████| 1.1 MB 19.0 MB/s 
[?25hRequirement already satisfied: google-auth-oauthlib in /usr/local/lib/python3.7/dist-packages (from gcsfs) (0.4.6)
Requirement already satisfied: decorator>4.1.2 in /usr/local/lib/python3.7/dist-packages (from gcsfs) (4.4.2)
Requirement already satisfied: google-cloud-storage in /usr/local/lib/python3.7/dist-packages (from gcsfs) (1.18.1)

...

代码语言:javascript
复制
from matplotlib import pyplot as plt
import pandas as pd
import xarray as xr
import gcsfs
import datetime
import os
代码语言:javascript
复制
df = pd.read_csv('https://storage.googleapis.com/cmip6/cmip6-zarr-consolidated-stores.csv')
代码语言:javascript
复制
df.head()

activity_id

institution_id

source_id

experiment_id

member_id

table_id

variable_id

grid_label

zstore

dcpp_init_year

version

0

HighResMIP

CMCC

CMCC-CM2-HR4

highresSST-present

r1i1p1f1

Amon

ps

gn

gs://cmip6/CMIP6/HighResMIP/CMCC/CMCC-CM2-HR4/...

NaN

20170706

1

HighResMIP

CMCC

CMCC-CM2-HR4

highresSST-present

r1i1p1f1

Amon

rsds

gn

gs://cmip6/CMIP6/HighResMIP/CMCC/CMCC-CM2-HR4/...

NaN

20170706

2

HighResMIP

CMCC

CMCC-CM2-HR4

highresSST-present

r1i1p1f1

Amon

rlus

gn

gs://cmip6/CMIP6/HighResMIP/CMCC/CMCC-CM2-HR4/...

NaN

20170706

3

HighResMIP

CMCC

CMCC-CM2-HR4

highresSST-present

r1i1p1f1

Amon

rlds

gn

gs://cmip6/CMIP6/HighResMIP/CMCC/CMCC-CM2-HR4/...

NaN

20170706

4

HighResMIP

CMCC

CMCC-CM2-HR4

highresSST-present

r1i1p1f1

Amon

psl

gn

gs://cmip6/CMIP6/HighResMIP/CMCC/CMCC-CM2-HR4/...

NaN

20170706

代码语言:javascript
复制
  <script>
    const buttonEl =
      document.querySelector('#df-fa509b08-481e-4753-98fb-2bd6e28d86da button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-fa509b08-481e-4753-98fb-2bd6e28d86da');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>
代码语言:javascript
复制
len(df)
代码语言:javascript
复制
523774

The df table includes information for all climate change model runs that are available on Google Cloud Storage. As of March 2022, there are over half a million different model outputs (e.g. different models, variables, timescales, scenarios, ensemble member, etc.). Let's narrow down the outputs to only those of monthly air temperature in historical and ssp585 scenario:

代码语言:javascript
复制
df_ssp585 = df.query("activity_id=='ScenarioMIP' & table_id == 'Amon' & " +\
    "variable_id == 'tas' & experiment_id == 'ssp585' & member_id == 'r1i1p1f1'")
print('Length of df_ssp585:', len(df_ssp585))
df_ssp585.head(3)
代码语言:javascript
复制
Length of df_ssp585: 35

activity_id

institution_id

source_id

experiment_id

member_id

table_id

variable_id

grid_label

zstore

dcpp_init_year

version

866

ScenarioMIP

NOAA-GFDL

GFDL-CM4

ssp585

r1i1p1f1

Amon

tas

gr1

gs://cmip6/CMIP6/ScenarioMIP/NOAA-GFDL/GFDL-CM...

NaN

20180701

19008

ScenarioMIP

NOAA-GFDL

GFDL-ESM4

ssp585

r1i1p1f1

Amon

tas

gr1

gs://cmip6/CMIP6/ScenarioMIP/NOAA-GFDL/GFDL-ES...

NaN

20180701

66586

ScenarioMIP

BCC

BCC-CSM2-MR

ssp585

r1i1p1f1

Amon

tas

gn

gs://cmip6/CMIP6/ScenarioMIP/BCC/BCC-CSM2-MR/s...

NaN

20190314

代码语言:javascript
复制
  <script>
    const buttonEl =
      document.querySelector('#df-c25dc5e3-8d7f-47f4-9cd2-512f3734ceb2 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-c25dc5e3-8d7f-47f4-9cd2-512f3734ceb2');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>
代码语言:javascript
复制
df_historical = df.query("activity_id == 'CMIP' & table_id == 'Amon' & " +\
    "variable_id == 'tas' & experiment_id == 'historical' & member_id == 'r1i1p1f1'")
print('Length of df_historical:', len(df_historical))
df_historical.head(3)
代码语言:javascript
复制
Length of df_historical: 55

activity_id

institution_id

source_id

experiment_id

member_id

table_id

variable_id

grid_label

zstore

dcpp_init_year

version

8074

CMIP

NOAA-GFDL

GFDL-CM4

historical

r1i1p1f1

Amon

tas

gr1

gs://cmip6/CMIP6/CMIP/NOAA-GFDL/GFDL-CM4/histo...

NaN

20180701

29231

CMIP

IPSL

IPSL-CM6A-LR

historical

r1i1p1f1

Amon

tas

gr

gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor...

NaN

20180803

32351

CMIP

NASA-GISS

GISS-E2-1-G

historical

r1i1p1f1

Amon

tas

gn

gs://cmip6/CMIP6/CMIP/NASA-GISS/GISS-E2-1-G/hi...

NaN

20180827

代码语言:javascript
复制
  <script>
    const buttonEl =
      document.querySelector('#df-08643300-e196-40e1-a8a9-2b9cb2fe6ef1 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-08643300-e196-40e1-a8a9-2b9cb2fe6ef1');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>

Ok, much better. In summary, our CMIP6 search is narrowed down to 55 historical models and 35 ssp585 future monthly temperature "forecasts". In a decent climate change study, we should first evaluate the models during the historical period and then select a sub-set of models that outperform at the region of interest. However, since this is just an exercise, I will only choose one model that has data in both historical and future periods.

代码语言:javascript
复制
model = 'GFDL-CM4'
zstore_hist = df_historical.query(f"source_id == '{model}'").zstore.values[0]
zstore_ssp585 = df_ssp585.query(f"source_id == '{model}'").zstore.values[0]
gcs = gcsfs.GCSFileSystem(token='anon')
代码语言:javascript
复制
mapper = gcs.get_mapper(zstore_hist)
ds_hist = xr.open_zarr(mapper, consolidated = True)
mapper = gcs.get_mapper(zstore_ssp585)
ds_ssp585 = xr.open_zarr(mapper, consolidated = True)
代码语言:javascript
复制
ds_hist
代码语言:javascript
复制
<xarray.Dataset>
Dimensions:    (bnds: 2, lat: 180, lon: 288, time: 1980)
Coordinates:
bnds       (bnds) float64 1.0 2.0
height     float64 ...lat        (lat) float64 -89.5 -88.5 -87.5 -86.5 ... 86.5 87.5 88.5 89.5
lat_bnds   (lat, bnds) float64 dask.array<chunksize=(180, 2), meta=np.ndarray>lon        (lon) float64 0.625 1.875 3.125 4.375 ... 355.6 356.9 358.1 359.4
lon_bnds   (lon, bnds) float64 dask.array<chunksize=(288, 2), meta=np.ndarray>time       (time) object 1850-01-16 12:00:00 ... 2014-12-16 12:00:00
time_bnds  (time, bnds) object dask.array<chunksize=(1980, 2), meta=np.ndarray>
Data variables:
tas        (time, lat, lon) float32 dask.array<chunksize=(600, 180, 288), meta=np.ndarray>
Attributes: (12/49)
Conventions:            CF-1.7 CMIP-6.0 UGRID-1.0
activity_id:            CMIP
branch_method:          standard
branch_time_in_child:   0.0
branch_time_in_parent:  36500.0
comment:                <null ref>
...                     ...
variable_id:            tas
variant_info:           N/A
variant_label:          r1i1p1f1
status:                 2019-08-07;created;by nhn2@columbia.edu
netcdf_tracking_ids:    hdl:21.14100/e4193a02-6405-49b6-8ad3-65def741a4dd...
version_id:             v20180701

Looking at the metadata of the two datasets, the data periods can be beyond our interest (historical: 1850-2014, ssp585: 2015-2100). Therefore, we can select our period of interest and then load the subset of data for more convenience in further analyses. But before that, the time coordinate is in "object" format, which we need to convert to datetime to be able to easily analyze the timeseries.

代码语言:javascript
复制
print('hist date range  :', ds_hist.time[0].values, ' , ', ds_hist.time[-1].values)
print('ssp585 date range:', ds_ssp585.time[0].values, ' , ', ds_ssp585.time[-1].values)
代码语言:javascript
复制
hist date range  : 1850-01-16 12:00:00  ,  2014-12-16 12:00:00
ssp585 date range: 2015-01-16 12:00:00  ,  2100-12-16 12:00:00
代码语言:javascript
复制
start_time = pd.to_datetime(datetime.date(1850,1,15)) # I chose 15 for all dates to make it easier.
time_new_hist = [start_time + pd.DateOffset(months = x) for x in range(len(ds_hist.time))]

start_time = pd.to_datetime(datetime.date(2015,1,15))
time_new_ssp585 = [start_time + pd.DateOffset(months = x) for x in range(len(ds_ssp585.time))]
代码语言:javascript
复制
ds_hist = ds_hist.assign_coords(time = time_new_hist)
ds_ssp585 = ds_ssp585.assign_coords(time = time_new_ssp585)
代码语言:javascript
复制
start_date = pd.to_datetime(datetime.date(1980,1,1))
end_date = pd.to_datetime(datetime.date(2010,12,31))
ds_hist_sel = ds_hist.isel(time=(ds_hist.time >= start_date) & (ds_hist.time <= end_date))

start_date = pd.to_datetime(datetime.date(2070,1,1))
end_date = pd.to_datetime(datetime.date(2099,12,31))
ds_ssp585_sel = ds_ssp585.isel(time=(ds_ssp585.time >= start_date) & (ds_ssp585.time <= end_date))
代码语言:javascript
复制
ds_hist_sel.load()
ds_ssp585_sel.load()
代码语言:javascript
复制
<xarray.Dataset>
Dimensions:    (bnds: 2, lat: 180, lon: 288, time: 360)
Coordinates:


bnds       (bnds) float64 1.0 2.0
height     float64 2.0
lat        (lat) float64 -89.5 -88.5 -87.5 -86.5 ... 86.5 87.5 88.5 89.5
lat_bnds   (lat, bnds) float64 -90.0 -89.0 -89.0 -88.0 ... 89.0 89.0 90.0
lon        (lon) float64 0.625 1.875 3.125 4.375 ... 355.6 356.9 358.1 359.4
lon_bnds   (lon, bnds) float64 0.0 1.25 1.25 2.5 ... 357.5 358.8 358.8 360.0
time       (time) datetime64[ns] 2070-01-15 2070-02-15 ... 2099-12-15
time_bnds  (time, bnds) object 2070-01-01 00:00:00 ... 2100-01-01 00:00:00
Data variables:
tas        (time, lat, lon) float32 242.4 242.4 242.4 ... 269.6 269.6 269.6
Attributes: (12/49)
Conventions:            CF-1.7 CMIP-6.0 UGRID-1.0
activity_id:            ScenarioMIP
branch_method:          standard
branch_time_in_child:   0.0
branch_time_in_parent:  60225.0
comment:                <null ref>
...                     ...
variable_id:            tas
variant_info:           N/A
variant_label:          r1i1p1f1
status:                 2019-08-13;created;by nhn2@columbia.edu
netcdf_tracking_ids:    hdl:21.14100/991bc4a4-20d5-4a58-a451-6b3ea33116be
version_id:             v20180701

Great! Now let's take a look at the average monthly temperature change over the globe in distant future:

代码语言:javascript
复制
tas_avg_hist = ds_hist_sel.groupby('time.month').mean()
tas_avg_ssp585 = ds_ssp585_sel.groupby('time.month').mean()
代码语言:javascript
复制
tas_30yr_diff = tas_avg_ssp585 - tas_avg_hist
代码语言:javascript
复制
tas_30yr_diff.tas.plot(col = 'month', col_wrap = 6, vmax = 10, vmin = 0, cmap = 'hot_r')
代码语言:javascript
复制
<xarray.plot.facetgrid.FacetGrid at 0x7f32e9cd8b10>

png

And here's a plot for the annual mean change:

代码语言:javascript
复制
tas_30yr_diff.mean('month').tas.plot(figsize=[8,5], cmap = 'hot_r', 
                                     vmin = 0, vmax = 10)
plt.title('Mean temperature change: 2070-2100 vs. 1980-2010')
plt.tight_layout()

png

The above plots show that in an extreme scenario (ssp585), the annual mean temperature in northern latitudes can be above 10°C warmer than the historical period, and the same for mid-latitudes can be ~5-7°C warmer. Please note that this is the result of one GCM initialization at coarse resolution. In practice, these outputs should be first bias-corrected and downscaled before further analysis.

Now, let's try an example and work with timeseries data (instead of 30-year mean). Let's look at the annual temperature change of each year.

代码语言:javascript
复制
tas_avg_hist_yr = tas_avg_hist.mean('month')
代码语言:javascript
复制
tas_change_yr = ds_ssp585_sel.groupby('time.year').mean('time')
代码语言:javascript
复制
tas_change_yr = tas_change_yr - tas_avg_hist_yr

Creating a timelapse:

Now, let's make a timelapse video for annual temperature change. To have a smooth video, we'll plot the 5-year rolling average.

代码语言:javascript
复制
tas_change_yr_rolling5 = tas_change_yr.rolling(year=5,center=True).mean().dropna('year').tas
# Make a directory to save all the figures there:
if not os.path.exists('./Figures_ssp585/'):
    os.makedirs('./Figures_ssp585/')

for i in range(len(tas_change_yr_rolling5)):
    dataplot = tas_change_yr_rolling5[i,:,:]
    # Convert 0:360 to -180:180 :
    dataplot = dataplot.assign_coords(lon = dataplot.lon - (dataplot.lon > 180)*360)
    dataplot = dataplot.sortby('lon', ascending=True)

    fig = plt.figure(figsize=[9,5], facecolor='w')
    # Adjust plot area (I find these by try and error until I get what I want)
    plt.subplots_adjust(left=0.075, right=0.895, bottom=0.1, top=0.93)
    plt.pcolormesh(dataplot.lon, dataplot.lat, dataplot, cmap='plasma', vmin=0, vmax=12)
    plt.title(f'Near-surface air temperature change: {model} ssp585, {dataplot.year.values} vs. 1980-2010',
              fontsize = 14)
    plt.ylabel('Latitude', fontsize = 12)
    plt.xlabel('Longitude', fontsize = 12)
    # Add colorbar:
    cax = fig.add_axes([0.91, 0.12, 0.02, 0.8])
    cb = plt.colorbar(cax=cax, orientation='vertical', extend = 'max')
    cb.ax.tick_params(labelsize=11)
    cb.set_label(label='Temperature Change (°C)', color = 'k', size=13)
    # Save and close figure:
    plt.savefig(f'./Figures_ssp585/Fig_tasChange_{dataplot.year.values}.png', 
                format = 'png', dpi=200)
    plt.close()

Great! Now we have saved all the figures. We can use different libraries to generate a timelapse (/animation) from the figures. I will use openCV here, which is an amazing library for image processing and you can also leverage it for different applications in climate data analysis (e.g. spatial smoothing, applying filters or weighted kernels to emphasize on edges for feature detection/extraction, or use it for data augmentation in computer vision applications). OpenCV requires a whole tutorial on its own and I'm not going to go in details here. But again, if you intend to be an advanced user, I highly recommend working with OpenCV.

代码语言:javascript
复制
import cv2
import glob
代码语言:javascript
复制
files = glob.glob(f'./Figures_ssp585/Fig_tasChange*.png')
files.sort()
代码语言:javascript
复制
img_array = []
for filename in files:
    img = cv2.imread(filename)
    height, width, layers = img.shape
    size = (width,height)
    img_array.append(img)
fps = 4
out = cv2.VideoWriter(f'Vid_tasChange_ssp585.mp4',cv2.VideoWriter_fourcc(*'MP4V'), 4, size)
for i in range(len(img_array)):
    out.write(img_array[i])
out.release()

There we go! We were able to load CMIP6 data directly from Google Cloud Storage, analyze the data, generate figures, and then make a timelapse animation. Remember that after you close the Colab (or if your session is terminated for any reason), you will lose all the results. So, if you want to keep any of this result, you can either download it to your local computer or save/copy the output to your own Google Drive.

Generate simple timeseries plot:

We can define a region of interest and explore the spatial mean of the temperature change. For this example, we'll focus on the Northwest US (e.g. the Cascades):

代码语言:javascript
复制
left = 236
right = 240
bottom = 42
top = 49
代码语言:javascript
复制
tas_NW_yr_hist = ds_hist_sel.isel(lat = (ds_hist_sel.lat>=bottom) & (ds_hist_sel.lat<=top),
                   lon = (ds_hist_sel.lon>=left) & (ds_hist_sel.lon<=right),
                   ).mean(['lat','lon']).drop(['bnds', 'height', 'time_bnds'])
tas_NW_yr_ssp585 = ds_ssp585_sel.isel(lat = (ds_ssp585_sel.lat>=bottom) & (ds_ssp585_sel.lat<=top),
                   lon = (ds_ssp585_sel.lon>=left) & (ds_ssp585_sel.lon<=right),
                   ).mean(['lat','lon']).drop(['bnds', 'height', 'time_bnds'])
代码语言:javascript
复制
plt.figure(figsize=[8,5],)
(tas_NW_yr_hist.groupby('time.year').mean().tas-273.15).plot(
    label='historical', color='b', linewidth=2)
(tas_NW_yr_ssp585.groupby('time.year').mean().tas-273.15).plot(
    label='ssp585', color='r', linewidth=2)
plt.grid()
plt.xlim([1980,2100])
plt.legend(fontsize=13)
plt.xticks(fontsize=12)
plt.yticks(fontsize=13)
plt.ylabel('Average Annual Temperature (°C)', fontsize=13, fontweight='bold')
plt.xlabel('Year', fontsize=13, fontweight='bold')
plt.title(f'Average Annual air temperature in the Cascades: {model} ssp585',
          fontsize=15)

plt.tight_layout()

That concludes the advanced section of this tutorial. I hope you found it useful. Suggestions and feedbacks are appreciated.

4. Advanced+ (Machine Learning):

I shared a case study more than a year ago where I used climate data to predict wildfire frequency in California. It is a relatively simple study and should be a good exercise for developing a machine learning prediction model. I have shared all the codes and explained the process in this link: https://www.linkedin.com/pulse/ai-climate-data-predicting-fire-frequency-california-ali-ahmadalipour/

声明:欢迎转载、转发本号原创内容,可留言区留言或者后台联系小编进行授权。气象学家公众号转载信息旨在传播交流,其内容由作者负责,不代表本号观点。文中部分图片来源于网络,如涉及作品内容、版权和其他问题,请后台联系小编处理。

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2022-04-06,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 气象学家 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 新段落
  • 1. Basic (download, extract & save data, concat, groupby, select):
  • 2. Intermediate (interpolate, revise coordinates, rolling average):
  • 3. Advanced (skip data download, anomaly calculation, timelapse animation):
    • 3.1. Seasonal Forecast:
      • 3.2. Climate change assessment using CMIP6 data:
        • Creating a timelapse:
        • Generate simple timeseries plot:
    • 4. Advanced+ (Machine Learning):
    领券
    问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档