Recipes¶
Once installed, this package should make the command line tool cdb_query
visible to the user’s path. This is typically the
case for common python installations.
The command cdb_query CMIP5
contains several commands:
cdb_query CMIP5 ask
, searches the CMIP5 archive and produces a file with pointers to the data.cdb_query CMIP5 validate
can be used to find all the experiments that have all of the avialalble years. This command outputs a file that points to data for every month of the requested period.cdb_query CMIP5 download_files
andcdb_query CMIP5 download_opendap
, reads the output fromcdb_query CMIP5 validate
, as an input and returns a single path to file. This makes it easy to retrieve data from simple scripts.
Hint
The variable descriptions (time_frequency, realm, cmor_table, ...) for the CMIP5 can be found in files http://cmip-pcmdi.llnl.gov/cmip5/docs/standard_output.pdf, http://cmip-pcmdi.llnl.gov/cmip5/docs/standard_output.xls.
1. Retrieving surface temperature for ONDJF (CMIP5)¶
Hint
Don’t forget to use the extensive command-line helps cdb_query -h
, cdb_query CMIP5 -h
, etc.
Discovering the data¶
The script is run using:
$ cdb_query CMIP5 ask \
--ask_month=1,2,10,11,12 \
--ask_var=tas:day-atmos-day,orog:fx-atmos-fx \
--ask_experiment=amip:1979-2004 \
--model=CanAM4 --model=CCSM4 --model=GISS-E2-R --model=MRI-CGCM3 \
--num_procs=10 \
tas_ONDJF_pointers.nc
This is a list of simulations that COULD satisfy the query:
NCAR,CCSM4,r7i1p1,amip
NCAR,CCSM4,r2i1p1,amip
NCAR,CCSM4,r1i1p1,amip
NCAR,CCSM4,r4i1p1,amip
NCAR,CCSM4,r3i1p1,amip
NCAR,CCSM4,r5i1p1,amip
CCCMA,CanAM4,r2i1p1,amip
CCCMA,CanAM4,r1i1p1,amip
CCCMA,CanAM4,r4i1p1,amip
CCCMA,CanAM4,r3i1p1,amip
MRI,MRI-CGCM3,r4i1p2,amip
MRI,MRI-CGCM3,r2i1p1,amip
MRI,MRI-CGCM3,r1i1p1,amip
MRI,MRI-CGCM3,r3i1p1,amip
NASA-GISS,GISS-E2-R,r6i1p1,amip
NASA-GISS,GISS-E2-R,r6i1p3,amip
cdb_query will now attempt to confirm that these simulations have all the requested variables.
This can take some time. Please abort if there are not enough simulations for your needs.
Obtaining the tentative list of simulations should be very quick (a few seconds to a minute) but confirming that these simulations have all the requested
variables should take a few minutes, depending on your connection to the ESGF node. It returns a self-descriptive netCDF4 file
with pointers to the data. The num_procs
flag substantially speeds up the discovery, and comes with a very small price to the user.
Hint
Try looking at the resulting netCDF file using ncdump
:
$ ncdump -h tas_ONDJF_pointers.nc
As you can see, it generates a hierarchical netCDF4 file. cdb_query CMIP5 list_fields
offer a tool to query this file.
Querying the discovered data¶
For example, if we want to know how many different simulations were made available, we would run
$ cdb_query CMIP5 list_fields -f institute -f model -f ensemble tas_ONDJF_pointers.nc
CCCMA,CanAM4,r0i0p0
CCCMA,CanAM4,r1i1p1
CCCMA,CanAM4,r2i1p1
CCCMA,CanAM4,r3i1p1
CCCMA,CanAM4,r4i1p1
MRI,MRI-CGCM3,r0i0p0
MRI,MRI-CGCM3,r1i1p1
MRI,MRI-CGCM3,r2i1p1
MRI,MRI-CGCM3,r3i1p1
NCAR,CCSM4,r0i0p0
NCAR,CCSM4,r1i1p1
NCAR,CCSM4,r2i1p1
NCAR,CCSM4,r3i1p1
NCAR,CCSM4,r4i1p1
NCAR,CCSM4,r5i1p1
NCAR,CCSM4,r7i1p1
This test was run on July 2nd, 2016 and these results represent the data presented by the ESGF node on that day. The r0i0p0 ensemble name is the ensemble associated with fixed (time_frequency=fx) variables and its presence suggests that these three institutes have provided the requested orog variable. These results also indicate that models CanAM4, MRI-CGCM3 and CCSM4 provided 4, 3 and 6 simulations,respectively.
If this list of models is satisfying, we next check the paths
$ cdb_query CMIP5 list_fields -f path tas_ONDJF_pointers.nc
...
http://esgf-data1.ceda.ac.uk/thredds/fileServer/esg_dataroot/cmip5/output1/NCAR/CCSM4/amip/fx/atmos/fx/r0i0p0/v20130312/orog/orog_fx_CCSM4_amip_
r0i0p0.nc|SHA256|87b29a7d2731e6b028d81b07edbe84c3f06e1321986401482f8c5d76d5361516|2b43ce02-7124-40bd-8ae4-0961e399e9ec
http://esgf-data1.diasjp.net/thredds/fileServer/esg_dataroot/cmip5/output1/MRI/MRI-CGCM3/amip/day/atmos/day/r1i1p1/v20120701/tas/tas_day_MRI-CGCM3_am
ip_r1i1p1_19790101-19881231.nc|SHA256|804f3325a2b0e29bad14e5773a7216c2893a5200fba62ffa83db992b9765b283|e95dd229-1e36-4e8a-9e50-8797e2a136a2
...
We consider the first path. It is constituted of two parts. The first part begins with http://esgf-data1.ceda.ac.uk/...
and
ends a the vertical line. This is a wget link. The second part, separated by vertical lines, are the checksum typw, checksum and tracking id, respectively.
The checksum is as published on the EGSF website. The file found at the other end of the wget link can be
expected to have the same checksum.
The string that precedes /thredds/...
in the wget link is the data node. Here, we have two data nodes:
http://esgf-data1.ceda.ac.uk
and http://esgf-data1.diasjp.net
. Retrieving two files from two different data nodes at the same time should
not hinder the transfer of one another.
Hint
The command cdb_query CMIP5 ask
does not guarantee that the simulations found satisfy ALL the requested criteria.
Validating the set of simulations¶
Warning
From now on it is assumed that the user has installed properly resgistered with the ESGF.
Using the --openid
option combined with an ESGF account takes care of this.
The first time this function is used, it might fail and ask you to register your kind of user.
This has to be done only once.
To narrow down our results to the simulations that satisfy ALL the requested criteria, we can use
$ cdb_query CMIP5 validate \
--openid=$OPENID \
--Xdata_node=http://esgf2.dkrz.de \
--num_procs=10 \
tas_ONDJF_pointers.nc \
tas_ONDJF_pointers.validate.nc
Here, we are removing data node http://esgf2.dkrz.de
from the validation because on this data node, data sits on a tape archive and
it can be very slow to recover it.
To output now has a time axis for each variable (except fx). It links every time index to a time index in a UNIQUE file (remote or local).
Try looking at the resulting netCDF file using ncdump
:
$ ncdump -h tas_ONDJF_pointers.validate.nc
Again, this file can be queried for simulations:
$ cdb_query CMIP5 list_fields -f institute -f model -f ensemble tas_ONDJF_pointers.validate.nc
CCCMA,CanAM4,r0i0p0
CCCMA,CanAM4,r1i1p1
CCCMA,CanAM4,r2i1p1
CCCMA,CanAM4,r3i1p1
CCCMA,CanAM4,r4i1p1
MRI,MRI-CGCM3,r0i0p0
MRI,MRI-CGCM3,r1i1p1
MRI,MRI-CGCM3,r2i1p1
MRI,MRI-CGCM3,r3i1p1
NCAR,CCSM4,r0i0p0
NCAR,CCSM4,r1i1p1
NCAR,CCSM4,r2i1p1
NCAR,CCSM4,r3i1p1
NCAR,CCSM4,r4i1p1
NCAR,CCSM4,r5i1p1
NCAR,CCSM4,r7i1p1
Retrieving the data: wget¶
cdb_query CMIP5 includes built-in functionality for retrieving the paths. It is used as follows
$ cdb_query CMIP5 download_files \
--download_all_files \
--openid=$OPENID \
--out_download_dir=./in/CMIP5/ \
tas_ONDJF_pointers.validate.nc \
tas_ONDJF_pointers.validate.downloaded.nc
It downloads the paths listed in tas_ONDJF_pointers.validate.nc
to ./in/CMIP5/
and records the soft links to the local data in tas_ONDJF_pointers.validate.downloaded.nc
.
Warning
The retrieved files are structured with the CMIP5 DRS. It is good practice not to change this directory structure.
If the structure is kept then cdb_query CMIP5 ask
will recognize the retrieved files as local if they were
retrieved to a directory listed in the Search_path
.
The downloaded paths are now discoverable by cdb_query CMIP5 ask
.
Retrieving the data: OPeNDAP¶
cdb_query CMIP5 includes built-in functionality for retrieving a subset of the data.
To retrieve the first month of daily data:
$ cdb_query CMIP5 download_opendap \
--year=1979 \
--month=1 \
--openid=$OPENID \
tas_ONDJF_pointers.validate.nc \
tas_ONDJF_pointers.validate.197901.retrieved.nc
The file tas_ONDJF_pointers.validate.197901.retrieved.nc
should now contain the first thirty days for all experiments! To check the daily
surface temperature in the amip experiment from simulation NCAR,CCSM4,r1i1p1 ncview (if installed):
$ ncks -G : -g /NCAR/CCSM4/amip/day/atmos/day/r1i1p1/tas \
tas_ONDJF_pointers.validate.197901.retrieved.nc \
tas_ONDJF_pointers.validate.197901.retrieved.NCAR_CCSM4_r1i1p1.nc
$ ncview tas_ONDJF_pointers.validate.197901.retrieved.NCAR_CCSM4_r1i1p1.nc
Note
The ncks
command can be slow. For some unknown reasons NCO version 4.5.3 and earlier with netCDF version 4.3.3.1 and earlier
does not seem optimized for highly hierarchical files. At the moment, there are no indications that more recent versions have fixed
this issue.
BASH script¶
This recipe is summarized in the following BASH script. The --password_from_pipe
option is introduced to fully automatize the script:
#!/bin/bash
OPENID='your openid'
# Single quotes are necessary here:
PASSWORD='your ESGF password'
#Discover data:
cdb_query CMIP5 ask --ask_month=1,2,10,11,12 \
--ask_var=tas:day-atmos-day,orog:fx-atmos-fx \
--ask_experiment=amip:1979-2004 \
--model=CanAM4 --model=CCSM4 --model=GISS-E2-R --model=MRI-CGCM3 \
--num_procs=10 \
tas_ONDJF_pointers.nc
#List simulations:
cdb_query CMIP5 list_fields -f institute \
-f model \
-f ensemble \
tas_ONDJF_pointers.nc
#Validate simulations:
#Exclude data_node http://esgf2.dkrz.de because it is on a tape archive (slow)
#If you do not exclude it, it will likely be excluded because of its slow
#response time.
#
#The first time this function is used, it might fail and ask you to register your kind of user.
#This has to be done only once.
echo $PASSWORD | cdb_query CMIP5 validate \
--openid=$OPENID \
--password_from_pipe \
--num_procs=10 \
--Xdata_node=http://esgf2.dkrz.de \
tas_ONDJF_pointers.nc \
tas_ONDJF_pointers.validate.nc
#List simulations:
cdb_query CMIP5 list_fields -f institute \
-f model \
-f ensemble \
tas_ONDJF_pointers.validate.nc
#CHOOSE:
# *1* Retrieve files:
#echo $PASSWORD | cdb_query CMIP5 download_files \
# --download_all_files \
# --openid=$OPENID \
# --password_from_pipe \
# --out_download_dir=./in/CMIP5/ \
# tas_ONDJF_pointers.validate.nc \
# tas_ONDJF_pointers.validate.downloaded.nc
# *2* Retrieve to netCDF:
#Retrieve the first month:
echo $PASSWORD | cdb_query CMIP5 download_opendap --year=1979 --month=1 \
--openid=$OPENID \
--password_from_pipe \
tas_ONDJF_pointers.validate.nc \
tas_ONDJF_pointers.validate.197901.retrieved.nc
#Pick one simulation:
#Note: this can be VERY slow!
ncks -G :8 -g /NCAR/CCSM4/amip/day/atmos/day/r1i1p1/tas \
tas_ONDJF_pointers.validate.197901.retrieved.nc \
tas_ONDJF_pointers.validate.197901.retrieved.NCAR_CCSM4_r1i1p1.nc
#Remove soft_links informations:
ncks -L 0 -O -x -g soft_links \
tas_ONDJF_pointers.validate.197901.retrieved.NCAR_CCSM4_r1i1p1.nc \
tas_ONDJF_pointers.validate.197901.retrieved.NCAR_CCSM4_r1i1p1.nc
#Look at it:
#When done, look at it. A good tool for that is ncview:
# ncview tas_ONDJF_pointers.validate.197901.retrieved.NCAR_CCSM4_r1i1p1.nc
#Convert hierarchical file to files on filesystem (much faster than ncks):
#Identity reduction simply copies the data to disk
cdb_query CMIP5 reduce \
'' \
--out_destination=./out/CMIP5/ \
tas_ONDJF_pointers.validate.197901.retrieved.nc \
tas_ONDJF_pointers.validate.197901.retrieved.converted.nc
#The files can be found in ./out/CMIP5/:
#find ./out/CMIP5/ -name '*.nc'
2. Speeding things up¶
The ask
and validate
steps can be slow.
They can be sped up by querying the archive simulation per simulation AND do it asynchronously.
Asynchronous discovery¶
The ask and validate commands provides a basic multi-processing implementation:
$ cdb_query CMIP5 ask --num_procs=5 \
... \
tas_ONDJF_pointers.nc
$ cdb_query CMIP5 validate --num_procs=5 \
tas_ONDJF_pointers.nc \
tas_ONDJF_pointers.validate.nc
This command uses 5 processes and queries the archive simulation per simulation.
Asynchronous downloads¶
The download_files and download_opendap commands also provides a basic multi-processing implementation. By default, data from different data nodes is retrieved in parallel. One can allow more than one simultaneous download per data node
$ cdb_query CMIP5 download_files --num_dl=5 ...
$ cdb_query CMIP5 download_opendap --num_dl=5 ...
These commands now allow 5 simulataneous download per data node.
3. Retrieving precipitation for JJAS over France (CORDEX)¶
Specifying the discovery¶
This relies on the idea that all queries are for a experiment list and a variable list. The CORDEX project has however another important component that one might want to query: its domain. The first step is thus to find what domains are available
$ cdb_query CORDEX ask --ask_experiment=historical:1979-2004 \
--ask_var=pr:day \
--ask_month=6,7,8,9 \
--list_only_field=domain \
pr_JJAS_France_pointers.nc
MNA-44
EAS-44
SAM-44
MNA-22
WAS-44i
ANT-44
EUR-44
CAM-44
EUR-11
ARC-44
AFR-44
WAS-44
NAM-44
Here the --list_only_field=domain
option lists all available domains. The result is an (unsorted) list of domain
identifiers. The European domains (EUR-11
and EUR-44
) are what we want. For the sake of this recipe,
we are going to limit our discovery to the higher resolution data EUR-11
Discovering the data¶
The script is run using:
$ cdb_query CORDEX ask --ask_experiment=historical:1979-2004 \
--ask_var=pr:day \
--month=6,7,8,9 \
--domain=EUR-11 \
--driving_model=ICHEC-EC-EARTH \
--num_procs=10 \
pr_JJAS_France_pointers.nc
This is a list of simulations that COULD satisfy the query:
EUR-11,DMI,ICHEC-EC-EARTH,HIRHAM5,v1,r3i1p1,historical
EUR-11,CLMcom,ICHEC-EC-EARTH,CCLM4-8-17,v1,r12i1p1,historical
EUR-11,KNMI,ICHEC-EC-EARTH,RACMO22E,v1,r1i1p1,historical
EUR-11,SMHI,ICHEC-EC-EARTH,RCA4,v1,r12i1p1,historical
cdb_query will now attempt to confirm that these simulations have all the requested variables.
This can take some time. Please abort if there are not enough simulations for your needs.
Obtaining the tentative list of simulations can take a few minutes but confirming that these simulations have all the requested
variables should take a few minutes, depending on your connection to the ESGF IPSL node. It returns a self-descriptive netCDF file
with pointers to the data. Try looking at the resulting netCDF file using ncdump
:
$ ncdump -h pr_JJAS_France_pointers.nc
As you can see, it generates a hierarchical netCDF4 file. cdb_query CORDEX list_fields
offer a tool to query this file.
Querying the discovered data¶
For example, if we want to know how many different simulations were made available, we would run
$ cdb_query CORDEX list_fields -f domain -f driving_model -f institute \
-f rcm_model -f rcm_version -f ensemble pr_JJAS_France_pointers.nc
EUR-11,ICHEC-EC-EARTH,CLMcom,CCLM4-8-17,v1,r12i1p1
EUR-11,ICHEC-EC-EARTH,DMI,HIRHAM5,v1,r3i1p1
EUR-11,ICHEC-EC-EARTH,KNMI,RACMO22E,v1,r1i1p1
EUR-11,ICHEC-EC-EARTH,SMHI,RCA4,v1,r12i1p1
This test was run on June 23, 2016 and these results represent the data presented by the ESGF on that day.
If this list of models in satisfying, we next check the paths
$ cdb_query CORDEX list_fields -f path pr_JJAS_France_pointers.nc
http://cordexesg.dmi.dk/thredds/dodsC/cordex_general/cordex/output/EUR-11/DMI/ICHEC-EC-EARTH/historical/r3i1p1/DMI-HIRHAM5/v1/day/pr/v20131119/pr_EUR-11_ICHEC-EC-EARTH_historical_r3i1p1_DMI-HIRHAM5_v1_day_19510101-19551231.nc|SHA256|d172a848bfaa24db89c5f550046c8dfc789e61f5b81c6d9ea21709c70b17eff7|d2d75739-4023-446a-a834-c111daf6d970
...
We consider the first path. It is constituted of two parts. The first part begins with http://esgf-node.ipsl.fr/...
and
ends a the vertical line. This is an OPENDAP link. The second part, at the right of the vertical line, is the checksum type, the checksum and the tracking id.
Hint
The command cdb_query CORDEX ask
does not guarantee that the simulations found satisfy ALL the requested criteria.
Validating the simulations¶
Warning
From now on it is assumed that the user has installed properly resgistered with the ESGF.
Using the --openid
option combined with an ESGF account takes care of this.
The first time this function is used, it might fail and ask you to register your kind of user.
This has to be done only once.
To narrow down our results to the simulations that satisfy ALL the requested criteria, we can use
$ cdb_query CORDEX validate \
--openid=$OPENID \
--num_procs=10 \
pr_JJAS_France_pointers.nc \
pr_JJAS_France_pointers.validate.nc
To output now has a time axis for each variable (except fx). It links every time index to a time index in a UNIQUE file (remote or local).
Try looking at the resulting netCDF file using ncdump
:
$ ncdump -h pr_JJAS_France_pointers.validate.nc
Again, this file can be queried for simulations:
$ cdb_query CORDEX list_fields -f domain -f driving_model -f institute \
-f rcm_model -f rcm_version -f ensemble pr_JJAS_France_pointers.validate.nc
EUR-11,ICHEC-EC-EARTH,CLMcom,CCLM4-8-17,v1,r12i1p1
EUR-11,ICHEC-EC-EARTH,DMI,HIRHAM5,v1,r3i1p1
EUR-11,ICHEC-EC-EARTH,KNMI,RACMO22E,v1,r1i1p1
EUR-11,ICHEC-EC-EARTH,SMHI,RCA4,v1,r12i1p1
We can see that no simulations were excluded. This means that they had ALL the variables for ALL the months of ALL the years for the historical experiment.
Retrieving the data: wget¶
cdb_query CORDEX includes built-in functionality for retrieving the paths. It is used as follows
$ cdb_query CORDEX download_files --out_download_dir=./in/CMIP5/ \
--openid=$OPENID \
--download_all_files \
pr_JJAS_France_pointers.validate.nc \
pr_JJAS_France_pointers.validate.files.nc
It downloads the paths listed in pr_JJAS_France_pointers.validate.nc
and create a new
soft links file pr_JJAS_France_pointers.validate.files.nc
with the downloaded path registered.
Warning
The retrieved files are structured with the CORDEX DRS. It is good practice not to change this directory structure.
If the structure is kept then cdb_query CORDEX ask
will recognized the retrieved files as local if they were
retrieved to a directory listed in the --Search_path
.
The downloaded paths are now discoverable by cdb_query CORDEX ask
.
Retrieving the data: OPeNDAP¶
We retrieve the first month:
$ cdb_query CORDEX download_opendap --year=1979 --month=6 \
--openid=$OPENID \
pr_JJAS_France_pointers.validate.nc \
pr_JJAS_France_pointers.validate.197906.retrieved.nc
This step took about 4 minutes from the University of Toronto on June 23, 2016. Next, we extract precipitation for the simulation with the EUR-11 domain:
$ ncks -G :9 -g /EUR-11/DMI/ICHEC-EC-EARTH/historical/r3i1p1/HIRHAM5/v1/day/pr \
pr_JJAS_France_pointers.validate.197906.retrieved.nc \
pr_JJAS_France_pointers.validate.197906.retrieved.EUR-11.nc
$ ncview pr_JJAS_France_pointers.validate.197906.retrieved.EUR-11.nc
Hint
This file contains a soft_links
subgroup that contains full traceability informations for the accompyning data.
This data is projected onto a rotated pole grid, making it difficult to zoom in onto France by using slices along dimensions. Sever tools can be used to zoom in even with a rotated pole grid. With CDO, one would do:
$ cdo -f nc -sellonlatbox,-5.0,10.0,40.0,53.0 -selgrid,curvilinear,gaussian,lonlat \
pr_JJAS_France_pointers.validate.197906.retrieved.EUR-11.nc \
pr_JJAS_France_pointers.validate.197906.retrieved.EUR-11_France.nc
Alternatively, bundled with cdb_query
there is a simple tool that can accomplish this:
$ nc4sl subset --lonlatbox -5.0 10.0 40.0 53.0 \
pr_JJAS_France_pointers.validate.197906.retrieved.EUR-11.nc \
pr_JJAS_France_pointers.validate.197906.retrieved.EUR-11_France.nc
We can make sure that our subsetting was ok:
$ ncview pr_JJAS_France_pointers.validate.197906.retrieved.EUR-11_France.nc
Subsetting the data BEFORE the OPENDAP retrieval¶
We can subset the soft link file before using download_opendap
and cdb_query
will only download
the requested data:
$ nc4sl subset --lonlatbox -5.0 10.0 40.0 53.0 \
pr_JJAS_France_pointers.validate.nc \
pr_JJAS_France_pointers.validate.France.nc
or, using reduce_soft_links
:
$ cdb_query CORDEX reduce_soft_links \
--num_procs=10 \
'nc4sl subset --lonlatbox -5.0 10.0 40.0 53.0' \
pr_JJAS_France_pointers.validate.nc \
pr_JJAS_France_pointers.validate.France.nc
In the second method, the subsetting can be performed asynchronously (--num_procs=10
).
Finally, we retrieve the subsetted data:
$ cdb_query CORDEX download_opendap --year=1979 --month=6 \
--openid=$OPENID \
pr_JJAS_France_pointers.validate.France.nc \
pr_JJAS_France_pointers.validate.France.197906.retrieved.nc
This step took about 3m40s from the University of Toronto. It retrieves all models but only over France. We can then check the variables:
$ ncks -G :9 -g /EUR-11/DMI/ICHEC-EC-EARTH/historical/r3i1p1/HIRHAM5/v1/day/pr \
pr_JJAS_France_pointers.validate.France.197906.retrieved.nc \
pr_JJAS_France_pointers.validate.France.197906.retrieved.EUR-11.nc
$ ncview pr_JJAS_France_pointers.validate.France.197906.retrieved.EUR-11.nc
Should show precipitation over France in June 1979.
The amount of time required for the download is not substantially improved for single month but they are for longer retrievals:
$ time cdb_query CORDEX download_opendap --month=6 \
--openid=$OPENID\
pr_JJAS_France_pointers.validate.France.nc \
pr_JJAS_France_pointers.validate.France.June.retrieved.nc
real 25m28.268s
user 14m25.368s
sys 3m18.299s
$ time cdb_query CORDEX download_opendap --month=6 \
--openid=$OPENID\
pr_JJAS_France_pointers.validate.nc \
pr_JJAS_France_pointers.validate.June.retrieved.nc
real 43m45.656s
user 21m59.345s
sys 8m53.251s
BASH script¶
This recipe is summarized in the following BASH script:
#!/bin/bash
#Change to set number of processes to use:
NUM_PROCS=10
#Specify your OPENID
OPENID='your openid'
# Single quotes are necessary here:
PASSWORD='your ESGF password'
#Discover data:
cdb_query CORDEX ask --ask_experiment=historical:1979-2004 \
--ask_var=pr:day \
--domain=EUR-11 \
--num_procs=${NUM_PROCS} \
pr_JJAS_France_pointers.nc
#List simulations:
cdb_query CORDEX list_fields -f domain -f driving_model -f institute \
-f rcm_model -f rcm_version -f ensemble pr_JJAS_France_pointers.nc
#Validate simulations:
#Exclude data_node http://esgf2.dkrz.de because it is on a tape archive (slow)
#If you do not exclude it, it will likely be excluded because of its slow
#
#The first time this function is used, it might fail and ask you to register your kind of user.
#This has to be done only once.
echo $PASSWORD | cdb_query CORDEX validate \
--openid=$OPENID \
--password_from_pipe \
--num_procs=${NUM_PROCS} \
--Xdata_node=http://esgf2.dkrz.de \
pr_JJAS_France_pointers.nc \
pr_JJAS_France_pointers.validate.nc
#CHOOSE:
# *1* Retrieve files:
#echo $PASSWORD | cdb_query CORDEX download_files \
# --out_download_dir=./in/CMIP5/ \
# --openid=$OPENID \
# --download_all_files \
# --password_from_pipe \
# pr_JJAS_France_pointers.validate.nc \
# pr_JJAS_France_pointers.validate.files.nc
# *2* Retrieve to netCDF:
#Retrieve one month:
echo $PASSWORD | cdb_query CORDEX download_opendap --year=1979 --month=6 \
--openid=$OPENID \
--password_from_pipe \
pr_JJAS_France_pointers.validate.nc \
pr_JJAS_France_pointers.validate.197906.retrieved.nc
#Convert to filesystem:
cdb_query CORDEX reduce --out_destination=./out/CORDEX/ '' \
pr_JJAS_France_pointers.validate.197906.retrieved.nc \
pr_JJAS_France_pointers.validate.197906.retrieved.converted.nc
#Subset France on soft_links:
cdb_query CORDEX reduce_soft_links \
--num_procs=${NUM_PROCS} \
'nc4sl subset --lonlatbox -5.0 10.0 40.0 53.0' \
pr_JJAS_France_pointers.validate.nc \
pr_JJAS_France_pointers.validate.France.nc
#We then retrieve the whole time series over France:
echo $PASSWORD | cdb_query CORDEX download_opendap \
--openid=$OPENID \
--password_from_pipe \
pr_JJAS_France_pointers.validate.France.nc \
pr_JJAS_France_pointers.validate.France.retrieved.nc
#Convert to filesystem:
cdb_query CORDEX reduce --out_destination=./out_France/CORDEX/ \
--num_procs=${NUM_PROCS} \
'' \
pr_JJAS_France_pointers.validate.France.retrieved.nc \
pr_JJAS_France_pointers.validate.France.retrieved.converted.nc
4. Operator chaining¶
The real purpose of cdb_query
is to perform all of the steps asynchronously.
The ask
, validate
, reduce_soft_links
and download_opendap
operations can be
chained and applied to each simulation.
CMIP5¶
In CMIP5, simulations are institute, model, ensemble
triples. Chaining operators will first
determine the simulations triples and chain the operators for every triples. Advanced options allow
to bypass this default setting. This will be covered in later recipes. This is the internal mechanics
but for the user it is fairly transparent (except when there is an error message).
With operator chaining, recipe 1 could be written:
$ OPENID="your openid
$ cdb_query CMIP5 ask validate record_validate download_opendap reduce \
--ask_month=1,2,10,11,12 \
--ask_var=tas:day-atmos-day,orog:fx-atmos-fx \
--ask_experiment=amip:1979-2004 \
--Xdata_node=http://esgf2.dkrz.de \
--openid=$OPENID \
--year=1979 --month=1 \
--out_destination=./out/CMIP5/ \
--num_procs=10 \
'' \
tas_ONDJF_pointers.validate.197901.retrieved.converted.nc
It does:
- Finds ONDJF
tas
and fixed variableorog
foramip
. - Excludes (
--Xdata_node=http://esgf2.dkrz.de
) data nodehttp://esgf2.dkrz.de
because it is a tape archive and tends to be slow. - Retrieve credentials (
--openid=$OPENID
). It will prompt for your password. - Record the result (
record_validate
) ofvalidate
totas_ONDJF_pointers.validate.197901.retrieved.converted.nc.validate
. - Does this using 10 processes
--num_procs=10
. - Download only January 1979 (
--year=1979 --month=1
). - Converts (the empty script
''
passed toreduce
) the data to the CMIP5 DRS to directory./out/CMIP5/
.
CORDEX¶
In CORDEX, simulations are domain,driving_model,institute,rcm_model,rcm_version,ensemble
sextuples. Chaining operators will first
determine the simulations triples and chain the operators for every sextuple. Advanced options allow
to bypass this default setting. This will be covered in later recipes. This is the internal mechanics
but for the user it is fairly transparent (except when there is an error message).
With operator chaining, recipe 3 could be written:
$ OPENID="your openid"
$ cdb_query CORDEX ask validate record_validate reduce_soft_links download_opendap reduce \
--ask_experiment=historical:1979-2004 --ask_var=pr:day --ask_month=6,7,8,9 \
--openid=$OPENID \
--year=1979 --month=6 \
--domain=EUR-11 \
--out_destination=./out_France/CORDEX/ \
--Xdata_node=http://esgf2.dkrz.de \
--num_procs=10 \
--reduce_soft_links_script='nc4sl subset --lonlatbox -5.0 10.0 40.0 53.0' \
'' \
pr_JJAS_France_pointers.validate.France.retrieved.converted.nc
It does:
- Finds JJAS
pr
forhistorical
. - Excludes (
--Xdata_node=http://esgf2.dkrz.de
) data nodehttp://esgf2.dkrz.de
because it is a tape archive and tends to be slow. - Retrieve certificates (
--openid=$OPENID
). It will prompt for your password. - Record the result (
record_validate
) ofvalidate
topr_JJAS_France_pointers.validate.France.retrieved.converted.nc.validate
. - Does this using 10 processes
--num_procs=10
. - Download only June 1979 (
--year=1979 --month=6
). - Converts (the empty script
''
passed toreduce
) the data to the CMIP5 DRS to directory./out_France/CORDEX/
.
Note
From now on, recipes will be presented as chained operators.
5. Remap MAM precipitation and temperature to US (CMIP5)¶
Discovery and analyzing a sample¶
With operator chaining, in a BASH script:
#!\bin\bash
#Create new cdo grid:
cat >> newgrid_atmos.cdo <<EndOfGrid
gridtype = lonlat
gridsize = 55296
xname = lon
xlongname = longitude
xunits = degrees_east
yname = lat
ylongname = latitude
yunits = degrees_north
xsize = 288
ysize = 192
xfirst = 0
xinc = 1.25
yfirst = -90
yinc = 0.94240837696
EndOfGrid
OPENID='your openid'
# Single quotes are necessary here:
PASSWORD='your ESGF password'
#latlon box -124.78 -66.95 24.74 49.34 is continental us
echo $PASSWORD | cdb_query CMIP5 ask validate record_validate reduce_soft_links download_opendap reduce \
--ask_month=3,4,5 \
--ask_var=tas:mon-atmos-Amon,pr:mon-atmos-Amon \
--ask_experiment=historical:1950-2005,rcp85:2006-2050 \
--related_experiments \
--Xdata_node=http://esgf2.dkrz.de \
--openid=$OPENID \
--password_from_pipe \
--out_destination=./out_sample/CMIP5/ \
--num_procs=10 \
--year=2000 --month=3 \
--reduce_soft_links_script='nc4sl subset --lonlatbox -150.0 -50.0 20.0 55.0' \
'cdo -f nc \
-sellonlatbox,-124.78,-66.95,24.74,49.34 \
-remapbil,newgrid_atmos.cdo \
-selgrid,lonlat,curvilinear,gaussian,unstructured ' \
us_pr_tas_MAM_pointers.validate.200003.retrieved.converted.nc
It does:
- Finds MAM
pr
andtas
for 1950 to 2050historical
, followed byrcp85
. - Drops simulations (
institute
,model
,ensemble
) triples that are not found in bothhistorical
andrcp85
for ALL requested years. - Excludes (
--Xdata_node=http://esgf2.dkrz.de
) data nodehttp://esgf2.dkrz.de
because it is a tape archive and tends to be slow. - Retrieves certificates (
--openid=$OPENID
). Password read from the pipe (--password_from_pipe
). - Records the result (
record_validate
) ofvalidate
tous_pr_tas_MAM_pointers.validate.200003.retrieved.converted.nc.validate
. - Selects a slightly larger area than continental US for download (
--reduce_soft_links_script='nc4sl subset --lonlatbox -150.0 -50.0 20.0 55.0'
) - Downloads only March 2000 (
--year=2000 --month=3
). - Uses a bilinear remapping and focuses on the continental US (
'cdo ... '
). - Does this using 10 processes
--num_procs=10
. - Converts the data to the CMIP5 DRS to directory
./out_sample/CMIP5/
. - Writes a full description of downloaded data (pointers to it) in file
us_pr_tas_MAM_pointers.validate.200003.retrieved.converted.nc
.
Scaling up to the whole dataset¶
If the data looks OK, then one can use the validate file to bypass the ask
and validate
steps:
#!\bin\bash
OPENID="your openid"
PASSWORD="your ESGF password"
#latlon box -124.78 -66.95 24.74 49.34 is continental us
echo $PASSWORD | cdb_query CMIP5 reduce_soft_links download_opendap reduce \
--openid=$OPENID \
--password_from_pipe \
--out_destination=./out/CMIP5/ \
--num_procs=10 \
--reduce_soft_links_script='nc4sl subset --lonlatbox -150.0 -50.0 20.0 55.0' \
'cdo -f nc \
-sellonlatbox,-124.78,-66.95,24.74,49.34 \
-remapbil,newgrid_atmos.cdo \
-selgrid,lonlat,curvilinear,gaussian,unstructured ' \
us_pr_tas_MAM_pointers.validate.200003.retrieved.converted.nc.validate \
us_pr_tas_MAM_pointers.validate.retrieved.converted.nc
This will download all the data!
Hint
It is good practice to first download a small subset to ensure that everything outputs as expected. Because we record the validate step, this two-parts process comes at a very small price.
6. Retrieving DJF monthly atmospheric data over a latitude band (CMIP5)¶
The following BASH script recovers several variables over a latitude band:
#!/bin/bash
#This script discovers and retrieves the geopotential height (zg), meridional wind (va) and
#atmospheric temperature (ta) at the monthly frequency (mon) from the atmospheric realm (atmos)
#and from monthly atmospheric mean CMOR table (Amon) for years 1979 to 2005 of experiment
#historical and years 2006 to 2015 for experiment rcp85.
#
#A ramdisk (/dev/shm/) swap directory is used (--swap_dir option)
#Data node http://esgf2.dkrz.de is excluded because it is a tape archive
#(and therefore too slow for the type of multiple concurrent requests that are required)
#
#The data is reduced to a latitude band (55.0 to 60.0) using the
#--reduce_soft_links_script='ncrcat -d lat 55.0 60.0' option and the reduce_soft_links command.
#The results are stored in:
# 1) a validate file (${OUT_FILE}.validate),
# 2) a directory tree under ${OUT_DIR} and
# 3) a pointer file ${OUT_FILE} that can be used in a further reduce step.
#Use 5 processors:
NUM_PROCS=5
OPENID='your openid'
# Single quotes are necessary here:
PASSWORD='your ESGF password'
SWAP_DIR="/dev/shm/lat_band/"
OUT_FILE="DJF_lat_band.nc"
OUT_DIR="out_lat_band/"
#Create swap directory:
mkdir ${SWAP_DIR}
echo $PASSWORD | cdb_query CMIP5 ask validate record_validate reduce_soft_links download_opendap reduce \
--openid=$OPENID \
--password_from_pipe \
--swap_dir=${SWAP_DIR} \
--num_procs=$NUM_PROCS \
--ask_experiment=historical:1979-2005,rcp85:2006-2015 \
--ask_var=zg:mon-atmos-Amon,va:mon-atmos-Amon,ta:mon-atmos-Amon \
--ask_month=1,2,12 \
--related_experiments \
--Xdata_node=http://esgf2.dkrz.de \
--reduce_soft_links_script='ncrcat -d lat,55.0,65.0' \
'' \
--out_destination=${OUT_DIR} \
${OUT_FILE}
#Remove swap directory:
rm -r ${SWAP_DIR}