How-To add new datasets#
The EUREC\(^4\)A-Intake catalog is the dataset collection of the EUREC\(^4\)A and ATOMIC field campaign. It is a collection of yaml
-files that contain references to the dataset storage locations.
Datasets should be added by following these steps:
via the command line#
Cloning the EUREC\(^4\)A-Intake catalog repository with
git clone git@github.com:eurec4a/eurec4a-intake.git
You might be asked to set up an SSH key for your GitHub account.
Create a new branch
git checkout -b <my_new_dataset>
Add new catalog entry
The catalog contains two types of entries:
references to sub-catalogs
references to a dataset
The surface flux dataset from the research vessel Meteor is accessible via
cat.Meteor.surface_fluxes
and is saved in the file Meteor/main.yaml.
The radar dataset, which is more complex and contains several subsets is accessible via
cat.Meteor.LIMRAD94.low_res cat.Meteor.LIMRAD94.high_res
For the creation of the
LIMRAD94
radar subset, a sub-catalog reference has been created in Meteor/main.yaml. The final reference to the dataset is added in Meteor/LIMRAD94.yamlDepending on the complexity of the dataset, an entry can be directly added to the
main.yaml
file of the respective platform/simulationvi <platform>/main.yaml
or to a (new) instrument specific file
vi <platform>/<instrument>.yaml
The reference has the following format:
sources: <dataset_name>: description: <short description of the dataset> driver: opendap args: auth: null urlpath: <url_to_dataset> chunks: {} engine: netcdf4
In case your dataset has been published on AERIS, the THREDDS link to your dataset can be determined by finding your dataset at https://observations.ipsl.fr/aeris/eurec4a-data/. To retrieve the THREDDS link, replace
https://observations.ipsl.fr/aeris/eurec4a-data/
withhttps://observations.ipsl.fr/thredds/dodsC/EUREC4A/
. You can check if the link is correct and the data correctly formatted by opening it e.g. directly withxarray.open_dataset()
or Panoply by opening aRemote Dataset
.A sub-catalog reference can be created with
sources: <instrument/model_name>: args: path: "{{CATALOG_DIR}}/<instrument/model_name>.yaml" description: '<Instrument/model description>' driver: yaml_file_cat metadata: {}
Please also check out entries already present in the EUREC\(^4\)A-Intake catalog.
Finally those changes need to be staged and committed.
git add -p
git commit -m "<Adding my_new_dataset>"
Push branch to GitHub
git push --set-upstream origin <my_new_dataset>
Create pull request on GitHub A pull request can be started on the GitHub webpage. After the pull request has been submitted, the review process will start. To accelerate the process, please make sure all tests for your pull request succeed. The status of the tests are shown at the bottom of your pull request.
via GitHub web interface#
Visit EUREC\(^4\)A-Intake catalog repository Go to eurec4a/eurec4a-intake
Login with your GitHub credentials
Select the platform/simulation/product for which a new dataset entry shall be added
Edit
main.yaml
and add a reference to the dataset if it is simple and does not contain different subsets (e.g. resolutions, frequencies, sensors, dimensions):sources: <dataset_name>: description: <short description of the dataset> driver: opendap args: auth: null urlpath: <url_to_dataset> chunks: {} engine: netcdf4
(see 3. of via command line for more details
Save changes and create pull request
Check if the automatic tests that are run on your edits succeed. You can access the tests e.g. via the Action Tab ( ).
A reviewer will look at the pull request and will discuss any additional steps necessary to merge the changes to the main repository.