Climate datasets are generally categorised based on the data source in one of these three groups: Observations, Model output and Reanalysis. Similarly, the use of AI/ML is also gaining grounds in recent years.

We have summarized information for each of these categories, including links to different datasets, important terms and other resources. You can find such information by following the links further below.

Please note -- The information provided in this website is collected through voluntary efforts on the part of researchers like you. To suggest edits to any of the pages, simply do so on the googledoc listed at the bottom of each page. If you cannot access googledocs and would still like to contribute, please reach out to us over email with your suggestions/contributions. 

https://docs.google.com/document/d/1yOchECe3Iz9l56mmMz5Uc7GZ88-iwXPRoSc2UTTq76c/edit?usp=sharing 

Climate Models

Simulated data from climate models or Earth System Models (ESMs). Also called Projections, as they often ‘project’ a possible future scenario or ensemble of possible future weather conditions consistent with a possible future scenario. Use this when you’re projecting future climate scenarios or experimenting with “what if” questions (like how the climate might respond to increased greenhouse gases). Models simulate processes and help predict long-term changes, but they rely on assumptions and approximations, so they need validation against observations. There are many details of day-to-day weather that are not expected to be matched by climate models, instead climate models are designed to give “typical” weather conditions that might occur.

Observations

Real-world measurements. Use this when you want to study real-world, past, and present climate conditions. It’s the most direct and accurate representation of what’s happening but always has gaps, especially in remote areas or over long time scales. Observations always have a starting date (before which they were not collected), and often observational platforms or sensors are replaced after a certain duration of operation. Often, climate observations are carefully calibrated so that our record of essential climate variables continues beyond a specific technology. Sometimes observations are of related changes, or proxies, for essential climate variables, this is especially true of paleoclimate data which is used to reconstruct climate before humans started measuring it! Observations are also used for validating models and understanding current trends.

Reanalysis

A combination of observational data and model output. Reanalysis takes observations and feeds them into a climate model, using the model’s physics to fill in gaps and create a consistent, gridded dataset over time. Done over historical periods, or ‘near real-time’ (as close to current as technically possible). It’s especially useful for studying long-term trends and climate dynamics when observations are incomplete. Use this when you need a complete, consistent historical record of climate variables over time and space. Closely related to reanalyses are initialized predictions, similar to weather predictions, where today’s observed state is used to anticipate what is to come tomorrow (or next year or next decade).

Artificial Intelligence and Machine Learning

These tools take advantage of computer-aided interpretation of Observations, Climate Models, and Reanalysis to perform related or new tasks. Surrogate emulators produce output designed to mimic that of climate models, but they are faster and more specialized than full Earth System Models. Some emulators may focus on extreme events, or global mean quantities, or just one sphere of the earth system (e.g., atmosphere, ocean, land, ice sheets, etc). The development of these tools often requires specially benchmarked or AI-ready datasets. Some AI tools are able to interpret natural language, and so can help in addressing climate questions in ordinary language and editing scientific documents.