AI/ML models use a range of datasets, including “AI-ready” and “AI/ML benchmark” datasets designed to facilitate model training and evaluation. Trained AI/ML models that emulate the Earth’s systems or climate models are often referred to as “emulators.”
What is an AI-ready dataset?
AI-ready datasets are designed to reduce the barriers and burden of data preparation for developers of AI/ML-based models and emulators, including non-domain experts, across a range of data modalities (e.g., language, in situ, time series).
- Resource: https://www.esipfed.org/checklist-ai-ready-data/
What is an AI/ML “benchmark” dataset?
A benchmark dataset is a dataset that AI/ML model developers and researchers use to compare model performance and to facilitate adoption within AI/ML frameworks. AI/ML benchmark datasets typically have clearly defined tasks and baseline results within the computer science communities. However, they may be more general in climate science (without baseline scores), given recent and rapid applications.
Listing of AI/ML climate benchmark (or AI-ready) datasets
- Climate change attribution, detection, and extremes
- Climate model emulation and subgrid parameterizations
- Downscaling and In Situ
- Multi-modal, language, or broader
What is an emulator?
These are AI/ML-based models that leverage Observations, Climate Models, Reanalysis, AI-ready, and/or AI/ML benchmark datasets to perform a range of tasks (e.g., prediction, downscaling, gap filling, etc.).
Types of emulators:
- Surrogate (model) emulators produce output designed to mimic that of climate models, but they are faster and more specialized than full Earth System Models. Some emulators may focus on just one sphere/component of the Earth system (e.g., atmosphere, ocean, land, ice sheets, etc).
- Process-based or observationally based emulators aim to represent Earth system processes, rather than emulating another physics-based or dynamically based model.
- Natural language models and emulators can help address climate questions in ordinary language (link to climate chatbot) and edit scientific documents
Other AI/ML based tools
CiteTrue - Free Citation Checker
"CiteTrue is an AI-powered citation verification tool and citation checker that helps researchers and students ensure their citations are authentic and accurate. Our free citation verifier searches through vast authoritative academic databases to verify and check citations, flagging any that appear to be fake or AI-generated. This citation verification tool provides comprehensive citation checking for academic integrity."