This website curates some of the currently existing benchmark datasets for machine learning for natural disaster management. We add a brief description of the dataset, including the machine learning task it is benchmarked for and the application of it. You can use our search bar to look for datasets, or contact us to add your own benchmark dataset to this curation.
DroughtED is a dataset for drought forecasting, and introduces this problem as multiclass ordinal classification. It contains 180 daily meteorological observations with geospatial location meta-data for 3,108 US counties..
SpaceNet 8 is flood-disaster dataset for building detection, road network extraction and flood detection. It covers 850km^2, including 32k buildings and 1,300km roads. It is introduced for multi-class segmentation and binary classification.
xBD is a general-disaster dataset for change detection and damage assessment of buildings. It contains 850,736 building polygons across 22,068 images and 45,361.79 km^2. It is introduced for multiclass (ordinal) classification.
Hephaestus is a volcano-disaster dataset for semantic segmentation of ground deformation. It contains 19,919 labeled and 110,573 unlabeled Sentinel-1 interferograms. It is introduced for binary classification.
FIgLib is a dataset for real-time wildfire smoke detection. It contains 24,800 labeled wildfire smoke images of Southern California and is introduced for binary classification.
EarthNet2021 is a dataset for Earth surface forecasting, extreme summer prediction, and seasonal cycle prediction. It contains more than 32,000 samples containing Sentinel 2 level 2A imagery and daily climatic conditions. This dataset is introduced for video prediction.
CrisisBench is general-disaster dataset for informativeness detection and categorization of humanitarian tasks post-disaster. It contains 166,098 tweets for informativeness and 141,533 tweets for humanitarian classification. It is introduced for binary and multiclass classification.
Next Day Wildfire Spread is a dataset for wildfire spread prediction. It contains images for 18,545 fire events with snapshots at time t and t+1 day. It is introduced for image segmentation.
RescueNet is a dataset for hurricane damage assessmnet. It contains 4,494 post disaster images collected after Hurricane Michael and is introduced for semantic segmentation.
ISBDA (Instance Segmentation in Building Damage Assessment) is a dataset for hurricane and tornado building damage assessment. It contains 1,030 images from 10 videos of disaster aftermaths (84 min total duration) and 2,961 damaged part instances. It is introduced for image segmentation and multiclass (ordinal) classification.
HumAID is a general-disaster dataset for categratization of humanitarian tasks post disaster. It contains 77,196 annotated tweets and is introduced for multiclass classification.
FEMA and NOAA is a dataset for hurricane damaged building detection. It contains vector (FEMA) and image (NOAA) data and is introduced for image segmentation and multiclass (ordinal) classification.
FloodNet is dataset for post-flood scene understanding, i.e flood detection and distinguishing different water bodies and flood. It contains about 11, 000 question-image pairs for VQA and 3,200 images. It is introduced for image classification, semantic segmentation, and visual question answering.
Incidents Dataset is dataset for general disaster detection. It contains 1,144,148 images (with 446,684 images as positives) and is introduced for multiclass classification.
Incidents1M is an extention of the Incidents Dataset for general disaster detection. It contains 1,787,154 images and is introduced for multiclass multilabel classification.
Video Dataset of Incidents (VIDI) is a general-disaster dataset for video incident classification. It contains 4,534 video clips of 43 incident categories and is introduced for multiclass classification.
Disaster Image Retrieval form Social Media (DIRSM) and Flood Detection in Satellite Images (FDSI) are datasets for flood detection. DIRSM contains 6,600 Flickr images and FDSI's development set contains 462 image patches. They are both introduced for binary classification.
Flood Classification from Social Multimedia (FCSM) and Flood Detection from Satellite Imagery (FDSI) are datasets for road passability assessment, flood detection and flood classification. FCSM contains 7,387 tweets (development set), 3,683 images and features (test set) and FDSI contains 1,438 image patches (development set), 226 image patches (test set). They are both introduced for multilabel classification.
MEDIC is a general-disaster dataset for disaster type detection, informativeness classification, categorization of humanitarian tasks, and damage severity assessment. It contains 71,198 images and is introduced for multitask learning.
ClimateNet Dataset is a dataset for atmospheric river and tropical cyclone damage assessment. It contains 219 image samples, and is introduced for semantic segmentation and multilabel classification.
This is an image dataset for flood extent detection. It is a 4.11 GB download and is introdcued for image segmentation.
Image-sentiment dataset is a general-disaster dataset for sentiment analysis. It contains 4,003 annotated disaster related images and is introduced for multiclass multilabel classification.
CrisisMMD is a general-disaster dataset for informativeness classification, categorization of humanitarian tasks, and damage severity assessment. It contains 18,082 images and 16,058 tweets and is introduced for binary classification, multiclass classification and multiclass (ordinal) classification.
This is a general-disaster dataset for damage severity assessment. It contains around 25,000 images of natural disasters from social media and Google Images and is introduced for multiclass (ordinal) classification.
This is a general-disaster dataset for damage assessment. It contains 10,875 images, 5,879 captioned images, and 19,031 textual data. It is introduced for multiclass classification.
This dataset is a combination of AIDR-DT and DMD data for general-disaster type detection . It contains 17,511 images and utilizes multiclass classification.
This dataset is a combination of DAD, CrisisMMD, AIDR-Info and DMD data for general-disaster informativeness classification . It contains 59,716 images and is introduced for binary classification.
This dataset is a combination of CrisisMMD and DMD data for general-disaster categorization of humanitarian tasks . It contains 16,769 images and is introduced for multiclass classification.
This dataset is a combination of DAD, CrisisMMD and DMD data for general-disaster damage severity assessment . It contains 34,896 images and is introduced for multiclass (ordinal) classification.