My Image

Benchmark Datasets for Machine Learning for Natural Disasters

This website curates some of the currently existing benchmark datasets for machine learning for natural disaster management. We add a brief description of the dataset, including the machine learning task it is benchmarked for and the application of it. You can use our search bar to look for datasets, or contact us to add your own benchmark dataset to this curation.

DroughtED: A dataset and methodology for drought forecasting spanning multiple climate zones

Numerical | Drought | Prevention

DroughtED is a dataset for drought forecasting, and introduces this problem as multiclass ordinal classification. It contains 180 daily meteorological observations with geospatial location meta-data for 3,108 US counties..


SpaceNet 8 - The Detection of Flooded Roads and Buildings

Image | Flood | Preparedness

SpaceNet 8 is flood-disaster dataset for building detection, road network extraction and flood detection. It covers 850km^2, including 32k buildings and 1,300km roads. It is introduced for multi-class segmentation and binary classification.


Creating xBD: A dataset for assessing building damage from satellite imagery

Image | General | Response

xBD is a general-disaster dataset for change detection and damage assessment of buildings. It contains 850,736 building polygons across 22,068 images and 45,361.79 km^2. It is introduced for multiclass (ordinal) classification.


Hephaestus: A large scale multitask dataset towards InSAR understanding

Image | Volcano | Preparedness

Hephaestus is a volcano-disaster dataset for semantic segmentation of ground deformation. It contains 19,919 labeled and 110,573 unlabeled Sentinel-1 interferograms. It is introduced for binary classification.


FIgLib & SmokeyNet: Dataset and Deep Learning Model for Real-Time Wildland Fire Smoke Detection

Image | Wildfire | Preparedness

FIgLib is a dataset for real-time wildfire smoke detection. It contains 24,800 labeled wildfire smoke images of Southern California and is introduced for binary classification.


EarthNet2021: A novel large-scale dataset and challenge for forecasting localized climate impacts

Video | Etreme Weather | Prevention

EarthNet2021 is a dataset for Earth surface forecasting, extreme summer prediction, and seasonal cycle prediction. It contains more than 32,000 samples containing Sentinel 2 level 2A imagery and daily climatic conditions. This dataset is introduced for video prediction.


CrisisBench: Benchmarking Crisis-related Social Media Datasets for Humanitarian Information Processing

Text | General | Response

CrisisBench is general-disaster dataset for informativeness detection and categorization of humanitarian tasks post-disaster. It contains 166,098 tweets for informativeness and 141,533 tweets for humanitarian classification. It is introduced for binary and multiclass classification.


Next Day Wildfire Spread: A Machine Learning Data Set to Predict Wildfire Spreading from Remote-Sensing Data

Image | Wildfire | Preparedness

Next Day Wildfire Spread is a dataset for wildfire spread prediction. It contains images for 18,545 fire events with snapshots at time t and t+1 day. It is introduced for image segmentation.


RescueNet: A High Resolution UAV Semantic Segmentation Benchmark Dataset for Natural Disaster Damage Assessment

Image | Hurricane | Response

RescueNet is a dataset for hurricane damage assessmnet. It contains 4,494 post disaster images collected after Hurricane Michael and is introduced for semantic segmentation.


MSNet: A Multilevel Instance Segmentation Network for Natural Disaster Damage Assessment in Aerial Videos

Video | Hurricane and Tornado | Response

ISBDA (Instance Segmentation in Building Damage Assessment) is a dataset for hurricane and tornado building damage assessment. It contains 1,030 images from 10 videos of disaster aftermaths (84 min total duration) and 2,961 damaged part instances. It is introduced for image segmentation and multiclass (ordinal) classification.


HumAID: Human-Annotated Disaster Incidents Data from Twitter with Deep Learning Benchmarks

Text | General | Response

HumAID is a general-disaster dataset for categratization of humanitarian tasks post disaster. It contains 77,196 annotated tweets and is introduced for multiclass classification.


Benchmark Dataset for Automatic Damaged Building Detection from Post-Hurricane Remotely Sensed Imagery

Multimodal (Vector data and Image) | Hurricane | Response

FEMA and NOAA is a dataset for hurricane damaged building detection. It contains vector (FEMA) and image (NOAA) data and is introduced for image segmentation and multiclass (ordinal) classification.


FloodNet: A High Resolution Aerial Imagery Dataset for Post Flood Scene Understanding

Image | Hurricane | Response

FloodNet is dataset for post-flood scene understanding, i.e flood detection and distinguishing different water bodies and flood. It contains about 11, 000 question-image pairs for VQA and 3,200 images. It is introduced for image classification, semantic segmentation, and visual question answering.


Detecting Natural Disasters, Damage, and Incidents in the Wild

Image | General | Preparedness

Incidents Dataset is dataset for general disaster detection. It contains 1,144,148 images (with 446,684 images as positives) and is introduced for multiclass classification.


Incidents1M: a large-scale dataset of images with natural disasters, damage, and incidents

Image | General | Preparedness

Incidents1M is an extention of the Incidents Dataset for general disaster detection. It contains 1,787,154 images and is introduced for multiclass multilabel classification.


VIDI: A Video Dataset of Incidents

Video | General | Preparedness

Video Dataset of Incidents (VIDI) is a general-disaster dataset for video incident classification. It contains 4,534 video clips of 43 incident categories and is introduced for multiclass classification.


The Multimedia Satellite Task at MediaEval 2017: Emergency Response for Flooding Events

Image | Flood | Preparedness

Disaster Image Retrieval form Social Media (DIRSM) and Flood Detection in Satellite Images (FDSI) are datasets for flood detection. DIRSM contains 6,600 Flickr images and FDSI's development set contains 462 image patches. They are both introduced for binary classification.


The multimedia satellite task at mediaeval 2018: emergency response for flooding events

Image | Flood | Preparedness

Flood Classification from Social Multimedia (FCSM) and Flood Detection from Satellite Imagery (FDSI) are datasets for road passability assessment, flood detection and flood classification. FCSM contains 7,387 tweets (development set), 3,683 images and features (test set) and FDSI contains 1,438 image patches (development set), 226 image patches (test set). They are both introduced for multilabel classification.


MEDIC: A Multi-Task Learning Dataset for Disaster Image Classification

Image | General | Preparedness

MEDIC is a general-disaster dataset for disaster type detection, informativeness classification, categorization of humanitarian tasks, and damage severity assessment. It contains 71,198 images and is introduced for multitask learning.


ClimateNet: Bringing the power of Deep Learning to weather and climate sciences via open datasets and architectures

Image | Atmospheric River and Tropical Cyclones | Response

ClimateNet Dataset is a dataset for atmospheric river and tropical cyclone damage assessment. It contains 219 image samples, and is introduced for semantic segmentation and multilabel classification.


Curating flood extent data and leveraging citizen science for benchmarking machine learning solutions

Image | Flood | Preparedness

This is an image dataset for flood extent detection. It is a 4.11 GB download and is introdcued for image segmentation.


Visual Sentiment Analysis from Disaster Images in Social Media

Image | General | Recovery

Image-sentiment dataset is a general-disaster dataset for sentiment analysis. It contains 4,003 annotated disaster related images and is introduced for multiclass multilabel classification.


CrisisMMD: Multimodal Twitter Datasets from Natural Disasters

Multimodal (Image and Text) | General | Response

CrisisMMD is a general-disaster dataset for informativeness classification, categorization of humanitarian tasks, and damage severity assessment. It contains 18,082 images and 16,058 tweets and is introduced for binary classification, multiclass classification and multiclass (ordinal) classification.


Damage Assessment from Social Media Imagery Data During Disasters

Image | General | Response

This is a general-disaster dataset for damage severity assessment. It contains around 25,000 images of natural disasters from social media and Google Images and is introduced for multiclass (ordinal) classification.


Damage Identification in Social Media Posts using Multimodal Deep Learning

Multimodal (Image, Captioned Image, and Text) | General | Response

This is a general-disaster dataset for damage assessment. It contains 10,875 images, 5,879 captioned images, and 19,031 textual data. It is introduced for multiclass classification.


Deep Learning Benchmarks and Datasets for Social Media Image Classification for Disaster Response

This paper introduces four different datasets for four different ML tasks

Combination of AIDR-DT and DMD

Image | General | Preparedness

This dataset is a combination of AIDR-DT and DMD data for general-disaster type detection . It contains 17,511 images and utilizes multiclass classification.


Combination of DAD, CrisisMMD, AIDR-Info and DMD

Image | General | Response

This dataset is a combination of DAD, CrisisMMD, AIDR-Info and DMD data for general-disaster informativeness classification . It contains 59,716 images and is introduced for binary classification.


Combination of CrisisMMD and DMD

Image | General | Response

This dataset is a combination of CrisisMMD and DMD data for general-disaster categorization of humanitarian tasks . It contains 16,769 images and is introduced for multiclass classification.


Combination of DAD, CrisisMMD and DMD

Image | General | Response

This dataset is a combination of DAD, CrisisMMD and DMD data for general-disaster damage severity assessment . It contains 34,896 images and is introduced for multiclass (ordinal) classification.