Google AI Research Team recently released Ground SourceThe project addresses the lack of historical data for rapid-onset natural disasters. Project aims to address the issue of lack historical data in natural disasters with rapid onset. The first outcome is an Open-Source dataset of 2.6 Million historical urban flash floods across 150 countries.
Hydro-Meteorological Data Gap
For training and validating machine learning models, early warning systems require a large amount of historical data. Hydro-meteorological disasters like flash flooding lack global, standardized observation networks.
- Flash Floods and their Impact: The World Meteorological Organization, flash floods cause approximately 85% of flood-related fatalitiesOver 5,000 people die each year as a result of this disease.
- The limitations of existing data: Satellite-based database, like the Global Flood Database and Dartmouth Flood Observatory, are restricted by cloud coverage, revisiting times of the satellite, and an emphasis on long-lasting event.
- The scale of the deficit: Global Disaster Alert and Coordination System – GDACS – provides a list of approximately 10,000 major events. The volume of data is not sufficient to train global-scale prediction models.
Groundsource Methodology
Google’s Research team created a pipeline to process decades of localized reports in order to create a historical base.
- Semantic Parsing Using Gemini LLMs can be used to extract entities. It uses unstructured multilingual text for identifying specific hazards events and classifying their severity.
- Geospatial Mapping Google Maps is used to combine the textual descriptions of each flood event with Google Maps’ APIs. This allows for precise geo-coordinates and polygonal boundary boundaries to be associated with every incident.
This pipeline converts journalistic qualitative reporting into an extremely structured and machine-readable data set.
Flash Flood Forecasting
Google Flood Forecasting Initiative historically concentrated on riverine flooding which is easier to monitor and develops slowly. Because of the speed at which they occur, flash flooding requires a distinct approach to prediction.
The research team used the Groundsource dataset of 2.6 million records to train a new AI to predict the risk of urban flash flooding up to 24 hour in advance. Studies show that even 12 hours of warning can help reduce the damage caused by flash flooding. Google’s Flood Hub has now made these forecasts live. This dataset is open source to enable the wider data science community train their localized prediction models.
What you need to know
- LLM-Driven Data Pipeline: Groundsource is a semantic parsing tool that uses Gemini to extract historical data about disasters. It can be used with unstructured multilingual news articles.
- Massive Dataset Generating: This pipeline produced a dataset of 2.6 millions historical records on urban flash flooding in more than 150 different countries.
- The Sensors: This NLP-based approach addresses the historical ‘data desert,’ bypassing the physical constraints of remote sensing (such as cloud cover or satellite revisit times) and the limited volume of existing traditional databases like GDACS.
- Geospatial Inclusion: Google Maps APIs are used to combine the extracted natural language descriptions to pinpoint geographic coordinates, polygonal borders and each event.
- Predictive Model Deployment: This dataset has been used to create a model that can predict urban flash flooding risks 24 hours ahead of time. The Flood Hub platform at Google is currently using this AI model.
Check it Out Dataset, Pre-Print Paper You can also find out more about the following: Technical details. Also, feel free to follow us on Twitter Join our Facebook group! 120k+ ML SubReddit Subscribe now our Newsletter. Wait! Are you using Telegram? now you can join us on telegram as well.


