Google AI Launches Groundsource: New Methodology That Uses Gemini Model To Transform Unstructured News Into Actionable Historical Data

Google AI Research Team recently released Ground SourceThe project addresses the lack of historical data for rapid-onset natural disasters. Project aims to address the issue of lack historical data in natural disasters with rapid onset. The first outcome is an Open-Source dataset of 2.6 Million historical urban flash floods across 150 countries.

Hydro-Meteorological Data Gap

For training and validating machine learning models, early warning systems require a large amount of historical data. Hydro-meteorological disasters like flash flooding lack global, standardized observation networks.

Flash Floods and their Impact: The World Meteorological Organization, flash floods cause approximately 85% of flood-related fatalitiesOver 5,000 people die each year as a result of this disease.
The limitations of existing data: Satellite-based database, like the Global Flood Database and Dartmouth Flood Observatory, are restricted by cloud coverage, revisiting times of the satellite, and an emphasis on long-lasting event.
The scale of the deficit: Global Disaster Alert and Coordination System – GDACS – provides a list of approximately 10,000 major events. The volume of data is not sufficient to train global-scale prediction models.

Groundsource Methodology

Google’s Research team created a pipeline to process decades of localized reports in order to create a historical base.

Semantic Parsing Using Gemini LLMs can be used to extract entities. It uses unstructured multilingual text for identifying specific hazards events and classifying their severity.
Geospatial Mapping Google Maps is used to combine the textual descriptions of each flood event with Google Maps’ APIs. This allows for precise geo-coordinates and polygonal boundary boundaries to be associated with every incident.

This pipeline converts journalistic qualitative reporting into an extremely structured and machine-readable data set.

https://research.google/blog/introducing-groundsource-turning-news-reports-into-data-with-gemini/

Flash Flood Forecasting

Google Flood Forecasting Initiative historically concentrated on riverine flooding which is easier to monitor and develops slowly. Because of the speed at which they occur, flash flooding requires a distinct approach to prediction.

The research team used the Groundsource dataset of 2.6 million records to train a new AI to predict the risk of urban flash flooding up to 24 hour in advance. Studies show that even 12 hours of warning can help reduce the damage caused by flash flooding. Google’s Flood Hub has now made these forecasts live. This dataset is open source to enable the wider data science community train their localized prediction models.

What you need to know

LLM-Driven Data Pipeline: Groundsource is a semantic parsing tool that uses Gemini to extract historical data about disasters. It can be used with unstructured multilingual news articles.
Massive Dataset Generating: This pipeline produced a dataset of 2.6 millions historical records on urban flash flooding in more than 150 different countries.
The Sensors: This NLP-based approach addresses the historical ‘data desert,’ bypassing the physical constraints of remote sensing (such as cloud cover or satellite revisit times) and the limited volume of existing traditional databases like GDACS.
Geospatial Inclusion: Google Maps APIs are used to combine the extracted natural language descriptions to pinpoint geographic coordinates, polygonal borders and each event.
Predictive Model Deployment: This dataset has been used to create a model that can predict urban flash flooding risks 24 hours ahead of time. The Flood Hub platform at Google is currently using this AI model.

Check it Out Dataset, Pre-Print Paper You can also find out more about the following: Technical details. Also, feel free to follow us on Twitter Join our Facebook group! 120k+ ML SubReddit Subscribe now our Newsletter. Wait! Are you using Telegram? now you can join us on telegram as well.

Michal Sutter, a data scientist with a master’s degree in data science from the University of Padova is an expert. Michal Sutter excels in transforming large datasets to actionable insight. He has a strong foundation in statistics, machine learning and data engineering.

Google AI Launches Groundsource: New Methodology That Uses Gemini Model To Transform Unstructured News Into Actionable Historical Data

Anthropic releases Claude Opus 4.7, a major upgrade for agentic coding, high-resolution vision, and long-horizon autonomous tasks

The Coding Guide to Property Based Testing with Hypothesis and Stateful, Differential and Metamorphic Test Designs

Google AI Releases Google Auto-Diagnosis: A Large Language Model LLM Based System to Diagnose Integrity Test Failures At Scale

This is a complete guide to running OpenAI’s GPT-OSS open-weight models using advanced inference workflows.

GPT-4o Tells Jokes about AI • AI Blog

Biden Administration Report on AI Safety Unpublished

What do large language models dream about AI agents?

Marissa Mayer dissolves her Sunshine Startup Lab

Nvidia is planning to launch an open-source AI agent platform

Top Insights

Deepteam: How to test an OpenAI model against single-turn adversarial attacks

Memory-R1 is a Reinforcement learning memory agent that supercharges LLM.

Latest News

Anthropic releases Claude Opus 4.7, a major upgrade for agentic coding, high-resolution vision, and long-horizon autonomous tasks

The Coding Guide to Property Based Testing with Hypothesis and Stateful, Differential and Metamorphic Test Designs

Google AI Launches Groundsource: New Methodology That Uses Gemini Model To Transform Unstructured News Into Actionable Historical Data

Hydro-Meteorological Data Gap

Groundsource Methodology

Flash Flood Forecasting

What you need to know

Related Posts