"Early Experience Prototyping a Science Data Service for Environmental Data" Deb Agarwal, Berkeley Water Center
and Catharine van
Ingen, Microsoft
12:00 p.m. on Wednesday, September 20 in 540Cory Hall, UC Berkeley. Part of the
CITRIS Research Exchange at UC Berkeley. The complete schedule for the fall semester is online at RE-fall2006.
Abstract: Recognition of the importance of data access as a necessary pre-requisite
to scientific analysis has sparked development of data archives incorporating
data from a variety of sources. This trend has dramatically improved the
availability of data and completeness of data sets in many scientific
disciplines. This data when combined with locally collected field observations
including sensor data and model results has the potential to enable new science
analyses. At the same time, there is an increasing desire to do science at
scales larger than a single site or watershed and over times measured in years
rather than seasons.
At the Berkeley Water Center, we are using data from the Oak Ridge National
Laboratory Ameriflux carbon flux measurement towers to develop and prototype a
new server for use by a collaborating group to jointly analyze data across
sites. Working with data and metadata from the Ameriflux data repository, we
are developing a scientific data server. This prototype server provides a
framework to allow easy data download, quality checking, cleaning, and storage.
The server also includes scientifically important metadata such as site biome or
climate along with the actual data. The prototype is designed to allow data from
other related data sets to be included as needed.
Our goal is to facilitate scientific investigations and enable
serendipitous science: a carbon researcher should be able to very simply mine
the data to explore temporal or spatial data correlation between measurements
and across sites. We expect to integrate at some of the routine data processing
steps and calculations that are often done repeatedly and manually by each
investigator using the same data set. We expect to connect the results to
visualization tools that are already commonly used by this community. Our intent
is to reduce the barrier currently faced by these scientists when analyzing
AmeriFlux data, without forcing familiar desktop analysis tools to be
abandoned.
This work is joint effort by the
BerkeleyWaterCenter and Microsoft Research
Speaker Bio: Deb Agarwal
Deb Agarwal is a researcher with the Berkeley Water Center and is
Distributed Systems Department Head at the Lawrence Berkeley National
Laboratory, where she has worked since 1994. Her current projects involve
research, development and deployment of computing technologies to support
collaborative scientific research in a variety of domains, including providing
appropriate controls for securing and sharing access to information and
computational resources. Dr. Agarwal holds a Ph.D. in electrical and computer
engineering from UC Santa Barbara and a B.S. in Mechanical Engineering from
Purdue. Further details available at <http://dsd.lbl.gov/~deba/>
Speaker Bio: Catharine van Ingen Catharine van Ingen is an architect in the Microsoft
Research Silicon Valley E-Science group. Her research focus is the application
of commercial data management technologies to enable new insights in
environmental science by cooperating scientists. She has been with Microsoft
since 1997. Dr. van Ingen holds a PhD in Civil and Environmental Engineering
from the California Institute of Technology, an M.S. in Civil Engineering from
UC Berkeley, and a B.S. in Civil and Environmental Engineering from UC Irvine.
Her home page is http://research.microsoft.com/~vaningen/.