Astronomers Envision Linking World Data Archives
Virtual observatories could change the sociology of astronomy. But to bring them about, financial, political, technical, and sociological challenges must first be met.
The National Virtual Observatory is neither national, nor virtual, nor an observatory," says George Djorgovski, a Caltech astronomy professor who is involved in developing the NVO, one of a half dozen or so such projects worldwide. The emerging virtual observatories aim to handle the humongous datasets that are proliferating in astronomy and will have built-in software tools for people to mine and query data across archives. Eventually, the various projects will be knit together so that ground- and space-based astronomy archives from around the world are linked and accessible to all.
In its 2000 decadal survey, the US astronomy community named the NVO its highest priority in the $100 million-or-less category. Last November, NSF awarded the NVO $10 million over five years. At about the same time, the European Commission granted 5 million euros (roughly $4.5 million) over three years to the Astrophysical Virtual Observatory, a consortium in Europe. AstroGrid in the UK, an AVO partner, has another £5 million ($7.3 million). Smaller projects are also getting started in Australia, Germany, India, and Japan. To win more money and broaden support within the astronomy community, the teams have to prove the virtual observatory concept is a good investment and will lead to new science.
The first steps are to set rules for describing and manipulating both raw and processed data and to decide what sorts of computational software to incorporate. The virtual observatories will consist of hardware and software frameworks that can cope with both spectroscopic data and images in all wavelengths. The various projects will develop mutually compatible systems, says AVO leader Peter Quinn of the European Southern Observatory (ESO). "Everybody realizes that it's a global virtual observatory. It's not in any way, shape, or form bounded by boundaries."
"The virtual observatory concept is a very ambitious project and a great challenge to astrophysicists, mathematicians, statisticians, and specialists in all areas of computer science to work together," says Wolfgang Voges of the Max Planck Institute for Extraterrestrial Physics in Garching. Voges heads up efforts for the German Astrophysical Virtual Observatory, which is intended to support astronomy research in Germany and coordinate that country's participation in creating a global virtual observatory. "There will be a fresh wind blowing through the graveyard of old and unused data. And there should be no newly formed graveyards in the future if we succeed in finding a globally accepted data format," he says.
Data is doubling every year or two in astronomy. But, says Djorgovski, "our understanding of the universe doesn't double at the same rate. There is a bottleneck somewhere. The old ways of dealing with data don't work anymore."
"Even though the data is public, it's hard to get to it," says Alex Szalay of Johns Hopkins University, who oversees the Sloan Digital Sky Survey archive and is one of the leaders of the NVO, whose participants represent 17 institutions. "Every set of data has quirks. It's there. It's stored. It's archived. But it's not friendly. We can't shove it around at will. It's too much. We have to do analysis closer to the data." With the virtual observatories, he adds, researchers could search and analyze data from their desktops without having to download unwieldy datasets, which would stay in their home archives.
Sloan, currently the most prolific astronomy experiment, will produce 40 terabytes of data in five years as it maps a quarter of the sky and measures distances to more than a million galaxies and quasars. That figure will be dwarfed by future surveys. The Large Synoptic Survey Telescope, for example, is slated to make a movie of the visible sky every week starting in 2008, racking up about 5 terabytes each night. Says Szalay, "It's the CCD [charge-coupled device] cameras. It's the pixels. The data avalanche is not because telescopes are bigger."
When it comes to data avalanches, astronomy is not alone: The problem is shared by high-energy physics, genomics, neuroscience, and geophysics, to name a few (see the article on page 42). "There is synergy between us and the Grid crowd," says Szalay. "But there are differences. In particle physics, they tend to analyze individual events one by one, they don't want to cross-correlate, whereas in astronomy, a hierarchical data flow is not possible because different telescopes take different data in different formats."
Grid technology is just one of the developments paving the way for virtual observatories. In astronomy, unlike most fields, data headers have been standardized for more than two decades; space missions, at least, keep excellent, searchable records; and many archive centers link data from multiple experiments. And, last August, using Astrovirtel, a fledgling European virtual observatory and precursor to the AVO, scientists determined that a newly discovered asteroid, 2001 KX76, has a diameter of about 1200 km, making it the largest asteroid yet spotted in the Solar System.
"I think the global virtual observatory is a fantastic idea," says UK Astronomer Royal Martin Rees of the University of Cambridge. "More than 99% of rare events--gamma-ray bursts, stars swallowed by black holes, and all different kinds of variable stars--are completely missed now."
The data glut is not the only driver for the virtual observatories. Statistical studies and correlations across wavelengths are both on the rise, and both stand to gain from combing through and comparing data. For example, only after astronomers took pictures in the infrared did they realize that most spiral galaxies have arms. "Our physical intuition about how these galaxies were made was influenced by the wavelengths of the images we took," says Quinn. "The universe doesn't know about our arbitrary way of dividing things by wavelength," adds Djorgovski. "That's the exciting part for me. This IT [information technology] revolution is enabling us to tackle the sky in a whole new organized way. It will enable science we couldn't do otherwise."
Not all astronomers embrace the idea of virtual observatories, however. Common concerns--even among supporters--include that too much money will be sunk into software; that archives at ground-based observatories should be brought up to snuff before thinking about linking them; and that users won't be able to trust the processed data they call up. Some people also worry that funders might view the virtual observatories as a replacement for actual facilities and that the know-how needed to use telescopes will disappear.
One of the more outspoken critics is Bob Fosbury of the European Space Agency--itself an AVO member--who says, "a virtual observatory is a bad move for Europe. The danger here is that if the European AVO is funded and the observing projects are not, Europe ends up supporting the derivative proposals while others do the cutting edge science. Archive science is very valuable for supporting a scientific investigation. But archive science takes the eye off the ball. The ball is doing a primary investigation." What's more, says Fosbury, "we are likely to breed a generation of young people who sift through data without knowing about instruments. That's dangerous."
"The whole idea behind a national virtual observatory is to tie in different datasets," says Anne Kinney, who, as head of NASA's astronomy and physics division, will be key in obtaining funding for the NVO from that agency. "NASA says it's good that NSF is giving the first money, because they are behind on archiving," she says, referring to the National Optical Astronomical Observatory (NOAO) and most other NSF-funded ground-based facilities. "I think NSF should go back a step and give money for archiving--do right by science."
The astronomy community is split over archiving. "People give it lip service," says John Huchra of the Harvard-Smithsonian Center for Astrophysics. "But when it comes to choosing between archiving and a new instrument, they're not willing to pay for archiving." The notable exception to the typically lousy records--often just backup tapes--at ground-based telescopes is ESO's Very Large Telescope in Chile, which allots time for calibration and keeps data records like a space-based experiment. The National Radio Astronomy Observatory is stepping up its archiving, and NOAO is just getting started; both are partners in the NVO. Meanwhile, at private telescopes such as the Kecks, the money problem is compounded by a culture of keeping data under wraps. Indeed, the emergence of virtual observatories is lending urgency to ongoing discussions about proprietary data (see Physics Today, November 1998, page 52).
As for controlling data quality, Szalay says, "we've spent days and days already arguing about this. We don't want to be a police force." Adds Djorgovski, "The free market idea is the way it works. If some dataset is really no good, it will quickly become known. People don't operate in a vacuum. Part of the skill is to make judgments."
"The NVO will entirely change the sociology of astronomy," predicts Szalay. Among the changes he expects are a blurring of the now separate cultures of radio, optical, x-ray, and gamma-ray astronomers and a democratization of astronomy, in that researchers from poorly funded colleges or countries would have easier access to fresh data. The virtual observatory, adds NOAO Associate Director for Science Stephen Strom, "will lead to two extreme changes. There will be more teamwork to create surveys and use them. Ironically, I also think it will change the way an individual looks at the world, because you can really pose questions at your desktop." The virtual observatory is essential, adds Strom, "but it will come about piecewise, and not without struggle, until people begin to clamor for making everything work together."
Virtual observatory enthusiasts want to start now with datasets that are already in good shape, and they hope that others will want to make their data compliant when they see the power of linking and querying archives. "The biggest challenge is psychological--getting people to want to join," says Caltech's Roy Williams, computer systems architect for the NVO. "You can think of it as leveraging previous and future science investments."
In two years, says Williams, "I hope the NVO will be at a stage where 10 to 20 collections are federated, and that one might have a dozen or so interesting projects going. In five years, the framework will have been used enough, with enough different datasets and computing tools, that it will start to be part of the general infrastructure for astronomers. And in 10 years, nobody will mention it, people will take it for granted."
© 2002 American Institute of Physics