The creation of large data repositories has been a key historical indicator of social and intellectual development—and indeed perhaps one of the defining characteristics of the whole progress of civilization.

And through our work on Wolfram|Alpha—with its insatiable appetite for systematic data—we have gained a uniquely broad view of the many great data repositories that exist in the world today.

Some of these repositories are maintained by national or international agencies, some by companies and other organizations, and some by individuals. A few of the repositories are quite new, but many date back 40 or more years, and some well over a century. But there is one thing in common across essentially every great data repository: a core of diligent and committed people who have carefully shepherded its development.

Curiously, though, few of these people have ever met their counterparts in other domains of data. And in our work on Wolfram|Alpha we are almost certainly the first group ever to have had the pleasure of getting to know such a broad range of leaders of great data repositories.

And one of the things that we have discovered is that there is much in common in both the methods used and the issues faced by these data repositories. So as part of our contribution to the worldwide data community we have decided to sponsor a data summit to bring together for the first time the leaders of today’s great data repositories.

The Wolfram Data Summit 2010 will be held in Washington, DC on September 9–10.

We have invited leaders of data repositories in all areas—socioeconomic, scientific, financial, medical, geographic, commercial, lexicographic, cultural, biographical, mathematical, and others. And we already know that many data repositories will be represented, including for example the BBC, Bowker, CABI, CDC, comScore, CRC, DataONE, Encyclopedia of Life, FBI, Federal Reserve Bank, Gale, IMF, Internet Archive, Moody’s, NASA, NCBI, NIST, NREL, NSF, U.S. Office of Management and Budget, Open Library, OpenStreetMap, ProQuest, Protein Data Bank, Smithsonian Institution, Sunlight Foundation, Thomson Reuters, UNESCO, UNICEF, US Census, US Department of Transportation, US Department of Education, World Bank, and World Conservation Monitoring Centre—as well as many others.

There is quite a lot to discuss at the Data Summit. Experiences and best practices in data curation. How data should be combined, validated, and standardized. How things from automated sensors to crowdsourcing affect data collection. How governmental and organizational data policies are and should be evolving. What can be done with data that is not yet in digital form. How privacy and commercial issues affect data dissemination. And much, much more.

This is a unique time in the history of data: as scientific and analytical methods become more and more prominent and successful in the world at large, so larger and larger numbers of important decisions are being made on the basis of data, by both organizations and individuals. And as computers, the web, and now mobile devices have become ubiquitous, data can be disseminated vastly more widely than ever before.

It is a difficult matter, though, to do this in a way that is immediately useful to a broad range of people. And that is part of what we are trying to achieve by making knowledge—and data—computable in Wolfram|Alpha.

And in fact, in doing this, we see something else too: that if data can be made uniformly computable, it routinely becomes possible to derive completely new facts and knowledge by combining very different kinds of data—thereby generating vastly more value than could be obtained from any data repository on its own.

It is truly impressive how much data has been carefully collected and organized over the course of many years in the world’s great data repositories. And today this data is poised to become dramatically more relevant and significant in the daily lives of people around the world.

Our hope is that our Data Summit this September will help highlight the great achievements of the worldwide data community to date, and will serve as a catalyst in the next phase of the community’s development.

I myself have been a lifelong enthusiast of systematic data—as well as being directly responsible over the course of several decades for the collection of large amounts of mathematical and computational data. For me, the great data repositories are wonders of the modern world—pure yet tangible instantiations of what our civilization has achieved in many different areas.

And I look forward to the progress that we can make with our Data Summit this September—as well as to hearing all those fascinating tales from the front lines of the world of data.

Note: This year, the Data Summit is a free invitation-only event, but we are keen for all relevant individuals to attend, so we encourage applications for invitations from qualified people.

10 Comments

Excellent idea! Your site is amazing, and I have been pleased to spread the word among friends and family. I wish you success with the data summit and your continuing work on wolframalpha.com. I believe your efforts are beginning to produce what many people hoped the web could do when it first became a viable network experience.

Posted by Gerald McDaniel June 7, 2010 at 12:28 pm Reply

Great idea! This site is a great opportunity for people to delve deeper into simplememnte knowledge or learn something new day by day and this event is great because it involves many people in this. I like a lot of these things.

Congratulations!

Posted by Francisco Gonzalez June 8, 2010 at 4:04 pm Reply

This is a wonderful idea. During my career, I have had many occasions to use data either within multiple departments of the same country, or datasets from multiple countries. These groups are often reluctant to share knowledge amongst providers, so this can be a major step forward for data sharing. Is there any chance that groups outside of the invitee list, for example the volunteer curators, to watch presentations over the Internet? Thanks.

Posted by Seth Greenblatt, Ph.D. June 8, 2010 at 6:01 pm Reply

    Hi Seth,

    We do anticipate live streaming portions of the Wolfram Data Summit. Please continue to visit the blog for event details and updates. Thank you!

    Posted by The PR Team June 11, 2010 at 8:10 am Reply

WA is certainly amazing.

One has to be curious about the possibility how the curated data may in some fashion set various aspects of our reality in stone. Many of the issues in our world stem as a direct result of both too much and too little data, how the data is used, who uses it, how it is biased or the biases of the person interpreting the data. How these data points are interpreted by individuals or organizations has a dramatic impact upon the multiplicity of decisions within our reality.

Not only is the parallel glut/lack of data an issue but also the inherent complexity of all of the supporting infrastructure, communities, organizations, technologies, psychologies, systems, nationalities, etc that support the data and its delivery.

What ability do errors in data sets have to shape the future when they are permuted, rehashed, or reinjected into existing or new data sets, policy decisions, or consciousness as a whole?

It is my belief that we already have numerous AI systems interoperating between google, wikipedia, all major supercomputing facilities, nasa, jpl, stock markets, etc. If not, it can certainly be conceived. With enough data one is likely to be able to predict or affect the behavior of the entire planet or any market available to man. I hope that it will be put to good use for the benefit of all.

Posted by Seann Dorand June 8, 2010 at 6:42 pm Reply

this is a true service to the whole human kind in real sense… and i hope all of us r ready for this and this isn’t misused… The best part about the whole concept is “And in fact, in doing this, we see something else too: that if data can be made uniformly computable, it routinely becomes possible to derive completely new facts and knowledge by combining very different kinds of data—thereby generating vastly more value than could be obtained from any data repository on its own.” – Man we might be solving solving so many unsolved mysteries…
gr8 goin… this project is gonna be a part of the history, present and future of the human race… cheers!!!

Posted by Vineet Shah June 11, 2010 at 12:22 am Reply

Wolfram|Alpha,
I really enjoy the information provided by W|A, especially the physics information. However, I have noticed one thing. W|A gives “mass neutrino” as 0, but a recent observation (at the OPERA experiment in collaboration with CERN) of neutrino oscillation shows that, according to what we know, neutrinos must have mass — otherwise they couldn’t oscillate. The current consensus is that neutrino mass is small, nonzero, and undetermined, but there are upper limits to its mass. I greatly admire W|A’s commitment to up-to-date information, and I knew the team would not want this to go un-updated, so I thought I would bring it to your attention.
Thanks so much for all your hard work!

Posted by CoolCat June 11, 2010 at 2:10 pm Reply

    (1×10^-3 to 2) eV/c^2

    Posted by twowolves September 9, 2010 at 3:26 pm Reply

I love the graphic for the “History of Systematic Data and the Development of Computable Knowledge.” I would pay good money for a poster version. Any chance of posters (or the image file) being made available?

Posted by Stephen Francoeur October 6, 2010 at 2:09 pm Reply

    Stephen, At this time, we’ve run out of posters. We’re in the process of making some edits to the poster. We’ll let you know when more posters are available. Thank you!

    Posted by The Wolfram|Alpha Team October 6, 2010 at 4:06 pm Reply
Leave a Comment

(required)

(will not be published) (required)

(your comment will be held for moderation)