A Computational Knowledge Engine for the Web
In a nutshell, Wolfram and his team have built what he calls a "computational knowledge engine" for the Web. OK, so what does that really mean? Basically it means that you can ask it factual questions and it computes answers for you.
It doesn't simply return documents that (might) contain the answers, like Google does, and it isn't just a giant database of knowledge, like the Wikipedia. It doesn't simply parse natural language and then use that to retrieve documents, like Powerset, for example.
Instead, Wolfram Alpha actually computes the answers to a wide range of questions -- like questions that have factual answers such as "What is the location of Timbuktu?" or "How many protons are in a hydrogen atom?," "What was the average rainfall in Boston last year?," "What is the 307th digit of Pi?," "where is the ISS?" or "When was GOOG worth more than $300?"
Think about that for a minute. It computes the answers. Wolfram Alpha doesn't simply contain huge amounts of manually entered pairs of questions and answers, nor does it search for answers in a database of facts. Instead, it understands and then computes answers to certain kinds of questions.
How Does it Work?
Wolfram Alpha is a system for computing the answers to questions. To accomplish this it uses built-in models of fields of knowledge, complete with data and algorithms, that represent real-world knowledge.
For example, it contains formal models of much of what we know about science -- massive amounts of data about various physical laws and properties, as well as data about the physical world.
Based on this you can ask it scientific questions and it can compute the answers for you. Even if it has not been programmed explicity to answer each question you might ask it.
But science is just one of the domains it knows about -- it also knows about technology, geography, weather, cooking, business, travel, people, music, and more.
It also has a natural language interface for asking it questions. This interface allows you to ask questions in plain language, or even in various forms of abbreviated notation, and then provides detailed answers.
The vision seems to be to create a system wich can do for formal knowledge (all the formally definable systems, heuristics, algorithms, rules, methods, theorems, and facts in the world) what search engines have done for informal knowledge (all the text and documents in various forms of media).
How Smart is it and Will it Take Over the World?
Wolfram Alpha is like plugging into a vast electronic brain. It provides extremely impressive and thorough answers to a wide range of questions asked in many different ways, and it computes answers, it doesn't merely look them up in a big database.
In this respect it is vastly smarter than (and different from) Google. Google simply retrieves documents based on keyword searches. Google doesn't understand the question or the answer, and doesn't compute answers based on models of various fields of human knowledge.
But as intelligent as it seems, Wolfram Alpha is not HAL 9000, and it wasn't intended to be. It doesn't have a sense of self or opinions or feelings. It's not artificial intelligence in the sense of being a simulation of a human mind. Instead, it is a system that has been engineered to provide really rich knowledge about human knowledge -- it's a very powerful calculator that doesn't just work for math problems -- it works for many other kinds of questions that have unambiguous (computable) answers.
There is no risk of Wolfram Alpha becoming too smart, or taking over the world. It's good at answering factual questions; it's a computing machine, a tool -- not a mind.
One of the most surprising aspects of this project is that Wolfram has been able to keep it secret for so long. I say this because it is a monumental effort (and achievement) and almost absurdly ambitious. The project involves more than a hundred people working in stealth to create a vast system of reusable, computable knowledge, from terabytes of raw data, statistics, algorithms, data feeds, and expertise. But he appears to have done it, and kept it quiet for a long time while it was being developed.
Computation Versus Lookup
For those who are more scientifically inclined, Stephen showed me many interesting examples -- for example, Wolfram Alpha was able to solve novel numeric sequencing problems, calculus problems, and could answer questions about the human genome too. It was also able to compute answers to questions about many other kinds of topics (cooking, people, economics, etc.). Some commenters on this article have mentioned that in some cases Google appears to be able to answer questions, or at least the answers appear at the top of Google's results. So what is the Big Deal? The Big Deal is that Wolfram Alpha doesn't merely look up the answers like Google does, it computes them using at least some level of domain understanding and reasoning, plus vast amounts of data about the topic being asked about.
Computation is in many cases a better alternative to lookup. For example, you could solve math problems using lookup -- that is what a multiplication table is after all. For a small multiplication table, lookup might even be almost as computationally inexpensive as computing the answers. But imagine trying to create a lookup table of all answers to all possible multiplication problems -- an infinite multiplication table. That is a clear case where lookup is no longer a better option compared to computation.
The ability to compute the answer on a case by case basis, only when asked, is clearly more efficient than trying to enumerate and store an infinitely large multiplication table. The computation approach only requires a finite amount of data storage -- just enough to store the algorithms for solving general multiplication problems -- whereas the lookup table approach requires an infinite amount of storage -- it requires actually storing, in advance, the products of all pairs of numbers.
(Note: If we really want to store the products of ALL pairs of numbers, it turns out this is impossible to accomplish, because there are an infinite number of numbers. It would require an infinite amount of time to simply generate the data, and an infinite amount of storage to store it. In fact, just to enumerate and store all the multiplication products of the numbers between 0 and 1 would require an infinite amount of time and storage. This is because the real-numbers are uncountable. There are in fact more real-numbers than integers (see the work of Georg Cantor on this). However, the same problem holds even if we are speaking of integers -- it would require an infinite amount of storage to store all their multiplication products, although they at least could be enumerated, given infinite time.)
Using the above analogy, we can see why a computational system like Wolfram Alpha is ultimately a more efficient way to compute the answers to many kinds of factual questions than a lookup system like Google. Even though Google is becoming increasingly comprehensive as more information comes on-line and gets indexed, it will never know EVERYTHING. Google is effectively just a lookup table of everything that has been written and published on the Web, that Google has found. But not everything has been published yet, and furthermore Google's index is also incomplete, and always will be.
Therefore Google does and always will contain gaps. It cannot possibly index the answer to every question that matters or will matter in the future -- it doesn't contain all the questions or all the answers. If nobody has ever published a particular question-answer pair onto some Web page, then Google will not be able to index it, and won't be able to help you find the answer to that question -- UNLESS Google also is able to compute the answer like Wolfram Alpha does (an area that Google is probably working on, but most likely not to as sophisticated a level as Wolfram's Mathematica engine enables).
While Google only provide answers that are found on some Web page (or at least in some data set they index), a computational knowledge engine like Wolfram Alpha can provide answers to questions it has never seen before -- provided however that it at least knows the necessary algorithms for answering such questions, and it at least has sufficient data to compute the answers using these algorithms. This is a "big if" of course.
Wolfram Alpha substitutes computation for storage. It is simply more compact to store general algorithms for computing the answers to various types of potential factual questions, than to store all possible answers to all possible factual questions. In then end making this tradeoff in favor of computation wins, at least for subject domains where the space of possible factual questions and answers is large. A computational engine is simply more compact and extensible than a database of all questions and answers.
This tradeoff, as Mills Davis points out in the comments to this article is also referred to as the tradeoff between time and space in computation. For very difficult computations, it may take a long time to compute the answer. If the answer was simply stored in a database already of course that would be faster and more efficient. Therefore, a hybrid approach would be for a system like Wolfram Alpha to store all the answers to any questions that have already been asked of it, so that they can be provided by simple lookup in the future, rather than recalculated each time. There may also already be databases of precomputed answers to very hard problems, such as finding very large prime numbers for example. These should also be stored in the system for simple lookup, rather than having to be recomputed. I think that Wolfram Alpha is probably taking this approach. For many questions it doesn't make sense to store all the answers in advance, but certainly for some questions it is more efficient to store the answers, when you already know them, and just look them up.
Where Google is a system for FINDING things that we as a civilization collectively publish, Wolfram Alpha is for COMPUTING answers to questions about what we as a civilization collectively know. It's the next step in the distribution of knowledge and intelligence around the world -- a new leap in the intelligence of our collective "Global Brain." And like any big next-step, Wolfram Alpha works in a new way -- it computes answers instead of just looking them up.
Wolfram Alpha, at its heart is quite different from a brute force statistical search engine like Google. And it is not going to replace Google -- it is not a general search engine: You would probably not use Wolfram Alpha to shop for a new car, find blog posts about a topic, or to choose a resort for your honeymoon. It is not a system that will understand the nuances of what you consider to be the perfect romantic getaway, for example -- there is still no substitute for manual human-guided search for that. Where it appears to excel is when you want facts about something, or when you need to compute a factual answer to some set of questions about factual data.
I think the folks at Google will be surprised by Wolfram Alpha, and they will probably want to own it, but not because it risks cutting into their core search engine traffic. Instead, it will be because it opens up an entirely new field of potential traffic around questions, answers and computations that you can't do on Google today.
The services that are probably going to be most threatened by a service like Wolfram Alpha are the Wikipedia, Metaweb's Freebase, True Knowledge, and any natural language search engines (such as Microsoft's upcoming search engine, based perhaps in part on Powerset's technology among others), and other services that are trying to build comprehensive factual knowledge bases.
As a side-note, my own service, Twine.com, is NOT trying to do what Wolfram Alpha is trying to do, fortunately. Instead, Twine uses the Semantic Web to help people filter the Web, organize knowledge, and track their interests. It's a very different goal. And I'm glad, because I would not want to be competing with Wolfram Alpha. It's a force to be reckoned with.
Relationship to the Semantic Web
During our discussion, after I tried and failed to poke holes in his natural language parser for a while, we turned to the question of just what this thing is, and how it relates to other approaches like the Semantic Web.
The first question was could (or even should) Wolfram Alpha be built using the Semantic Web in some manner, rather than (or as well as) the Mathematica engine it is currently built on. Is anything missed by not building it with Semantic Web's languages (RDF, OWL, Sparql, etc.)?
The answer is that there is no reason that one MUST use the Semantic Web stack to build something like Wolfram Alpha. In fact, in my opinion it would be far too difficult to try to explicitly represent everything Wolfram Alpha knows and can compute using OWL ontologies and the reasoning that they enable. It is just too wide a range of human knowledge and giant OWL ontologies are too difficult to build and curate.
It would of course at some point be beneficial to integrate with the Semantic Web so that the knowledge in Wolfram Alpha could be accessed, linked with, and reasoned with, by other semantic applications on the Web, and perhaps to make it easier to pull knowledge in from outside as well. Wolfram Alpha could probably play better with other Web services in the future by providing RDF and OWL representations of it's knowledge, via a SPARQL query interface -- the basic open standards of the Semantic Web. However for the internal knowledge representation and reasoning that takes places in Wolfram Alpah, OWL and RDF are not required and it appears Wolfram has found a more pragmatic and efficient representation of his own.
I don't think he needs the Semantic Web INSIDE his engine, at least; it seems to be doing just fine without it. This view is in fact not different from the current mainstream approach to the Semantic Web -- as one commenter on this article pointed out, "what you do in your database is your business" -- the power of the Semantic Web is really for knowledge linking and exchange -- for linking data and reasoning across different databases. As Wolfram Alpha connects with the rest of the "linked data Web," Wolfram Alpha could benefit from providing access to its knowledge via OWL, RDF and Sparql. But that's off in the future.
It is important to note that just like OpenCyc (which has taken decades to build up a very broad knowledge base of common sense knowledge and reasoning heuristics), Wolfram Alpha is also a centrally hand-curated system. Somehow, perhaps just secretly but over a long period of time, or perhaps due to some new formulation or methodology for rapid knowledge-entry, Wolfram and his team have figured out a way to make the process of building up a broad knowledge base about the world practical where all others who have tried this have found it takes far longer than expected. The task is gargantuan -- there is just so much diverse knowledge in the world. Representing even a small area of it formally turns out to be extremely difficult and time-consuming.
It has generally not been considered feasible for any one group to hand-curate all knowledge about every subject. The centralized hand-curation of Wolfram Alpha is certainly more controllable, manageable and efficient for a project of this scale and complexity. It avoids problems of data quality and data-consistency. But it's also a potential bottleneck and most certainly a cost-center. Yet it appears to be a tradeoff that Wolfram can afford to make, and one worth making as well, from what I could see. I don't yet know how Wolfram has managed to assemble his knowledge base in less than a very long time, or even how much knowledge he and his team have really added, but at first glance it seems to be a large amount. I look forward to learning more about this aspect of the project.
Building Blocks for Knowledge Computing
Wolfram Alpha is almost more of an engineering accomplishment than a scientific one -- Wolfram has broken down the set of factual questions we might ask, and the computational models and data necessary for answering them, into basic building blocks -- a kind of basic language for knowledge computing if you will. Then, with these building blocks in hand his system is able to compute with them -- to break down questions into the basic building blocks and computations necessary to answer them, and then to actually build up computations and compute the answers on the fly.
Wolfram's team manually entered, and in some cases automatically pulled in, masses of raw factual data about various fields of knowledge, plus models and algorithms for doing computations with the data. By building all of this in a modular fashion on top of the Mathematica engine, they have built a system that is able to actually do computations over vast data sets representing real-world knowledge. More importantly, it enables anyone to easily construct their own computations -- simply by asking questions.
The scientific and philosophical underpinnings of Wolfram Alpha are similar to those of the cellular automata systems he describes in his book, "A New Kind of Science" (NKS). Just as with cellular automata (such as the famous "Game of Life" algorithm that many have seen on screensavers), a set of simple rules and data can be used to generate surprisingly diverse, even lifelike patterns. One of the observations of NKS is that incredibly rich, even unpredictable patterns, can be generated from tiny sets of simple rules and data, when they are applied to their own output over and over again.
In fact, cellular automata, by using just a few simple repetitive rules, can compute anything any computer or computer program can compute, in theory at least. But actually using such systems to build real computers or useful programs (such as Web browsers) has never been practical because they are so low-level it would not be efficient (it would be like trying to build a giant computer, starting from the atomic level).
The simplicity and elegance of cellular automata proves that anything that may be computed -- and potentially anything that may exist in nature -- can be generated from very simple building blocks and rules that interact locally with one another. There is no top-down control, there is no overarching model. Instead, from a bunch of low-level parts that interact only with other nearby parts, complex global behaviors emerge that, for example, can simulate physical systems such as fluid flow, optics, population dynamics in nature, voting behaviors, and perhaps even the very nature of space-time. This is the main point of the NKS book in fact, and Wolfram draws numerous examples from nature and cellular automata to make his case.
But with all its focus on recombining simple bits of information according to simple rules, cellular automata is not a reductionist approach to science -- in fact, it is much more focused on synthesizing complex emergent behaviors from simple elements than in reducing complexity back to simple units. The highly synthetic philosophy behind NKS is the paradigm shift at the basis of Wolfram Alpha's approach too. It is a system that is very much "bottom-up" in orientation. This is not to say that Wolfram Alpha IS a cellular automaton itself -- but rather that it is similarly based on fundamental rules and data that are recombined to form highly sophisticated structures.
Wolfram has created a set of building blocks for working with formal knowledge to generate useful computations, and in turn, by putting these computations together you can answer even more sophisticated questions and so on. It's a system for synthesizing sophisticated computations from simple computations. Of course anyone who understands computer programming will recognize this as the very essence of good software design. But the key is that instead of forcing users to write programs to do this in Mathematica, Wolfram Alpha enables them to simply ask questions in natural language and then automatically assembles the programs to compute the answers they need.
Wolfram Alpha perhaps represents what may be a new approach to creating an "intelligent machine" that does away with much of the manual labor of explicitly building top-down expert systems about fields of knowledge (the traditional AI approach, such as that taken by the Cyc project), while simultaneously avoiding the complexities of trying to do anything reasonable with the messy distributed knowledge on the Web (the open-standards Semantic Web approach). It's simpler than top down AI and easier than the original vision of Semantic Web.
Generally if someone had proposed doing this to me, I would have said it was not practical. But Wolfram seems to have figured out a way to do it. The proof is that he's done it. It works. I've seen it myself.
Of course, questions abound. It remains to be seen just how smart Wolfram Alpha really is, or can be. How easily extensible is it? Will it get increasingly hard to add and maintain knowledge as more is added to it? Will it ever make mistakes? What forms of knowledge will it be able to handle in the future?
I think Wolfram would agree that it is probably never going to be able to give relationship or career advice, for example, because that is "fuzzy" -- there is often no single right answer to such questions. And I don't know how comprehensive it is, or how it will be able to keep up with all the new knowledge in the world (the knowledge in the system is exclusively added by Wolfram's team right now, which is a labor intensive process). But Wolfram is an ambitious guy. He seems confident that he has figured out how to add new knowledge to the system at a fairly rapid pace, and he seems to be planning to make the system extremely broad.
And there is the question of bias, which we addressed as well. Is there any risk of bias in the answers the system gives because all the knowledge is entered by Wolfram's team? Those who enter the knowledge and design the formal models in the system are in a position to both define the way the system thinks -- both the questions and the answers it can handle. Wolfram believes that by focusing on factual knowledge -- things like you might find in the Wikipedia or textbooks or reports -- the bias problem can be avoided. At least he is focusing the system on questions that do have only one answer -- not questions for which there might be many different opinions. Everyone generally agrees for example that the closing price of GOOG on a certain data is a particular dollar amount. It is not debatable. These are the kinds of questions the system addresses.
But even for some supposedly factual questions, there are potential biases in the answers one might come up with, depending on the data sources and paradigms used to compute them. Thus the choice of data sources has to be made carefully to try to reflect as non-biased a view as possible. Wolfram's strategy is to rely on widely accepted data sources like well-known scientific models, public data about factual things like the weather, geography and the stock market published by reputable organizatoins and government agencies, etc. But of course even this is a particular worldview and reflects certain implicit or explicit assumptions about what data sources are authoritative.
This is a system that reflects one perspective -- that of Wolfram and his team -- which probably is a close approximation of the mainstream consensus scientific worldview of our modern civilization. It is a tool -- a tool for answering questions about the world today, based on what we generally agree that we know about it. Still, this is potentially murky philosophical territory, at least for some kinds of questions. Consider global warming -- not all scientists even agree it is taking place, let alone what it signifies or where the trends are headed. Similarly in economics, based on certain assumptions and measurements we are either experiencing only mild inflation right now, or significant inflation. There is not necessarily one right answer -- there are valid alternative perspectives.
I agree with Wolfram, that bias in the data choices will not be a problem, at least for a while. But even scientists don't always agree on the answers to factual questions, or what models to use to describe the world -- and this disagreement is essential to progress in science in fact. If there is only one "right" answer to any question there could never be progress, or even different points of view. Fortunately, Wolfram is desigining his system to link to alternative questions and answers at least, and even to sources for more information about the answers (such as the Wikipeda for example). In this way he can provide unambiguous factual answers, yet also connect to more information and points of view about them at the same time. This is important.
It is ironic that a system like Wolfram Alpha, which is designed to answer questions factually, will probably bring up a broad range of questions that don't themselves have unambiguous factual answers -- questions about philosophy, perspective, and even public policy in the future (if it becomes very widely used). It is a system that has the potential to touch our lives as deeply as Google. Yet how widely it will be used is an open question too.
The system is beautiful, and the user interface is already quite simple and clean. In addition, answers include computationally generated diagrams and graphs -- not just text. It looks really cool. But it is also designed by and for people with IQ's somewhere in the altitude of Wolfram's -- some work will need to be done dumbing it down a few hundred IQ points so as to not overwhelm the average consumer with answers that are so comprehensive that they require a graduate degree to fully understand.
It also remains to be seen how much the average consumer thirsts for answers to factual questions. I do think all consumers at times have a need for this kind of intelligence once in a while, but perhaps not as often as they need something like Google. But I am sure that academics, researchers, students, government employees, journalists and a broad range of professionals in all fields definitely need a tool like this and will use it every day.
I think there is more potential to this system than Stephen has revealed so far. I think he has bigger ambitions for it in the long-term future. I believe it has the potential to be THE online service for computing factual answers. THE system for factual knowlege on the Web. More than that, it may eventually have the potential to learn and even to make new discoveries. We'll have to wait and see where Wolfram takes it.
Maybe Wolfram Alpha could even do a better job of retrieving documents than Google, for certain kinds of questions -- by first understanding what you really want, then computing the answer, and then giving you links to documents that related to the answer. But even if it is never applied to document retrieval, I think it has the potential to play a leading role in all our daily lives -- it could function like a kind of expert assistant, with all the facts and computational power in the world at our fingertips.
I would expect that Wolfram Alpha will open up various API's in the future and then we'll begin to see some interesting new, intelligent, applications begin to emerge based on its underlying capabilities and what it knows already.
In May, Wolfram plans to open up what I believe will be a first version of Wolfram Alpha. Anyone interested in a smarter Web will find it quite interesting, I think. Meanwhile, I look forward to learning more about this project as Stephen reveals more in months to come.
One thing is certain, Wolfram Alpha is quite impressive and Stephen Wolfram deserves all the congratulations he is soon going to get.