I had an interesting conversation with a colleague the other day about the value and price of big data.
He posited that as more people and companies figure out the value of big data, the price of buying datasets will go up, and thus a market opportunity exists to start snapping-up datasets.
I disagreed with him.
I argued that the value of data is relative and supply is inelastic, meaning data is going to continue to be generated regardless of its market price – and much of it will be available free. Therefore, price will approach zero in the long run.
So who’s right?
The Price of Free
There is a clear trend toward Open Data. As government agencies and NGOs continue to release datasets, the amount of information available about the world grows. Anyone can access and download the U.S. Census data, for example, or download any of the 378,529 raw and geospatial datasets available at Data.gov.
Whenever there is a free alternative on the market, it inevitably puts pressure on pricing. While the free product will not serve the needs of the entire market, it will satisfy a chunk of it.
Marginal Cost of Zero
Most products produced have a marginal cost per unit sold. It costs more money to produce ten cars than it does one. In the software world, it costs more to support ten-thousand users than it does one. In order to make a profit, you price your product above marginal cost and hope to do enough volume to recover your startup and fixed.
Datasets on the other hand have zero marginal cost. If I own a particular set of data, it costs me the same whether I sell one copy of it or one-million copies. Let’s say I can sell one copy of a particular dataset for $10,000 or 2,000 copies for $100 each. Given that my cost is the same in either scenario, the math becomes pretty simple.
We see this happening in the digital goods world already. Seth Godin believes that if it costs zero to produce, then the price should be zero. And indeed, many eBooks on Amazon.com can be downloaded for $.99 or even Free.
The evolution of datamarkets will inevitable change the dynamics of how datasets are exchanged. Namely because they will increase supply and reduce scarcity. Today, if I want to acquire a dataset about a certain segment of Facebook users I have to buy it directly from whoever is collecting it. In the near future, I’ll be able to go to a data exchange like Datamarket.com and have multiple options to choose from.
Also, in many cases data collection is getting cheaper. If someone has to drive a jeep to a remote village in Africa to collect survey data, that requires more resources than conducting a survey via SMS. But, if the same analysis or outcome can be achieved by the cheaper option then it will replace the old method.
Choice reduces price.
Slightly Aged and Still Unrefined
Most of us would never buy a barrel of unrefined oil. Unless you either own a refinery or you want to store it in your garage until market price goes up and then sell it, the unrefined oil will have zero value to you. Gas has value to you because you can put it in your car and use as fuel.
In fact, a barrel of unrefined oil sitting in your garage is actually costing you money to store. So the longer you hold on to it, the more it’s going to cost you.
The same holds true for big data, and just like a car depreciates when you drive it off the lot, data ages quickly and becomes less valuable overtime. Storing it and maintaining it can quickly become expensive.
Only those that own a data refinery will find value in raw data.
So then why are companies able to charge for data?
While the selling of big data may be the new vogue, it’s not new. Privacy exposés on Facebook and guys like Axiom may be bringing the market for data to the surface, but direct marketing companies have been accumulating data about people for years.
But here’s the thing, the companies that are selling data about people are specialized on creating unique and valuable datasets. It’s a nontrivial exercise to compile a composite profile of someone. In some cases, they’re probably sourcing their own data sources very cheaply and are able to charge a premium for what they’ve added in the process.
For example, the Economist Intelligence Unit (EIU), Infochimps and Factual are in the business of selling data. Taking a look at infochimps.com, they sell access to 11,451 data sets (as of 12/10/12). However, all but three are available for free. The three they charge for offer some level of unique value, such as the ability to return census data by IP address.
On the other hand, if you want to purchase a copy of All China Province Data from the Economist Intelligence Unit, plan on spending $21,900 per year.
So what gives?
You Get What You Pay For
There is a difference between data and information.
In the cases above, neither EIU nor Infochimps are simply hocking spreadsheets for sale. They’ve gone through some level of effort to cleanse the data and have added a layer of value on top of the raw dataset.
In the case of the EIU dataset, there is a fair amount of research, data collection, cleansing, contextualization, completeness and freshness of data that you’re paying for.
Data subscriptions are also an interesting and promising business model. Data ages quickly, but if you’re continually updating it as things change, then there is chargeable value in that.
It’s in the way that you use it
Let’s face it, open data is cool. But only a fraction of what’s available will used in any meaningful, or valuable, way. The real value of data isn’t in the data itself. Its what you do with it, and how you do it.
For example, raw data that’s being collected by the National Data Buoy Center isn’t directly valuable to a fisherman off the coast of Maine, but an app on his mobile phone that predicts currents and storms is.
While the potential use cases for the data are interesting, the real value at scale will come through organizations that create those applications and take them to market.
Companies that create value added applications are in the information market, not the data market.
Scarcity drives prices up
The other factors that will determine pricing are scarcity and propriety. The EIU dataset selling for $21,900 per year is a niche focused and specialized set. There simply aren’t many other places you can go to get that level of data. And odds are, the handful of companies that have the need for that data also have deep enough pockets to pay for it.
Proprietary data is what companies or governments are collecting on their own about people or things. This type of information can be very valuable if it can deliver insight into monetizable actions.
Some companies will make a killing off of selling their proprietary data. Facebook potentially being one of them. But these will number in the few. It’s going to be increasingly tough to create and manage truly proprietary data.
Buy, Sell, Hold?
Without question, the data market will shape up to be an interesting one. Most raw data is going to be cheap or free. Refined or enriched data will drive its own market price based on the value it brings to the people that can use it.
Deep and proprietary data about things and people will be valuable, as well as targeted data sets that help organizations solve unique problems. But I think by and large, the value created will be at the contextualization and application layers.
The more complete, cleansed, refined, contextualized and scarce the data – the more it will be worth. Land grabbing data and trying to resell it is simply not a model that will make any money in the long run.