The Big Data Landgrab, Are You Missing Out?

I had an interesting conversation with a colleague the other day about the value and price of big data.

He posited that as more people and companies figure out the value of big data, the price of buying datasets will go up, and thus a market opportunity exists to start snapping-up datasets.


I disagreed with him.

I argued that the value of data is relative and supply is inelastic, meaning data is going to continue to be generated regardless of its market price – and much of it will be available free. Therefore, price will approach zero in the long run.

So who’s right?

The Price of Free

There is a clear trend toward Open Data. As government agencies and NGOs continue to release datasets, the amount of information available about the world grows. Anyone can access and download the U.S. Census data, for example, or download any of the 378,529 raw and geospatial datasets available at

Whenever there is a free alternative on the market, it inevitably puts pressure on pricing. While the free product will not serve the needs of the entire market, it will satisfy a chunk of it.

Marginal Cost of Zero

Most products produced have a marginal cost per unit sold. It costs more money to produce ten cars than it does one. In the software world, it costs more to support ten-thousand users than it does one. In order to make a profit, you price your product above marginal cost and hope to do enough volume to recover your startup and fixed.

Datasets on the other hand have zero marginal cost. If I own a particular set of data, it costs me the same whether I sell one copy of it or one-million copies. Let’s say I can sell one copy of a particular dataset for $10,000 or 2,000 copies for $100 each. Given that my cost is the same in either scenario, the math becomes pretty simple.

We see this happening in the digital goods world already. Seth Godin believes that if it costs zero to produce, then the price should be zero.  And indeed, many eBooks on can be downloaded for $.99 or even Free.

Data Markets

The evolution of datamarkets will inevitable change the dynamics of how datasets are exchanged. Namely because they will increase supply and reduce scarcity. Today, if I want to acquire a dataset about a certain segment of Facebook users I have to buy it directly from whoever is collecting it. In the near future, I’ll be able to go to a data exchange like and have multiple options to choose from.

Also, in many cases data collection is getting cheaper. If someone has to drive a jeep to a remote village in Africa to collect survey data, that requires more resources than conducting a survey via SMS. But, if the same analysis or outcome can be achieved by the cheaper option then it will replace the old method.

Choice reduces price.

Slightly Aged and Still Unrefined

Most of us would never buy a barrel of unrefined oil. Unless you either own a refinery or you want to store it in your garage until market price goes up and then sell it, the unrefined oil will have zero value to you. Gas has value to you because you can put it in your car and use as fuel.

In fact, a barrel of unrefined oil sitting in your garage is actually costing you money to store. So the longer you hold on to it, the more it’s going to cost you.

The same holds true for big data, and just like a car depreciates when you drive it off the lot, data ages quickly and becomes less valuable overtime. Storing it and maintaining it can quickly become expensive.

Only those that own a data refinery will find value in raw data.

So then why are companies able to charge for data?

While the selling of big data may be the new vogue, it’s not new. Privacy exposés on Facebook and guys like Axiom may be bringing the market for data to the surface, but direct marketing companies have been accumulating data about people for years.

But here’s the thing, the companies that are selling data about people are specialized on creating unique and valuable datasets. It’s a nontrivial exercise to compile a composite profile of someone. In some cases, they’re probably sourcing their own data sources very cheaply and are able to charge a premium for what they’ve added in the process.

For example, the Economist Intelligence Unit (EIU), Infochimps and Factual are in the business of selling data. Taking a look at, they sell access to 11,451 data sets (as of 12/10/12). However, all but three are available for free. The three they charge for offer some level of unique value, such as the ability to return census data by IP address.

On the other hand, if you want to purchase a copy of All China Province Data from the Economist Intelligence Unit, plan on spending $21,900 per year.

So what gives?

You Get What You Pay For

There is a difference between data and information.

In the cases above, neither EIU nor Infochimps are simply hocking spreadsheets for sale. They’ve gone through some level of effort to cleanse the data and have added a layer of value on top of the raw dataset.

In the case of the EIU dataset, there is a fair amount of research, data collection, cleansing, contextualization, completeness and freshness of data that you’re paying for.

Data subscriptions are also an interesting and promising business model. Data ages quickly, but if you’re continually updating it as things change, then there is chargeable value in that.

It’s in the way that you use it

Let’s face it, open data is cool. But only a fraction of what’s available will used in any meaningful, or valuable, way. The real value of data isn’t in the data itself. Its what you do with it, and how you do it.

For example, raw data that’s being collected by the National Data Buoy Center isn’t directly valuable to a fisherman off the coast of Maine, but an app on his mobile phone that predicts currents and storms is.

While the potential use cases for the data are interesting, the real value at scale will come through organizations that create those applications and take them to market.

Companies that create value added applications are in the information market, not the data market. 

Scarcity drives prices up

The other factors that will determine pricing are scarcity and propriety. The EIU dataset selling for $21,900 per year is a niche focused and specialized set. There simply aren’t many other places you can go to get that level of data. And odds are, the handful of companies that have the need for that data also have deep enough pockets to pay for it.

Proprietary data is what companies or governments are collecting on their own about people or things. This type of information can be very valuable if it can deliver insight into monetizable actions.

Some companies will make a killing off of selling their proprietary data. Facebook potentially being one of them. But these will number in the few. It’s going to be increasingly tough to create and manage truly proprietary data.

Buy, Sell, Hold?

Without question, the data market will shape up to be an interesting one. Most raw data is going to be cheap or free. Refined or enriched data will drive its own market price based on the value it brings to the people that can use it.

Deep and proprietary data about things and people will be valuable, as well as targeted data sets that help organizations solve unique problems. But I think by and large, the value created will be at the contextualization and application layers.

The more complete, cleansed, refined, contextualized and scarce the data – the more it will be worth. Land grabbing data and trying to resell it is simply not a model that will make any money in the long run.

Sign-up for Updates
Like data and startups? This is your place.

Please note: I reserve the right to delete comments that are offensive or off-topic.

Leave a Reply

Your email address will not be published.

4 thoughts on “The Big Data Landgrab, Are You Missing Out?

  1. Agree Chris. In fact, even InfoChimps would agree with you.  6 months ago, they were in the business to sell datasets.  Then 6 months later when you visit their website, they are really no longer in the business of selling data, although they do that a bit still.  They are now in the business of helping companies set up data analysis engines in the cloud.  They realized that land grabbing a bunch of data is really not all that valuable.
    What will be interesting to me in the coming years will be to watch whole industries to see who shifts to letting go of their data and who doesn’t.  Will retailers start pooling all of their transactional data together into a Data Market so they can all make better decisions about merchandising, availability, upselling, crosselling etc..?  Will insurance companies start pooling their data on claims to make better decisions together as an industry about risks in order to offer more choices or options to the uninsured or under-insured?
    My sense is that many of these companies and industries will begin to see that it is not just about their internal data, so they will begin to share data in verticals to enrich the customer experience and make smarter choices as a company.  Then they will realize that it is not just about their vertical, but there is a whole set of data external to the company that will enrich their data even more to bring new insights and business models or even whole new companies into the world to help solve problems. 
    I think the real value of Social Media, specifically Twitter and Facebook, long term will not be so much the ROI of Social Media and selling more stuff or “listening” to customers.  But instead the true value of these channels over the long term will be that they started to open the eyes of us all about the value of external data and how it can help companies layer external data on to internal data to learn more and make better decisions.

  2. Some really great points in your article Chris. Agreeing with Rob’s comment — Infochimps still provides a data marketplace, but becoming a perk more than the main event. In creating that marketplace we pioneered and gained a lot of experience in a complex Big Data analytics stack. Now we’re packaging it and making it available as an “analytics platform” because — in my honest opinion — the market is still a little too young for a truly open data marketplace to fully shine. Companies are still trying to wrangle their data. They want visibility of the data they own before they can even begin to bring context to it. They want to be able to handle the real-time data flows of live customer transactions or off-the-wire tweets. We’re helping companies by abstracting away the complexity of a Big Data stack, while letting them get all the benefits of the applications you can build and insights you can generate — and it’s providing way more value than just providing a bunch of random data sets.

    Eventually, the data marketplace concept will return to the market as whole — as data gets manageable, the issue will be context and richness, not visibility. However until then, we’ll help companies get to visibility and business value, and we can provide some initial context by taking data — either our own or someone else’s (like Experian, Gnip, etc.) — and help directly augment data with that context. Giving you the raw data set to then figure out what to do with is much more challenging for the customer.Regarding free versus paid data  — it’s relatively easy to create a data storefront or catalog. It’s much more complex to allow people to upload or sell their own data. Cleanliness of data and detailed enough documentation is a big issue in a crowd-sourced marketplace. Also transaction sizes are typically small for the kind of data that most people want to sell. Larger scale data companies like Experian or Epsilon have massive or highly specific/curated data sets that are much more expensive, and they create direct relationships with their customers for their customers. Overall a complex market to navigate.

  3. Rob, I agree with your point about social being the gateway to understanding the value of data rather than being the end game itself. The idea of companies pooling their data to make smarter choices is an interesting one. It seems though that would eliminate (or at least minimize) the competitive advantages that data can bring to an organization. My sense is that the first ones to figure it out aren’t going to be keen on sharing much. Obama, as example, is apprehensive of sharing his prized data even within his own party.

    Although, if your hypothesis holds true then there will be a huge opportunity for data intermediaries…

  4. Tim, thank you for the very insightful comment. It’s interesting to see the different stratas of data providers that you laid out. I also love where you guys are going in the marketplace, its an exciting space and you guys are leading the way.