To many people “big data” is a ridiculous idea. Not because they don’t understand it but because they are already quite comfortable managing, curating and acting on huge swathes of data. Many of us however see big data as a serious problem, possibly because we are too naïve, lazy or limited by our own imagination to solve the issue.
Depending on who you believe, the phrase has been around since the mid-1990s. So, since then, what has actually changed? Surprisingly little. Let’s start by clarifying what we mean by “big data”. IBM and SAS describe it in terms of dimensions. The intrinsic qualities are; volume, variety, complexity, velocity, variability, and veracity. Volume describes the sheer size of it, and the growth enabled by volume storage. The challenge here is in determining what is relevant and how to mine and action it.
Variety and complexity cover the types of data, its various sources and formats e.g. text, code, audio, video, non-numeric data etc. It also covers data governance, and the issues presented by trying to normalise it for decision-making. Velocity is about data production and demand. In environments where data is created in almost real-time (think smart metering or website analytics) the challenge is reacting to it in a timely fashion.
Variability is a little less intuitive. It alludes to trends, seasonality and inconsistencies in data flows during peak periods, making managing it troublesome. Finally, we come to veracity, and for me it’s the most important dimension. IBM estimate that 1 in 3 business leaders are not confident in the truthfulness of the information they’re given to base decisions on. So, we can have a scenario where we are dutifully dealing with the ideals of big data, but 33% of what we are handling is seen as untrustworthy.
We all already have access to business information that fits some or all of these criteria. We already make decisions based on it, and yet we continue to be profitable without unfurling the Big Data banner. This begs the question “why are there so many big data initiatives being pushed?” When looking at business information today what do you see? Large volumes of data. Some of it is structured but the vast majority is not, it is profoundly unstructured.
We then compound that problem of unstructured data with a desire for ever more information, from our extended environment and beyond. Often times I hear, “we need more information” or “why aren’t we collecting that data?” or “get me that information and then I can make a decision”. What all of this lacks are the whys and the what’s. Why do you need it, why should you trust it, what is it, and what are you going to do with it, assuming you do get it of course?
There are plenty of vendors that’ll help you with big data, using their increasingly faster processors, smarter technology, better analysis and expert engines to allow you to correlate and manage it. What seems to be lacking is the maturity of the people in the organisations that will be using these systems. This is not to say that management teams do not know what they want, usually far from it. However, their end result is normally less than the goal or the expectation that they started with. Obsessing over ‘Big Data’ is highly likely to achieve that end result.
George Santayana said, “Those who cannot remember the past, are condemned to repeat it”, so a slight detour into history. At the turn of the century we had the dot com bubble and Y2K. All of the things that people had learned about running a business were set aside. What did we see – some businesses succeeded, more failed and a large volume were left with a big hole in the bottom line due to the money that they had spent. Are we going to see the same things with ‘Big Data’? The likelihood is that we are, unless senior management carefully consider, and use their cumulative experience to guide their approach.
Don’t get me wrong, big data is a valid approach, and required by most businesses to inform their level of product, service or market intelligence for example. We have been doing this for years as businesses. What has changed though is the volume of data that we now have, coupled with the ease of data creation and collection. So we have to find new ways to use this data – but more importantly we have to make these meaningful. This is the area that vendors and technology do not help.
People are the key asset to making it work, but they are also the biggest problem in Big Data. Why? Because we all create data, we consume it, we correlate it but we never actually go back and tidy up or maintain it. Fundamentally we are all a little bit lazy. Is it possible that “Big Data” is as a much a response to messy housekeeping as it is to striving for agility and ordered decision making?
Asset or data management isn’t a sexy subject. However, it is where we need to start in any data driven project. It is an area that is significantly underfunded and unloved by most organisations, and are we surprised? No. Why would it be? It does not directly make or save money for the organisation. Without it though, it’s impossible to put any building blocks in place. The following are areas that asset and data management affect, and as they are important you really ought to be considering them:
Remember that big data is not driven by technology or systems it is driven by need, and what you are trying to achieve. This has to provide real business benefit, and will dictate the data that you actually require. If you can resolve the above areas there will be a lot less data in the first place, making this timelier and providing a competitive edge. Fundamentally you still have to ask the question, “does having more data actually make the decisions any easier?”
Big Data is likely to be a programme that we all run within our organisations. In order to achieve this start small, define the scope and stick to it. In order to achieve Big Data what a business actually needs is ‘big data’ discipline and governance. Once this has been achieved then the quality of that data can be improved and Big Data becomes business as usual data.
So who put the ‘I’ in Big Data – the person that creates the data or the person that wants to use it!