We have all heard the horror stories surrounding data breaches at household name companies in various industries. Our shopping preferences, financial habits and even personal health records are being looted at will. Data breaches threaten millions of people, cost billions in lost revenue and goodwill, and represent such a serious threat to a company’s prosperity that several industries have been launched to combat the plague.
So far, our efforts have been focused on preventing the intrusion. We have built giant walls to keep the barbarians from entry but have left the treasury room unsecured. We are focused on stopping an intrusion rather than protecting the data itself. This paper will focus on how to use the BI tools we already have to help protect that data and do it as close to the source as possible.
Locking the “data doors”
We’ve see the enemies ability to pick locks no matter the sophistication. All technical security systems will fail. Why? Because the enemy is relentless at trying to break it. Studies show that the average time to break-in and steal information happens in hours but detection takes 7 months. The most secure system is one that you don’t even realize is in place.
Several years ago, I designed a login system (back in the DOS days) that did nothing but return the command prompt “>” if you entered the password incorrectly. It even mimicked some basic DOS commands (e.g. CLR –screen clear). The hacker believed they were in the DOS command shell without realizing they were trapped in a security loop. They tried to break an OS and an application that weren’t there rather than trying to defeat a security system. It’s all about using mind-set, distraction, and illusion, just as good magicians do.
Why now? Big Data and how it’s used
The reason this all has come about is the advent of the ability to address large amounts of data. (Figure 1) The amount of data that occurs every minute of every day is staggering. New tools, like HADOOP, are designed to allow the fastest, most economical, and ubiquitous access possible. This opens new vistas in information and intelligence, but the problem is that it works too well. These tools are designed to provide unfettered access, not to be secure.
We, as an industry, have become complacent in our concerns around data security. Mobile apps as well as increasingly smaller storage media (Snowden’s stolen documents will fit on a micro-SD the size of a fingernail) making theft of critical information easy and widespread. Still, we compartmentalize security concerns to the routers and networks assuming that technology will protect us. Figure 1 Amount of data traversing the internet every minute (Source: Intel)
How to use BI to help minimize breaches
- Masking, Encrypting and Obfuscation – Use the tools that either come as part of an ETL package or create a table of masked values that tokenize the actual data and report on that.
- Reduce the circle of influence – By placing security as close to the data as possible you minimize the risk of human-caused leaks. The less people know the actual restricted values, the less the issue of an outside force co-opting your staff.
- Understand your data – There is no substitute for knowledge about the data. You should know if it is it important enough to be masked, how it is stored, and how it is accessed. It’s not economically feasible to mask everything, nor is it conducive to performance. Knowing your data at an atomic level will allow you to group it into secure clusters and enhance performance.
- Build a compelling business case for a BI solution – The cost per breach at the record level in the US is now over $200. A simple spreadsheet using any of the numbers of records stolen would provide a compelling ROI for this approach.
The barbarians are at the gate and they will get in. Make sure that when they breach the walls and open the vault, all they see are useless bit strings. As any good home alarm company will tell you, the criminals will soon move on to greener pastures.