The accurate creation, movement and storage of data is now big business for the corporations that guarantee integrity, velocity, veracity and validity of the information they store, retrieve and present to their customers. Historically this information used to be stored on printed pages or as magnetic variations on reels of tape and archived in secure long term environmentally controlled storage facilities; we now have the storage densities to retain more (relevant and irrelevant) data on hard disk for far longer.
This on-line, all-the-time storage and retrieval allows the huge volumes of data to be near-instantly accessed and subsequently processed, which allows manipulation and analysis of the information to gain insight to patterns and behaviour and the generation of meta-data (data on data) to be created, and ultimately stored away for later near-instant analysis.
The granularity and detail in the stored information creates issues pertaining to regulatory compliance in respect to what data is stored about whom, and should there be questions as to its accuracy in the future, its long-term storage and security.
The regulatory trend is to require more and more business records to be stored for longer periods to aid in litigation and defend positions. Gathering information on your customers and the market has become the key to the success of many organisations, and in some cases, the saleable commodity they bring to market. Security of the big data in such projects is essential; having the information is great but if you lose it, the damage to the business’s reputation could be devastating.
While the data collected and stored may be anonymised, the correlation for different data sets paired with reasonable analysis may be able to identify individuals and their habits and thus creates a data protection problem while it is at rest on a hard drive and in motion on the wire.
The amount of data is staggering; EMC has calculated the total storage of the open internet to stand at 4.4 zettabytes (ZB); that is 4.4 trillion gigabytes in 2013. The trend indicates that this value doubles approximately every two years, which nicely re-affirms Moore’s law. Extrapolation of the trend reveals that in the year 2020 the data stored on the internet will rise above 44 zettabytes. It can be difficult to comprehend, just because it is so far outside of what we experience in our everyday life. A zettabyte is 1,000,000,000,000,000,000,000 bytes. To help you picture this, there are 100 billion stars in an average galaxy and 100 billion galaxies in the observable universe. There are more bytes of storage on the internet than there are stars in the observable universe. That is quite a lot!
The Live Internet Stats Project is monitoring more than a billion internet services giving real time statistics of their usage. The information really does put all this data into perspective. Head over to the internet stats project page for both live data and a second in the life of the internet.
Axial Systems provide solutions that touch every part of the big data (r)evolution – observing, reporting, optimising and securing big data – your data.