01. Log analysis for pattern identification/process analysis.
02. Massive storage and parallel processing
03. Data mashup to extract intelligence from data
Health & Life Sciences
01. Health-insurance fraud detection
02. Campaign management
03. Brand & reputation management
04. Patient care and service quality management
05. Gene mapping and analytics
06. Drug discovery
Communication, Media & Technology
01. Real-time calls analysis
02. Network performance management
03. Social graph analysis
04. Mobile user usage analysis
Governance
01. Compliance and regulatory analysis
02. Threat detection, crime prediction
03. Smart cities and e-governance
04. Energy management
The various Big Data layers are discussed below, there are four main big data layers.
Big data management architecture should be able to incorporate all possible data sources and provide a cheap option for Total Cost of Ownership (TCO).
Big Data technologies provide a concept of utilizing all available data through an integrated system.
You can choose either open source frameworks or packaged licensed products to take full advantage of the functionality of the various components in the stack. The various Big Data layers are discussed below:
Data Sources Layer
Data Source layer has a different scale – while the most obvious, many companies work in the multi-terabyte and even petabyte arena.
It incorporates structured, unstructured
and/or semi-structured data captured from transactions, interactions and observations systems such as Facebook, twitter.
This very wide variety of data, coming in huge volume with high velocity has to be seamlessly merged and consolidated so that the analytics engines, as well as the visualization tools, can operate on it as one single big data set.
Acquire/Ingestion Layer
The responsibility of this layer is to separate the noise and relevant information from the humongous data set which is present at different data access points.
This layer should have the ability to validate, cleanse, transform, reduce, and integrate the data into the big data tech stack for further processing.
Once the relevant information is captured, it is sent to manage layer where Hadoop distributed file system (HDFS) stores the relevant information based on multiple commodity servers.
Manage Layer
This layer is supported by storage layer—that is the robust and inexpensive physical infrastructure is fundamental to the operation and scalability of big data architecture.
This layer also provides the tools and query languages to access the NoSQL databases using the HDFS storage file system sitting on top of the Hadoop physical infrastructure layer.
The data is no longer stored in a monolithic server where the SQL functions are applied to crunch it.
Redundancy is built into this infrastructure for the very simple reason that we are dealing with large volume of data from different sources.
The key building blocks of the Hadoop platform management layer is MapReduce programming which executes set of functions against a large amount of data in batch mode.
The map function does the distributed computation task while the reduce function combines all the elements back together to provide a result.
An example of MapReduce program would be to determine how many times a particular word appeared in a document.
Analyze & Visualize Layer
This layer provides the data discovery mechanisms from the huge volume of data.
For the huge volume of data, we need fast search engines with iterative and cognitive approaches. Search engine results can be presented in various forms using “new age” visualization tools and methods.
Real-time analysis can leverage NoSQL stores (for example, Cassandra, MongoDB, and others) to analyze data produced by web-facing apps.
Before we start Big Data definition and introduction we need to understand why do we need big data technology when we have high performance and reliable relational database management system (RDBMS)?
Why Big Data
The reason to use big data is that, in the relational databases, data is stored in a structured format with data modeling techniques such as entity-relationship modeling, star schema modeling or snowflake schema techniques.
Initially, it was just transactional data and hence if the data grows over a period of time, organizations started analyzing the data using data marts and data warehouses.
Business Intelligence done on top of data marts and data warehouses is the key drivers for CxOs to make forecasts, define budgets, and determine new market drivers of growth.
Until the era of internet, business intelligence analysis was done on the enterprise data. However, in the era of internet, data existing outside the enterprise become the key need for strategic decisions.
Things started getting more complex in terms of the variety, velocity and volume of data with the advent of social networking sites and search engines such as Google, Yahoo, and Bing.
Businesses need to find the pragmatic approach to capture this information to survive or gain a competitive advantage with other vendors.
Organizations need to collect this data generated from a variety of sources such as images, streaming videos, social media feeds, text files, documents, sensor data, and so on to respond and innovate quickly to customer needs in order to gain the competitive advantage over other companies.
The solution of above problem is “BIG DATA” however the unstructured or semi-structured nature of data with the velocity with which it is getting created is the real challenge for the big data.
Big Data Definition
Let us go through big data definition below to understand about big data.
Big data is a term that describes the large volume of data, both structured and unstructured, that a business generates on a day-to-day basis.
However, it’s not the amount of data that’s important. The idea of Big Data is basically how do I extract extra dollars from someone’s pocket to maximize sales and minimize cost in order to increase the profit margin.
Organizations are discovering that important predictions can be made by sorting through and analyzing Big Data.
Data is the new oil. There are 1000’s of companies which are just working towards collecting the data. No manufacturing plant, no supply chain strategies; they just collect the data.
Big Data in Action – Examples of Big Data Analytics
American retail company Walmart collects 2.5 petabytes of unstructured data from 1 million customers every hour which is equivalent to 167 times the books in America’s Library of Congress.
With tons of unstructured data being generated every hour, Walmart is improving its operational efficiency by leveraging big data analytics.
One of the finest applications Walmart has is Savings Catcher Application which alerts the customer whenever its neighboring competitor reduces the cost of an item the customer already bought.
This application then sends a gift voucher to the customer to compensate the price difference. This application runs on top of the tons and tons of data which Walmart collects every hour.
The universe of Big Data is surrounded by customer reviews, feedbacks, who are talking about a particular product through the communication channels such as Facebook, Twitter, product review forums, etc.
It is important for organizations to understand and analyze what customers say about their goods and/or services to ensure customer satisfaction.
Important predictions such as analyzing customer sentiments, which give organizations a clear picture of what they need to do to outperform their competitors can be made by sorting through and analyzing Big Data.
Therefore, big data can be analyzed for insights that lead to better decisions and strategic business moves.