We are living in a digital era where data becomes big, learning becomes deep and the data is increasing exponentially 9. Artificial Intelligence, Cloud Computing, and Big Data are the talk of the town in the 21st Century. Big Data are the gold mine of this century 52. In the years to come, I may get a call from a psychological clinic and they may tell me that I am suffering from depression based on the tweets of mine for last one month. I may move with a driverless car and my every move and keystrokes may be tracked. My daughter may inquire an online course selection system and it would advise her best course suitable for her based on her academic records and some psychometric tests performed online. She may enroll for the best-suited course for her and get instant materials, feedback and guidance from the system and may get some links to understand the topics better. This may be the scenario of recent future. From learning analytics to business intelligence, from the health sector to sensor data, the regime of big data continues. McKinsey predicted Big Data as next frontier for research and innovation 9. Big data analytics may be categorized as Big predictive, prescriptive and descriptive analytics 51. The important aspect is to search for pearls in the sea of Big data. Big data is such data that cannot be computed in a single machine of today’s world and stored using our traditional database system. Big data is generated from social media, sensor networks, banking transactions, satellite imaging, biomedical projects, business data, genome data etc. By 2020, the number of internet users would be 5 billion with 50 billion devices from the current 2 billion users 18. The predicated data is 44 times more than the present volume. 90% of data generated in the last two years 33. In 2012, American administration invested 200 million dollars for research in Big Data 18. Most of the generated data are semi-structured or unstructured data so the structured databases cannot handle such data. So, the big data comes with various challenges as the volume is huge, data is generated at high speed and for its heterogeneous nature. To understand the rapid growth of unstructured data, one may go through the use of YouTube, Facebook, Twitter, Instagram, Google+ etc. In YouTube, 100 hours of video per minute are uploaded, Facebook users upload 100 TB of data daily, Twitter user publish 175 million tweets daily, 40 million photos per day are uploaded in Instagram while Google+ creates 1 billion accounts per day and the list goes on. One of the biggest challenges is how to accumulate this huge data generation. Laney defined big data with 3V characteristics 9. They are Volume (data is huge), Velocity (data is coming at a high rate) and Variety (data is of different formats). The characteristics of big data are now characterized with 12V, they are Volume, Velocity, Variety, Veracity, Value, Validity, Viscosity, Visualization, Virility, Volatility, Variability, and Visibility. To process Big Data, Apache Hadoop is a well-established platform. It implements the Google’s Map/Reduce computational paradigm that divides the application into pieces and processes each part parallel 18. So, Hadoop is a powerful programming framework which can process huge data in-parallel on various clusters in the effective fault-tolerant way. In the Map-Reduce framework, Map () and Reduce () are the two main functions. In this distributed computing paradigm of Hadoop, one master module called JobTracker and many slave modules are available called TaskTracker 33. The Map() function manages the huge data and makes it as key, value pairs in parallel and the key, value are merged by the Reduce() function. From that output, the analytics may be applied to find interesting and valuable information for decision-makers. MapReduce is highly scalable across lots of computing systems and can process zeta bytes of data using batch processing. Hadoop has a variety of components namely HBase, Pig, Hive, HCatalog, Oozie, Zookeeper, Kafka with a paradigm like MapReduce and Hadoop Distributed File System (HDFS) used extensively for Big Data 9. Hadoop has limitations as well. For efficiency, the Big Data is replicated in multiple locations thus making the Big Data bigger. Hadoop is a complicated system with very limited SQL support. Privacy and security are another concern for Big Data. So, Big Data comes with big challenges and opportunities 52.
The rest of the paper is organized as follows: Section II presents Literature Review, Section III describes Big Data in various fields, Section IV presents Big Data Methodology and the Section V describes the Learning Analytics Model and Section VI presents the Conclusion.