Monday, June 15, 2015

BIG DATA

BIG DATA

 Traditional approach was to extract the data from distributed systems, store in centralized data warehouse and develop analytics on top of centralized data warehouse. The central data warehouses were the relational database management systems used mainly to processes structured data such as financial transaction, shipping data, employee information. However, with the technological advancement in various fronts, it has become possible to generate large volume of data sets; both structured and unstructured, in high frequency through varieties of data sources. This led to a tipping point when the traditional architecture was not scalable, efficient enough to store, process and retrieve those large volume of data sets. 

Definition
It is the term used to describe the exponential growth, availability and usage of structured and unstructured data. Big Data is becoming global phenomena in all the sectors including government, business and societies because of the availability of high volume of data, efficient and affordable processing technologies leading to more accurate analyses based on facts. The more accurate the analyses are better would be the understanding of business problems, opportunities and threats leading to better decision making for operational excellence in any business.
The relevance and emergence of Big Data have increased because of the 3Vs (Volume, Velocity and Variety) of data. Volumes refer to the large amount of data being generated through a variety of sources. These data could be structured such as financial transaction, invoices, personal detail or unstructured such as twitter/facebook messages, reviews, videos, images etc. Velocity indicates the speed at which the new data is being generated. Social media messages, sensor data are the examples of high volume of data being generated with high velocity. Variety refers to different types of data available for storage, retrieval and analysis. 

Difference between Big Data and Traditional Data
Traditional data includes the data such as documents, finances, stocks and personal files, which are more structured and mostly human generated. Whereas Big Data mainly refers to the large volume of data being generated through a variety of sources, not just human but also machines, processes etc. These data could be social media contents , sensor data , RFID data, scientific research data with millions, billions in numbers and large volume of size.

Why Big Data Matters
In last few years businesses have witnessed rapid growth of data whether it is in retail industry, logistics, financial or health industry.  There are many reasons, which are contributing to the growth of the data.  Total cost of application ownership is being reduced; application providers are offering cloud/subscription based solutions. Internet is becoming accessible to more and more end customers with the availability of smart phones at affordable cost and higher bandwidth (4G). Social media is becoming mainstream of daily life, business, and politics. More and more objects/things are getting connected to Internet (popularly known as Internet of Things) generating data in each stage of supply chain / value chain of businesses. Data is not only being generated by human but also by machines in terms of RFID feeds, medical records, sensor data, scientific data, service logs, web contents, maps, GPS logs etc. Some data are being generated so fast that there is not even time to store it before applying analytics to it.
This phenomenon of exponential growth of structured and unstructured data with higher speed is forcing the business to explore it rather than ignore it. To remain competitive, businesses have to leverage the data they can get within and outside the organization and use it in the decision-making process whether it is to understand the customers, suppliers, employees or the internal/external processes.

How to get a handle on analyzing the big data
The process of getting handle on analyzing the big data starts with the analysis of the company’s business environment, strategies, understanding the sources of data and their relevance in the business. It is also important to know how the data driven companies and other analytical competitors are exploiting Big Data in order to achieve the strategic advantage. The process of understanding big data analytics does not only involve understanding the technical aspects of big data infrastructure like Hadoop but also logical aspects of analytics such as data modeling, mining and its application in business in order to make better decisions. Literature reviews and researches would not be sufficient for a company to get confidence whether the big data analytics is the way forward for them. Companies can actually start a pilot project focusing on one strategic aspect of business whereby Big Data analytics could add value in the decision making process.

Hadoop and MapReduce
Hadoop is an open source software infrastructure that enables the processing of large volume of data sets in distributed clusters of commodity servers. The main advantage of Hadoop is its scalability as the server can be scaled from one to thousands of servers with the possibility of parallel processing (computing). Hadoop has mainly two components a) HDFS and b) MapReduce.  HDFS refers to the Hadoop Distributed File System that spans across all the nodes of the cluster within Hadoop server architecture. Unlike RDBMS, it is schema less architecture, which can easily store different types of structured and unstructured data. Hadoop is also fault tolerant that means if one node fails then it uses other backup nodes and recovers easily.
MapReduce is at the core of Hadoop. It is highly scalable cluster based data processing technology which consists of two main tasks a) Map and b) Reduce, executed in sequential order. The “map” task takes a set of data from source and converts it into key/value pairs. The “reduce” job then takes these key/value pairs outputs from “map” job and combines (reduces) those tuples (rows) into smaller number of tuples. For example, there is a requirement to collect “twitter” data for a newly released song for sentiment analysis. The chosen keywords are “I love it”, “I like it”, “I hate it”. The map job finds these keys and computes to come up with value (count) in each data set stored in a node of HDFS and the “reduce” task combines these computations at all nodes level to come up with final set of computation (e.g. final counts of these key/values). This way it becomes possible to process vast volume of data based on the scalability of the cluster.

Conclusion Data is becoming the energy of 21st century, a very important resource for every organization. The most important challenge would be to implement a sustainable, affordable, stable solution, which could bring insights and knowledge out of the mountains of data generated by human and machines every second. Big Data solution like Hadoop is being increasingly adopted by industries to exploit their structured and unstructured data thereby creating value through the cycle of data information insight knowledge intelligence and developing strategic capability as analytical competitor.  Despite the challenges, more and more organizations are expected to join this bandwagon in coming future.


Reference: A Blog from Javra Software Nepal.

Saturday, May 30, 2015

Few Example for JAVA understanding

1. Write a Java method removeDuplicates that removes all duplicates in a given list.

 Solution:

import java.util.ArrayList;

public class DuplicateTest {

    public static void main(String args[])
    {
        ArrayList<String> list=new ArrayList<String>();
        list.add("good");
        list.add("better");
        list.add("best");
        list.add("best");
        list.add("first");
        list.add("last");
        list.add("last");
        list.add("last");
        list.add("good");
       
        System.out.printf("List Before Duplicate Removal:%s",list);
       
        removeDuplicatesMethod(list);
       
        System.out.printf("\nList After Duplicate Removal:%s",list);
    }

    private static void removeDuplicatesMethod(ArrayList<String> list) {
        // TODO Auto-generated method stub
        for(int i=0;i<list.size();i++)
        {
            for(int k=i+1;k<list.size();k++)
            {
                if(list.get(i).equals(list.get(k)))
                {
                    list.remove(k);
                    k--;
                }
            }
        }
    }
   
   
}

2.  Write a Java method testForSum which determines whether a given array of integers contains three entries whose sum is equal to a given integer.
Solution:

public class TestSumImplementation {
   
    public static void main(String args[])
    {
      
        int[] testdata={5, 1, 23, 21, 17, 2, 3, 9, 12};
        int sum=5;
      
        boolean result= testForSum(testdata,sum);
        System.out.print(result);
    }

    private static boolean testForSum(int[] testdata, int sum) {
        // TODO Auto-generated method stub
        for(int i=0;i<testdata.length-2;i++)
        {
            for(int j=i+1;j<testdata.length-1;j++)
            {
                for(int k=j+1;k<testdata.length;k++ )
                    if(testdata[i]+testdata[j]+testdata[k]==sum) return true;
            }
        }
        return false;
    }
  
   
}

 3. Create your own linked list

Solution:

public class ListNode {
    Object data;
    ListNode nextNode;
  
    public ListNode(Object object)
    {
        data=object;
    }

    public ListNode(Object object,ListNode Node)
    {
        data=object;
        nextNode=Node;
    }
  
    Object getObject()
    {
        return data;
    }
  
    ListNode getNext()
    {
        return nextNode;
    }
}


public class LinkList {
    private ListNode firstNode;
    private ListNode lastNode;
    private String name;
   
    public  LinkList()
    {
        this("list");
    }

    public  LinkList(String listName)
    {
        name=listName;
    }
   
    public void add(Object obj)
    {
        if(isEmpty())
            firstNode=lastNode=new ListNode(obj);
        else
            firstNode=new ListNode(obj,firstNode);
    }
   
    public boolean find(Object obj)
    {
        if(isEmpty())
            return false;
        else
        {
            ListNode current=firstNode;
            while(current.nextNode!=null)
            {
                current=current.nextNode;
                if(obj==current.data)
                    return true;
            }
            return false;
        }
       
    }
   
   
    public boolean isEmpty()
    {
         if(firstNode==null)
             return true;
         else
             return false;
    }
   
    public String toString()
    {
        if(isEmpty())
            return "Empty List";
        else
        {
            ListNode current=firstNode;
            String result="[";
            while(current.nextNode!=null)
            {
                current=current.nextNode;
                result+=(String)current.data+",";
                   
            }
            result+=lastNode.data+"]";
            return result;
        }
    }
   
    public static void main(String args[])
    {
        LinkList list=new LinkList();
       
        list.add("Straight");
        System.out.println(list.toString());
        list.add("Bent");
        System.out.println(list.toString());
        list.add("Equals");
        System.out.println(list.toString());
       
        list.add("Well");
        System.out.println(list.toString());
        list.add("Storm");
        System.out.println(list.toString());
        //System.out.println(list.toString());
        System.out.printf("\nSearch result of well:%s",list.find("Well"));
        System.out.printf("\nSearch result of Strength:%s",list.find("Strength"));
       
       
       
    }
}
4. Permutation:

import java.util.ArrayList;
import java.util.LinkedList;

public class Permutation {
    public static void main(String[] args) {
        for(int[] a :permutationsOf(new int[]{1,2,3,4})){
            for(int b:a){
                System.out.print(b+",");
            }
            System.out.println();
           
        }
       
    }
    static ArrayList<int[]> permutationsOf(int[] arr) {
        ArrayList<int[]> result = new ArrayList<int[]>();
        if (arr.length == 1) {

            result.add(arr);
            return result;
        } else {
            int first=arr[0];
            int[] rest=new int[arr.length-1];
            for(int i=1;i<arr.length;i++){
                rest[i-1]=arr[i];
            }
            ArrayList<int[]> simpler=permutationsOf(rest);
            for(int[] permutation:simpler){
                ArrayList additions=insertAtAllPositions(first,permutation);
                result.addAll(additions);
            }
            return result;
           

        }
    }

    private static ArrayList insertAtAllPositions(int first, int[] permutation) {
        // TODO Auto- generated method stub
        ArrayList<int[]> res=new ArrayList<int[]>();
        for(int i=0;i<permutation.length+1;i++){
            LinkedList<Integer> ll=new LinkedList<Integer>();
            for(int j=0;j<permutation.length;j++){
                ll.add(permutation[j]);
               
               
            }
            ll.add(i, first);
            int[] r=new int[ll.size()];
            int count=0;
            for(int z:ll){
                r[count++]=z;
            }
            res.add(r);
        }
        return res;
    }


}

5. Object Restriction:

public class RestrictInstance {
   
        private static final int limit_ =5; //Set this to whatever you want to restrict
        private static int count =0;
        private RestrictInstance(){}
        public static synchronized RestrictInstance getInstance(){
            if(count<limit_){
                RestrictInstance myClass= new RestrictInstance();
                count++;
                return myClass;
            }
            return null;
        }
       
        public static void main(String args[])
        {
            RestrictInstance a= RestrictInstance.getInstance();
            RestrictInstance a1= RestrictInstance.getInstance();
            RestrictInstance a2= RestrictInstance.getInstance();
            RestrictInstance a3= RestrictInstance.getInstance();
            RestrictInstance a4= RestrictInstance.getInstance();
            RestrictInstance a5= RestrictInstance.getInstance();
           
            System.out.println(a.toString());
            System.out.println(a3.toString());
            System.out.println(a4.toString());
            //System.out.println(a5.toString());
        }
}

Solution to problem 1
package prob1;

public class MyStringList {
      private int size = 0;
      private final int INIT_ARR_SIZE = 2;
      private String[] strArray;
     
      public MyStringList(){
            strArray = new String[INIT_ARR_SIZE];
      }
      public void add(String s) {
            if(size >= strArray.length) resize();
            strArray[size++]=s;
      }
      public boolean find(String s) {
            for(int i = 0; i < size; ++i) {
                  if(s.equals(strArray[i])) return true;
            }
            return false;
      }
      public String get(int i) {
            if(i < 0 || i >= size) return null;
            return strArray[i];
      }
      private void resize() {
            System.out.println("Resizing from size "+strArray.length +"to "+2*strArray.length);
            String[] temp = new String[2*strArray.length];
            for(int i = 0; i < strArray.length; ++i) {
                  temp[i] = strArray[i];
            }
            strArray = temp;
      }
      public static void main(String[] args) {
            MyStringList list = new MyStringList();
     
            for(int i = 0; i < 64; ++i) {
                  list.add("a"+i);
            }
            System.out.println("looking for a3 "+list.find("a3"));
            System.out.println("value at position 43 "+list.get(43));
      }
}

Solution to Problem 2
package prob2;

import java.util.HashMap;
import java.util.Iterator;

public class Employee {
      private String firstName;
      private String lastName;
      private HashMap salaryRecord=new HashMap();
     
      public void addEntry(String date, double salary) {
            salaryRecord.put(date,salary);
      }
      public void printPaymentAmount(String date) {
            Double salaryObject = (Double)salaryRecord.get(date);
            if(salaryObject == null){
                  System.out.println(firstName+" "+lastName+" did not receive a paycheck on "+date);
                 
            }
            else {
                  System.out.println(firstName+" "+lastName+" was paid "+salaryObject.doubleValue()+" on "+date);
            }
           
      }
      public void printAveragePaycheck() {
            Iterator it = salaryRecord.keySet().iterator();
            double accum = 0.0;
            int count = 0;
            while(it.hasNext()){
                  String nextDate = (String)it.next();
                  Double sal = (Double)salaryRecord.get(nextDate);
                  accum += sal.doubleValue();
                  ++count;
            }
            System.out.println("Average paycheck for "+firstName+" "+lastName+" was "+accum/count);
      }
     
      public static void main(String[] args) {
            Employee e = new Employee();
            e.setFirstName("Jim");
            e.setLastName("Jones");
            for(int i = 0; i < 12; ++i) {
                  e.addEntry(i+"/15/2006", 3070+5*i);
            }
            e.printPaymentAmount("3/15/2006");
            e.printPaymentAmount("5/15/2005");
            e.printAveragePaycheck();
           
      }
      public String getFirstName() {
            return firstName;
      }
      public void setFirstName(String firstName) {
            this.firstName = firstName;
      }
      public String getLastName() {
            return lastName;
      }
      public void setLastName(String lastName) {
            this.lastName = lastName;
      }

}