Linux by Ripudaman Singh

Image
Customize the Bash [Linux] Your terminal window generally has username@hostname but you can change that we will discuss here how to change it.   link for details: Read-1 Read-2 Read-3 Let's make the Example setup as in above picture with live time cp ~/.bashrc ~/.bashrc.bak  In case of a problem it will help to get back the old file  nano ~/.bashrc Add the follwing line at the Last PS1="\e[0;35m\w\e[m\n\@ \e[0;35m$\e[m " close the nano by ctrl X, Y enter. making Changes active by source ~/.bashrc To customize in your own way see the above links. Get the localhost at some other name [Linux] Open your terminal and type  sudo nano /etc/hosts map your name with localhost address like 127.0.0.1       localhost 127.0.1.1       ripudaman # The following lines are desirable for IPv6 capable hosts ::1 ctrl X, y enter All done you can access the localhost at your describe name for me its ripudaman (helpful when making twitt...

Hadoop



Hadoop is a framework for the analysis of big data. Hadoop have 2 components.
1) hdfs . Hadoop distributed file system.
2) mapreduce . (Processing big data)


Hdfs.. It is basically a file storing system. Used to store big file in the form of small cluster. Master slave process is followed. If we are using hdfs for small file storing then it is not worthy. In hdfs suppose a data is of 1TB then it store in different slave having data 128mb. Also the replica of data also take place in Hadoop so the the loss of data can be reduce.


Mapreduce ... It's is process of analysis of data stored in hdfs.
Following steps are involved for mapreduce ( * at basic level)
1) splitting  . First of all big data values is splits into small small parts . suppose I have a data of 3 line then it split the data in 3 parts each part have 9ne line
2) mapping. Then the mapping of data take place . It is a process of making keys and values.
Suppose I have a data like car car bus then after mapping we have car--> 1,  car --> 1, bus --> 1
3) shuffling.  After mapping the shuffling of data take place . Suppose car is present in different line of part of splitting then it collects all the keys value of car together
4) reducing. Let understand reducing by simple example. Suppose after shuffling data we have is car --> 1 car--> 2
After reducing data we have is car--> 3
Similarly we will the reduced data from different slave.
And the process of mapreduce is success.


Different key points of Hadoop.
Data is inserted in hdfs by flumes and sqoop
Yarn is the main processing component of Hadoop. Yet another resource negotiator.
Some more tools used in Hadoop are pig hive spark etc



Hadoop cluster is of 3 type.
1 standalone Hadoop cluster:: There is no deamons in this cluster and no distributed file system is followed
2) pseudo Hadoop cluster :: all the Hadoop deamons are in local machine distributed file system is followed.
3) multi node Hadoop cluster::  It is a mode in which deamons run on cluster of machine.


Remaining topics

what are the deamons of hadoop 1
what are the deamons with hadoop2
hadoop1 Vs hadoop 2
architecture
concept of racks
reading a file
 writing a file
modes of operation

Comments

Post a Comment

Popular posts from this blog

GIT tool for GITHUB/

Linux by Ripudaman Singh

CS IT PDFS