Single node hadoop cluster using Cloudera Manager

  1. Launch an EC2 instance with minimum of 8/16GB RAM. (m4.large/m4.xlarge)
  2. Add the yum repository inside /etc/yum.repos.d/cloudera.repo like below:
    • Choose appropriate CDH version as per your requirement from RHEL6 / RHEL7
    • [cloudera-cdh5]
      # Packages for Cloudera Manager for Hadoop, Version 5.8.2, on RedHat or CentOS 6 x86_64
      name=Cloudera Manager for Hadoop, Version 5.8.2
      baseurl=http://archive.cloudera.com/cm5/redhat/6/x86_64/cm/5.8.2/
      gpgkey=http://archive.cloudera.com/cm5/redhat/6/x86_64/cm/RPM-GPG-KEY-cloudera
      gpgcheck=1
  3. Clean YUM repo : sudo yum clean all
  4. Java Installation
    • sudo yum install oracle-j2sdk1.7-1*
  5. Mysql install
    • yum install mysql-server
    • service mysqld start
    • /usr/bin/mysql_secure_installation – Set password
  6. Create cmdb,hivedb,huedb and whatever required as per the services you need
    1. create database cmdb;
      create user 'cmuser' identified by 'myPass@123';
      grant all on cmdb.* to 'cmuser' identified by 'myPass@123';
      grant all on cmdb.* to 'cmuser'@'localhost' identified by 'myPass@123';
      grant all on cmdb.* to 'cmuser'@'%' identified by 'myPass@123';
      grant all on cmdb.* to 'cmuser'@'<<hostname>>' identified by 'myPass@123';
      
      create database hivedb;
      create user 'hive' identified by 'myPass@123';
      grant all on hivedb.* to 'hive' identified by 'myPass@123';
      grant all on hivedb.* to 'hive'@'localhost' identified by 'myPass@123';
      grant all on hivedb.* to 'hive'@'%' identified by 'myPass@123';
      grant all on hivedb.* to 'hive'@'<<hostname>>' identified by 'myPass@123';
  7. Install cloudera-manager-server
    1. yum install cloudera-scm-sever
  8. Install cloudera-manager-agent
    1. yum install cloudera-scm-agent
  9. Edit db.properties like below – sudo vi /etc/cloudera-scm-server/db.properties
    1. # Copyright (c) 2012 Cloudera, Inc. All rights reserved.
      #
      # This file describes the database connection.
      #
      # The database type
      # Currently 'mysql', 'postgresql' and 'oracle' are valid databases.
      com.cloudera.cmf.db.type=mysql
      # The database host
      # If a non standard port is needed, use 'hostname:port'
      com.cloudera.cmf.db.host=localhost
      # The database name
      com.cloudera.cmf.db.name=cmdb
      
      # The database user
      com.cloudera.cmf.db.user=cmuser
      
      # The database user's password
      com.cloudera.cmf.db.password=myPass@123

       

  10. sudo yum install mysql-connector-java
  11. Start the scm-server
    1. sudo service cloudera-scm-server start
      sudo service cloudera-scm-server start
  12. Start the scm-agent
    1. sudo service cloudera-scm-server start
      sudo service cloudera-scm-server start
  13. http://<<hostname/ipaddress>&gt;:7180/ (check inbound rules/disable firewall if not opening)
  14. Continue the steps mentioned and complete the installation.
  15. Below are some commands which might be helpful for setting proper permissions for directories if something gets wrong during installation.
    1. chown -R yarn:yarn hadoop-yarn
      chown -R mapred:mapred hadoop-mapreduce
      chown -R hdfs:hdfs hadoop-hdfs
      chown -R httpfs:httpfs hadoop-httpfs
      chown -R kms:kms hadoop-kms
      chmod 770 /var/lib/hadoop-yarn
      chmod 770 /var/lib/hadoop-hdfs
      chmod 770 spark/
      chmod 770 oozie/
      chmod 770 hive/
      chmod 770 impala
      chmod 770 hbase

       

  16. This is completely verified, please let me know in the comments if you face any issues or need any help.

 

 

Advertisements

Tiger VNC server in Linux distributions

Installation:

  • yum install tigervnc-server

Configure for Single/Multiple users:

  • Edit vi /etc/sysconfig/vncservers and add the following
  • VNCSERVERS=”<<display_number>>:<<user_name>>”
  • Login to each user and set VNC password using command : vncpasswd
  • Then from the user login, start the server using command : vncserver :<<display_number>>

List all running VNC servers;

  • vncserver -list

Kill VNC display:

  • vncserver -kill :2

 

Multiple python versions – Anaconda create virtual environment

Install Anaconda on Linux – Last post we saw how to install Anaconda in Linux.

Now lets see how to create multiple python versions on same machine using Anaconda and using them with any compatibility issues.

Creating a new python version:

  • conda create –name <> python=<>
  • E.g : conda create –name py36 python=3.6

Using particular Python version:

There are multiple ways you can use the desired python versions.

  1. source activate <>
    • This activates the desired python env and you can install any python packages which reflects only for the activated version.
    • Also any python code you execute will use the activated version.
    • To exit this virtual env, type : source deactivate <>
  2. You can invoke directly from installed directory like below:
    • /opt/anaconda/py36/bin/python – (This invokes python 3.6)
  3. Add to PATH environmental variable and add inn bashrc:
    • export PATH=/opt/anaconda/py36/bin:$PATH
    • After adding to bashrc, type : source ~/.bashrc

List Installed Python Versions:

  • conda list