$ cd ~
$ hadoop version
Reinstall the Hadoop if it is not version 3.3.6. Remember to delete the existing hadoop3 directory before begin your setup.
$ pyspark --version
Reinstall the Spark if it the PySpark not version 3.5.0 (for Hadoop 3.3.6 and later). Remember to delete the existing spark directory before begin your setup.
$ scala -version
$ ll ~/kafka/libs | grep kafka
Reinstall the Kafka if it is not those version of your install Scala, say 2.13.x as shown in file kafka_(Scala-version)-(Kafka-version).*. Remember to delete the existing kafka directory before begin your setup.
$ cd ~
$ wget https://archive.apache.org/dist/hbase/2.5.7/hbase-2.5.7-bin.tar.gz
$ tar -xvzf hbase-2.5.7-bin.tar.gz
$ mv hbase-2.5.7 hbase
Find the current stable release of HBase that is compatible with your version of Hadoop at here (https://hbase.apache.org/book.html#hadoop), and you may find a list of releases at Apache Hbase download page (https://hbase.apache.org/downloads.html). For example, you may adopt version Hbase 2.5.x for the installed Hadoop 3.3.6.
$ cd ~/kafka
$ bin/kafka-server-stop.sh
$ bin/zookeeper-server-stop.sh
$ jps
Attention! Please wait at least 30 seconds after issuing each command. Responses might be slow, and do use the command jps to observe the termination of services, i.e. HQuorumPeer and Kafka.
export HBASE_HOME=/home/hduser/hbase
export PATH=$HBASE_HOME/bin:$PATH
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HBASE_HOME=/home/hduser/hbase
export HBASE_CLASSPATH=${HBASE_HOME}/lib
export HBASE_REGIONSERVERS=${HBASE_HOME}/conf/regionservers
export HBASE_MANAGES_ZK=false
Please start both zookeeper and kafka servers before this step. If you are planning to run the Kafka service, then in hbase-env.sh change HBASE_MANAGES_ZK from true to false
<configuration>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:9000/hbase</value>
</property>
<property>
<name>hbase.wal.provider</name>
<value>filesystem</value>
</property>
</configuration>
$ cd ~/kafka
$ bin/zookeeper-server-start.sh config/zookeeper.properties &
$ bin/kafka-server-start.sh config/server.properties &
Attention! Please wait at least 30 seconds after issuing each command. Responses may be slow to start following your recent configuration. Please verify that Zookeeper and Kafka are running by running the jps command, i.e. you should see the process: HQuorumPeer and Kafka, before proceed to the next step.
$ cd ~
$ ~/hbase/bin/start-hbase.sh
$ jps
Verify that HBase is running - you should see the HBase processes with jps command, i.e. HMaster and HRegionServer. Note that you may ignore the errors due to SLF4J during HBase startup, as it will eventually perform logging to a plain text in local file system. When you want to stop HBase, type the following commands
$ cd ~ $ ~/hbase/bin/stop-hbase.sh $ jps
You may choose to clear all the HBase data, after stopped both HMaster and HRegionServer, using the following commands
$ hdfs dfs -ls / $ hdfs dfs -rm -r /hbase $ hdfs dfs -ls /
Delete the log4j-slf4j-impl-2.17.2.jar file (optional)
$ rm ~/hive/lib/log4j-slf4j-impl-2.17.2.jar
We delete the file log4j-slf4j-impl-2.17.2.jar because the similar file is also presented in the Hadoop directory, and it gives error to us occasionally
$ cd ~
$ ~/hbase/bin/hbase shell
hbase(main):001:0> help
hbase(main):002:0> status
hbase(main):003:0> list
hbase(main):004:0> exit
hbase> create 'linkshare', 'link'
hbase> disable 'linkshare'
hbase> alter 'linkshare', 'statistics'
hbase> enable 'linkshare'
Note that to alter the table (e.g. change column-family, add column-family, etc) after it has been created, you need to first disable the table to prevent clients from assessing the table during the alter operation.
hbase> describe 'linkshare'
hbase> put 'linkshare', 'org.hbase.www', 'link:title', 'Apache HBase'
hbase> put 'linkshare', 'org.hadoop.www', 'link:title', 'Apache Hadoop'
hbase> put 'linkshare', 'com.oreilly.www', 'link:title', 'O\'Reilly.com'
hbase> incr 'linkshare', 'org.hbase.www', 'statistics:share', 1
hbase> incr 'linkshare', 'org.hbase.www', 'statistics:like', 1
hbase> incr 'linkshare', 'org.hbase.www', 'statistics:share', 1
hbase> get_counter 'linkshare', 'org.hbase.www', 'statistics:share'
hbase> get 'linkshare', 'org.hbase.www'
The get command also accepts an optional dictionary of parameters to specify the column(s), timestamp, timerange, and version of the cell values to be retrieved. e.g.
hbase> get 'linkshare', 'org.hbase.www', 'link:title' hbase> get 'linkshare', 'org.hbase.www', 'link:title', 'statistics:share' hbase> get 'linkshare', 'org.hbase.www', ['link:title', 'statistics:share'] hbase> get 'linkshare', 'org.hbase.www', {TIMERANGE => [1399887705673, 1400133976734]} hbase> get 'linkshare', 'org.hbase.www', {COLUMN => 'statistics:share', VERSIONS => 2}
hbase> scan 'linkshare'
hbase> scan 'linkshare', {COLUMNS => ['link:title'], STARTROW => 'org.hbase.www'}
hbase> scan 'linkshare', {COLUMNS => ['link:title'], STARTROW => 'org'}
hbase> import org.apache.hadoop.hbase.util.Bytes
hbase> import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
hbase> import org.apache.hadoop.hbase.filter.BinaryComparator
hbase> import org.apache.hadoop.hbase.filter.CompareFilter
Create a filter that limits the results to rows where the statistics:like counter column value is less than or equal to 10
likeFilter = SingleColumnValueFilter.new(Bytes.toBytes('statistics'), Bytes.toBytes('like'), CompareFilter::CompareOp.valueOf('GREATER_OR_EQUAL'), BinaryComparator.new(Bytes.toBytes(10)))
Set a flag for the filter to skip any rows without a value in this column
hbase> likeFilter.setFilterIfMissing(true)
Run a scan with the configured filter
hbase> scan 'linkshare', { FILTER => likeFilter }
hbase> create 't1', 'cf1'
hbase> create 'emp', 'personal data', 'professional data'
hbase> list
hbase> describe 'emp'
hbase> create_namespace 'ns1'
hbase> create 'ns1:t1', {NAME=>'cf1', VERSIONS=>5}
hbase> list
Create a table named ‘t2’ in the namespace ‘ns1’ with two column families ‘cf1’ and ‘cf1’
hbase> create 'ns1:t2', 'cf1', 'cf2'
hbase> list
hbase> describe 'ns1:t2'
hbase> alter 'ns1:t2', 'cf3', 'cf4', 'cf5'
hbase> describe 'ns1:t2'
hbase> alter 'ns1:t2', NAME=>'cf3', METHOD=>'delete'
Delete the column family ‘cf4’ of the table ‘ns1:t2’
hbase> alter 'ns1:t2', 'delete'=>'cf4'
hbase> describe 'ns1:t2'
hbase> alter 'emp', {NAME=>'personal data', VERSIONS=>5}
hbase> put 'ns1:t2', 'key1', 'cf1:name', 'John'
hbase> put 'ns1:t2', 'key1', 'cf1:id', 19191919
hbase> put 'ns1:t2', 'key1', 'cf2:city', 'London'
hbase> put 'ns1:t2', 'key1', 'cf2:country', 'UK'
hbase> scan 'ns1:t2'
hbase> put 'emp', '1001', 'personal data:name', 'Thor'
hbase> put 'emp', '1001', 'personal data:city', 'Kuala Lumpur'
hbase> put 'emp', '1001', 'professional data:designation', 'manager'
hbase> put 'emp', '1001', 'professional data:email', 'thor@mail.abc.com'
hbase> scan 'emp'
hbase> t = get_table 'ns1:t2'
hbase> t.scan
hbase> t = get_table 'ns1:t1'
hbase> t.put 'key2', 'cf1:city', 'KL'
hbase> t.put 'key2', 'cf1:id', 87654321
hbase> t.put 'key2', 'cf1:name', 'Minnie'
hbase> scan 'ns1:t1'
hbase> put 'ns1:t2', 'key1', 'cf1:name', 'Jack'
hbase> t.put 'key1', 'cf1:city', 'Manchester'
hbase> t.scan
hbase> get 'ns1:t1', 'key1'
hbase> get 'ns1:t1', 'key2', {COLUMN=>'cf1:city'}
hbase> scan 'ns1:t1', {COLUMNS => ['cf1:name', 'cf1:city']}
hbase> t.scan
hbase> t.count
hbase> delete 'ns1:t1', 'key1', 'cf1:city'
hbase> scan 'ns1:t1'
hbase> deleteall 'ns1:t1', 'key1'
hbase> scan 'ns1:t1'
hbase> list
hbase> drop 't1'
hbase> disable 't1'
hbase> drop 't1'
hbase> list
$ su - hduser
$ sudo service ssh start
$ sbin/start-dfs.sh
$ jps
Suppose you need to observe at least four (4) services, including both NameNode and DataNode, as stated in step 12 of G7. Otherwise, you may need to reformat the HDFS NameNode.
$ cd ~/hadoop3 $ bin/hdfs namenode -format $ sbin/start-dfs.sh $ hdfs dfs -mkdir /user $ hdfs dfs -mkdir /user/hduser
$ sbin/start-yarn.sh
$ jps
Suppose you need to observe at least six (6) services in the total, refer steps 12 and 13 of G7.
$ cd ~/kafka
$ bin/kafka-server-stop.sh
$ bin/zookeeper-server-stop.sh
$ bin/zookeeper-server-start.sh config/zookeeper.properties &
$ bin/kafka-server-start.sh config/server.properties &
Please wait at least 30 seconds after issuing each command. Note that responses may be slow. Suppose you need to observe at least eight (8) services in the total, including both HQuorumPeer and Kafka, as stated in step 6 of K2.
$ cd ~
$ $HBASE_HOME/bin/start-hbase.sh
$ jps
Suppose you need to observe at least ten (10) services in the total, including both HMaster and HRegionServer, as stated in step 7 of K2.
$ cd ~
$ $HBASE_HOME/bin/stop-hbase.sh
$ jps
Suppose the two services will be terminated, i.e. HMaster and HRegionServer.
$ cd ~/kafka
$ bin/kafka-server-stop.sh
$ bin/zookeeper-server-stop.sh
Attention! Please wait at least 30 seconds after issuing each command. Responses may be slow. Suppose the two services will be terminated, i.e. HQuorumPeer and Kafka
$ cd ~/hadoop3
$ sbin/stop-yarn.sh
$ sbin/stop-dfs.sh
$ top
Ctrl-c to terminate the command top
exit