L. HappyBase Installation and Configuration
- HappyBase is a developer-friendly Python library to interact with Apache HBase. Read more on HappyBase at URL https://happybase.readthedocs.io/en/latest/
- To explore the HappyBase API, read more at URL https://happybase.readthedocs.io/en/latest/user.html , and it should provides you with enough information to get you started
L1. Install HappyBase
- Login as hduser, and we may access Hbase using Python. We use the HappyBase’ Python library, which accesses Hbase using a Thrift gateway.
- Install HappyBase, and verify the installed packages with the following commands
$ cd ~ $ pip3 install happybase $ python -c 'import happybase'
- Ensure the following services started sequentially, and check them with command jps
- SSH
- HDFS
- YARN
- Kafka
- HBase
Note that a total of ten (10) services can be observed eventually.
- Start up the Thrift service at port 9090 to run at the background
$ cd ~ $ hbase thrift start -p 9090 &
Find the pid for the Thrift service, and stop the Thrift service using the identified pid
$ ps -aux | grep 9090 hduser 9441 1.5 3.1 2928716 152960 pts/2 Sl 07:46 0:05 $ kill 9441
Or,
$ jps 6849 NodeManager 9441 ThriftServer 7233 QuorumPeerMain 8321 HMaster 6227 DataNode 6451 SecondaryNameNode 6053 NameNode 9735 Jps 8539 HRegionServer 6700 ResourceManager 7613 Kafka $ kill 9441
L2. Using HappyBase
- Open another session in terminal, and login as hduser. Then, launch the python command-line interpreter
$ python
When you want to exit the python command-line interpreter, type
>> exit()
- Creating connection to the HBase by importing HappyBase. Then, create connection to localhost
>>> import happybase >>> connection = happybase.Connection(port=9090)
- List all the available HBase tables
>>> print(connection.tables())
- Display all the rows in one of the HBase tables
>>> t = connection.table('linkshare') >>> rs = t.scan() >>> for r in rs: ... print(r)
L3. Creating Tables in HBase
- Open another session in terminal, and login as hduser. Then, launch the python command-line interpreter
$ python
- Create a new HBase table with a single column-family
>>> connection.create_table('mytable', {'cf1':dict()})
If you encounter the following exception,
thriftpy2.transport.base.TTransportException: TTransportException(type=4, message=’TSocket read 0 bytes’)
Run the statement to create a connection to the localhost again and then the statement to create the table once again. List all the available HBase tables to confirm that the table has been successfully created.
- Create a HBase table with 3 column families as described below
- cf1 : maximum 10 columns
- cf2 : maximum 2 columns with the data blocks resident in memory after they are read
- cf3 : no setting of columns/attributes.
>>> connection.create_table('mytable2', {'cf1':dict(max_versions=10),'cf2':dict(max_versions=2, block_cache_enabled=False), 'cf3':dict()})
List all the available HBase tables to confirm that the table has been successfully created.
- Create a tables with table namespace
>>> connection2 = happybase.Connection(host='localhost', port=9090, table_prefix='prj_pfx') >>> connection2.create_table('mytable3', {'cf1':dict(max_versions=5),'cf2':dict(max_versions=2)})
List all the available HBase tables to confirm that the table has been successfully created.
- Invoke the HBase shell to check the tables that were created using HappyBase
$ cd ~/hbase $ bin/hbase shell
- List all the HBase tables
hbase(main):001:0> list
- Show the description of the newly created tables
hbase(main):002:0> describe 'mytable' hbase(main):003:0> describe 'mytable2' hbase(main):004:0> describe 'prj_pfx_mytable3'
L4. Inserting Values into the HBase Table
- Identify a table instance, i.e. mytable, for data insertion after the required connection
>>> c = happybase.Connection(host='localhost', port=9090) >>> t = c.table('mytable') # get a Table instance
- Insert values into mytable cell by cell
>>> t.put('k1', {'cf1:c1':'v1'}) >>> t.put('k1', {'cf1:c2':'v2'}) >>> t.put('k1', {'cf1:c3':'v3'}) >>> t.put('k2', {'cf1:c1':'vv1', 'cf1:c2':'vv2', 'cf1:c3':'vv3'})
- Insert values into table with the namespace prefix
>>> c = happybase.Connection(port=9090) >>> t3 = c.table('prj_pfx_mytable3') >>> t3.put('rk1', {'cf1:country':'UK'}) >>> t3.put('rk1', {'cf1:city':'London'}) >>> t3.put('rk1', {'cf1:industry':'Manufacturing'}) >>> t3.put('rk1', {'cf2:department':'Production', 'cf2:title':'Senior Manager'}) >>> t3.put('rk2', {'cf1:country':'Malaysia', 'cf1:city':'KL', 'cf1:industry':'Software Development', 'cf2:department':'QA', 'cf2:title':'Test Engineer'})
If you encounter a socket.error: [Errno 32] Broken pipe error, make sure that you have obtained the Table instance. If the error still exists, then kill the Thrift process and re-start it.
L5. Retrieving Values from the HBase Tables
- Retrieve values by individual data fields of row-key
>>> row = t3.row(b'rk1')
>>> print(row[b'cf1:industry'])
>>> print(row[b'cf1:country'])
>>> print(row[b'cf1:city'])
>>> print(row[b'cf2:department'])
>>> print(row[b'cf2:title'])
>>> row = t3.row(b'rk2')
>>> print(row[b'cf1:country'])
>>> print(row[b'cf1:city'])
>>> print(row[b'cf1:industry'])
>>> print(row[b'cf2:department'])
>>> print(row[b'cf2:title'])
- Retrieve all the data fields of row-key ‘rk2’
>>> print("data for row-key 'rk2':\n", row)
- Process multiple rows
>>> m_rows = t3.rows(['rk1', 'rk2']) >>> for r in m_rows: ... print("data in current row: ", r)
- Extract the column-families of each row as a dictionary
>>> for r in m_rows: ... r_as_dict = dict(r[1]) ... print("\nr: ", r) ... print("\tAs dict: ", r_as_dict)
- Extract all the cities from the rows
>>> city_list = [] >>> for r in m_rows: ... r_as_dict = dict(r[1]) ... city_list.append(r_as_dict[b'cf1:city']) ... >>> print(city_list)
- Fine-grained selections - multiple cell retrieval
>>> c = happybase.Connection(port=9090) >>> t3 = c.table('prj_pfx_mytable3') >>> row = t3.row(b'rk1', columns=[b'cf1:country', b'cf2:title']) >>> print(row) >>> print(row[b'cf1:country']) >>> print(row[b'cf2:title'])
- Fine-grained selections - all columns belonging to a column-family
>>> c = happybase.Connection(port=9090) >>> t3 = c.table('prj_pfx_mytable3') >>> row = t3.row(b'rk1', columns=[b'cf1']) >>> print(row)
- Retrieve with timestamps
>>> c = happybase.Connection(port=9090) >>> t3 = c.table('prj_pfx_mytable3') >>> row = t3.row(b'rk1', columns=[b'cf1:country'], include_timestamp=True) >>> value, timestamp = row[b'cf1:country'] >>> print(value) >>> print(timestamp)
- Retrieve cells based on versions
>>> cells = t3.cells(b'rk1', b'cf1:country', versions=3, include_timestamp=True) >>> for value, timestamp in cells: ... print("Cell data at {}:{}".format(timestamp, value))
- Iterating through rows and extracting data fields
>>> for key, data in t3.scan(): ... print("\nCompany record key: ", key.decode()) ... print("\tCompany info:") ... print("\t\tLocation: {}, {}".format(data[b'cf1:city'].decode(), data[b'cf1:country'].decode())) ... print("\t\tIndustry: {}".format(data[b'cf1:industry'].decode())) ... print("\tContact person info:") ... print("\t\tTitle & Department: {}, {}".format(data[b'cf2:title'].decode(), data[b'cf2:department'].decode())) ...
- Select range of rows starting from row_start
>>> for key, data in t3.scan(row_start=b'rk2'): ... print(key, data)
- Select range of rows up to row_stop
>>> for key, data in t3.scan(row_stop=b'rk2'): ... print(key, data)
- Select range based on specified key range
>>> for key, data in t3.scan(row_start=b'rk1', row_stop=b'rk3'): ... print(key, data) ...
Have a look from hbase shell
hbase> scan 'prj_pfx_mytable3'
- Select range based on key prefix
>>> t3.put('kr4', {'cf1:country':'Australia', 'cf1:city':'Sydney', 'cf1:industry':'Tourism', 'cf2:department':'Human Resource', 'cf2:title':'Manager'}) >>> for key, data in t3.scan(row_prefix=b'k'): ... print(key, data) ...
L6. Deleting Values from the HBase Tables
- Delete a row
>>> t3.delete(b'rk2')
- Delete specific cells of a row
>>> t3.delete(b'rk1', columns=[b'cf1:city', b'cf2:title'])