Don't forget to create account on our site to get access to more material made only for free registered user.  

Don't forget to create account on our site to get access to more material made only for free registered user.  

1.       Open a connection with HBase. HBase shell is mainly for Administrative tasks.

$ hbase shell

HBase Shell; enter 'help<RETURN>' for list of supported commands.

Type "exit<RETURN>" to leave the HBase Shell

Version 0.92.0, r1231986, Mon Jan 16 13:16:35 UTC 2012



2.       Creating table in HBase Shell (It is not necessary that you create a table with the predefined column name, you can store any type of data in any table hence it is called schema-less database).

hbase(main):001:0> create 'users', 'info'

0 row(s) in 0.1200 seconds



Name of the table: users (In HBase table contains the rows and column)

Column family: info (Columns in HBase are organized into groups and called column family). Column family also impact how the data should be stored physically in the HDFS file system, hence there is a mandate that you should always have at least one column family. We can also alter the column families once the table is created.


3.       List all the tables in HBase :

hbase(main):002:0> list



1 row(s) in 0.0220 seconds


4.       Describe individual table in HBase: There are two properties for this table

1.       Name : Table name

2.       FAMILIES : Properties of the column families.

hbase(main):003:0> describe 'users'


{NAME => 'users', FAMILIES => [{NAME => 'info', true



=> '2147483647', BLOCKSIZE => '65536', IN_MEMOR

Y => 'false', BLOCKCACHE => 'true'}]}

1 row(s) in 0.0330 seconds


Now Talk about HBase API


5.       Opening a connection with the HBase users table with below line of code, as we are not providing any configuration information hence it is using the default one.


HTableInterface usersTable = new HTable("users");




6.       How to pass the configuration: HBase client applications need to have only one configuration piece available to them to access HBase—the ZooKeeper quorum address.

Configuration myConf = HBaseConfiguration.create();

HTableInterface usersTable = new HTable(myConf, "users");


myConf.set("hbase.zookeeper.quorum", "serverip");


For now, all you need to know is that the configuration parameters can be picked either by the Java client from the hbase-site.xml file in their classpath or by you setting the configuration explicitly in the connection. When you leave the configuration completely unspecified, as you do in this sample code, the default configuration is read and localhost is used for the ZooKeeper quorum address.


7.       HBase connection pooling: As we do in JDBC, we always keep the pool of connection at the start of the applications (or it could be lazy as well). So the creating connection again and again is network overhead.

HTablePool pool = new HTablePool();

HTableInterface usersTable = pool.getTable("users");

... // work with the table



Closing the table when you’re finished with it allows the underlying connection resources to be returned to the pool.


8.       RowKey : Every row in an HBase table has a unique identifier called its rowkey (Which is equivalent to Primary key in RDBMS, which would be distinct throughout the table). Every interaction you are going to do in database will start with the RowKey only.


9.       Main commands you will be using to interact with the HBase tables are

Get, Put, Delete, Scan, and Increment


10.   Now you want to put (insert) the data in a table. It will always need a key as below.

Put p = new Put(Bytes.toBytes("John Smith"));


All the data in the HBase is stored as raw byte Array (10101010). Now the put instance is created which can be inserted in the HBase users table.


11.   Now adding additional information about “John Smith”

Put p = new Put(Bytes.toBytes("John Smith userID"));



Bytes.toBytes("John Smith"));



Bytes.toBytes("This email address is being protected from spambots. You need JavaScript enabled to view it."));





12.   HBase Co-ordinates: HBase uses the coordinates to locate a piece of data within a table. The RowKey is the first coordinate. Following three co-ordinates define the location of the cell.

1.       RowKey

2.       Column Family (Group of columns)

3.       Column Qualifier (Name of the columns or column itself e.g. Name, Email, Address)

Co-ordinates for the John Smith Name Cell.

["John Smith userID", “info”, “name”]

13.   Writing data to HBase table:

HTableInterface usersTable = pool.getTable("users");

Put p = new Put(Bytes.toBytes("John Smith userID"));





14.   Changing the existing data : create a Put object, give it some data at the appropriate coordinates, and send it to the table

Put p = new Put(Bytes.toBytes("John Smith userID"));






15.   Understand HBase write Path :

a.       Whether you use Put to record a new row in HBase or to modify an existing row, the internal process is the same.

b.      HBase receives the command and persists the change, or throws an exception if the write fails.

c.       When a write is made, by default, it goes into two places:

a.       the write-ahead log (WAL), also referred to as the HLog

b.      and the MemStore

d.      The default behavior of HBase recording the write in both places is in order to maintain data durability. Only after the change is written to and confirmed in both places is the write considered complete.

e.      The MemStore is a write buffer where HBase accumulates data in memory before a permanent write.

f.        Its contents are flushed to disk to form an HFile when the MemStore fills up.

g.       It doesn’t write to an existing HFile but instead forms a new file on every flush.

h.      The HFile is the underlying storage format for HBase.

i.         HFiles belong to a column family and a column family can have multiple HFiles.

j.        But a single HFile can’t have data for multiple column families.

k.       There is one MemStore per column family. (The size of the MemStore is defined by the system-wide property in hbase-site.xml called hbase.hregion.memstore.flush.size)

HBase write path. Every write to HBase requires confirmation from both the WAL and the MemStore. The two steps ensure that every write to HBase happens as fast as possible while maintaining durability. The MemStore is flushed to a new HFile when it fills up.


16.    Failures during write: Failures are common in large distributed systems, and HBase is no exception.

Imagine that the server hosting a MemStore that has not yet been flushed crashes. You’ll lose the data that was in memory but not yet persisted. HBase safeguards against that by writing to the WAL before the write completes. Every server that’s part of the.


HBase cluster keeps a WAL to record changes as they happen. The WAL is a file on the underlying file system. A write isn’t considered successful until the new WAL entry is successfully written. This guarantee makes HBase as durable as the file system backing it. Most of the time, HBase is backed by the Hadoop Distributed Filesystem (HDFS). If HBase goes down, the data that was not yet flushed from the MemStore to the HFile can be recovered by replaying the WAL. You don’t have to do this manually. It’s all handled under the hood by HBase as a part of the recovery process. There is a single WAL per HBase server, shared by all tables (and their column families) served from that server. As you can imagine, skipping the WAL during writes can help improve write performance.

There’s one less thing to do, right? We don’t recommend disabling the WAL unless you’re willing to lose data when things fail. In case you want to experiment, you can disable the WAL like this:

Put p = new Put();



NOTE Not writing to the WAL comes at the cost of increased risk of losing data in case of RegionServer failure. Disable the WAL, and HBase can’t recover your data in the face of failure. Any writes that haven’t flushed to disk will be lost.

You have no rights to post comments

Don't forget to create account on our site to get access to more material made only for free registered user.