CHAPTER 6
Stargate is the name of the HBase REST API, which makes data available to read and write over HTTP. Stargate exposes data from URLs that match the table structure (e.g., /access-logs/rk1 would get the row with key rk1 from the access-logs table).
The HTTP verbs GET, POST and DELETE are used to work with data as resources, which gives Stargate a nicely RESTful interface. You can work with rows and cells in JSON, but the downside with the API is that all data is represented as Base64 strings, which are encoded from the raw byte arrays in HBase. That makes the API awkward if you just want to browse with a REST client like Postman or cURL.
Like the Thrift API, Stargate is a separate service, which you can start with hbase-daemon.sh start rest. By default, it listens on port 8080 (and is already running on the hbase-succinctly Docker image).
In production you can run Stargate on the Region Servers, but if you want to use it as your primary interface for HBase, you should consider load balancing with a separate server.
Tip: I walk through creating a load balancing reverse-proxy using Nginx to front Stargate in this blog post.
You can use any framework with an HTTP client to talk to Stargate. In this chapter we'll use cURL to see the raw HTTP data, and a .NET client library to work at a higher level of abstraction.
Code Listing 48 shows a GET request (the default verb in cURL) to the root Stargate URL. The response is the same as a list command in the HBase Shell:
Code Listing 48: Listing tables with cURL
$ curl http://127.0.0.1:8080 access-logs social-usage |
An HTTP GET request is equivalent to a get command in the HBase shell. You can add an Accept header to specify the format you want in the response, but there are some restrictions to the amount of data Stargate will serve.
If you try to request a whole table, e.g. http://127.0.0.1:8080/access-logs, you'll get an error response, with status code 405, meaning the method isn’t allowed. You can't fetch a whole table in HBase, and the 405 is Stargate's implementation of that.
You can fetch a whole row by adding the row key after the table name, but if you have any HTTP-unfriendly characters in your row format, you'll need to escape them in the URL. You also can't get a plain-text representation of a row; you need to make the request with a data-focused format like JSON.
Code Listing 49 shows a whole row read from Stargate. Note that the pipe character in the row key has been escaped as %7C, and the data values in the response—row key, column qualifiers and cell values—are all Base64-encoded strings (I've applied formatting to the response; there's no whitespace returned by Stargate).
Code Listing 49: Reading a row with cURL
$ curl -H accept:application/json http://127.0.0.1:8080/access-logs/elton%7Cjericho%7C201511 { "Row": [{ "key": "ZWx0b258amVyaWNob3wyMDE1MTE=", "Cell": [{ "column": "dDoxMTA2", "timestamp": 1447228701460, "$": "MTIw" }, { "column": "dDoxMTA3", "timestamp": 1447228695240, "$": "NjUw" }] }] } |
The response is a JSON object that contains an array of Row objects. Each row has a key field and an array of Cell objects. Each cell has a column qualifier, a cell value (stored in the $ field), and a timestamp. Timestamps are the only values HBase stores with an interpretation; they are long integers, storing the UNIX timestamp when the row was updated.
Other values are Base64 strings, which means you need to decode the fields in a GET response. Table 6 shows the decoded values for one column in the response:
Field | Value | Decode value |
|---|---|---|
Row.key | ZWx0b258amVyaWNob3wyMDE1MTE= | elton|jericho|201511 |
Cell.column | dDoxMTA2 | t:1106 |
Cell.$ | MTIw | 120 |
Table 6: Encoded and Decoded Stargate Values
Note that the column value in the cell is the full name (column family plus qualifier, separated by a colon), and the numeric cell value is actually a string.
You can also fetch a single-column family in a row from Stargate (with the URL format /{table}/{row-key}/{column-family}, or a single cell value. For single cells, you can fetch them in plain text and Stargate will decode the value in the response, as in Error! Reference source not found.:
Code Listing 50: Fetching a single cell value
$ curl http://127.0.0.1:8080/access-logs/elton%7Cjericho%7C201511/t:1106 120 |
The semantics of a PUT request with Stargate are much the same as a put command in the HBase Shell. You specify the desired end state in your request, and HBase does the rest—creating the row or column if it doesn't exist, and setting the value.
You have to use a data format to make updates through Stargate; otherwise you'll get a 415 error, “Unsupported Media Type.” Using JSON, you mirror the format from a GET response, so you can send multiple cell values for multiple rows in a single request.
The URL format still requires a row key as well as a table, but with a PUT the key in the URL gets ignored in favor of the key(s) in the request data. Code Listing 51 shows a PUT request that updates the two cells in my row:
Code Listing 51: Updating cells with Stargate
$ curl -X PUT -H "Content-Type: application/json" -d '{ "Row": [{ "key": "ZWx0b258amVyaWNob3wyMDE1MTE=", "Cell": [{ "column": "dDoxMTA2", "timestamp": 1447228701460, "$": "MTMw" }, { "column": "dDoxMTA3", "timestamp": 1447228695240, "$": "NjYw" }] }] }' 'http://127.0.0.1:8080/access-logs/FAKE-KEY' |
Tip: Stargate is very handy for ad-hoc requests, but working with Base64 can be difficult. I've blogged about making it easier with a couple of simple tools here.
That PUT request increments the numeric values that are actually strings in my row. The incr command to atomically increment counter columns isn't available through Stargate, so if you need to increment values, then you have to read them first and then PUT the update.
You can do more with cURL, like sending DELETE requests to remove data, and creating scanners to fetch multiple rows, but the syntax gets cumbersome. Using a wrapper around the RESTful API in Stargate is a better option.
NuGet is the Package Manager for .NET apps, and there are a couple of open-source packages that are wrappers for accessing Stargate. Microsoft has a package specifically for HBase clusters running on the Azure cloud, and there's a third-party package from authors “The Tribe” for working with Stargate generically.
The package does a good job of abstracting the internals of Stargate and lets you work with HBase data intuitively. It's IoC-aware (using Autofac by default), so you can easily tweak the HTTP setup and build a data-access layer, which you can mock out for testing.
To add that package and its dependencies to your .NET app, you can use the NuGet Package Manager Console command in Code Listing 52:
Code Listing 52: Adding a NuGet Reference for Stargate
Install-Package "HBase.Stargate.Client.Autofac" |
In the GitHub repository for this book, there's a .NET console app that uses The Tribe's client to connect to Stargate, running in the hbase-succinctly Docker container.
To setup the Stargate client, you need to configure the server URL and build the container. Code Listing 53 shows how to do that with Autofac:
Code Listing 53: Configuring the Stargate client
var builder = new ContainerBuilder(); builder.RegisterModule(new StargateModule(new StargateOptions { ServerUrl = "http://127.0.0.1:8080" })); var container = builder.Build(); var stargate = container.Resolve<IStargate>(); |
The StargateOptions object contains the Stargate (or proxy) URL, and the StargateModule contains all the other container registrations. The IStargate interface you get from the container provides access to all the Stargate client operations, with a neat abstraction and with all data items encoded as strings.
The Stargate client has two methods for reading data. The simplest is ReadValue(), which fetches a specific cell value, passing it the table name, row key, column family, and qualifier. This is functionally equivalent to a GET request with a URL containing the table name, row key, and column name, which returns a single cell value encoded as a string, as in Code Listing 54:
Code Listing 54: Reading a cell with IStargate
var value = stargate.ReadValue("access-logs", "elton|jericho|201511", "t", "1106"); //value is "120" |
Alternatively, you can fetch a CellSet object using the FindCells() method, which returns a collection of rows and cells. This is like issuing a GET request for a full row, or a column family in a row. The CellSet is an enumerable collection, which you can query with LINQ as in Code Listing 55:
Code Listing 55: Finding cells with IStargate
var cellSet = stargate.FindCells("access-logs", "elton|jericho|201511"); var value = cellSet .First(x => x.Identifier.CellDescriptor.Qualifier == "1106").Value; //value is "120" |
Note that the FindCells() call makes the GET request to Stargate, which returns all the data in the CellSet, and the LINQ query runs over the CellSet in memory on the .NET client.
The ReadValue() call will return quickly because it's fetching a single piece of data, and the FindCells() call will be quick for Stargate to serve, but could take longer for the client to receive if there are many cells that contain a lot of data.
An alternative way to fetch data from Stargate is to create a row scanner, which is like server-side cursor that you can use to read multiple rows, and the scanner can optionally have a filter to limit the number of cells that get returned.
There are two parts to scanning rows with Stargate. Firstly, you create the scanner, which runs on the server. Stargate gives you a reference to the scanner, which you can use to fetch rows. In the .NET client, you use a ScannerOptions object to specify the start and end rows for the scan.
Code Listing 56 shows how to create a scanner to retrieve all the access logs for one user for one system from October 2015 onwards:
Code Listing 56: Creating a Scanner with IStargate
var options = new ScannerOptions { TableName = "access-logs", StartRow = "elton|jericho|201510", StopRow = "elton|jericho|x", }; var scanner = stargate.CreateScanner(options); |
When Stargate returns, the scanner is running on the server and you can loop through rows from the start up to the end row key, using the MoveNext() method to fetch the next set of cells from Stargate, as in Code Listing 57:
Code Listing 57: Iterating Through a Scanner with IStargate
var totalUsage = 0; while (scanner.MoveNext()) { var cells = scanner.Current; foreach (var cell in cells) { totalUsage += int.Parse(cell.Value); } } //totalUsage is 850 |
Note: The cells returned when you iterate through a scanner could be from multiple rows, so don't assume MoveNext (or the equivalent in other clients) moves on to the next row—it moves on to the next set of cells, which could be from many rows.
Scanning by row key is the fastest way to read data from Stargate, but if you need to additionally restrict the results by the data in the columns (the qualifiers or the cell values), you can specify a filter when you create a scanner.
The filter also runs server-side in Stargate, so it's faster than fetching whole rows and extracting certain columns in the client, but it does use additional server resources (important if you're running Stargate on the Region Servers).
There are various filter types that are well documented for the Stargate API, but some of the most useful are:
In the access-logs table, the period in the row key specifies the year and month of the usage record, and the column qualifier contains the day and hour. In
Code Listing 58, I add a column prefix filter to the scanner, which filters the results so only cells with column names that start with the provided prefix are returned. In this case, only cells from the 11th day of each month will be included in the results:
Code Listing 58: Creating a Filtered Scanner with IStargate
var options = new ScannerOptions { TableName = "access-logs", StartRow = "elton|jericho|201510", StopRow = "elton|jericho|x", Filter = new ColumnPrefixFilter("11") }; var scanner = stargate.CreateScanner(options); |
The IStargate interface has two methods for writing data. The simplest is WriteValue(), which is the equivalent of a PUT request for a specific cell in a row. Stargate creates the row and/or column if needed, and sets the value, as in Code Listing 59:
Code Listing 59: Updating a Cell Value with IStargate
stargate.WriteValue("100", "access-logs", "elton|jericho|201510", "t", "2908"); //cell value is now "100" |
A more complex and flexible method is WriteCells(), which takes a CellSet object and can update multiple values with a single API call. That mix of values can include updates and inserts for different rows, but all the rows must be in the same table.
Code Listing 60 shows an update to an existing cell value, and an insertion of a new row in a single call to WriteCells():
Code Listing 60: Updating a Cell Value with IStargate
var update = new Cell(new Identifier { Row = "elton|jericho|201510", CellDescriptor = new HBaseCellDescriptor { Column = "t", Qualifier = "2908" } }, "120"); var insert = new Cell(new Identifier { Row = "elijah|jericho|201511", CellDescriptor = new HBaseCellDescriptor { Column = "t", Qualifier = "1117" } }, "360"); var cells = new CellSet(new Cell[] { update, insert}); cells.Table = "access-logs"; stargate.WriteCells(cells); |
The Stargate API is stateless (even scanners run on the Region Server, which is not necessarily the Stargate server), and so is the client, so there is no caching of data locally, unless you retain cells in memory yourself. Every call to the Stargate client to read or write data results in a REST call to Stargate.
In this chapter we looked at Stargate, the REST API that HBase provides. It exposes data from rows as resources, with a URL format that describes the path to the data, always including the table name and row key, and optionally a column family and qualifier.
The API supports different data formats, including JSON and Google's Protocol Buffers, and for simple reading and writing, you can use plain text (although the feature set is more limited). Stargate passes the cURL test—if you can use an API with cURL, then it has a usable RESTful design.
As Stargate provides a standardized approach to accessing data, it can be easily wrapped into a client library, and we covered one option with a .NET NuGet package. Stargate provides much of the functionality of other clients (including DDL functions that we haven't had room for, like creating tables), but one feature not supported at the time of writing is incrementing counter columns.
Now we have a good understanding of what you can store in HBase and the client options for accessing it. In the next chapter, we'll step back and look at the architecture of HBase.