Starting Hive-Client Programmatically With Scala
Learn about using Scala with Hive for programmatic access to Hadoop data.
Join the DZone community and get the full member experience.
Join For FreeHive defines a simple SQL-like query language for querying and managing large datasets called Hive-QL (HQL). It’s easy to use if you’re familiar with SQL Language. Hive allows programmers who are familiar with the language to write the custom MapReduce framework to perform more sophisticated analyses.
In this blog, we will learn how to create a Hive client with Scala to execute basic HQL commands. First, create a Scala project with Scala 2.12 version.
Now, add the following properties in your build.sbt
file:
name := "hive_cli_client"
version := "1.0"
scalaVersion := "2.12.2"
libraryDependencies += "org.apache.hive" % "hive-exec" % "1.2.1" excludeAll
ExclusionRule(organization = "org.pentaho")
libraryDependencies += "org.apache.hadoop" % "hadoop-common" % "2.7.3"
libraryDependencies += "org.apache.httpcomponents" % "httpclient" % "4.3.4"
libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "2.6.0"
libraryDependencies += "org.apache.hive" % "hive-service" % "1.2.1"
libraryDependencies += "org.apache.hive" % "hive-cli" % "1.2.1"
libraryDependencies += "org.scalatest" % "scalatest_2.12" % "3.0.3"
In my case, I am using Hive 2.1.1; you can use any. Let the dependencies be resolved. Now, add a Scala class in your project named hiveclient
:
package cli
import java.io.IOException
import scala.util.Try
import org.apache.hadoop.hive.cli.CliSessionState
import org.apache.hadoop.hive.conf.HiveConf
import org.apache.hadoop.hive.ql.Driver
import org.apache.hadoop.hive.ql.session.SessionState
/**
* Hive meta API client for Testing Purpose
*
* @author Anubhav
*/
class HiveClient {
val hiveConf = new HiveConf(classOf[HiveClient])
/**
* Get the hive ql driver to execute ddl or dml
*
* @return
*/
private def getDriver: Driver = {
val driver = new Driver(hiveConf)
SessionState.start(new CliSessionState(hiveConf))
driver
}
/**
* @param hql
* @throws org.apache.hadoop.hive.ql.CommandNeedRetryException
* @return int
*/
def executeHQL(hql: String): Int = {
val responseOpt = Try(getDriver.run(hql)).toEither
val response = responseOpt match {
case Right(response) => response
case Left(exception) => throw new Exception(s"${ exception.getMessage }")
}
val responseCode = response.getResponseCode
if (responseCode != 0) {
val err: String = response.getErrorMessage
throw new IOException("Failed to execute hql [" + hql + "], error message is: " + err)
}
responseCode
}
}
It has one public method, executeHQL
, that calls the private method getDriver
to get the hiveDriver
instance and execute HQL with it. This method will give back the response code back.
Now, write the test case to test this Hive client:
import cli.HiveClient
import org.scalatest.FunSuite
class HiveClientTest extends FunSuite {
val hiveClient = new HiveClient
test("testing for the hql query") {
assert(hiveClient.executeHQL("DROP TABLE IF EXISTS DEMO") == 0)
assert(hiveClient.executeHQL("CREATE TABLE IF NOT EXISTS DEMO(id int)") == 0)
assert(hiveClient.executeHQL("INSERT INTO DEMO VALUES(1)") == 0)
assert(hiveClient.executeHQL("SELECT * FROM DEMO") == 0)
assert(hiveClient.executeHQL("SELECT COUNT(*) FROM DEMO") == 0)
}
}
Now, run these test cases:
Published at DZone with permission of Anubhav Tarar, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments