Hive User-Defined Functions

DZone 's Guide to

Hive User-Defined Functions

There are a lot of useful functions available to add to Hive. You can also write your own in Java, Scala, and Python. The main 3rd party open source collections are referenced in this article.

· Big Data Zone ·
Free Resource

When you start using Hive you may miss some of the functions you are used to from Oracle, MySQL or elsewhere. Or you might just want a profanity filter. Whatever the case you can browse our list below for a large selection of UDF libraries. You can also use the pointers listed to write your own.

The Brickhouse Collection of UDFs from Klout includes functions for collapsing multiple rows into one, generating top K lists, a distributed cache, bloom counters, JSON functions, and HBase tools.

Facebook UDF Collection (HIVE-1545) including functions for unescape, find in an array, and finding a max in a set of columns.

There's also a number of smaller UDF collections for various purposes to add to Hive:

Roll Your Own

If you want to add your own Hive UDF, it's best to read the guide from Apache Hive and follow this helpful Hive UDF Workshop.  Here is a nice tutorial.  

Image title

If you want to write something a bit different from Hive UDF for your functions.

There's also a database independent hybrid procedural SQL language supported on Hive in Hive 2.0.  This works with Hadoop, NoSQL database and SQL databases like MySQL and is mostly compatible with Oracle PL/SQL.   This looks pretty interesting.

bigdata, hadoop, hive, hortonworks, java, udf

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}