{{announcement.body}}
{{announcement.title}}

How to Use UDF in Spark Without Register Them

DZone 's Guide to

How to Use UDF in Spark Without Register Them

This article will show a code snippet on how to use UDF in Spark without registering them.

· Big Data Zone ·
Free Resource

Here, we will demonstrate the use of UDF via a small example.

Use Case: We need to change the value of an existing column of DF/DS to add some prefix or suffix to the existing value in a new column.

// code snippet how to create UDF in Spark without registering them

Scala
 




x
15


 
1
import org.apache.spark.sql.types._
2
import org.apache.spark.sql.functions._
3
 
          
4
val rowKeyGenerator = udf((n: String) =>
5
{
6
 
          
7
  val r = scala.util.Random
8
  val randomNB =  r.nextInt( (100) ).toString()
9
  val deviceNew = randomNB.concat(n)
10
  deviceNew
11
}, StringType)
12
 
          
13
// "Name" is column name of type string at source DF.
14
val ds2=dfFromFile.withColumn("NameNewValue",rowKeyGenerator(col("Name")))
15
ds2.show()



Note: We can also change the type from String to any other supported type, as per individual requirements. Make sure while developing that we handle null cases, as this is a common cause of errors. UDFs are a black box for the Spark engine, whereas functions that take a Column argument and return a Column are not a black box for Spark. It is always recommended to use Spark's Native API/Expression over UDF's with contrast to performance parameters.

Topics:
spark, spark dataframe, spark query, spark sql, spark sql tutorial, sparksql, udf integration

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}