Pig Latin Introduction – Examples, Pig Data Types | RCV Academy

Pig Latin Introduction – Examples, Pig Data Types

In the following post, we will learn about Pig Latin and Pig Data types in detail.

Pig Latin Overview

Pig Latin provides a platform to non-java programmer where each processing step results in a new data set or relation.

For example, X = load ’emp’; Here “X” is the name of relation or new data set which is fed from loading the data set “emp”,”X” which is the name of relation is not a variable however it seems to act like a variable.

Once the assignment is done to a given relation say “X”, it is permanent. We can reuse the relation name in other steps as well but it is not advisable to do so because of better script readability purpose.

For Example:

X = load 'emp';
X = filter X by sal > 10000.0;
X = foreach X generate Ename;

Here at each step, the reassignment is not done for “X”, rather a new data set is getting created at each step.

Pig Latin also has a concept of fields or columns. In the above example “sal” and “Ename” is termed as field or column.

It is also important to know that keywords in Apache Pig Latin are not case sensitive.

For example, LOAD is equivalent to load. But the relations and column names are case sensitive.  For example, X = load ’emp’; is not equivalent to x = load ’emp’;

For multi-line comments in the Apache pig scripts, we use “/* … */” and for single-line comment we use “–“.

Pig Data Types

Pig Scalar Data Types

  • Int (signed 32 bit integer)
  • Long (signed 64 bit integer)
  • Float (32 bit floating point)
  • Double (64 bit floating point)
  • Chararray (Character array(String) in UTF-8
  • Bytearray (Binary object)

Pig Complex Data Types

Map

A map is a collection of key-value pairs.

Key-value pairs are separated by the pound sign #. “Key” must be a chararray datatype and should be a unique value while as “value” can be of any datatype.

For example:

[1#Honda, 2#Toyota, 3#Suzuki], [name#Mak, phone#99845,age#29].

Tuple

A tuple is similar to a row in SQL with the fields resembling SQL columns. In other

In other words, we can say that tuples are an ordered set of fields formed by grouping scalar data types. Two consecutive tuples need not have to contain the same number of fields.

For example:

(mak, 29, 4000.0)

Bag

A bag is formed by the collection of tuples. A bag can have duplicate tuples.

If Pig tries to access a field that does not exist, a null value is substituted.

For Example:

({(a),(b)},{},{(c),(d)},{ (mak, 29, 4000.0)}), (BigData, {Hadoop, Mapreduce, Pig, Hive})

NULLS

A null data element in Apache Pig is just same as the SQL null data element. The null value in Apache Pig means the value is unknown.