Quantcast
Viewing latest article 6
Browse Latest Browse All 10

How to add numbers with Pig

Introduction

We’re going to start with a very simple Pig script that reads a file that contains 2 numbers per line separated by a comma. The Pig script will first read the line, store each of the 2 numbers in separate variables, and will then add the numbers together.

Create the Sample Input File

cd
vi pig-practice01.txt

Paste the following into pig-practice01.txt.

5	1
6	4
3	2
1	1
9	2
3	8

Create the Input and Output Directories in HDFS

We’re going to create 2 directories to store the input to and output from our first pig script.

hadoop fs -mkdir pig01-input
hadoop fs -mkdir pig01-output

Put Data File into HDFS

hadoop fs -put pig-practice01.txt pig01-input

Now, let’s check that our file was put from our local file system to HDFS correctly.

hadoop fs -ls pig01-input
hadoop fs -cat pig01-input/pig-practice01.txt

Write the Pig Latin Script

vi practice01.pig

Paste the following code into practice01.pig.

/*
Add 2 numbers together
*/

-- Load the practice file from HDFS
A = LOAD 'pig01-input/pig-practice01.txt' USING PigStorage() AS (x:int, y:int);

-- Add x and y 
B = FOREACH A GENERATE x + y;

-- Show the output
STORE B INTO 'pig01-output/results' USING PigStorage();

Run the Pig Script

pig practice01.pig

View the Results

hadoop fs -ls pig01-output/results

The results are stored in the part* file.

hadoop fs -cat pig01-output/results/part-m-0000

Additional Reading


Image may be NSFW.
Clik here to view.
Image may be NSFW.
Clik here to view.

Viewing latest article 6
Browse Latest Browse All 10

Trending Articles