HiveQL Data Manipulation – Load, Insert, Export Data and Create Table

It is important to note that HiveQL data manipulation doesn’t offer any row-level insert, update or delete operation. Therefore, data can be inserted into hive tables using either “bulk” load operations or writing the files into correct directories by other methods.

HiveQL Load Data into Managed Tables

Loading data from input file (Schema on Read)

Hive>LOAD DATA LOCAL INPATH   '/home/hduser/sampledata/users.txt'

OVERWRITE INTO TABLE users;

‘LOCAL’ indicates the source data is on local file system
Local data will be copied into the final destination (HDFS file system) by Hive
If ‘Local’ is not specified, the file is assumed to be on HDFS
Hive does not do any data transformation while loading the data

Loading data into partition requires PARTITION clause

Hive>LOAD DATA LOCAL PATH  '/home/hduser/sampledata/Employees.txt' OVERWRITE INTO TABLES Employees;

PARTITION (country  = 'India', city ='Delhi');

HDFS directory is created according to the partition values

Loading data from HDFS directory

Hive>LOAD DATA INPATH  '/usr/hadoop/data' OVERWRITE INTO TABLES aliens;

All the files in the directory are copied into Hive
‘OVERWRITE’ causes table to be purged and filled
Leaving out ‘OVERWRITE’ adds data to existing folder (old data will exist under its name and new one under a different name)

HiveQL Insert Data into Hive Tables from Queries

Hive> INSERT OVERWRITE TABLE Employee
Partition (country= ‘IN’,state=’KA’)
SELECT * FROM emp_stage ese
WHERE ese.country=’IN’ AND ese.state=’KA’;

Create table and load them from Hive Queries

Hive> CREATE TABLE Employees

AS SELECT eno,ename,sal,address

FROM emp

WHERE country=’IN’;

Exporting Data out of Hive

If LOCAL keyword is used, Hive will write the data to local directory

Hive>INSERT OVERWRITE LOCAL DIRECTORY   '/home/hadoop/data'

SELECT name, age

FROM aliens

WHERE date_sighted >'2014-09-15'