It is important to note that HiveQL data manipulation doesn’t offer any row-level insert, update or delete operation. Therefore, data can be inserted into hive tables using either “bulk” load operations or writing the files into correct directories by other methods.
HiveQL Load Data into Managed Tables
Loading data from input file (Schema on Read)
Hive>LOAD DATA LOCAL INPATH '/home/hduser/sampledata/users.txt' OVERWRITE INTO TABLE users;
- ‘LOCAL’ indicates the source data is on local file system
- Local data will be copied into the final destination (HDFS file system) by Hive
- If ‘Local’ is not specified, the file is assumed to be on HDFS
- Hive does not do any data transformation while loading the data
Loading data into partition requires PARTITION clause
Hive>LOAD DATA LOCAL PATH '/home/hduser/sampledata/Employees.txt' OVERWRITE INTO TABLES Employees; PARTITION (country = 'India', city ='Delhi');
HDFS directory is created according to the partition values
Loading data from HDFS directory
Hive>LOAD DATA INPATH '/usr/hadoop/data' OVERWRITE INTO TABLES aliens;
- All the files in the directory are copied into Hive
- ‘OVERWRITE’ causes table to be purged and filled
- Leaving out ‘OVERWRITE’ adds data to existing folder (old data will exist under its name and new one under a different name)
HiveQL Insert Data into Hive Tables from Queries
Hive> INSERT OVERWRITE TABLE Employee Partition (country= ‘IN’,state=’KA’) SELECT * FROM emp_stage ese WHERE ese.country=’IN’ AND ese.state=’KA’;
Create table and load them from Hive Queries
Hive> CREATE TABLE Employees AS SELECT eno,ename,sal,address FROM emp WHERE country=’IN’;
Exporting Data out of Hive
If LOCAL keyword is used, Hive will write the data to local directory
Hive>INSERT OVERWRITE LOCAL DIRECTORY '/home/hadoop/data' SELECT name, age FROM aliens WHERE date_sighted >'2014-09-15'