Hello Karthikeyan,
This worked perfectly - many thanks!
If possible, I would like to add two more steps onto this.
Firstly, in the table I have imported into HDFS which is called Shifts , I have two columns called Shift Type and Total Cost. Could I create a second table called 'Measures' by taking these to columns and creating a second table with a corresponding column called Shift Type and three columns which take the Average, Median, Minimum and Maximum of Total Cost?
When this has been done I would then like to move the copy the Shifts and Measures tables into Hive on the same server which I can then read into my Analytics Platform - Qliksense.
I hope this is possible,
Best Wishes,
Tim
Hi Karthikeyan,
Thanks for this solution. Unfortunately, it is failing at the first step with an error :
.....Cannot create PoolableConnectionFactory (could not establish connection to jdbc:hive2://localhost:10000/default;auth=noSasl;:null). No Flowfileto route to failure:....
Can you assist please?
Many many thanks,
Tim
Hi,
Thanks for your quick reply. The services were running correctly.
I have just thought that as part of another support thread I was advised to update the authentication parameters within the hive-site.xml file as I had to connect to the hive data from a separate analytics platform. Would this have an effect ? The parameters are now :
Property |
Value |
Database Connection URL |
jdbc:hive2://localhost:10000/default; |
Database User |
tim |
Password |
password |
Property |
Value |
Database Connection URL |
jdbc:hive2://localhost:10000/default; -n tim -p password |
Hi,
Thanks for your reply. This is now working although it seems to fail on the last step :
HiveSQLException: Error while compiling statement: FAILED: ParseException line 1:62 cannot recognize input near ','',' '"Total_Cost"' in value row constructor; it is possible that retrying the operation will succeed, so routing to retry: org.apache....................
I also have another question. In the 'Insert in to Hive Table' step, does the table that it is inserting into have to already exist ? or will it append to an existing table, overwrite an existing table or create a table is it doesn't exist?
Thanks you for your help
HiveSQLException: Error while compiling statement: FAILED: ParseException line 1:62 cannot recognize input near ','',' '"Total_Cost"' in value row constructor; it is possible that retrying the operation will succeed, so routing to retry: org.apache.................... |
Make sure that your Insert query is properly constructed according to your Measures table. Use ‘ ‘ (single quotations) for String column types in hive.
For eg.
Replacement Value in the Constructing queries processor - insert into Measures values('${csv.2:getDelimitedField(1)}',${csv.2:getDelimitedField(2)},${csv.2:getDelimitedField(3)},${csv.2:getDelimitedField(4)},${csv.2:getDelimitedField(5)})
Query Generated - insert into Measures values('"Type4"',600.0,600.0,600.0,600.0)
You can view the Query Generated from the “Constructing queries” processor using Data Provenance. |
I also have another question. In the 'Insert in to Hive Table' step, does the table that it is inserting into have to already exist ? or will it append to an existing table, overwrite an existing table or create a table is it doesn't exist? |
The “Insert in to Hive Table” processor just insert the records in to the Hive table. If the table Measures is not present in the Hive server, we need to create the table manually.
If the table exists, It will not overwrite or create the table, It will just append the records in the existing table. |