We are using DataDirect Hive driver version 5.1.4.000136. In Hive, we created a table like following:
CREATE EXTERNAL TABLE `party`(
`message_name` string COMMENT 'from deserializer',
`message_triggering_service` string COMMENT 'from deserializer',
`message_date_time` string COMMENT 'from deserializer',
`message_id` string COMMENT 'from deserializer',
`party_id` string COMMENT 'from deserializer',
`party_type_id` string COMMENT 'from deserializer',
`salutation` string COMMENT 'from deserializer',
`first_name` string COMMENT 'from deserializer',
`middle_name` string COMMENT 'from deserializer',
`last_name` string COMMENT 'from deserializer',
`dob` string COMMENT 'from deserializer',
`gender` string COMMENT 'from deserializer',
`marital_status_id` string COMMENT 'from deserializer',
`organization_name` string COMMENT 'from deserializer',
`organization_type_id` string COMMENT 'from deserializer',
`organization_contact_person` string COMMENT 'from deserializer',
`acquired_channel_id` string COMMENT 'from deserializer',
`internal_customer_flag` string COMMENT 'from deserializer',
`deceased_flag` string COMMENT 'from deserializer',
`deceased_date` string COMMENT 'from deserializer',
`party_status_id` string COMMENT 'from deserializer',
`party_status_chg_rsn_id` string COMMENT 'from deserializer',
`migrated_customer_flag` string COMMENT 'from deserializer',
`partyaddresses` array<struct<addresstype:struct<party_address_id:string,address_type_id:string>>> COMMENT 'from deserializer',
`addresses` array<struct<addresse:struct<address_id:string,address_line1:string,address_line2:string,address_line3:string,address_line4:string,street:string,city:string,post_code:string,county:string,country:string>>> COMMENT 'from deserializer',
`partyspecialneeds` array<struct<specialneed:struct<special_need_id:string>>> COMMENT 'from deserializer',
`party_joining_reasons` array<struct<party_joining_reason:struct<joining_reason_id:string>>> COMMENT 'from deserializer')
ROW FORMAT SERDE
'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
'column.xpath.ACQUIRED_CHANNEL_ID'='/MStart/PARTY/ACQUIRED_CHANNEL_ID/text()',
'column.xpath.ADDRESSES'='/MStart/PARTY/ADDRESSES/ADDRESSE',
'column.xpath.DECEASED_DATE'='/MStart/PARTY/DECEASED_DATE/text()',
'column.xpath.DECEASED_FLAG'='/MStart/PARTY/DECEASED_FLAG/text()',
'column.xpath.DOB'='/MStart/PARTY/DOB/text()',
'column.xpath.FIRST_NAME'='/MStart/PARTY/FIRST_NAME/text()',
'column.xpath.GENDER'='/MStart/PARTY/GENDER/text()',
'column.xpath.INTERNAL_CUSTOMER_FLAG'='/MStart/PARTY/INTERNAL_CUSTOMER_FLAG/text()',
'column.xpath.LAST_NAME'='/MStart/PARTY/LAST_NAME/text()',
'column.xpath.MARITAL_STATUS_ID'='/MStart/PARTY/MARITAL_STATUS_ID/text()',
'column.xpath.MESSAGE_DATE_TIME'='/MStart/Message_Header/MESSAGE_DATE_TIME/text()',
'column.xpath.MESSAGE_ID'='/MStart/Message_Header/MESSAGE_ID/text()',
'column.xpath.MESSAGE_NAME'='/MStart/Message_Header/MESSAGE_NAME/text()',
'column.xpath.MESSAGE_TRIGGERING_SERVICE'='/MStart/Message_Header/MESSAGE_TRIGGERING_SERVICE/text()',
'column.xpath.MIDDLE_NAME'='/MStart/PARTY/MIDDLE_NAME/text()',
'column.xpath.MIGRATED_CUSTOMER_FLAG'='/MStart/PARTY/MIGRATED_CUSTOMER_FLAG/text()',
'column.xpath.ORGANIZATION_CONTACT_PERSON'='/MStart/PARTY/ORGANIZATION_CONTACT_PERSON/text()',
'column.xpath.ORGANIZATION_NAME'='/MStart/PARTY/ORGANIZATION_NAME/text()',
'column.xpath.ORGANIZATION_TYPE_ID'='/MStart/PARTY/ORGANIZATION_TYPE_ID/text()',
'column.xpath.PARTYADDRESSES'='/MStart/PARTY/PARTYADDRESSES/ADDRESSTYPE',
'column.xpath.PARTYSPECIALNEEDS'='/MStart/PARTY/PARTYSPECIALNEEDS/SPECIALNEED',
'column.xpath.PARTY_ID'='/MStart/PARTY/PARTY_ID/text()',
'column.xpath.PARTY_JOINING_REASONS'='/MStart/PARTY/PARTY_JOINING_REASONS/PARTY_JOINING_REASON',
'column.xpath.PARTY_STATUS_CHG_RSN_ID'='/MStart/PARTY/PARTY_STATUS_CHG_RSN_ID/text()',
'column.xpath.PARTY_STATUS_ID'='/MStart/PARTY/PARTY_STATUS_ID/text()',
'column.xpath.PARTY_TYPE_ID'='/MStart/PARTY/PARTY_TYPE_ID/text()',
'column.xpath.SALUTATION'='/MStart/PARTY/SALUTATION/text()')
STORED AS INPUTFORMAT
'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'adl://rbdvinsightsdls01.azuredatalakestore.net/root/CRM/1000.10.8.b/Party/Process/InProgress'
TBLPROPERTIES (
'xmlinput.end'='</MStart>',
'xmlinput.start'='<MStart')
Note we are using the serde downloaded from https://github.com/dvasilen/Hive-XML-SerDe/wiki/XML-data-sources
We also did the "add jar ....".
Table was created successfully, and we were able to see it using "describe table" command.
However in our java code,
rs = Connection.getMetaData().getTables()
returns all the tables, except the one created above.
If table was created without the ROW FORMATSERDE clause, the table will be returned.
If table was created with serde like "parquet.hive.serde.ParquetHiveSerDe", "org.apache.hive.hcatalog.data.JsonSerDe", "org.apache.hadoop.hive.serde2.avro.AvroSerDe", the table will be returned.
So looks like only the "com.ibm.spss.hive.serde2.xml.XmlSerDe" will have the issue. Is there any special steps to be performed in order for the Hive drive to recognize it?
Thanks,
Xuelin