Oracle 12cR2 and Apache Hive integration
- Posted by redglue
- On December 11, 2016
- 0 Comments
- bigdata, hive, oracle12cr2
#UKOUG16 is over, some great things happened and a lot of very good content was delivered, but as expected we found in the agenda a lot of session regarding the “new” about Oracle 12c Release 2.
Oracle 12c Release 2 will be a very solid release with some great improvements and we will describe them in future posts as soon as we can evaluate them and make sure that they work as advised. Just to remember that 12cR2 is only available in the Oracle Cloud right now.
Here @redglue, we built Data Lakes (or Oceans, or Repositories or whatever you want to call it) to support Datawarehousing systems and most of those Datawarehouses are built on top of Oracle Databases. With the future of massive data in companies to be supported under Hadoop HDFS umbrella (or in the Cloud, on services like Amazon S3 using HDFS as a “temporary” cache engine) and with a different “Big Data” stack to handle it, like Hive or Spark SQL engine to deliver a SQL-experience that helps the interaction between the Lakes and Datawarehousing.
Relational database vendors like Oracle, showed us that they really care about this move and they realized that the integration between the RDBMS and Big Data stack is hugely important.
Bellow is a shot of one of Oracle 12cR2 Big Data features (presented by Dominic Giles), the fully integration between Oracle External Tables and some Apache Hive features:
Still, if you digg a little bit under Oracle 12cR2 documentation that is available right now you will find interesting stuff:
- DBMS_HADOOP.CREATE_EXTDDL_FOR_HIVE
- Views: ALL/DBA/USER_HIVE_TABLES, ALL/DBA/USER_HIVE_DATABASES and ALL/DBA/USER_HIVE_COLUMNS
Docs: https://docs.oracle.com/cd/E55905_01/doc.40/e55814/bigsqlref.htm#BIGUG76649
The ideia behind is to query from Oracle and access Apache Hive (that is eventually on top of Hadoop HDFS) and also take advantage of Apache Hive features like Partioning Pruning (Yes, Hive has partition pruning and the concept is similar to Oracle), Partition Maintenance and Hive Cost-Based Optimizer.
That are of course some things to clarify mainly on supportted Hive versions (maybe 2.x is the only supported). Another note is that you will be able to use what engine you want under Hive (as Map-Reduce engine is deprecated on Hive 2.x) so you can use Spark or Tez.
So, the future of data movement is coming and it speaks SQL.


0 Comments