CHAPTER 8
There are three additional components of HDInsight that have not yet been addressed in detail, as they are more advanced features. These are likely to change and be extended as the platform evolves. Current details can be found on the documentation page that details what version of Hadoop is currently used in HDInsight.[31]
![]()
Oozie is a graphical workflow engine that can run sequences of MapReduce and Pig jobs.
Oozie is exposed through REST APIs and the .NET SDKs.
For full details, see the official documentation.[32]
![]()
Sqoop is a tool used to transfer data between Hadoop and relational databases using a JDBC driver. It is command line-based and allows the export of a single query or table’s data from a relational source to either Hadoop as files or to Hive as tables. The reverse process allows the direct population of relational tables from Hadoop files.
For full details, see the official documentation.[33]

Ambari is a framework for monitoring, managing, and provisioning clusters. It is still an incubator project as far as Apache is concerned.
Its exact role in HDInsight is unclear given the existing as well as planned capabilities in the Azure Management console, although it may have more relevance in the on-premises version when it arrives.
Like Oozie, it is exposed through REST APIs and the .NET SDKs.
For full details, see the official documentation.[34]
[13] Hadoop 1.1.0 documentation: http://hadoop.apache.org/docs/r1.1.0/streaming.html
[14] An introductory tutorial on this is available on TechNet: http://social.technet.microsoft.com/wiki/contents/articles/13810.hadoop-on-azure-c-streaming-sample-tutorial.aspx