Pivotal Knowledge Base

Follow

HiveServer2 Active Directory Authentication Guide

Environment

  • PHD 2.x
  • PHD 3.x

 

Purpose

This article demonstrates by example how to configure hiveserver2 with Active Directory

 

Overview

Setting up HiveServer2 to authenticate against Microsoft Active Directory Domain Services (AD DS) involves the following high-level steps

  1. Prerequisites
    • Installation and basic configuration of HiveServer2
    • Check AD DS connectivity and functionality from your PHD cluster
  2. Modify HiveServer2 security configurations related to AD DS authentication
  3. Start (or restart) HiveServer2 service
  4. Connect to HiveServer2 via beeline and validate AD DS authentication

 

Example

To illustrate how to set up HiveServer2 with AD authentication, we will use the following Environment

  1. Single Node virtual machine of Pivotal HD (ver. 2.0.1). This will be the place where we install and configure HiveServer2 and interact with it via beeline. Hostname of this node is pivhdsne.localdomain
  2. Windows Server 2008 R2 with Microsoft Active Directory service properly installed and configured. Hostname of this node is dc1-corp-2k8.corp.gepivotal.com. We have created a domain called corp.gepivotal.com, and the following users for testing purposes (shown in Figure 1)
    • jsmith@corp.gepivotal.com, under Organizational Unit CapitalAmerica
    • dmiller@corp.gepivotal.com, under Users

Figure 1. Two AD Users for Testing

When user attempts to log into HiveServer2 via beeline, actual authentication communications occur between the Pivotal HD single node VM and Windows Server 2008 R2 server running AD service

 

1. Prerequisites

  1. Installation and initial configuration of HiveServer2. Please refer to this knowledge base article for detailed instructions. Below is the basic HiveServer2 configurations in /etc/gphd/hive/conf/hive-site.xml
    <property>
        <name>hive.server2.thrift.port</name>
        <value>10001</value>
        <description>TCP port number to listen on, default 10000</description>
    </property>
    
    <property>
        <name>hive.support.concurrency</name>
        <description>Whether Hive supports concurrency or not. A Zookeeper instance must be up and running for the default Hive lock manager to support read-write locks.
        </description>
        <value>true</value>
    </property>
    
    <property>
        <name>hive.zookeeper.quorum</name>
        <description>Zookeeper quorum used by Hive's Table Lock Manager</description>
        <value>pivhdsne.localdomain</value>
    </property>
    
    <property>
        <name>ipc.client.connection.maxidletime</name>
        <value>10000</value>
    </property>
  2. Check AD DS connectivity and functionality from your PHD cluster
    # First, check AD DS connectivity
    # make sure you can ping AD DS server
    $ ping -c 4 dc1-corp-2k8.corp.gepivotal.com
    
    
    # make sure DNS resolution is working, 192.168.9.133 is our DNS
    $ dig @192.168.9.133 dc1-corp-2k8.corp.gepivotal.com
    ; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.17.rc1.el6 <<>> @192.168.9.133 dc1-corp-2k8.corp.gepivotal.com
    ; (1 server found)
    ;; global options: +cmd
    ;; Got answer:
    ;; ->>HEADER<
  3. If the bind test above succeeds, you should see output similar to this link. Pay special attention to the “search result” section, which should show “0 Success”
  4. Before you proceed, fix any errors if the above LDAP bind test fails

2. Detailed Steps

  1. Add the following section in /etc/gphd/hive/conf/hive-site.xml
    <property>
        <name>hive.server2.authentication</name>
        <value>LDAP</value>
    </property>
    
    <property>
        <name>hive.server2.authentication.ldap.url</name>
        <value>ldap://dc1-corp-2k8.corp.gepivotal.com</value>
    </property>
    
  2. Start/restart HiveServer2
    [pivhdsne:~]$ id
    uid=500(gpadmin) gid=500(gpadmin) groups=500(gpadmin),501(hadoop)
    [pivhdsne:~]$ sudo service hive-server2 start
    starting hive-server2, logging to /var/log/gphd/hive/hive-server2.log [ OK ]
  3. Connect to HiveServer2 via beeline and validate AD DS authentication.
    [pivhdsne:~]$ id
    uid=500(gpadmin) gid=500(gpadmin) groups=500(gpadmin),501(hadoop)
    
    [pivhdsne:~]$ beeline
    Beeline version 0.12.0-gphd-3.0.0.0 by Apache Hive
    beeline> !connect jdbc:hive2://pivhdsne.localdomain:10001/
    scan complete in 1ms
    Connecting to jdbc:hive2://pivhdsne.localdomain:10001/
    Enter username for jdbc:hive2://pivhdsne.localdomain:10001/: jsmith@corp.gepivotal.com
    Enter password for jdbc:hive2://pivhdsne.localdomain:10001/: ********
    Connected to: Hive (version 0.12.0-gphd-3.0.0.0)
    Driver: Hive (version 0.12.0-gphd-3.0.0.0)
    Transaction isolation: TRANSACTION_REPEATABLE_READ
    0: jdbc:hive2://pivhdsne.localdomain:10001/> show tables;
    +---------------------------+
    |         tab_name          |
    +---------------------------+
    | date_dim_hive             |
    | email_addresses_dim_hive  |
    +---------------------------+
    2 rows selected (2.09 seconds)
    0: jdbc:hive2://pivhdsne.localdomain:10001/> use retail_demo;
    No rows affected (0.089 seconds)
    0: jdbc:hive2://pivhdsne.localdomain:10001/> show tables;
    +-----------------------+
    |       tab_name        |
    +-----------------------+
    | order_lineitems_hive  |
    | products_dim_hive     |
    +-----------------------+
    2 rows selected (0.186 seconds)
    0: jdbc:hive2://pivhdsne.localdomain:10001/> select count(*) from order_lineitems_hive;
    +----------+
    |   _c0    |
    +----------+
    | 1024158  |
    +----------+
    1 row selected (28.165 seconds)
    0: jdbc:hive2://pivhdsne.localdomain:10001/> !list
    1 active connection:
     #0  open     jdbc:hive2://pivhdsne.localdomain:10001/
    0: jdbc:hive2://pivhdsne.localdomain:10001/> !closeall
    Closing: org.apache.hive.jdbc.HiveConnection
    beeline> !list
    No active connections
    beeline> !connect jdbc:hive2://pivhdsne.localdomain:10001/
    scan complete in 2ms
    Connecting to jdbc:hive2://pivhdsne.localdomain:10001/
    Enter username for jdbc:hive2://pivhdsne.localdomain:10001/: dmiller@corp.gepivotal.com
    Enter password for jdbc:hive2://pivhdsne.localdomain:10001/: ********
    Connected to: Hive (version 0.12.0-gphd-3.0.0.0)
    Driver: Hive (version 0.12.0-gphd-3.0.0.0)
    Transaction isolation: TRANSACTION_REPEATABLE_READ
    0: jdbc:hive2://pivhdsne.localdomain:10001/> show tables;
    +---------------------------+
    |         tab_name          |
    +---------------------------+
    | date_dim_hive             |
    | email_addresses_dim_hive  |
    +---------------------------+
    2 rows selected (1.499 seconds)
    0: jdbc:hive2://pivhdsne.localdomain:10001/> !list
    1 active connection:
     #0  open     jdbc:hive2://pivhdsne.localdomain:10001/
    
    

Special Notes

  1. Configurations related to Active Directory in hive-site.xml. This page lists all possible settings relevant to Authentication/Security for HiveServer2. In our testing, we observed that setting the following two parameters will result in AD authentication failure. In Hive 0.12, the error message returned is quite fuzzy

    Error: Invalid URL: jdbc:hive2://<HOST>:<PORT>/ (state=08S01,code=0)
    • hive.server2.authentication.ldap.Domain
    • hive.server2.authentication.ldap.baseDN

    Thus, we recommend using the following settings while trying to configure HiveServer2 with AD authentication

    • hive.server2.authentication
    • hive.server2.authentication.ldap.url
    • hive.server2.authentication.ldap.Domain (optional)

    If you set hive.server2.authentication.ldap.Domain in hive-site.xml, you can simply use your AD username in beeline to connect/authenticate to HiveServer2. Otherwise, you need to specify the fully-qualified DN in the form of username@domain. Our example above illustrates the latter case

  2. Multi-user permission denied error when executing SQL statements. After successfully authenticated against AD, you will be logged into the beeline interactive shell. During subsequent sessions, it might be possible that you encounter the error repoted in this JIRA https://issues.apache.org/jira/browse/HIVE-6602. A simple workaround is to manually chmod hdfs://tmp/hive-{hive.username} to 777.

Comments

Powered by Zendesk