|Pivotal HD / HDP||All supported versions|
|OS||All supported versions|
This KB article shows how to set up a Sqoop metastore. Sqoop metastore is used to store Sqoop job information in a central place. This helps collaboration between Sqoop users and developers; for example, user A can create a job to load some specific data, then any other user can access from any node in the cluster the same job and just run it again. This is very convenient when using Sqoop in Oozie workflows.
At a high level, the following steps were followed:
- Choose a server to host Sqoop metastore. It is best to choose a master or administrative server
- Setup Sqoop metastore
- Update the service configuration to access the meta store automatically
- Start the Sqoop metastore
Step 1: Chose the right server
It is strongly recommended to choose a master or administrative server. Slave nodes are not recommended because they are expected to be under heavy load and to fail at some point. Colocating Sqoop meta store with Ambari server is acceptable.
Step 2: Set up Sqoop metastore
Here you need to decide which user will execute the metastore. It is recommended to run the metastore as sqoop user; it is strongly discouraged to run as root. Once you have decided which user will run the metastore, the next step is to create the user and the home directory (if needed), and a folder to store the database (DB) information.
The next step is to configure the metastore details in sqoop-site.xml; the relevant properties to be set up are sqoop.metastore.server.location, for example: /home/sqoop/meta-store/shared.db
The other configuration property to set is sqoop.metastore.server.port; we can leave the default 16000.
For the client properties, we need to set the following properties:
The auto-connect URL is a connect string for an HSQL DB with the following format:
Where hostname_fqdn is the hostname with domain from the host chosen in step 1; and port is the port we set in the previous step, by default 16000. An example for this is shown here:
The username and password, we can leave the defaults.
Step 3: Update service configuration
It is not possible to use Ambari to configure these settings; we have to update the files manually in the old way.
Log on to another node in the cluster and update the properties for client access:
Do not setup the properties for server configuration. The properties sqoop.metastore.server.location and sqoop.metastore.server.port should be set only in the node running Sqoop metastore.
Copy this new sqoop-site.xml file to all other nodes except the Sqoop metastore server.
Step 4: Start Sqoop metastore
Now we can run sudo -u sqoop sqoop-metastore to test that the server comes up successfully. Once the server comes up, it binds to standard output and remains as a foreground process. This is undesirable for a server process, so now we have to start and leave the server process running in the background. There are many ways to achieve this, all of them are correct. The one we recommend is the following:
- Log on as the user who will run the metastore: su - sqoop
- Enter in the metastore folder
- Start the server process, redirect stdout and stderr to a file and leave it in the background: nohup sqoop-metastore &>> shared.db.out &
- If at any point you want to shut down the metastore gracefully, use sqoop-metastore --shutdown as the user running the process