Pivotal Knowledge Base

Follow

Unable to Delete Falcon Feed/Process from the Ambari WebUI or the Command Line

Environment

 Product  Version
 Pivotal HDP  2.4.2.0
 Ambari  2.2.2.0
 Falcon  0.6.1.2.4

Purpose

This article provides a summary of the different types of entities in Falcon, the deletion process of these entities, and what happens when the entities are not deleted in proper order. Also, the details of the error encountered and manual process of removal of entities in case a wrong deletion order is followed by the user are discussed. 

How Falcon Works:

Hadoop operators can use the Falcon web UI or the command-line interface (CLI) to create data pipelines, which consist of cluster storage location definitions, dataset feeds, and processing logic.

Each pipeline consists of XML pipeline specifications, called entities. These entities act together to provide a dynamic flow of information to load, clean, and process data.
 
There are three types of entities:
  • Cluster: Defines where data and processes are stored.
  • Feed: Defines the datasets to be cleaned and processed.
  • Process: Consumes feeds, invokes processing logic, and produces further feeds. A process defines the configuration of the Oozie workflow and defines when and how often the workflow should run. Also allows for late data handling.
Each entity is defined separately and then linked together to form a data pipeline. Falcon provides predefined policies for data replication, retention, late data handling, and replication. These sample policies are easily customized to suit your needs.
 
Deletion of falcon entities:

Delete operation on the entity removes any scheduled activity on the workflow engine, besides removing the entity from the falcon configuration store. Delete operation on an entity would only succeed if there are no dependent entities on the deleted entity.
 
Entities follow a particular deletion order. Removal of entities should follow the below order - 
  1. Delete Process Entity
  2. Delete Feed Entity
  3. Delete Cluster Entity

This removal order should be followed or it can lead to the deletion of child entities in case a parent entity is dropped before, as described in this article.

Cause

When deleting a Falcon data pipeline, first all, the process entities should be removed, then all the feed entities, and finally the cluster entity.

In this scenario, the user accidentally deleted the cluster entity first from the command line. The entity successfully dropped but on removal of the child entities (process/feed), the following error occurred:

2017-01-09 19:04:54,481 ERROR - [xxxx@xxxx - b838d3ea-60f6-4334-b16b-de4652fb8a99:falcon:DELETE//entities/delete/feed/TestFeed] ~ Dependent entity (CLUSTER) testcluster was not found in configuration store. (EntityGraph:66)
2017-01-09 19:04:54,481 ERROR - [xxxx@qxxx - b838d3ea-60f6-4334-b16b-de4652fb8a99:falcon:DELETE//entities/delete/feed/TestFeed] ~ Unable to reach workflow engine for deletion or deletion failed (AbstractEntityManager:266)
java.lang.NullPointerException

In the backend, while deletion, falcon code checks for all the dependent entities of the TestFeed entity being deleted. Since the cluster entity should be present while the feed is being deleted, it fails with the above error.

Resolution

  1. Create a cluster entity pointing to the original location. This can be done by following this article.
  2. Delete process entity/entities.
  3. Delete feed entity, if it fails again jump to the next step.
  4. Delete feed xml from /hadoop/falcon/store/FEED/<feed_name>/<feed_xml>.
  5. Restart Falcon.

Additional Information

Hortonworks Falcon documentation

 

 

Comments

Powered by Zendesk