This article provides a summary of the different types of entities in Falcon, the deletion process of these entities, and what happens when the entities are not deleted in proper order. Also, the details of the error encountered and manual process of removal of entities in case a wrong deletion order is followed by the user are discussed.
How Falcon Works:
Hadoop operators can use the Falcon web UI or the command-line interface (CLI) to create data pipelines, which consist of cluster storage location definitions, dataset feeds, and processing logic.
- Cluster: Defines where data and processes are stored.
- Feed: Defines the datasets to be cleaned and processed.
- Process: Consumes feeds, invokes processing logic, and produces further feeds. A process defines the configuration of the Oozie workflow and defines when and how often the workflow should run. Also allows for late data handling.
Delete operation on the entity removes any scheduled activity on the workflow engine, besides removing the entity from the falcon configuration store. Delete operation on an entity would only succeed if there are no dependent entities on the deleted entity.
- Delete Process Entity
- Delete Feed Entity
- Delete Cluster Entity
This removal order should be followed or it can lead to the deletion of child entities in case a parent entity is dropped before, as described in this article.
In this scenario, the user accidentally deleted the cluster entity first from the command line. The entity successfully dropped but on removal of the child entities (process/feed), the following error occurred:
2017-01-09 19:04:54,481 ERROR - [xxxx@xxxx - b838d3ea-60f6-4334-b16b-de4652fb8a99:falcon:DELETE//entities/delete/feed/TestFeed] ~ Dependent entity (CLUSTER) testcluster was not found in configuration store. (EntityGraph:66)
2017-01-09 19:04:54,481 ERROR - [xxxx@qxxx - b838d3ea-60f6-4334-b16b-de4652fb8a99:falcon:DELETE//entities/delete/feed/TestFeed] ~ Unable to reach workflow engine for deletion or deletion failed (AbstractEntityManager:266)
In the backend, while deletion, falcon code checks for all the dependent entities of the TestFeed entity being deleted. Since the cluster entity should be present while the feed is being deleted, it fails with the above error.
- Create a cluster entity pointing to the original location. This can be done by following this article.
- Delete process entity/entities.
- Delete feed entity, if it fails again jump to the next step.
- Delete feed xml from /hadoop/falcon/store/FEED/<feed_name>/<feed_xml>.
- Restart Falcon.