Pivotal Knowledge Base

Follow

Orphaned Tasks on Cloud Foundry causes inaccurate App Usage Data and Application Instance (AI) count

Environment

Pivotal Cloud Foundry (PCF) 1.11.0-1.11.10

Note: Defect has been fixed in PCF 1.11.11. If you were ever on PCF 1.11.0 - 1.11.10, you may have orphaned tasks.

Symptom

Task count and Max Concurrent AI count look abnormally high for given month.

Cause

Pivotal began counting tasks as AIs in PCF 1.11. We discovered an issue where if you ran a task and deleted the parent app while the task was still running, CloudController failed to emit a stop event for the task. Thus, the App Usage Service still thinks the task is still running.  CAPI has since released a fix so stop event are emitted when the parent app/org is deleted while a task is still running.  

However, it is possible you have orphaned tasks if you were ever on PCF 1.11.0 - 1.11.10.  

A task is orphaned if:

  1. Its parent app is deleted while task is still running
  2. It has a start event
  3. It does not have an end event

Impact

Resolution 

How to identify orphaned tasks on PCF 

Assumptions:

  • Customer has permissions to ssh onto the usage service container
  • Customer has permissions to write to the usage service database
  • Customer has basic understanding of rails console
  • Customer has the CC CLI installed 
  • Customer has irb installed

Steps:

1. `cf target` your deployment.

2. SSH onto the usage service container.

3. `cd app`

4. Start the rails console

script/rails_console_from_container

5. Run a query to find the task guids that only have a single task event. 

task_guids 
= TaskEvent.group(:task_guid).count.map { |k, v| v == 1 ? k : nil }.compact

6. Now we want to find the parent app guids for the task guids in this set. 

parent_app_guids = task_guids.map { |guid| TaskEvent.find_by(task_guid: guid).parent_app_guid }.uniq

7. Now we want to take each parent app guid and see if the Cloud Controller says if it exists or not. For this part, you should copy the value of parent_app_guids variable somewhere and exit the rails console and start up irb by typing `irb`. Define parent_app_guids variable in irb with value you copied. Then run the following command:

lost_app_guids = parent_app_guids.select do |guid|
   system("cf curl /v3/apps/#{guid} | grep errors")
end

8. We now have a list of inactive parent app guids. We will go back into the rails console (using `script/rails_console_from_container`) and find the task guids associated with them.

task_guids = []
lost_app_guids.each do |guid|
task_guids << TaskEvent.find_by(parent_app_guid: guid).task_guid
end

If you have orphaned tasks, you will get something like this:

["861d58fe-fe8e-47b3-89aa-db133c3c7a53", "2889aa5c-349a-43d1-a67b-a7c0e3a86cde", "8fa24eb1-a9ab-4161-93dc-157d1e564302"]

If you do NOT have orphaned tasks, you will get no results.

[]

How to stop orphaned tasks

Note: Please reach out to support if you have discovered orphaned tasks.  We will help you resolve the issue.  The instructions below will create an artificial “STOPPED” task events an orphaned task in the current day. AI/Max Concurrent counts for past months will not be updated - they will still include orphaned tasks. 

1. Determine which org is linked to each orphaned task_guid.

orgs_to_task_mapping = {}
lost_app_guids.each do |guid|
  event = TaskEvent.find_by(parent_app_guid: guid)
  org_guid = event.org_guid
  task_guid = event.task_guid
  if orgs_to_task_mapping[org_guid].nil?
    orgs_to_task_mapping[org_guid] = [task_guid]
  else
    orgs_to_task_mapping[org_guid] << task_guid
  end
end

You will get something like this (org guid and associated tasks):

{"89f8f365-6a5c-469c-8b38-1c6e498a3499"=>["861d58fe-fe8e-47b3-89aa-db133c3c7a53", 
"2889aa5c-349a-43d1-a67b-a7c0e3a86cde", "8fa24eb1-a9ab-4161-93dc-157d1e564302",
"145a2a4e-afdd-4b6b-8fa9-0901dddca20c", "6783c845-f7c7-47b2-b5c0-5d079751dd52"], 
"6730a444-95ba-4a67-b364-914d9add8104"=>["01d43814-9f7c-4dea-a7e8-4061d3eeda53"]} 

2. Now we want to create a stop event for each orphaned task.

task_guids_to_stop = ["PASTE value from task_guids FROM step 8"] 
task_guids_to_stop.each do |guid|
start_event = TaskEvent.find_by_task_guid(guid)
if start_event.started?
stop_event = TaskEvent.create!(
guid: SecureRandom.uuid,
state: TaskEvent::STOPPED,
org_guid: start_event.org_guid,
space_guid: start_event.space_guid,
space_name: start_event.space_name,
parent_app_name: start_event.parent_app_name,
parent_app_guid: start_event.parent_app_guid,
task_name: start_event.task_name,
task_guid: start_event.task_guid,
memory_in_mb_per_instance: start_event.memory_in_mb_per_instance,
occurred_at: Time.current)
DailyTaskSummaryUpdater.new(stop_event).update
end
end

 

Comments

Powered by Zendesk