Pivotal Knowledge Base

Follow

Smoke Test errand fails during PAS Tile Upgrade with "setting quota limit for projid 2: function not implemented" error

Environment

  • Pivotal Cloud Foundry (PCF) 1.12
  • Pivotal Cloud Foundry (PCF) 2.0

Symptom

After upgrading to PCF 1.12 or 2.0 from 1.11 or below, smoke test errand is failing.

grootfs is enabled in Elastic Runtime or Isolation Segment tile.

Error Message:

2018-03-25T03:50:19.62+0000 [STG/0] OUT Uploading complete  
2018-03-25T03:50:19.64+0000 [STG/0] OUT Stopping instance c2d33c93-b9e6-4b7a-ad24-c0dea4c07aa2  
2018-03-25T03:50:19.64+0000 [STG/0] OUT Destroying container  
2018-03-25T03:50:19.84+0000 [CELL/0] OUT Creating container  
2018-03-25T03:50:19.98+0000 [CELL/0] ERR Failed to create container  
2018-03-25T03:50:20.01+0000 [CELL/0] OUT Destroying container  
2018-03-25T03:50:20.01+0000 [CELL/0] OUT Successfully destroyed container  

"cf push" of any app will also fail with the above errors during staging. 

Cause

Detailed Verification Steps 

The Bosh commands below assume Bosh CLI v2. 

1. Identify the staging task GUID and Diego cell where the app container is staged.

In a terminal window, ssh to one of the diego_brain VMs from the Ops Manager VM, and run the following command :

cfdot task-events | jq '.data[] | select (.state == "Running") | 
{cell_id: .cell_id, task_guid: .task_guid}'

In another terminal window, push an app :

While the app is being pushed, the events from the staging task will show up on the terminal, where the cfdot command is running. See a sample of the output below:


{
"cell_id": "45828e88-3cab-4d4c-8e56-b1c2d3749574",
"task_guid": "83bfed52-d7a0-47fb-8b0e-4ff52593f7d8"
}
{
"cell_id": "45828e88-3cab-4d4c-8e56-b1c2d3749574",
"task_guid": "83bfed52-d7a0-47fb-8b0e-4ff52593f7d8"
}

Note the cell_id and task_guid from the above output. 

The following steps (2) and (3) are optional. It is only to verify that the Diego auctioneer also got the same request for the staging task. If you skip (2) and (3), go to step (4) directly. 

2. Identify the active auctioneer by running the following command :

$ cfdot locks
{"key":"auctioneer","owner":"410c45b4-4e05-46a7-b83c-14a0b6ab5a02","type":"lock","type_code":1} <---
{"key":"bbs","owner":"5fcaa38d-8380-4a64-b177-79b6fe9df3a0","type":"lock","type_code":1}
{"key":"routing_api_lock","owner":"b7d474a0-08ed-45c6-a285-2b7bc0d4fb48","type":"lock","type_code":1}
{"key":"tps_watcher","owner":"5c534cac-8323-4185-6dd0-0c60ecca647c","type":"lock","type_code":1}

From the above example, 410c45b4-4e05-46a7-b83c-14a0b6ab5a02 is the identifier for the Diego brain VM that holds the lock for the auctioneer. To get the exact name of the VM, use the following command:

bosh -e <env> -d <cf deployment> vms | grep 410c45b4-4e05-46a7-b83c-14a0b6ab5a02
diego_brain/410c45b4-4e05-46a7-b83c-14a0b6ab5a02                    
running az1 10.193.71.39 vm-644093a1-bf0c-499b-9662-4e4c2b84b565 small

3. Bosh ssh to the Diego brain VM and run the following command:

$ bosh -e <env> -d <cf deployment> ssh diego_brain/410c45b4-4e05-46a7-b83c-14a0b6ab5a02

Once ssh'ed to the diego_brain VM, from the auctioneer logs on the active diego_brain VM, verify that the staging task and cell_guid are correct  :

$ sudo -i
# grep 83bfed52-d7a0-47fb-8b0e-4ff52593f7d8 /var/vcap/sys/log/auctioneer/auctioneer.stdout.log
{"timestamp":"1521996416.215080261","source":"auctioneer",
"message":"auctioneer.request.task-auction-handler.create.submitted",
"log_level":1,"data":{"method":"POST","request":"/v1/tasks","session":"1888.1.1",
"tasks":["83bfed52-d7a0-47fb-8b0e-4ff52593f7d8"]}}
{"timestamp":"1521996416.229309082","source":"auctioneer",
"message":"auctioneer.auction.task-added-to-cell","log_level":1,
"data":{"cell-guid":"45828e88-3cab-4d4c-8e56-b1c2d3749574","session":"1889",
"task-guid":"83bfed52-d7a0-47fb-8b0e-4ff52593f7d8"}

4. Bosh ssh to the Diego cell and check garden and rep logs 

$ bosh -e <env> -d <cf deployment> ssh diego_cell/45828e88-3cab-4d4c-8e56-b1c2d3749574

5. Run the following command to check for relevant errors in the rep logs:

$ sudo -i
# grep 83bfed52-d7a0-47fb-8b0e-4ff52593f7d8 /var/vcap/sys/log/rep/rep.stdout.log | grep -i fail
--snip--
{"timestamp":"1521996416.491130590","source":"rep",
"message":"rep.executing-container-operation.task-processor.run-container.containerstore-create.node-create.failed-to-create-container-in-garden",
"log_level":2,"data":{"container-guid":"83bfed52-d7a0-47fb-8b0e-4ff52593f7d8","container-state":"reserved",
"error":"running image plugin create: making image: creating image: applying disk limits:
apply disk limit: \u003cnil\u003e: setting quota to /var/vcap/data/grootfs/store/unprivileged/images/83bfed52-d7a0-47fb-8b0e-4ff52593f7d8:
setting quota limit for projid 2: function not implemented: exit status 1\n: exit status 1",
"guid":"83bfed52-d7a0-47fb-8b0e-4ff52593f7d8","session":"2984.1.3.2.1"}}
--snip--

6. Run the following command to check for relevant errors in the garden logs: 

# grep 83bfed52-d7a0-47fb-8b0e-4ff52593f7d8 /var/vcap/sys/log/garden/garden.stdout.log
 --snip-- 
{"timestamp":"1521996416.403869152","source":"guardian", "message":"guardian.create.create-failed-cleaningup.destroy.finished",
"log_level":1,"data":{"cause":"running image plugin create: making image: creating image: applying disk limits: apply disk limit:
\u003cnil\u003e: setting quota to /var/vcap/data/grootfs/store/unprivileged/images/83bfed52-d7a0-47fb-8b0e-4ff52593f7d8:
setting quota limit for projid 2: function not implemented: exit status 1\n: exit status 1",
"handle":"83bfed52-d7a0-47fb-8b0e-4ff52593f7d8","session":"3333.3.1"}}
--snip--

Resolution

Note: If you are seeing this issue and are applying the resolution below, please make sure you contact Pivotal Support so the issue can be logged with Product Engineering. Also, collect the following information while opening a ticket with Pivotal Support:

a) Copy this script to the diego cell and run it as root; copy os-report.tgz to /var/vcap/sys/log/ directory.
b) Run the following command to download the diego_cell logs and upload to the ticket:

bosh -e <env> -d <windows_deployment> logs diego_cell/<instance_id>

1. Disable grootfs plugin in Elastic Runtime UI. Go to Application Containers, uncheck "Enable the GrootFS container image plugin for Garden RunC" and click Apply Changes. Smoke test errands will now pass successfully. 

er-config-app-vol-svc.png

2. After enabling this feature we recommend recreating all Diego Cells. An easy way to do that is by selecting Recreate all VMs checkbox in Operations Manager UI by going to Director Tile and then to Director Config.

Comments

Powered by Zendesk