Pivotal Knowledge Base

Follow

How to get Remaining Memory on Diego Cells?

Environment

Product Version
Pivotal Cloud Foundry® 1.7
Pivotal Elastic Runtime 1.7
JMX Bridge is a Java Management Extensions (JMX) tool for Elastic Runtime  

Purpose 

This article discusses how to determine what's the amount of remaining memory on Diego Cells. The importance of determining the remaining memory will help you determine if there are sufficient resources with pushed applications.

Cause

When you run `cf push <APP>` it returns Insufficient Resources. You may need to perform this to check if the remaining memory is enough for the application that is being pushed.

Procedure

Here is the procedure to get the remaining memory on a Diego Cell, this would help if the application being pushed and will have sufficient memory.

Using JMX Bridge

In order to have a specific size in Megabytes to use JMX Bridge follow the procedure:

  1. Configure PCF Elastic Runtime firehose to include Diego metrics by changing OpenTSDDB Firehose Nozzle from instance to 1. You may skip this step if you already have done during JMX Bridge installation
  2. Using a JMX Client connect to JMX Bridge. IP and port can be obtained using this procedure. Use credentials that were supplied in this step, when you deployed JMX Bridge.
  3. Go to org.cloudfoundry -> diego-cell_partition -> IP -> attributes. Check attributes as mentioned below to check for memory statistics:
  • opentsdb.nozzle.rep.CapacityRemainingMemory attribute - the remaining amount of memory available for this cell to allocate to containers (in megabytes)
  • opentsdb.nozzle.rep.CapacityTotalMemory attribute - the Total amount of memory available for this cell to allocate to containers
  • opentsdb.nozzle.rep.CapacityRemainingContainers - the remaining number of containers this cell can host
  • opentsdb.nozzle.rep.CapacityTotalContainers - Total number of containers this cell can host

Using curl

If you do not have the JMX Bridge installed, you can manually pull capacity stats from each of your Diego Cells.  While this is OK for short term troubleshooting, we do not suggest doing this for long term metric capture or monitoring solutions.  Those solutions should use JMX or the firehose.

To pull the stats with curl simply run:

curl http://<cell-ip>:1800/state

This will return a JSON blob that looks like this.

{
  "RootFSProviders": {
    "docker": {
      "type": "arbitrary"
    },
    "preloaded": {
      "set": {
        "cflinuxfs2": {}
      },
      "type": "fixed_set"
    }
  },
  "AvailableResources": {
    "MemoryMB": 7344,
    "DiskMB": 93630,
    "Containers": 232
  },
  "TotalResources": {
    "MemoryMB": 16048,
    "DiskMB": 113086,
    "Containers": 250
  },
  "LRPs": [
    {
      "process_guid": "9bd58747-729d-41cb-a81a-4627676e6603-b05d77a8-7a4e-47d7-9584-044960f08278",
      "index": 0,
      "domain": "cf-apps",
      "MemoryMB": 64,
      "DiskMB": 1024,
      "RootFs": "",
      "VolumeDrivers": null
    },
    ...
  ],
  "Tasks": [],
  "StartingContainerCount": 0,
  "Zone": "9ed9effa594aa3c4753c",
  "Evacuating": false,
  "VolumeDrivers": []
}

The blocks of interest for capacity purposes are TotalResources and AvailableResources.

Comments

  • Avatar
    NBCU Open Platform

    Thanks Lou-ann.
    It is a very valuable piece of information. I have two follow-up questions on this.

    1. What the process_guid of the LRP denotes? How can I map this to an application say app guid?
    2. In case of Insufficient Resource, how can I make sure, my apps will run on a better provisioned Diego cell which has sufficient resource? Seems like cf restage APP_NAME not working

    Thanks,
    Chandan

  • Avatar
    Lou-ann

    Hi Chandan,

    For #2, I believe some pointers discussed in this KB will be able to address this. https://discuss.zendesk.com/hc/en-us/articles/221251847-Starting-or-staging-an-application-results-in-an-InsufficientResources-error

    Regards,
    Lou-ann

  • Avatar
    Daniel Lynch

    regarding question #1 how to check which app refers to the process_guid

    given process guid
    "process_guid": "66f31610-dfc4-4f60-bf43-22c3b8355aab-9f2af79e-fcd5-4cd0-a6e9-da2705080fca"

    "66f31610-dfc4-4f60-bf43-22c3b8355aab" is the app guid so we can get the app name and info using cf curl
    cf curl /v2/api/66f31610-dfc4-4f60-bf43-22c3b8355aab

    .
    .
    "entity": {
    "name": "apps-manager-js-venerable",
    .
    .

  • Avatar
    Brian OConnell

    The diego cell's rep process moved to secure TLS communications in PCF 1.10. So to get a cells state now is a bit more complicated.

    To report the diego cells state...

    The best way is to

    bosh ssh diego_brain/0
    sudo -i
    cd /var/vcap/jobs/auctioneer/config/certs/rep

    then for each diego cell:

    change diego-cell-X to diego-cell-0,diego-cell-1, diego-cell-2..

    curl -k --cert client.crt --key client.key https://diego-cell-X.node.cf.internal:1801/state | python -m json.tool

  • Avatar
    Todd Robbins

    Here is simpler command for querying all cells in 1.11, 1.10:

    bosh ssh diego_brain/0
    sudo -i
    cd /var/vcap/jobs/auctioneer/config/certs/rep

    for i in `cfdot cells | jq -r '.rep_url'`; do curl -k --cert client.crt --key client.key $i/state | jq '.AvailableResources ,.TotalResources'; done

Powered by Zendesk