Automate NetApp ONTAP Storage Management with Private and Secure API Governance

Using automation, frequently leveraging REST APIs, is a common approach for configuring and maintaining solutions like F5 BIG-IP appliances or NetApp ONTAP storage clusters. This article proposes to use the F5 Distributed Cloud HTTPS load balancers, coupled with the API Security module, to make remote ONTAP API access exclusively available to enterprise operations centers, not visible or reachable through the general Internet. Beyond access control, the API features of interest include automatic discovery of active API endpoints, WAF security layered upon the traffic and the ability to impose a positive security model whereby only conforming API activity is allowed to reach ONTAP solutions.

The NetApp deployments being governed by storage administrators can be fully hybrid in nature, including remote on-premises ONTAP clusters, including physical and virtual appliances, and public cloud-based offerings from hyperscalers like AWS, Azure and Google.

 

The Distributed Cloud and ONTAP Testbed

The following diagram demonstrates the overall use case. F5 Distributed Cloud (XC) offers points of presence in approximately 30 worldwide metropolitan networks. The aggregate bandwidth of the interconnections totals more than 14 Tbps. By deploying Customer Edge (CE) nodes within enterprise locations which are also equipped with NetApp ONTAP storage, whether it be on-premises locations are in any of the major hyperscalers, XC will allow for secure, private communications.

A representative test bed was created that utilized two ONTAP deployment types, an on-premises approach based in a facility in Redmond, Washington and a cloud-based Cloud Volumes ONTAP (CVO) in AWS East-2, located in Columbus, Ohio. The on-premises site made use of virtualized ONTAP using an ESXi 7.X hypervisor and the companion NetApp Deploy virtual machine, which as the name implies, is used to instantiate an ONTAP cluster. Primary and secondary operations centers, where ONTAP automation could control deployments with secure Rest API calls were setup in San Jose and Ottawa, Canada.

The ability to project access to exclusively the operations centers harnessed Distributed Cloud’s HTTPS load balancer capabilities. Unlike traditional load balancers which frequently see a “virtual server” projected to one side of a network appliance, and private origin pool members on the other side of the appliance, XC is a distributed load balancer approach. The public side might be projected into the global DNS and incoming transactions are attracted to the nearest global point of presence by the full support of IP anycast. Thus, the public face of the load balancer is distributed to many physical locations. The choice of where is up to the enterprise.

The origin pool may be one or many servers, in this case ONTAP clusters, at one-to-many locations. In this particular use case, the HTTP Load Balancer did not leverage the international points of presence, called Regional Edge (RE) nodes, but rather was exclusively implemented in the CE located at the operations center. The following diagram reflects the focused setup of secured API calls between an operations center and an ONTAP cluster.

Some of the key points to consider. The names used to reach remote ONTAP services are completely private DNS domain names. They are exclusively projected out of the “inside” interface of the San Jose CE site for use by operations staff and hosts, the name of services map to the local inside interface IP address of the CE node. As such, the services are reachable from no other place, ever. The services each map to different, color-coded origin pools, one for the Deploy service (green) and the ONTAP appliance itself (orange). The San Jose-based load balancers will deliver operations traffic to the configured origin pool members in Redmond. The traffic will flow across the high-speed F5 fabric.

 

API Discovery with F5 Distributed Cloud

NetApp provides extensive documentation around supported API calls for ONTAP workflows, an example of which can be found here. Some brief, high-level examples might be contacting the Deploy instance to inquire about ONTAP clusters configured at the remote Redmond site, from the San Jose Operations center:

C:\Users\steve>nslookup netapp04.busdevf5.io
Name:    netapp04.busdevf5.io
Address:  10.150.98.3 <----- Inside interface of local San Jose CE node


C:\Users\steve>curl -k -X GET "https://netapp04.busdevf5.io/api/v3/clusters" -H "accept:application/hal+json" --user admin:De**********
{
  "num_records": 1,
  "records": [
    {
      "id": "de67e558-7c8c-11ee-836d-000c29fa32ee",
      "name": "f5netappclusterE"
    }
  ]
}

 

We see the cluster is named “f5netappclusterE”, the cluster id value can be used as a key value to drill further down with both monitoring and configuration commands for this deployment.

REST API commands directed against the ONTAP appliance itself, via load balancer domain name netapp05.busdevf5.io might be used, as simple examples, to inquire on the NFS protocol services configuration, including export policies:

curl -k -X GET "https://netapp05.busdevf5.io/api/protocols/nfs/services" --user admin:De****

curl -k -X GET "https://netapp05.busdevf5.io/api/protocols/nfs/export-policies" --user admin:De****

Volumes may be configured and monitored, with commands such as the following that call for a list of all volumes and then drills into one particular volume “RAG_Secure_Files” based upon the UUID value returned in the first command.  Output is trimmed for brevity; potentially interesting fields are highlighted in yellow:

curl -k -X GET "https://netapp05.busdevf5.io/api/storage/volumes" -H "accept:application/hal+json"  --user admin:De****

{

  "records": [

    {

      "uuid": "0d9190b3-187d-11ef-ba6d-00a0b8d77b39",

      "name": "RAG_Source_Documents_2024",

          "href": "/api/storage/volumes/0d9190b3-187d-11ef-ba6d-00a0b8d77b39"

      "uuid": "35002d11-187d-11ef-ba6d-00a0b8d77b39",

      "name": "Vectors",

          "href": "/api/storage/volumes/35002d11-187d-11ef-ba6d-00a0b8d77b39"

     "uuid": "af65215a-e717-11ee-86e2-00a0b8d77b39",

     "name": "RAG_Secure_Documents",

         "href": "/api/storage/volumes/af65215a-e717-11ee-86e2-00a0b8d77b39"

 

curl -k -X GET "https://netapp05.busdevf5.io/api/storage/volumes/af65215a-e717-11ee-86e2-00a0b8d77b39" -H "accept:application/hal+json"  --user admin:De*****

 

"create_time": "2024-03-21T00:12:20+00:00",

  "language": "c.utf_8",

  "name": "RAG_Secure_Documents",

  "size": 1130258432,

  "state": "online",

  "style": "flexvol",

aggregates":

      "name": "f5netappclusterE_01_VM_DISK_1",

      "uuid": "642cdad1-5405-4b9b-a889-08050bfa96d7"

svm": {

    "name": "svm0",

    "uuid": "71e04cf8-7c90-11ee-bb71-00a0b8d77b39",

    "_links": {

      "self": {

        "href": "/api/svm/svms/71e04cf8-7c90-11ee-bb71-00a0b8d77b39"

  "space": {

    "size": 1130258432,

    "available": 1072046080,

    "used": 1699840

The extensive set of NetApp APIs are enriched by Distributed Cloud through providing a secure, true multi-site and multi-cloud approach to private connectivity. There is no need to engage in multiple-cloud VPN or remote access solutions. The skillset to troubleshoot multiple cloud access approaches is offset by dealing with a single, consistent connectivity approach, a turnkey platform for private reachability. The following section drills deeper into API-specific security features.

 

API Security for Remote NetApp Control Plane Tasks

Since the F5 Distributed Cloud is an in-line solution, performing a proxy operation through the configured load balancers, it has the advantage of seeing every transaction, in both directions. As seen from the discovered traffic pane below, over the last six hours, 1,400 transactions have been proxied, covering five different API endpoints terminating on the Redmond ONTAP appliance.

 

Interestingly, since the out-of-the-box WAF ruleset is set to maximum risk aversion, the fact that the user agent is “Curl” prompts a high threat level.   This can be ameliorated with a single click in the Security Analytics pane, where the matching WAF events can be added to a WAF exclusion rule, permanently or for a temporary duration of up to seven days.  Also, take note above of the yellow-highlighted ability to download the API Specification. This will be extremely useful and will be covered shortly.

By clicking on any one of the API endpoints, the operator can see a set of probability distribution function (PDF) curves, to see long term QoS performance such as latency boundaries or prevalence of errors.   The following image provides an example of the types of metrics tracked automatically for a sample API endpoint.

 


As mentioned, due to the use of Curl a “report” event, as opposed to a “block” event, is occurring with the selected stringent WAF ruleset chosen for the HTTPS load balancers. The following demonstrates the Security Analytics pane, where a security event is raised, including the provided rationale behind why the report is being generated. In this case, it is simply the presence of the Curl user agent. With one click, an exception to the WAF rules is quickly created to silence the events.

 

 

Implement a Positive Security Model for ONTAP APIs

One of the more interesting use cases of the Distributed Cloud is to preserve uptime for ONTAP solutions, by precluding accidental API commands which might impair service. For instance, API calls are often bundled together in scripts to allow automation to quickly set up, or perhaps take a full detailed configuration inventory of appliances. The Distributed Cloud may be allowed to run for a certain period of time, perhaps 48 hours. The resultant discovered API traffic harnessed to create a known-good schema of the expected traffic.

The file format saved is specified by the Open API Specification (OAS) and is historically often called a Swagger file. This auto-Swagger generation feature of XC allows an enterprise to immediately re-load the saved Swagger file as a definition of acceptable traffic. Future traffic violating the Swagger parameters, not just the endpoint but actual request and response parameter values, too, can be flagged or even stopped in its tracks. Since this is a real-time inline solution, items monitored might be a variable expected to be floating point but in actuality is carrying a string or a JSON array.

When violations are found, the operator may choose to allow a “fall through” approach whereby the traffic is flagged as “Shadow API” traffic or, for ironclad deployments, rather than fall through, all violations can simply be blocked, as in keeping with the strictest interpretation of a positive security model.

After downloading the API specification file, as highlighted in an earlier screenshot from the API discovery pane, we can analyze the Swagger file using a JSON capable viewer, such as the one here.

The full specification discovered automatically by the Distributed Cloud solution can be reviewed, including expected fields and their corresponding data types, again in both request and response directions. Interestingly, some API calls to /api/storage/volumes have inadvertently left a trailing slash. As a result this is correctly recorded as a separate endpoint request.

At this point, the enterprise has a choice. If full blocking of non-matching API activity is warranted, the “API Validation” menu allows this option. However, in many cases, a more nuanced response of fall-through is required. Take, for instance, a CICD pipeline where applications updates are being rolled out frequently, perhaps weekly, but the corresponding API documentation lags for a few days. If the Swagger file is being updated only after that gap in terms of days, the risk of applications simply breaking is quite real.

To accommodate application changes, the "allow but flagging" of non-documented API endpoints occurs, this is the “Shadow” API traffic. This will be brought to the attention of the operator as seen in the following image where traffic involving a NetApp Storage Virtual Machine (SVM), the fundamental unit of multi-tenancy, is proxied but the related API endpoints in this scenario are outside the API definition being used by Distributed Cloud. The following demonstrates the graph depiction of APIs, as opposed to the tabular format, and highlights the fact that it is shadow traffic.

By clicking on the shadow API entry an operator, while perhaps opening a ticket to investigate this activity, might choose to follow either of these paths after some consideration:

  • Rate limit users sending to this API endpoint, perhaps allow 1 request in any given 10 second interval, so as not to break an application but otherwise limiting connectivity by sending HTTP 429 Too Many Requests for excessive traffic
  • Immediately and permanently close access to the API endpoint, by sending HTTP 403 Forbidden responses to any future clients

This approach may allow a simple manner of controlling what ONTAP modifications can be made, or configuration details retrieved, beyond RBAC on the appliance itself.   Simply “learn” an API definition over time, and then implement blocking or throttling of traffic outside these boundaries going forward.

 

Summary

This article demonstrated tactical use cases for surrounding ONTAP API transactions, regardless of on-premises or public cloud-based form factors, with security by means of private communications and deep API-layer visibility and controls.   Possibilities exist beyond this starting point. Consider layering in Distributed Cloud service policies, such as GEO-IP rulesets.   If for regulatory reasons an enterprise chooses to limit the breadth of non-EU operations centers with respect to certain controls over EU-housed ONTAP clusters, while still allowing European operations unfettered control, GEO-IP may help.

HTTPS distributed load balancers, with the ability to project service availability only where it should exist, and the intertwined coupling to remote hybrid origin pools, both on premises and in cloud, was also discussed.   Rich API security and control plane features like rate-limiting or imposing a learned Open API Specification upon critical storage control traffic makes for an interesting approach to governing ONTAP appliances.

Published Jun 26, 2024
Version 1.0
No CommentsBe the first to comment