Ensuring Secure Data Interoperability in Government Agencies: Challenges and Solutions

Today’s security practitioners must be deliberately thoughtful in balancing the ever-evolving risks associated with securing data against adversarial actors versus the benefits of sharing information with allies for better situational awareness resulting in better outcomes. In other words, while avoiding data breaches is still necessary, it is no longer sufficient; in today’s interconnected but siloed world, data sharing is also required. Therefore, enabling seamless, secure data exchange across multiple platforms and agencies is vital for operational efficiency and mission success. 

As adversaries become increasingly sophisticated, a premeditated and purposeful approach is needed to meet these goals simultaneously. This blog intends to highlight the critical considerations around secure data interoperability and outline a technical framework that should be used to implement robust data management and security practices in your agency.

 

A framework for approaching secure data interoperability 

The first step is understanding what secure data interoperability really means. It’s not solely about protecting data, but also about ensuring the data can flow safely and efficiently across various agencies and platforms, who can consume that data in an effective way. We approach these goals by considering a secure data interoperability framework that is divided into two primary concerns, interlinked with each other.

The first concern is focused on the intelligent ingestion and curation of data, which, at the next level of detail, further breaks down into the Plan, Design, and Deliver phases. The second concern is focused on the consumption of data in an effective and secure manner. This concern maps to runtime data-delivery capabilities and is expressed using the OODA loop—Observe, Orient, Decide, Act.

The two interlinked concerns can be thought of as being analogous to continuous integration/continuous deployment (CI/CD) used in software development, but here applied to data. The “left side” of the data framework is responsible for ingesting and curating data, landing the delivered data in a secure store. The “right side” then responds to requests for consumption of that data from one or more of those stores to support orient, decide, and then act -- all while ensuring the access granted is risk-appropriate and the data delivery is secure.

Success using this strategy is predicated on a left side that is grounded in a thoughtful data architecture -- one that defines consistent semantics with well-specified syntax, and is inclusive of a rich metadata vocabulary. Further, a robust data ecosystem requires that the left-side data infrastructure embodies this knowledge via a set of published artifacts, including a comprehensive data glossary, data dictionary, and data catalog, all accessible by the right side of the data framework. The right side can leverage this catalog to effectively identify, locate, and consume these well-defined data elements. Metadata annotations are also key for run-time systems—the annotations are used to both contextualize any retrieved data, as well as to manage access, ensuring that the data is shared securely across multiple disparate data consumers, which we will expand upon next.

 

Implementation of a secure data management infrastructure 

As mentioned earlier, enabling data interoperability begins with planning for data management. In other words, deliberate aforethought around the framework for the “left side” of the pipeline -- the plan, design, and deliver stages -- is crucial to enable the delivery stages of the data pipeline to shepherd the data securely and interoperably from its creation to its utilization. 

In the plan stage, stakeholders in the data ecosystem should start by collaborating to develop a common data architecture and then formalize that communal agreement in the form of a set of core collateral artifacts, much in the same way code libraries and containers are the core artifacts in a software pipeline. These core artifacts will be the enablers for effective data interoperability by providing a machine-consumable and well-defined shared data vocabulary. 

The specific artifacts include a data glossary, a data dictionary, and a data catalog. The data glossary is the root of the data ontology; it defines the semantics and syntax of fundamental data building blocks—effectively, the “atoms” of the data ecosystem—by providing the words for the shared data language used across all data ecosystem participants. The data dictionary builds on the glossary by enumerating the pre-defined collections of these data atoms, similar to the sentences in a language; this would map to the syslog inventory or database table schema in traditional security appliances. Finally, the data catalog acts as a comprehensive inventory of where specific data dictionary elements exist within the ecosystem and how a data consumer can access that data. 

A formal specification of the agreed-upon ontology and constructs identified during the planning phase enables the data needs of the orient and decide phases—efficient discovery and effective consumption—to be met. In this manner, these artifacts provide the linkage that enable the integration of the left side data management framework with the right side data ingestion and consumption framework.  

 

Technologies and best practices for secure run-time data interoperability 

Of course, the solution requires more than just understanding the data and how to communicate it effectively; it also requires security around the ingestion and consumption of data. Solid security policies are grounded in an access control framework -- in colloquial terms, “who is allowed to perform what action, to what data. A mature access control framework allows for risk-informed, context-aware, transaction-granular decisions, so that the configured policies can convey the sophisticated requirements of today’s increasingly nuanced data sharing environments. The use of technologies such as device fingerprinting, User and Entity Behavior Analytics (UEBA), and workload behavior analysis should be leveraged to make more confident and more granular authenticity decisions about the data requestor. By applying a Zero Trust mindset, access control policies for data can mature from a simple binary yes/no authentication decision to a more refined “confidence score” metric. As applied to our data sharing context, policy can specify that more sensitive data requires higher authenticity confidence scores. An adaptive system could also present additional authentication challenges—sometimes referred to as “step-up authentication”—like requiring another MFA factor when a requestor’s confidence score is insufficient for the class of data being accessed.

There are other modern technologies that can also play a vital role in enforcement of run-time authorization decisions, determining if and how a specific class of data client is allowed to access a particular piece of data. One useful pattern is to embed metadata in ephemeral authentication artifacts, such as JSON Web Tokens (JWT) or Secure Production Identity Framework for Everyone (SPIFFE) annotations, in order to efficiently communicate relevant security context. For example, a JWT token could embed the system’s current confidence score of a human client’s authenticity. Or a SPIFFE identifier could embed information about classes of data that must be masked or de-res’ed before sending.

From the lens of interoperability, inline data transformations—ETL pipelines—also play a crucial role. Building upon the foundational data architecture artifacts of the “left-side”, the “right-side” can create inline data transformation pipelines that normalize and enrich data on the fly. Example use-cases for transformations include the addition of context (for example., geolocation, data-producing platform) on incoming data or adjusting the syntax of a timestamp as part of consumption by some legacy software. The ETL pipelines can also tie into security requirements; the earlier example of authorization-aware data masking or lowering the resolution of data is one such use case, where data transformation can be used to meet managed secure interoperability needs.

In short, while specific technologies will continue to evolve, the mindset for secure data access policies should always be considered in a context and risk-aware manner. Embracing the "Zero Trust" model—namely, assuming no trust by default and verifying every request, applying confidence-based identity scores to gauge risk; and using environmental and historical context to assess the integrity of requests—is the basis for managing security risk. In addition, solutions should exploit programmability in the configuration (that is, “policy as code”) not only for adaptive security posture, but also for performing the data transformations required for managed data sharing.

 

Future readiness requires integrating securely interoperable data practices 

Secure data interoperability is not just a technical requirement but a strategic necessity for government agencies. As security professionals, your role in defending against increasingly sophisticated attackers and ensuring seamless and secure data exchange cannot be overstated. Organizations can enhance their capabilities by adhering to a structured framework, implementing robust data management pipelines, and leveraging advanced technologies. We should not lose sight of the fact that the goal is not only to protect but to also enable secure data sharing in an efficient and effective manner across a constellation of peer and partner entities. This capability is core to our continuing mission of refining our data security strategies to both safeguard from adversaries and leverage for allies the national security information we are entrusted with.

Learn more about how F5 public-sector cybersecurity solutions can help your agency.

Published Dec 23, 2024
Version 1.0
No CommentsBe the first to comment