How to Implement Robust Security Guardrails Using Policy as Code
Posted on
Welcome to the third post in our IDP Best Practices series, where we explore how to implement policy as code with Pulumi CrossGuard to create deployment guardrails that make self-service infrastructure both powerful and safe.
Platform engineering presents a fundamental tension: we want to enable developer velocity while maintaining security and compliance. Every platform team faces the same question: how do you give teams the freedom to deploy infrastructure quickly without compromising on safety, security, or organizational standards? The answer isn’t to choose between speed and safety, but rather to embrace automated guardrails powered by policy as code that make both possible simultaneously.
This post is part of our IDP Best Practices series:
- How to Build an Internal Developer Platform: Strategy, Best Practices, and Self-Service Infrastructure
- Build Golden Paths with Infrastructure Components and Templates
- Deployment Guardrails with Policy as Code (you are here)
- Day 2 Platform Operations: Automating Drift Detection and Remediation
- Extend Your IDP for AI Applications: GPUs, Models, and Cost Controls
- Next-Gen IDPs: How to Modernize Legacy Infrastructure with Pulumi
The Platform Engineering Challenge: Speed vs. Safety
Let me tell you a story that perfectly captures the platform engineering dilemma. Statsig, a fast-growing feature flag platform processing an incredible 2 trillion events daily, had one infrastructure engineer named Jason who handled all infrastructure requests. Everything worked smoothly until Jason went on parental leave. Suddenly, the entire engineering organization ground to a halt. Infrastructure requests piled up, deployments slowed to a crawl, and the team faced a stark realization: their entire infrastructure capability depended on a single person.
This crisis forced them to confront a fundamental question that every growing engineering team eventually faces: How do you enable self-service infrastructure without sacrificing security, compliance, or operational stability? The answer wasn’t to hire more Jasons or to lock down infrastructure even tighter. Instead, they discovered that deployment guardrails with policy as code could transform their infrastructure conversation from “talk to the infrastructure person” to “ship with confidence, knowing guardrails will catch any issues.”
Think of it like building a system of roads with guardrails. When you create well-designed infrastructure components with proper safety barriers, teams can drive fast and confidently, knowing they’re protected from going off the cliff. The guardrails don’t slow them down; they enable speed by removing fear.
Understanding Platform Engineering Layers
Before we dive into implementing guardrails, we need to understand the architecture of a modern platform and where policies fit within it. Think of your platform as having three distinct layers, each with different security and access requirements.
At the foundation, you have your Layer 1: Foundational Infrastructure, which includes security controls, shared networking, and identity management systems like OIDC and IAM. Platform teams typically manage this layer with strict access controls because it forms the security backbone of your entire infrastructure.
Building on that foundation is Layer 2: Shared Infrastructure, which encompasses VPCs, compute platforms, and load balancers. These are standardized components that teams can use but with some room for customization based on specific needs.
The most dynamic layer is Layer 3: The Workloads Layer, where deployable artifacts, pipelines, and operability tools live. This is where the rubber meets the road, where most self-service activity happens and, consequently, where guardrails become absolutely critical. Developers need maximum freedom at this layer to innovate and ship quickly, but this freedom also introduces the most risk. It’s precisely here that policy-driven guardrails prove their worth, enabling that freedom while automatically preventing dangerous configurations.
What Are Deployment Guardrails?
Deployment guardrails are automated policies that act as your infrastructure’s safety net. Rather than relying on manual reviews or hoping developers remember all the security requirements, guardrails automatically prevent misconfigurations before they reach production. They enforce security standards without human intervention, guide developers toward best practices through immediate feedback, and enable safe self-service by catching issues at the earliest possible moment.
A helpful analogy is to think of guardrails like type checking in programming languages. Just as TypeScript doesn’t restrict your ability to write JavaScript but rather catches type errors before runtime, deployment guardrails don’t limit your infrastructure creativity. They simply ensure you’re following secure patterns and catch potentially dangerous configurations before they cause problems in production.
Introducing Pulumi CrossGuard: Policy as Code
Pulumi CrossGuard is Pulumi’s policy as code framework that brings the same engineering rigor to compliance and security that you apply to your application code. Instead of maintaining policy documents in wikis or relying on manual reviews, you can write policies in familiar programming languages like Python, TypeScript, or Go. These policies then enforce themselves across all your cloud resources and providers, running at different stages of the deployment lifecycle and integrating seamlessly with your CI/CD pipelines for automated enforcement.
Key Policy Types
CrossGuard supports two fundamental types of policies, each serving different validation needs:
Resource Policies: Validate individual resources
def restrict_dangerous_ports(args: ResourceValidationArgs, report_violation: ReportViolation):
if args.resource_type == "aws:lb/targetGroup:TargetGroup":
port = args.props.get("port")
dangerous_ports = ["22", "23", "3389"]
if str(port) in dangerous_ports:
report_violation("Dangerous port detected. Avoid using SSH, Telnet, or RDP ports.")
Stack Policies: Validate relationships across resources
def validate_microservice_encryption(args: StackValidationArgs, report_violation: ReportViolation):
microservice_components = []
s3_buckets = []
for resource in args.resources:
if resource.resource_type == "custom:infrastructure:Microservice":
microservice_components.append(resource)
elif resource.resource_type == "aws:s3/bucket:Bucket":
s3_buckets.append(resource)
if microservice_components and s3_buckets:
for bucket in s3_buckets:
encryption = bucket.props.get("serverSideEncryptionConfiguration")
if not encryption:
report_violation("S3 bucket must have encryption enabled when used with microservice components.")
Building Practical Guardrails: Real-World Examples
Let’s walk through building guardrails for a microservice platform, starting with the component from our previous workshop.
Example 1: Port Security Policy
Problem: Developers might accidentally expose services on dangerous ports like SSH (22) or RDP (3389).
Solution: A policy that blocks dangerous port configurations.
import pulumi_policy as policy
def restrict_dangerous_ports_validation(args, report_violation):
"""Prevent services from running on dangerous ports."""
if args.resource_type == "aws:lb/targetGroup:TargetGroup":
port = args.props.get("port")
dangerous_ports = ["22", "23", "3389", "5432", "3306"]
if str(port) in dangerous_ports:
report_violation(
f"Port {port} is not allowed. This port is commonly used for "
f"administrative services and should not be exposed via load balancer."
)
restrict_dangerous_ports = policy.ResourceValidationPolicy(
name="restrict-dangerous-ports",
description="Prevent services from using dangerous ports",
validate=restrict_dangerous_ports_validation,
enforcement_level=policy.EnforcementLevel.MANDATORY
)
Example 2: Resource Limits Policy
Problem: Teams might request oversized resources, leading to cost overruns.
Solution: A policy that enforces reasonable resource limits with advisory warnings.
def limit_memory_validation(args, report_violation):
"""Limit memory allocation for microservices."""
if args.resource_type == "custom:infrastructure:Microservice":
# Extract memory from component tags or properties
memory_str = args.props.get("memory", "512Mi")
# Parse memory value (assuming format like "2Gi", "1024Mi")
if memory_str.endswith("Gi"):
memory_gb = float(memory_str[:-2])
if memory_gb > 1:
report_violation(
f"Memory allocation of {memory_str} exceeds recommended limit of 1Gi "
f"for microservices. Consider optimizing your application or "
f"contact the platform team for exceptions."
)
limit_memory = policy.ResourceValidationPolicy(
name="limit-microservice-memory",
description="Enforce reasonable memory limits for microservices",
validate=limit_memory_validation,
enforcement_level=policy.EnforcementLevel.ADVISORY
)
Example 3: Cross-Resource Security Policy
Problem: When microservices use S3 buckets, encryption should be mandatory.
Solution: A stack policy that validates encryption across related resources.
def microservice_s3_encryption_validation(args, report_violation):
"""Ensure S3 buckets used with microservices have encryption enabled."""
microservice_resources = []
s3_buckets = []
# Collect relevant resources
for resource in args.resources:
if resource.resource_type == "custom:infrastructure:Microservice":
microservice_resources.append(resource)
elif resource.resource_type == "aws:s3/bucket:Bucket":
s3_buckets.append(resource)
# If we have both microservices and S3 buckets, check encryption
if microservice_resources and s3_buckets:
for bucket in s3_buckets:
sse_config = bucket.props.get("serverSideEncryptionConfiguration")
if not sse_config:
report_violation(
f"S3 bucket '{bucket.name}' must have server-side encryption "
f"enabled when used with microservice components."
)
microservice_s3_encryption = policy.StackValidationPolicy(
name="microservice-s3-encryption",
description="Ensure S3 buckets used with microservices are encrypted",
validate=microservice_s3_encryption_validation,
enforcement_level=policy.EnforcementLevel.MANDATORY
)
Policy Enforcement Models
Pulumi CrossGuard supports multiple enforcement models to fit different workflows, and understanding when to use each model is crucial for effective policy implementation.
The Preventative Model
The most common approach is the preventative model, where policies run before resources are created. This gives you fast feedback and prevents bad deployments from ever reaching the cloud. When you run pulumi preview
, policies are evaluated immediately, and if violations are found during pulumi up
, the deployment is blocked. This model works particularly well for validating port configurations, resource limits, and other settings that you can check without needing to see the actual deployed resource.
# Policies run automatically during preview and deployment
pulumi preview # Policies evaluated here
pulumi up # Policies block deployment if violations found
The Detective Model
Sometimes you need to validate the actual cloud state, including auto-generated values like ARNs or IP addresses. That’s where the detective model comes in. These policies run after resources are created, making them perfect for compliance scanning and drift detection. While they can’t prevent initial deployment of non-compliant resources, they excel at catching configuration drift and validating properties that only exist after deployment.
CI/CD Integration
The third model integrates policies directly into your deployment pipeline. This ensures consistent enforcement across all teams and creates natural deployment gates. For example, you might configure GitHub Actions to run policy validation on every pull request, blocking merges if violations are found. This approach combines the best of both worlds: early feedback during development and guaranteed enforcement before production.
# GitHub Actions example
- name: Run Policy Validation
run: |
pulumi preview --policy-pack ./policies
if [ $? -ne 0 ]; then
echo "Policy violations found. Deployment blocked."
exit 1
fi
Policy Remediation: Beyond Detection
Modern policy frameworks don’t just detect violations; they can automatically fix them:
def auto_tag_resources(args, report_violation):
"""Automatically add required tags to resources."""
if args.resource_type == "aws:s3/bucket:Bucket":
tags = args.props.get("tags", {})
if "Department" not in tags:
# Instead of just reporting, fix the issue
if not hasattr(args, 'tags'):
args.props["tags"] = {}
args.props["tags"]["Department"] = "Engineering"
return args.props # Return modified properties
return None # No changes needed
auto_tag_policy = policy.ResourceValidationPolicy(
name="auto-tag-resources",
description="Automatically add required tags",
validate=auto_tag_resources,
enforcement_level=policy.EnforcementLevel.REMEDIATE
)
Server-Side Policy Enforcement
For enterprise deployments, Pulumi provides server-side policy enforcement that ensures policies can’t be bypassed. The process starts by publishing your policies to your Pulumi organization with pulumi policy publish ./my-policies
. Once published, you can create policy groups that combine multiple policies with specific enforcement levels, targeting particular stacks or environments while configuring exceptions for special cases. The beauty of this approach is that policies run automatically without requiring CLI flags, providing consistent governance across your entire organization without relying on developers to remember to include policy packs in their commands.
Compliance-Ready Policies
While custom policies address your specific organizational needs, compliance requirements often follow industry standards. Pulumi provides hundreds of pre-built policies for common compliance frameworks:
import { PolicyManager } from "@pulumi/policy";
import { policyManager } from "@pulumi/compliance-ready-policies/policy-manager";
new PolicyPack("aws-compliance-ready-policies-typescript", {
policies: [
...policyManager.selectPolicies({
vendors: ["aws"],
services: ["ec2", "s3"],
severities: ["medium", "high", "critical"],
topics: ["encryption"],
frameworks: ["pcidss"],
}, "mandatory"),
],
});
This automatically includes policies for major compliance frameworks like PCI DSS for payment card industry standards, SOC 2 for security and compliance controls, ISO 27001 for information security management, and CIS Benchmarks for security configuration standards.
Best Practices for Policy Implementation
After implementing policies at dozens of organizations, we’ve learned that successful policy adoption follows predictable patterns. The key is to adopt a product mindset: think of your policies as a product serving your development teams, not as bureaucratic rules imposed from above. Start small and iterate based on real feedback rather than trying to implement a comprehensive policy framework from day one.
Start Small and Iterate
Begin with just two or three critical policies that address your most pressing risks. Use advisory enforcement initially, which warns developers about violations but doesn’t block deployments. This gives your team time to understand and adapt to the policies. Only after gathering feedback and refining the policies should you graduate to mandatory enforcement.
Provide Clear, Actionable Error Messages
Nothing frustrates developers more than cryptic policy violations. Your error messages should explain not just what’s wrong, but how to fix it:
def good_error_message(args, report_violation):
if violation_detected:
report_violation(
"S3 bucket encryption is required for compliance. "
"Add serverSideEncryptionConfiguration to fix this. "
"See: https://docs.company.com/s3-encryption"
)
Embrace Progressive Enforcement
Think of enforcement levels as a dial, not a switch. Start with advisory mode to warn about issues, move to mandatory to block deployments, and eventually implement remediation to automatically fix common issues. This progression gives teams time to adapt while gradually raising the security bar.
Test Your Policies Thoroughly
Policies are code, and like any code, they need testing. Write unit tests for your policies to ensure they catch violations correctly and don’t produce false positives:
# Test policies with unit tests
def test_dangerous_port_policy():
# Mock resource with port 22
args = create_mock_args("aws:lb/targetGroup:TargetGroup", {"port": 22})
violations = run_policy(restrict_dangerous_ports, args)
assert len(violations) == 1
assert "dangerous port" in violations[0].message.lower()
Document Escape Hatches
No matter how well-designed your policies are, there will always be edge cases. Rather than forcing teams to work around policies in dangerous ways, provide clear, documented escape hatches. Create processes for requesting exceptions, handling emergency exemptions, and appealing for policy changes. As one platform engineer put it, you need “guardrails outside the guardrails” because platform adoption will never be 100%, and that’s okay. This transparency builds trust and ensures policies remain effective enablers rather than bureaucratic obstacles to circumvent.
Real-World Success: Statsig’s Transformation
Remember Statsig’s infrastructure crisis when Jason went on parental leave? Let’s look at how they transformed their platform using the guardrails approach we’ve been discussing. After implementing policy-driven guardrails, their infrastructure story completely changed.
As Tyrone Wong, Infrastructure Engineer at Statsig, explains: “We had a single infra owner. Parental leave forced us to build self-service, and that’s when everything clicked. Developers open a PR, Pulumi previews the change right in the PR, and CI blocks risky changes. That’s how you move fast safely. The magic was turning ’talk to the infra person’ into ‘ship with guardrails.’”
The results speak for themselves. Deployment times dropped from 1.5 weeks to mere minutes. The infrastructure ticket queue that once haunted their Slack channels disappeared entirely. Security actually improved through automated policy enforcement, eliminating the human error factor. Most importantly, developer satisfaction soared as teams gained the autonomy to ship infrastructure changes without friction or fear.
When asked if developers were actually using the self-service platform, Tyrone’s response was telling: “Yeah, people are actually using it because there’s changes they wanted to get done. Previously it became this giant effort with lots of meetings, people being concerned, and Google docs about what the changes would be. Now people can just put together a PR, we have our discussions there, see if anything gets flagged, and move forward.”
Building Your Policy Strategy
Implementing deployment guardrails isn’t a big-bang transformation; it’s a journey that unfolds in phases. Based on patterns we’ve seen across successful implementations, the first week or two should focus on assessment. Start by understanding your current state, auditing existing infrastructure patterns to identify what teams are actually deploying. Look for common misconfigurations that have caused incidents or near-misses, and document your security and compliance requirements as specific, enforceable rules. Most importantly, survey your developers to understand their pain points with the current infrastructure process, as this assessment forms the foundation for policies that solve real problems rather than creating new ones.
In weeks three and four, build your foundation by implementing three to five core policies that address your most critical risks. Set up CI/CD integration so policies run automatically on every pull request, starting with advisory enforcement to gather feedback without blocking deployments. Create clear documentation and runbooks that explain not just what the policies do, but why they exist and how to work with them.
By the second month, you’re ready to expand. Add compliance-specific policies for regulatory requirements and implement server-side enforcement to ensure policies can’t be bypassed. Create formal processes for policy exemptions and exceptions, and begin measuring policy effectiveness through metrics like violation rates and remediation times.
Remember that policy implementation is never “done.” Continuously monitor violation patterns to identify areas where policies might be too strict or too lenient. Refine policies based on developer feedback and incident data, add automated remediation for common violations to reduce manual fixes, and gradually expand coverage to new services and teams using lessons learned from early adopters.
Measuring Policy Success
You can’t improve what you don’t measure, and policy effectiveness is no exception. Successful teams track metrics across three critical dimensions to validate their policy strategy.
From a security perspective, start by monitoring your policy violation rate: the percentage of deployments that trigger policy warnings or blocks. This tells you whether your policies are catching real issues or creating unnecessary friction. Track time to remediation to understand how quickly teams fix violations when they occur. Most importantly, measure the reduction in security incidents caused by misconfigurations. If your policies aren’t reducing incidents, they’re not addressing the right risks.
Security is only half the equation, though. The developer experience dimension is equally important. Monitor deployment velocity to ensure policies aren’t slowing teams down. Regular developer satisfaction surveys reveal whether teams see policies as helpful guardrails or annoying obstacles. Track support ticket volume; successful policy implementation should reduce infrastructure-related requests as teams become more self-sufficient.
Finally, don’t forget to measure the operational health of your policy system itself. Policy coverage shows what percentage of your resources are protected by policies, while the exemption rate reveals whether policies are practical or if teams constantly need exceptions. Track remediation automation to see how many violations are fixed automatically versus requiring manual intervention. Together, these metrics paint a complete picture of your policy program’s health and help you continuously improve your approach.
Common Pitfalls and How to Avoid Them
Even with the best intentions, policy implementations can go wrong. Through years of helping organizations implement policy as code, we’ve identified four critical pitfalls that can derail your efforts.
The first and most common trap is over-engineering policies. It’s tempting to create sophisticated policies that handle every edge case, but overly complex policies are hard to understand and maintain. Teams won’t trust what they don’t understand. Instead, start with simple, clear policies that address specific risks. Use descriptive naming and provide examples of both compliant and non-compliant configurations. Remember, a simple policy that everyone follows is better than a perfect policy that everyone circumvents.
Equally dangerous is falling into “policy theater”: implementing policies because you think you should, not because they address real risks. This creates friction without improving security. Every policy should trace back to an actual incident, security requirement, or compliance need. If you can’t explain why a policy exists with a concrete example, it probably shouldn’t exist.
Poor developer experience is another common failure point. Nothing kills policy adoption faster than cryptic error messages and unclear remediation steps. When a developer hits a policy violation at 5 PM on a Friday, they need clear guidance on how to fix it. Invest heavily in documentation, helpful error messages, and examples. Consider your policies’ error messages as part of your product’s user experience.
Finally, inadequate testing can erode trust quickly. Policies that break legitimate use cases will be worked around or disabled. Before enforcing any policy, test it thoroughly with real scenarios from your production workloads. Run policies in advisory mode first to catch false positives, and have teams attempt real deployments with policies enabled to ensure they don’t block valid configurations.
The Future of Policy as Code
As infrastructure becomes increasingly complex and distributed, policy as code is evolving beyond simple rule enforcement to become an intelligent layer that understands context and intent. Three major trends are shaping this evolution.
First, we’re seeing the emergence of AI-enhanced policies with smart detection systems that use machine learning to identify anomalies that rule-based policies might miss. These systems learn from your infrastructure patterns and can provide contextual recommendations that adapt based on actual usage. Even more exciting is predictive enforcement: imagine policies that can identify risky patterns before they become violations, guiding developers away from problems they haven’t encountered yet.
Second, the multi-cloud reality is driving the need for universal governance. Organizations need policies that work seamlessly across AWS, Azure, and GCP, with federated enforcement that maintains consistency across multiple cloud accounts and regions. Compliance automation is also maturing, with systems that automatically collect evidence for audits and generate compliance reports without manual intervention.
Finally, policy enforcement is moving closer to where developers actually work. IDE integration will soon provide real-time policy feedback as you write infrastructure code, catching issues before you even attempt to deploy. Self-service exemption workflows will let developers request and receive policy exceptions through automated approval processes. Perhaps most intriguingly, we’re seeing the development of learning policies: systems that improve and adapt based on developer feedback and usage patterns, becoming more helpful over time rather than more restrictive.
Conclusion: Enabling Safe Self-Service at Scale
We started this post with a fundamental tension in platform engineering: the need for both speed and safety. Through the lens of Statsig’s transformation and the technical deep-dive into Pulumi CrossGuard, we’ve seen that this isn’t actually a tension that needs resolving. It’s a false dichotomy that policy as code eliminates entirely.
The key insight from successful platform teams like Statsig is that guardrails don’t restrict freedom; they enable it. When developers know that automated policies will catch dangerous configurations, they gain the confidence to move fast and experiment. When platform teams know that policies automatically enforce security and compliance standards, they can focus on building better platforms instead of reviewing every change. This is the magic of policy as code: it transforms infrastructure governance from a bottleneck into an accelerator.
But perhaps the most important lesson is that policy as code isn’t about saying “no” to developers. It’s about intelligent automation that makes the secure path the path of least resistance. It’s about catching mistakes before they become incidents. It’s about encoding your organization’s hard-won knowledge into systems that help every developer benefit from that experience.
As you embark on your own journey to implement deployment guardrails, remember that perfection isn’t the goal; progress is. Start small, iterate based on feedback, and gradually expand your coverage. Your developers will thank you for the clarity and confidence that comes with well-designed guardrails, and your security team will sleep better knowing that policies are enforced automatically and consistently.
The path from manual reviews to automated guardrails is well-traveled and well-documented. Our complete policy examples provide real-world implementations you can adapt to your needs, while the CrossGuard documentation offers deep technical details for advanced use cases. If you’re on AWS, AWSGuard’s pre-built policies offer immediate value, and our compliance-ready policy catalog addresses specific regulatory requirements.
The future of infrastructure management isn’t about choosing between developer autonomy and operational control. It’s about using policy as code to achieve both, creating platforms that are simultaneously powerful and safe, flexible and compliant, fast and secure.
In our next post, we’ll explore Day 2 Platform Operations, diving into how to maintain infrastructure compliance after deployment and automatically remediate configuration drift. Because getting to production is just the beginning; keeping your infrastructure secure and compliant over time is where the real challenge lies.
Get Started with Pulumi
Use Pulumi's open-source SDK to create, deploy, and manage infrastructure on any cloud.