When addressing a vast, monolithic account housing a blend of private and public assets, how can you rectify the absence of preventative design principles?

#
First we will look at how to organize different IAM roles, go through every possible way of exposing resources to the Internet, and how to stop each of those possibilities.
Organizing IAM Roles
Think about your IAM roles as privilege rings in x86:
![]()
In AWS, rings 1 through 3 can be different types of IAM roles:
- Ring 0 is SCPs
- Ring 1 can make public-facing resources
- Ring 2 needs to e.g. create load balancers, but none should be Internet-facing.
- Ring 3 needs to e.g. create EC2 instances, but none should be Internet-facing.
With an SCP banning ec2:CreateInternetGateway, rings 1 through 3 cannot create Internet-facing assets no matter what, as ring 0 prevents them – but without this, we need to play IAM games resembling whack-a-mole.

Ideally you would have a web interface of a CI/CD system with access to these IAM permissions, such that giving engineers raw access to them was unnecessary. But the following sections are relevant either way.
Next we will dive deep into one use-case, then list all use-cases, then summarize how to solve them.
Supporting Ring 3 with RunInstances
For Ring 3, we should be able to allow the role to call ec2:RunInstances, under the condition that ec2:AssociatePublicIpAddress is false.
Except that is insufficient: if the subnet given has "map-public-ip-on-launch" set to true, the EC2 will get a public IP.
So we need to allowlist the subnets that have that attribute set to false, by adding the condition that ec2:SubnetID equals an ID on the allowlist.
Maintaining such a list of opaque IDs would be tedious, so in Terraform we can use a data source to grab them:
data "aws_subnets" "public" {
filter {
name = "map-public-ip-on-launch"
values = [false]
}
}
Except that is insufficient: as it does not read from every region.1
So, we need to write our own custom Terraform provider to iterate through all regions.
Finally, we can create an aws_iam_policy_document2 similar to the following abbreviated3 policy:
data "private_subnets" "subnets" {
regions = var.regions_in_use
}
data "aws_iam_policy_document" "allow_private_subnets" {
statement {
sid = "AllowSubnetConds"
actions = [
"ec2:RunInstances",
]
effect = "Allow"
resources = [
"arn:aws:ec2:*:*:subnet/*"
]
condition {
test = "StringEquals"
variable = "ec2:SubnetID"
values = data.private_subnets.subnets.ids
}
}
statement {
sid = "AllowENIConds"
actions = [
"ec2:RunInstances",
]
effect = "Allow"
resources = [
"arn:aws:ec2:*:*:network-interface/*",
]
condition {
test = "Bool"
variable = "ec2:AssociatePublicIpAddress"
values = [
"false",
]
}
condition {
test = "StringEquals"
variable = "ec2:Subnet"
values = data.private_subnets.subnets.arns
}
}
}
Now, when someone in Ring 1 or 2 creates a new subnet, they can also apply this dynamic policy so Ring 3 can automatically launch instances in it.
This is one mole we just whacked. There are many others. This is What Bad Looks Like.
How It Happens
Let’s see if we can list every possible mole that can pop up.
Creation of load balancer in public subnet:

Creation of instance in public subnet:

EC2 added as Target group of load balancer

Lambda gets put as target
…TODO…
IP gets put as target
…TODO…
EIP gets associated with ENI
…TODO…
EIP gets associated with EC2
…TODO…
This is not much of an issue, but if can prevent it ‘for free’ anyway: Creation of public instance in private subnet using ec2:AssociatePublicIpAddress

How To Stop It
Banning IAM Actions
- Cannot create Load Balancers
- Cannot create EIPs
- Cannot modify target groups
- Cannot associate EIPs
Banning IAM Actions Conditionally
- Cannot touch public subnets
- Cannot
ec2:RunInstanceswith - Cannot modify target groups of public load balancers
Footnotes
-
Even if it did,
"aws_subnets"is one of the better data sources. E.g. aws_lb and aws_lbs do not have filters, and the latter fails if there are none. ↩ -
If going with an allowlist approach, you also need statements that cover all the other resource types (image, instance, security-group, volume etc.) ↩