[Rough Draft] Preventing Accidental Internet-Exposure of AWS Resources (Part 3: Tech Debt Mountain)

When addressing a vast, monolithic account housing a blend of private and public assets, how can you rectify the absence of preventative design principles?

alt text

First we will look at how to organize different IAM roles, go through every possible way of exposing resources to the Internet, and how to stop each of those possibilities.

Organizing IAM Roles

Think about your IAM roles as privilege rings in x86:

In AWS, rings 1 through 3 can be different types of IAM roles:

Ring 0 is SCPs
Ring 1 can make public-facing resources
Ring 2 needs to e.g. create load balancers, but none should be Internet-facing.
Ring 3 needs to e.g. create EC2 instances, but none should be Internet-facing.

With an SCP banning ec2:CreateInternetGateway, rings 1 through 3 cannot create Internet-facing assets no matter what, as ring 0 prevents them – but without this, we need to play IAM games resembling whack-a-mole.

alt text

Ideally you would have a web interface of a CI/CD system with access to these IAM permissions, such that giving engineers raw access to them was unnecessary. But the following sections are relevant either way.

Next we will dive deep into one use-case, then list all use-cases, then summarize how to solve them.

Supporting Ring 3 with RunInstances

For Ring 3, we should be able to allow the role to call ec2:RunInstances, under the condition that ec2:AssociatePublicIpAddress is false.

Except that is insufficient: if the subnet given has "map-public-ip-on-launch" set to true, the EC2 will get a public IP.

So we need to allowlist the subnets that have that attribute set to false, by adding the condition that ec2:SubnetID equals an ID on the allowlist.

Maintaining such a list of opaque IDs would be tedious, so in Terraform we can use a data source to grab them:

data "aws_subnets" "public" {
 filter {
   name = "map-public-ip-on-launch"
   values = [false]
 }
}

Except that is insufficient: as it does not read from every region.¹

So, we need to write our own custom Terraform provider to iterate through all regions.

Finally, we can create an aws_iam_policy_document² similar to the following abbreviated³ policy:

data "private_subnets" "subnets" {
  regions = var.regions_in_use
}

data "aws_iam_policy_document" "allow_private_subnets" {
  statement {
    sid = "AllowSubnetConds"

    actions = [
      "ec2:RunInstances",
    ]

    effect = "Allow"

    resources = [
      "arn:aws:ec2:*:*:subnet/*"
    ]

    condition {
      test     = "StringEquals"
      variable = "ec2:SubnetID"

      values = data.private_subnets.subnets.ids
    }
  }

  statement {
    sid = "AllowENIConds"

    actions = [
      "ec2:RunInstances",
    ]

    effect = "Allow"

    resources = [
      "arn:aws:ec2:*:*:network-interface/*",
    ]

    condition {
      test     = "Bool"
      variable = "ec2:AssociatePublicIpAddress"

      values = [
        "false",
      ]
    }

    condition {
      test     = "StringEquals"
      variable = "ec2:Subnet"

      values = data.private_subnets.subnets.arns
    }
  }
}

Now, when someone in Ring 1 or 2 creates a new subnet, they can also apply this dynamic policy so Ring 3 can automatically launch instances in it.

This is one mole we just whacked. There are many others. This is What Bad Looks Like.

How It Happens

Let’s see if we can list every possible mole that can pop up.

Creation of load balancer in public subnet:

alt text

Creation of instance in public subnet:

alt text

EC2 added as Target group of load balancer

alt text

Lambda gets put as target

…TODO…

IP gets put as target

…TODO…

EIP gets associated with ENI

…TODO…

EIP gets associated with EC2

…TODO…

This is not much of an issue, but if can prevent it ‘for free’ anyway: Creation of public instance in private subnet using ec2:AssociatePublicIpAddress

alt text

How To Stop It

Banning IAM Actions

Cannot create Load Balancers
Cannot create EIPs
Cannot modify target groups
Cannot associate EIPs

Banning IAM Actions Conditionally

Cannot touch public subnets
Cannot ec2:RunInstances with
Cannot modify target groups of public load balancers

Footnotes

Even if it did, "aws_subnets" is one of the better data sources. E.g. aws_lb and aws_lbs do not have filters, and the latter fails if there are none. ↩
See this gist for a denylist example. ↩
If going with an allowlist approach, you also need statements that cover all the other resource types (image, instance, security-group, volume etc.) ↩