[Rough Draft] Preventing Accidental Internet-Exposure of AWS Resources (Part 2: Handling All Services)

While having Private VPC accounts is a great first step – it is only a partial solution – since there are many AWS services that do not require a VPC to make a resource public through network access.

There are hundreds of AWS services – nobody knows them all – so what do you do?

Handling All Those Services

Since each AWS service has its own nuances and possible security problems. The primary way to mitigate this risk is to implement an allowlist strategy for services, which limits the amount of services you need to know in-depth.

Once you have this allowlist, you can cross-reference each service with the Mitigations Per Service matrix below, and ensure you have the proper mitigations in place.

Implementing the Allowlist

When a customer requests an AWS account¹, the fill-out form should ask them a set of questions including, “Which AWS services and regions will you be using?”

This can get fed into whatever your account creation automation you have. As a concrete example, let’s say we end up translating to e.g. Terraform local variables:

locals {
  services = ["dynamodb", "ec2", "lambda," "s3"]
  regions = ["us-east-2"]
}

Which will further be passed into a configuration module, an SCP module, or both, for the subaccount – in order to establish a security baseline.

Service-Specific Mitigations

Rather than simply giving the subaccount admin IAM role "lambda:*" and calling it a day, you should go further and ask service-specific questions:

“Will all Lambda functions be deployed in one of our VPCs?”
“Will all Lambda functions require AWS_IAM authentication?”
“Can you use our [company-specific] Terraform module for creating these Lambda functions?”

Which can, if for some reason the customer needs to have publicly accessible Lambda function URLs, end up getting translated to:

module "project_x_account" {
  source = "../../modules/subaccounts/scps"

  services = local.services
  regions = local.regions

  deny_auth_type_not_iam_lambda = false
}

The deny_auth_type_not_iam_lambda argument, will turn-off a mitigation that is on by default, via the module:

    {
      include = var.deny_auth_type_not_iam_lambda,
      effect = "Deny"
      actions = [
        "lambda:CreateFunctionUrlConfig",
        "lambda:UpdateFunctionUrlConfig",
      ]
      resources = ["arn:aws:lambda:*:*:function/*"]
      conditions = [
        {
          test     = "StringNotEquals"
          variable = "lambda:FunctionUrlAuthType"

          values = [
            "AWS_IAM",
          ]
        },
      ]
    },

Mitigation Types

For preventing public network access, there are 4 possible account-wide mitigation types per service:

Don’t have public subnets
Don’t have an Internet Gateway (IGW)
Use a condition key
Use a condition key in combination with resource type limits

It is important to note we are at the mercy of AWS as to which mitigations are available/applicable for each specific service.

Ideally, every service with resources created outside of a VPC would have a condition key (similar to lambda:FunctionUrlAuthType above) we could block and then we would have solid security invariants. Unfortunately, this is not the case for most services.

No Public Subnets

As long as there are no public subnets in an account, the service cannot have Internet-facing resources.

ELB v1 / v2 is an example.

No IGWs

As long as there are no IGWs in an account, the service cannot have Internet-facing resources.

Global Accelerator is an example.

Condition Key

There is a specific IAM condition key, that under some condition, can be blocked.

Lambda is an example.

Condition Key + Resource Type Limits

Both a condition key needs to be used, as well as limiting the types of resources to which that condition key is applied.

API Gateway is an example.

No-Mitigations-Available Services

These are services where the 4 above mitigation types are completely useless for preventing public network access.

EKS is an example.

The Solution for Services with No Mitigations

If your customers need eks:CreateCluster, then you need to rely on a trusted IAM role.

(As well as alerting, but that is reactive.)

Mitigations Per Service

Service	No Public Subnets	No IGW	Condition Key	Condition Key + Resource Type Limits	Need To Ban
API Gateway	N/A	N/A	Partial	Yes	No
Athena	N/A	N/A	N/A	N/A	No
CloudFront	?	?	?	?	?
DynamoDB	N/A	N/A	N/A	N/A	No
ECS	?	?	?	?	?
EC2	Yes-ish	Yes	Partial	N/A	No
EKS	Partial	Partial	No	N/A	Yes
ElasticCache	N/A	N/A	N/A	N/A	No
ELB v1 / v2	Yes	Yes	No	N/A	No
EMR	?	?	?	?	?
Global Accelerator	No	Yes	No	No	No
Lambda	N/A	N/A	Yes	N/A	No
Lightsail	?	?	?	?	?
Neptune	?	?	?	?	?
RDS	Yes-ish	Yes	No	N/A	No
Redshift	Yes-ish	Yes	No	N/A	No
S3	N/A	N/A	N/A	N/A	No
SNS	N/A	N/A	N/A	N/A	No
SQS	N/A	N/A	N/A	N/A	No

TBD:

CloudFront
ECS
EMR
Lightsail
Neptune

API Gateway

Summary: Either ban the whole service or have a limited allow solely for private REST APIs.

Only REST APIs can be private, HTTP and Websocket APIs cannot be.

So you have to limit the allowed resources to REST APIs, and ensure "apigateway:Request/EndpointType" is private:

"Resource": [
"arn:aws:apigateway:us-east-1::/restapis",
"arn:aws:apigateway:us-east-1::/restapis/??????????"
],
...
"ForAllValues:StringEqualsIfExists": {
  "apigateway:Request/EndpointType": "PRIVATE",
  "apigateway:Resource/EndpointType": "PRIVATE"
}

Note: This takes the cake for weirdest and most error prone IAM example I have ever seen. If you can turn this into a Deny statement somehow I’ll give you a prize.

CloudFront

???

ECS

???

ELB v1 / v2

Load-balancers with scheme “internet-facing” can only exist in public subnets, this is enforced at creation time.

TODO: What would happen if I changed the route table?

EKS

Summary: Ban the eks:CreateCluster action

eks:CreateCluster creates a public Kubernetes API endpoint in another AWS account you do not own.

Although, by default, the API requires an authorized token to perform sensitive actions², it can still be hit by the Internet and does not need an IGW.

Sidenote About the EKS Cluster Role

Before creating a cluster, you must have a cluster IAM role with the AmazonEKSClusterPolicy AWS managed policy attached.

This is an overpermissive one-size-fits-all policy.

You can use a combination of NotAction and Deny to limit your EKS cluster role to what is actually needed.

One of the more dangerous permissions is CreateLoadBalancer.

If you already are limited to private subnets or have no IGWs, then it cannot create public-facing load balancers. As an alternative however, in a pinch you can bring your own load balancer³ to EKS, rather than have the AWS Load Balancer Controller create them.

This will enable you to limit operators of the EKS cluster to Ring 3 style access.

ElasticCache

All ElasticCache instances are private and designed to be used internally to your VPC, so without e.g. using an EC2 as a NAT instance, there is no concern.

EMR

???

Global Accelerator

For global accelerator, you still need a symbolic IGW in the VPC.

Lambda

Summary: There is a FunctionUrlAuthType condition key.

Block

"StringEquals": {
  "lambda:FunctionUrlAuthType": "NONE"
}

or more specifically

"StringNotEquals": {
    "lambda:FunctionUrlAuthType": "AWS_IAM"
}

to prevent the function URL endpoint will be public unless you implement your own authorization logic in your function.”

Note: Requiring a lambda lives in a customer-owned VPC only affects Egress, not Ingress. So it is irrelevant here.

Lightsail

???

Neptune

???

RDS

Summary: No IGW, no problem.

Same as EC2.

Redshift

Summary: No IGW, no problem.

The ElasticIp argument in both CreateCluster and ModifyCluster says, “The cluster must be provisioned in EC2-VPC [as oppsosed to EC2-Classic, I assume] and publicly-accessible through an Internet gateway.”

Based on this, I am concluding that no IGW / no public subnets] would make it so that you cannot access a Redshift cluster. I am not testing this, however, since it would be too expensive. EC2 does not require an instance be in a public subnet to assign an EIP to it, so it would be odd for Redshift to, however RDS documentation says something similar.

As further evidence, Instructions for turning a private cluster public state: “Note: An Elastic IP address is required. If you do not choose one, an address will be randomly assigned to you.”

In only the ModifyCluster documentation it states: “Only clusters in VPCs can be set to be publicly available.” The CreateCluster documentation does not state this.

No condition keys exist for e.g. Encrypted or ElasticIP or PubliclyAccessible.

Footnotes

For sandbox accounts, this may not be realistic, so you may need to manually maintain a deny list of dangerous services. Granted, you should be seamlessly re-creating sandbox accounts (or, less ideally, nuking) regularly and only have public data in them. ↩
By default only the system:public-info-viewer cluster role provides access to a set of endpoints for the system:unauthenticated group. These endpoints (e.g. /healthz, /livez, /readyz, and /version) are used by Network Load Balancers to perform health checks. ↩
TODO: Find the link about this. ↩