While having Private VPC accounts is a great first step – it is only a partial solution – since there are many AWS services that do not require a VPC to make a resource public through network access.
There are hundreds of AWS services – nobody knows them all – so what do you do?
Handling All Those Services
Since each AWS service has its own nuances and possible security problems. The primary way to mitigate this risk is to implement an allowlist strategy for services, which limits the amount of services you need to know in-depth.
Once you have this allowlist, you can cross-reference each service with the Mitigations Per Service matrix below, and ensure you have the proper mitigations in place.
Implementing the Allowlist
When a customer requests an AWS account1, the fill-out form should ask them a set of questions including, “Which AWS services and regions will you be using?”
This can get fed into whatever your account creation automation you have. As a concrete example, let’s say we end up translating to e.g. Terraform local variables:
locals {
services = ["dynamodb", "ec2", "lambda," "s3"]
regions = ["us-east-2"]
}
Which will further be passed into a configuration module, an SCP module, or both, for the subaccount – in order to establish a security baseline.
Service-Specific Mitigations
Rather than simply giving the subaccount admin IAM role "lambda:*" and calling it a day, you should go further and ask service-specific questions:
- “Will all Lambda functions be deployed in one of our VPCs?”
- “Will all Lambda functions require AWS_IAM authentication?”
- “Can you use our [company-specific] Terraform module for creating these Lambda functions?”
Which can, if for some reason the customer needs to have publicly accessible Lambda function URLs, end up getting translated to:
module "project_x_account" {
source = "../../modules/subaccounts/scps"
services = local.services
regions = local.regions
deny_auth_type_not_iam_lambda = false
}
The deny_auth_type_not_iam_lambda argument, will turn-off a mitigation that is on by default, via the module:
{
include = var.deny_auth_type_not_iam_lambda,
effect = "Deny"
actions = [
"lambda:CreateFunctionUrlConfig",
"lambda:UpdateFunctionUrlConfig",
]
resources = ["arn:aws:lambda:*:*:function/*"]
conditions = [
{
test = "StringNotEquals"
variable = "lambda:FunctionUrlAuthType"
values = [
"AWS_IAM",
]
},
]
},
Mitigation Types
For preventing public network access, there are 4 possible account-wide mitigation types per service:
- Don’t have public subnets
- Don’t have an Internet Gateway (IGW)
- Use a condition key
- Use a condition key in combination with resource type limits
It is important to note we are at the mercy of AWS as to which mitigations are available/applicable for each specific service.
Ideally, every service with resources created outside of a VPC would have a condition key (similar to lambda:FunctionUrlAuthType above) we could block and then we would have solid security invariants. Unfortunately, this is not the case for most services.
No Public Subnets
As long as there are no public subnets in an account, the service cannot have Internet-facing resources.
ELB v1 / v2 is an example.
No IGWs
As long as there are no IGWs in an account, the service cannot have Internet-facing resources.
Global Accelerator is an example.
Condition Key
There is a specific IAM condition key, that under some condition, can be blocked.
Lambda is an example.
Condition Key + Resource Type Limits
Both a condition key needs to be used, as well as limiting the types of resources to which that condition key is applied.
API Gateway is an example.
No-Mitigations-Available Services
These are services where the 4 above mitigation types are completely useless for preventing public network access.
EKS is an example.
The Solution for Services with No Mitigations
If your customers need eks:CreateCluster, then you need to rely on a trusted IAM role.
(As well as alerting, but that is reactive.)
Mitigations Per Service
| Service | No Public Subnets | No IGW | Condition Key | Condition Key + Resource Type Limits | Need To Ban |
|---|---|---|---|---|---|
| API Gateway | N/A | N/A | Partial | Yes | No |
| Athena | N/A | N/A | N/A | N/A | No |
| CloudFront | ? | ? | ? | ? | ? |
| DynamoDB | N/A | N/A | N/A | N/A | No |
| ECS | ? | ? | ? | ? | ? |
| EC2 | Yes-ish | Yes | Partial | N/A | No |
| EKS | Partial | Partial | No | N/A | Yes |
| ElasticCache | N/A | N/A | N/A | N/A | No |
| ELB v1 / v2 | Yes | Yes | No | N/A | No |
| EMR | ? | ? | ? | ? | ? |
| Global Accelerator | No | Yes | No | No | No |
| Lambda | N/A | N/A | Yes | N/A | No |
| Lightsail | ? | ? | ? | ? | ? |
| Neptune | ? | ? | ? | ? | ? |
| RDS | Yes-ish | Yes | No | N/A | No |
| Redshift | Yes-ish | Yes | No | N/A | No |
| S3 | N/A | N/A | N/A | N/A | No |
| SNS | N/A | N/A | N/A | N/A | No |
| SQS | N/A | N/A | N/A | N/A | No |
TBD:
- CloudFront
- ECS
- EMR
- Lightsail
- Neptune
API Gateway
Summary: Either ban the whole service or have a limited allow solely for private REST APIs.
Only REST APIs can be private, HTTP and Websocket APIs cannot be.
So you have to limit the allowed resources to REST APIs, and ensure "apigateway:Request/EndpointType" is private:
"Resource": [
"arn:aws:apigateway:us-east-1::/restapis",
"arn:aws:apigateway:us-east-1::/restapis/??????????"
],
...
"ForAllValues:StringEqualsIfExists": {
"apigateway:Request/EndpointType": "PRIVATE",
"apigateway:Resource/EndpointType": "PRIVATE"
}
Note: This takes the cake for weirdest and most error prone IAM example I have ever seen. If you can turn this into a Deny statement somehow I’ll give you a prize.
CloudFront
???
ECS
???
ELB v1 / v2
Load-balancers with scheme “internet-facing” can only exist in public subnets, this is enforced at creation time.
TODO: What would happen if I changed the route table?
EKS
Summary: Ban the eks:CreateCluster action
eks:CreateCluster creates a public Kubernetes API endpoint in another AWS account you do not own.
Although, by default, the API requires an authorized token to perform sensitive actions2, it can still be hit by the Internet and does not need an IGW.
Sidenote About the EKS Cluster Role
Before creating a cluster, you must have a cluster IAM role with the AmazonEKSClusterPolicy AWS managed policy attached.
This is an overpermissive one-size-fits-all policy.
You can use a combination of NotAction and Deny to limit your EKS cluster role to what is actually needed.
One of the more dangerous permissions is CreateLoadBalancer.
If you already are limited to private subnets or have no IGWs, then it cannot create public-facing load balancers. As an alternative however, in a pinch you can bring your own load balancer3 to EKS, rather than have the AWS Load Balancer Controller create them.
This will enable you to limit operators of the EKS cluster to Ring 3 style access.
ElasticCache
All ElasticCache instances are private and designed to be used internally to your VPC, so without e.g. using an EC2 as a NAT instance, there is no concern.
EMR
???
Global Accelerator
For global accelerator, you still need a symbolic IGW in the VPC.
Lambda
Summary: There is a FunctionUrlAuthType condition key.
Block
"StringEquals": {
"lambda:FunctionUrlAuthType": "NONE"
}
or more specifically
"StringNotEquals": {
"lambda:FunctionUrlAuthType": "AWS_IAM"
}
to prevent the function URL endpoint will be public unless you implement your own authorization logic in your function.”
Note: Requiring a lambda lives in a customer-owned VPC only affects Egress, not Ingress. So it is irrelevant here.
Lightsail
???
Neptune
???
RDS
Summary: No IGW, no problem.
Same as EC2.
Redshift
Summary: No IGW, no problem.
The ElasticIp argument in both CreateCluster and ModifyCluster says, “The cluster must be provisioned in EC2-VPC [as oppsosed to EC2-Classic, I assume] and publicly-accessible through an Internet gateway.”
Based on this, I am concluding that no IGW / no public subnets] would make it so that you cannot access a Redshift cluster. I am not testing this, however, since it would be too expensive. EC2 does not require an instance be in a public subnet to assign an EIP to it, so it would be odd for Redshift to, however RDS documentation says something similar.
As further evidence, Instructions for turning a private cluster public state: “Note: An Elastic IP address is required. If you do not choose one, an address will be randomly assigned to you.”
In only the ModifyCluster documentation it states: “Only clusters in VPCs can be set to be publicly available.” The CreateCluster documentation does not state this.
No condition keys exist for e.g. Encrypted or ElasticIP or PubliclyAccessible.
Footnotes
-
For sandbox accounts, this may not be realistic, so you may need to manually maintain a deny list of dangerous services. Granted, you should be seamlessly re-creating sandbox accounts (or, less ideally, nuking) regularly and only have public data in them. ↩
-
By default only the
system:public-info-viewercluster role provides access to a set of endpoints for thesystem:unauthenticatedgroup. These endpoints (e.g./healthz,/livez,/readyz, and/version) are used by Network Load Balancers to perform health checks. ↩ -
TODO: Find the link about this. ↩