Managing AWS at scale; multi-account strategy

Managing AWS at scale; multi-account strategy

This blog post will talk about some of the reasons behind having an account strategy, the challenges, and some thoughts and suggestions on how you can implement this successfully.

1.jpgImage source

Background

When starting with AWS, a single account for your solutions usually goes a long way, this is when there are few teams and a small number of projects up and running. When you are expanding your AWS usage, sharing accounts between several teams becomes more and more cumbersome and introduces a variety of issues and challenges related to management and security, and if you are growing very large, resource limitations.

To mitigate a lot of these issues, AWS recommends utilizing a multi-account strategy.

Why should I care about a multi-account strategy?

If you are a growing business or someone who is migrating to AWS and are planning to run multiple teams and solutions on the platform, this is something you need to consider for multiple reasons, here are some of the most important in my opinion.

  • You want to give your teams autonomy and the full ability to manage and own their products.
  • You want an architecture that grows with added teams and products and is not blocked by gated actions, such as the need of requesting resources from central teams.
  • You care about security and want to find a solution that gives developers the freedom to invent and explore without creating unnecessary risk to the company or your solution.

If you feel like any of the points mentioned are something that you want to have in your company and on your AWS journey. You should spend time thinking about how to implement this successfully.

Team autonomy

By having isolated workloads, you can give the teams managing these solutions much wider permissions. In most cases, they will have full permission to do what they see fit. With an isolated account, there is no longer a need of removing access to certain resources, worry about people accessing other teams resources without their knowledge, and similar things which can be an issue when sharing accounts.

Growth

By giving teams full access to their infrastructure and accounts, you no longer need a central team acting as a blocker when people need to get things up and running. This translates into that you can expand your development teams without the need of scaling central teams, as infrastructure will be managed and owned by each team.

When your organization is growing, you can instead add central support teams which provide value by assisting with the AWS platform, and architectural expertise, helping and educating the teams on how to solve their challenges, and not acting as gatekeepers when creating infrastructure.

Security

By limiting teams to their accounts and by that not giving access to all of your services, you already check some security boxes:

  • No access to resources outside your domain
  • No issues related to misconfigured permissions, which translates to access to resources you should not have access to.
  • Account-to-account communication between services acts as a barrier that requires you to per service or group, configure who can access what API/Service, etc.
  • By separating workloads, you can enforce different audit and security requirements on an account basis, rather than having to be aware of all applications running in shared accounts.

On top of this, AWS has several tools aimed toward centralized observability and security, which all support and promote multi-account strategies such as AWS Security Hub, AWS Config and AWS GuardDuty

How should I separate workloads?

This is where it starts to get a bit tricky. I like it if you have a clear domain architecture, and implement those domains in accounts. This creates clear boundaries and enables you to enforce limitations and ownership which reflects the domains. I would also think that this is a good opportunity to enforce a clearer architecture if you don't already have that in place.

As this is something that most likely is a bit unique to each organization, a good approach is to try to name and create accounts that explain what the purpose of the services inside the account is, rather than use team names as an example. A good account name and usage should survive reorganizations and still make sense of what is running inside and what value it provides.

On the other hand, as this task can be difficult to get right, I would say that you rather create more than fewer AWS accounts for services. It is not an issue to have a single account for a single solution.

Working with organizations, some core concepts

When you are working with multiple accounts, AWS has its AWS Organizations service, which consists of several concepts and tools to manage accounts.

Organizational units (OU)

An OU acts as an account group. Each OU can consist of underlying OUs or accounts, making up a tree structure of your organization.

These groups can be used to apply certain policies and actions to all the accounts that exist in those groups. Policies and actions configured to a certain OU will traverse down to underlying OUs, making it possible to top-level configure certain policies, and if you require, have another layer of policies further down the organization. This has its caveats, so make sure to read up on the documentation.

2.jpgImage source

Service control policies (SCPs)

Service control policies (SCPs) are a type of organization policy that you can use to manage permissions in your organization. SCPs offer central control over the maximum available permissions for all accounts in your organization. AWS Docs

I like to view SCPs as "global" IAM boundaries that enable you to set organization-wide policies limiting certain services, regions, and other settings you don't wish to be fiddled with. Policies can be used both for limiting what services you are allowed to use, and also to protect certain created resources. SCPs can be applied to the whole organization as well as just certain OUs.

To be a bit more practical, this is an example of how to limit actions on all regions except us-east-1 using an SCP.

{
  "Version": "2012-10-17",
  "Statement": [    
    {
      "Sid": "OnlyInUsEast",
      "Effect": "Deny",
      "NotAction": [
        "iam:*",
        .. add global services to be excluded here
      ],
      "Resource": "*",
      "Condition": {
        "StringNotEquals": {
          "aws:RequestedRegion": [
            "us-east-1"
          ]
        }
      }
    }
  ]
}

StackSets

StackSets are CloudFormation stacks that can be distributed to accounts and OUs. This is a very powerful way of distributing, roles, resources, and other configurations you want to manage on your accounts in a centralized manner.

StackSets can be added to a certain OU, and automatically deploy that stack when a new account is created. If you like, stacks can even be removed if the account is removed from the OU creating a pattern where all accounts assigned to a certain OU will receive the same resources.

AWS SSO

Since you are going down the route of having a lot of accounts, a way of managing access to these is key. AWS provides their AWS IAM Identity Center (AWS Single Sign-On) to help you with this task.

By using this you can either use the native AWS SSO login or integrate with another SAML or SCIM provider such as Azure AD or auth0.

On top of giving your users a nice way of switching and accessing accounts, each of these integrations enables you to manage users and connect groups and permission sets to certain accounts, this can be automated via infrastructure as code, and take away a lot of manual tasks when it comes to account and permission management.

AWS Control tower

AWS provides a service where they try to make this journey a bit easier called AWS Control Tower. AWS Control Tower will give you a landing zone to manage accounts and sets up some sane defaults and account strategies for you to follow.

Unfortunately, my personal experience with AWS Control Tower is far from good. After spending quite a bit of time with the product in late 2021, I was underwhelmed with how it performed, the level of abstractions, and the hardship of customizing the product to your needs. I have heard that they have spent a lot of time improving the product, and hopefully, Control Tower can be a really solid choice to manage organizations in the future.

So how do I get started?

As stated at the beginning of this blog, I find that it helps that you have a clear domain architecture of your software.

Start by thinking about how it would make sense to separate workloads on functionality and see if you can translate that into well-defined domains. If you are struggling with this, a service-by-service approach can be a good alternative. Do not be afraid of creating accounts.

It is also very good to start by creating the top-level OUs, the larger groups of services and resources you want to separate. A common pattern would for example be to use top-level, Infrastructure, Security, Sandbox, and Workloads. AWS also recommend that you then separate production from non-production workloads, this helps you easily to add guardrails and spread functionality in the different type of accounts. The AWS whitepaper can guide you on how to get started here.

image.pngImage source

When you feel ready, a good first step can be to start creating personal sandbox accounts for the developers, so they can play around with services without having any impact on the production workloads. When you are comfortable with this, go towards the workloads.

After you start getting a structure in place, go down the route of adjusting your SCPs, create StackSets for distributed resources and play around with the centralized services hardening and making your AWS journey more and more complete.


Elva is a serverless-first consulting company that can help you transform or begin your AWS journey for the future