Azure Resource Group structure — Measure twice, cut once

Image for post
Image for post

Everyone is familiar with the adage “Measure twice, cut once”. What you may not realise is that this isn’t only good advice for putting up shelves, it also applies when you’re building infrastructure in Azure. The point here is that it can pay to think twice about your Azure subscription’s structure and where things fit before jumping straight into the juicy, delicious fun that is creating resources.

Recently I found myself tasked with creating the Azure infrastructure that would be used to host an application in its entirety, mainly:

  • A frontend that consisted of some static content and a CDN
  • An API running in Azure Kubernetes Service with a separate database
  • All the usual supporting and underlying pieces to glue the above together

This being my first major foray into Azure, I hit the books to try and scoop up as much wisdom as possible before I started. Right off the bat I was pretty struck by the importance of resource group structure and naming inside a subscription and how much the decisions I made in these areas would influence the maintainability and quality of my final product. Azure gives you three main mechanisms for resource organisation:

  • The resource names themselves
  • “Resource Groups” that act as containers to group resources together
  • “Resource tags” that represent a kind of per-resource user-defined metadata

Here I’ll only be discussing resource naming and resource grouping as these are the most difficult to correct at a later stage post-implementation.

There’s a great reference already in the documentation that relates to resource naming, which you can find here. Great! So that’s the naming sorted, this cloud stuff is easy! So what do the official docs reveal to us about resource group structure? Errrrr, not that much it turns out… I sort of get it, Resource Groups are an incredibly individual thing and the way they are used is going to differ greatly from one organisation to the next. That isn’t helpful in a practical sense though, so after a trawl through the Internet and a bit of a ponder, here are some possibilities for how they can be used to organise resources:

Usually, if you’re going to go with this approach, you’ll be hosting multiple applications inside the same Azure subscription. You may host two applications, “customer-portal” and “employee-portal”. In that case, taking into account the advice from Microsoft (linked above) regarding naming conventions, you may end up with two resource groups, something like:

rg-nonprod-customer-001
rg-nonprod-employee-001

Inside these groups, you’d have all of the components that are specific to that particular application. This didn’t really work for my use-case and besides that, it all appears to fall down quite quickly if any elements are shared between applications, like a database server or networking components.

This allows you to be very granular, providing a resource group for each kind of resource you’re hosting. This can be useful in some siloed workplaces where it can aid in managing permissions for different teams in Azure.

Providing access to all the resources that “Bob” from “Networks” would need is easy, you can set an IAM role assignment on rg-nonprod-net-001 and forget about it. Again, I wasn’t really sold on this method, with sprawl being an obvious issue. It doesn’t take much imagination to see how this sort of structure could get out of hand. Over time, adding more resource varieties to your Infrastructure is going to really inflate your resource group numbers with no clear advantages.

This is similar to option 1 but allows for slightly more wiggle room. It also really only works if you have separate subscriptions for each application that you host, or maybe you only intend on hosting a single application. In this case, you could host all resources responsible for delivering services for the frontend in one resource group and all of the backend components in another. For components not related specifically to providing application-related services, these would split into other “supporting” groups. For example:

rg-nonprod-frontend-001
rg-nonprod-backend-001
rg-nonprod-mgmt-001

In this case, rg-nonprod-frontend-001 would contain your Web App and CDN, rg-nonprod-backend-001 your API’s and database and rg-nonprod-mgmt-001 would contain a bastion host and a VNet.

This involves grouping together resources or components that will have the same lifecycle. I’ll admit this is rather close to grouping resources by purpose but does allow you to be a little more abstract when needed. Typically, the hardest part here is defining what the lifecycle of each resource is. To help to make this determination, it’s often easiest to consider the usefulness of a single resource on its own. As always, this is highly subjective to how your applications are built but as a simple example:

An SQL database intended to hold customer data isn’t massively useful on its own without the API that allows it to be consumed, which in turn isn’t useful without the SPA hosted in a Storage Account that allows users to interact with it. We’d consider that these resources share the same lifecycle and should generally always be created or destroyed together. On the other hand, we can look at a resource like an Azure Container Registry, that is useful on its own without the support of any other resource. This would be a candidate to be kept separate from the aforementioned DB, Storage Account and API.

This could be a combination of any of the schemes already proposed above. The key concept here is that it involves a softening of the rules when it comes to placing resources. It allows you greater flexibility and goes part of the way towards fixing the issue presented in the first approach relating to shared or orphaned resources. As an example, you may have two resource groups for the two different applications you host, alongside a third dedicated network focussed resource group that could be used for homing shared networking components, like the underlying VNet itself or a firewall service. Sticking to the original convention of scoping things to their environment, this may simply look like:

rg-nonprod-customer-001
rg-nonprod-employee-001
rg-nonprod-net-001

This gives us ways to handle most resource access scenarios too, providing “Bob” from “Networks” oversight over the network resource group is easy, as is allowing only the relevant application teams access to their resources.

So, what did I choose? In the end, I went for a hybrid approach that combined approaches 1 and 3 from the above list… and how did it work out? Well… pretty badly actually. The initial expectation was that we’d only be running and supporting a single application in our subscription. However, as the project evolved, we had to host a second and a third. At this point, having generic groups split by purpose (frontend, backend etc…) wasn’t that clear or helpful, as we were in fact hosting multiple backends and frontends for different applications.

Even before this, I think the number of resource groups we’d ended up creating using this approach was becoming problematic. If I were to do this again, I think I’d be tempted again by a hybrid scheme but would maintain more generic groups for application components as highlighted in the “By Lifecycle” or “By Application” sections above. Along with these, I’d still split out what could be considered “Core” Infrastructure (Networking and such) outside of these application groups and into their own. This will help with permission assignment and really ringfence the application component ownership to the appropriate teams. Oh well, maybe next time!

If you’ve made mistakes in your resource organisation, fear not! Azure does provide you with the functionality to migrate resources between Resource Groups (This even works across subscriptions!) A word of warning on this though, particularly if you’re managing your Infrastructure with IaC, as I was with Terraform. In this case, the resource group makes up part of the Resource ID that Terraform uses to manage the resource in the State File. A fair amount of State File surgery would be required for a large scale Resource Group migration. This remains on my radar as a piece of work for the future but there’s no denying it would have been easier if I’d have got this right the first time.

System Engineer by trade, enthusiastic about well made things.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store