⏩ How to Speed Up Existing Azure Infrastructure Migration to Terraform? Discover our Time-Efficient Solution - Bid Farewell to Manual Configuration! 🚀

Ewa Kowalska Backend Developer @ Exlabs

Terraform supports importing infrastructure into its state out of the box, but it’s up to the user to provide the proper configuration code for each resource that should be managed. Translating an existing Azure infrastructure into Terraform configuration can be challenging and laborious, not only for a beginner in Azure Cloud Services – sometimes it’s over a hundred resources that need to be imported!

Luckily, Azure provides a tool facilitating that effort – aztfexport – which significantly accelerated the import process in our case. Though there were some limitations of the tool to overcome, eventually the migration was completed with success 🎉

Legacy Infrastructure

The following diagram pictures the infrastructure we dealt with. It consists of several high-level Azure resources: 4 Function Apps, an App Service, a Service bus, a Cosmos DB, an Application Insights and a Key Vault.

As it turned out later, that infrastructure is represented by 102 Terraform resources, which is a significant number to process with the help of aztfexport tool, not to speak of approaching it manually.

Aztfexport limitations

 

The aztfexport tool generates configuration code along with a Terraform state file that reflects the prevailing state of the infrastructure so, in theory, it can be managed by Terraform right away. At the same time, it doesn’t aim at the reproducibility of the infrastructure. Reaching that reproducibility required additional adjustments to the outputted code.

The snippet below represents code generated for the Application Insights along with its alert rule, configured to track anomalies. It pictures some of the encountered issues that needed to be resolved. For the sake of the example, some sensitive values were replaced with dummy ones.

resource "azurerm_resource_group" "res-0" {
  location = "northeurope"
  name     = "resource-group-name"
}
resource "azurerm_monitor_smart_detector_alert_rule" "res-219" {
  detector_type       = "FailureAnomaliesDetector"
  frequency           = "PT1M"
  name                = "alert-rule"
  resource_group_name = "resource-group-name"
  scope_resource_ids  = ["id-of-res-220-application-insights-in-plain-text"]
  severity            = "Sev3"
  action_group {
    ids = ["action-group-resource-id"]
  }
  depends_on = [
    azurerm_resource_group.res-0,
  ]
}
resource "azurerm_application_insights" "res-220" {
  application_type    = "web"
  location            = "northeurope"
  name                = "application-insights"
  resource_group_name = "resource-group-name"
  sampling_percentage = 0
  workspace_id        = "some-resource-id"
  depends_on = [
    azurerm_resource_group.res-0,
  ]
}
As you may notice, the dependencies between resources are not sufficient. There are some – both resources refer to the res-0 resource group, using the depends_on clause. However, the dependent code of the alert rule (res-219) precedes the code of the Application Insights that it should refer to (res-220). No reference is present though – neither the depends_on clause pointing to the Application Insights, nor usage of its attribute. In line 11, the alert rule uses the current id of the Application Insights by id as plain text. If attempting to reproduce resources from that configuration, the id of the newly created application insights would change, and the value from line 11 would no longer be valid. As a result, the process would fail.
 
Another inconvenience was that all code was flattened into a single file and made no use of modules. Also, resource naming was hard to maintain – all of the resources were named in convention res-0, res-1 … and so on. In such a form, the configuration does not support scaling of infrastructure and is hard to understand. In the case of renaming or introducing modules, the generated state file becomes unusable – it can not be modified manually and instead needs to be altered with actions on terraform state to move each resource one by one, which is a problem when dealing with a significant number.
 

Adopted approach

  1. Generate a configuration for the desired resource group. It should be outputted into a separate directory and not pushed into a remote repository right away, as the output contains sensitive data and secrets.
  2. Recreate necessary dependencies and remove sensitive data. Pick a configuration of a high-level resource you want to track (like Function App), and then include the configuration of all resources it depends on (e.g. Storage Account, Service Plan). In the meantime, hide exposed sensitive data by using Terraform variables. This step often involves checking the Azure Portal to determine which resource property is being referenced. For example, given a connection string as plain text, decide whether it’s a database’s primary or secondary connection string.
  3. Organise connected resources into modules. For example, group resources of a single Function App into a module. Rename resources Within a separate module, the resource naming could be more straightforward and shorter compared to everything gathered in a single file, where you need to differentiate resources of several Function Apps.
  4. Manually import each resource into Terraform state with the terraform import command. This was the most laborious step. Luckily, aztfexport outputs a mapping of generated resource names and their ids, which speeds up the import process – while you need to figure out the new name, the id is already provided.
That approach was repeated several times for all resource groups containing resources that needed to be managed. After applying it to the example mentioned before, it results in the following form:

resource "azurerm_resource_group" "this" {
  location = "northeurope"
  name     = "resource-group-name"
}
resource "azurerm_application_insights" "this" {
  name                = "application-insights"
  resource_group_name = azurerm_resource_group.this.name
  location            = azurerm_resource_group.this.location
  application_type    = "web"
  sampling_percentage = 0
  workspace_id        = "some-resource-id"
}
resource "azurerm_monitor_smart_detector_alert_rule" "this" {
  name                = "alert-rule"
  resource_group_name = azurerm_resource_group.this.name
  severity            = "Sev3"
  scope_resource_ids  = [azurerm_application_insights.this.id]
  frequency           = "PT1M"
  detector_type       = "FailureAnomaliesDetector"
  action_group {
    ids = ["action-group-resource-id"]
  }
}
 
Now the resources are kept in the correct hierarchy and the depends_on clauses are replaced by references to the attributes (compare scope_resource_ids property of the alert rule in both examples).
 

Verification method

 
To verify if the adjusted configuration matches exactly the infrastructure, we agreed that the plan outputted by Terraform must match the existing infrastructure and indicate no changes to apply.
 

Minor failures

 
In general, the verification method was a reasonable approach.
 
There is one exception when it failed – it turned out that aztfexport produces configuration for default resources that are created automatically by the Azure provider, as a part of creating higher-level resources. We stumbled upon such a case for a custom hostname binding resource of a Function App, that represented the default hostname. It came out only when reproducing infrastructure from the configuration. Terraform apply failed because of an attempt to duplicate an already created resource, however, terraform plan did not indicate that problem.
To avoid such failures, the final configuration should be tested by recreating the infrastructure from scratch, for example using a separate subscription.
 

🎉 Success

 
In the end, the import process was successful! The Terraform plan was applied with no changes and further modifications of the infrastructure are no longer performed manually.