在Azure中启用VM诊断是非常痛苦的。我使用ARM模板、Azure PowerShell开发工具包和Azure CLI使其正常工作。但是我已经尝试了几天,使用Terraform和azurerm_virtual_machine_extension资源为Windows和Linux启用VM诊断。还是不能工作,呃!
这是我到目前为止所做的(我对这篇文章做了一些调整以简化它,所以希望我的手动编辑没有破坏任何东西):
resource "azurerm_virtual_machine_extension" "vm-linux" {
count = "${local.is_windows_vm == "false" ? 1 : 0}"
depends_on = ["azurerm_virtual_machine_data_disk_attachment.vm"]
name = "LinuxDiagnostic"
location = "${var.location}"
resource_group_name = "${var.resource_group_name}"
virtual_machine_name = "${local.vm_name}"
publisher = "Microsoft.Azure.Diagnostics"
type = "LinuxDiagnostic"
type_handler_version = "3.0"
auto_upgrade_minor_version = "true"
# The JSON file referenced below was created by running "az vm diagnostics get-default-config", and adding/verifying the "__DIAGNOSTIC_STORAGE_ACCOUNT__" and "__VM_RESOURCE_ID__" placeholders.
settings = <<SETTINGS
{
"ladCfg": "${base64encode(replace(replace(file("${path.module}/.diag-settings/linux_diag_config.json"), "__DIAGNOSTIC_STORAGE_ACCOUNT__", "${module.vm_storage_account.name}"), "__VM_RESOURCE_ID__", "${local.metricsresourceid}"))}",
"storageAccount": "${module.vm_storage_account.name}"
}
SETTINGS
# SAS token below: Do not include the leading question mark, as per https://docs.microsoft.com/en-us/azure/virtual-machines/extensions/diagnostics-linux.
protected_settings = <<SETTINGS
{
"storageAccountName": "${module.vm_storage_account.name}",
"storageAccountSasToken": "${replace(data.azurerm_storage_account_sas.current.sas, "/^\\?/", "")}",
"storageAccountEndPoint": "https://core.windows.net/"
}
SETTINGS
}
resource "azurerm_virtual_machine_extension" "vm-win" {
count = "${local.is_windows_vm == "true" ? 1 : 0}"
depends_on = ["azurerm_virtual_machine_data_disk_attachment.vm"]
name = "Microsoft.Insights.VMDiagnosticsSettings"
location = "${var.location}"
resource_group_name = "${var.resource_group_name}"
virtual_machine_name = "${local.vm_name}"
publisher = "Microsoft.Azure.Diagnostics"
type = "IaaSDiagnostics"
type_handler_version = "1.9"
auto_upgrade_minor_version = "true"
# The JSON file referenced below was created by running "az vm diagnostics get-default-config --is-windows-os", and adding/verifying the "__DIAGNOSTIC_STORAGE_ACCOUNT__" and "__VM_RESOURCE_ID__" placeholders.
settings = <<SETTINGS
{
"wadCfg": "${base64encode(replace(replace(file("${path.module}/.diag-settings/windows_diag_config.json"), "__DIAGNOSTIC_STORAGE_ACCOUNT__", "${module.vm_storage_account.name}"), "__VM_RESOURCE_ID__", "${local.metricsresourceid}"))}",
"storageAccount": "${module.vm_storage_account.name}"
}
SETTINGS
protected_settings = <<SETTINGS
{
"storageAccountName": "${module.vm_storage_account.name}",
"storageAccountSasToken": "${data.azurerm_storage_account_sas.current.sas}",
"storageAccountEndPoint": "https://core.windows.net/"
}
SETTINGS
}
请注意,对于Linux和Windows,我根据注释从代码库中的JSON文件加载诊断详细信息。这些是Azure提供的默认配置,因此它们应该是有效的。
当我部署这些时,Linux VM扩展部署成功,但在Azure门户中,扩展显示“在生成的mdsd配置中检测到问题”。如果我查看VM的“诊断设置”,它会显示“遇到错误: TypeError: Object不支持属性或方法'diagnosticMonitorConfiguration'”。Windows VM扩展完全无法部署,因为它“无法读取配置”。如果我在门户中查看扩展,它会显示以下错误:
"code": "ComponentStatus//failed/-3",
"level": "Error",
"displayStatus": "Provisioning failed",
"message": "Error starting the diagnostics extension"
如果我查看"Diagnostics settings“窗格,它只会挂起一个永不结束的" .”动画。
但是,如果我查看两个VM扩展的"terraform apply“输出,就会发现解码后的设置看起来完全符合预期,并且与配置文件中的占位符进行了正确的替换。
关于如何让它工作,有什么建议吗?
提前感谢!
发布于 2019-12-20 01:16:52
到目前为止,我已经让Windows Diagnostics在我们的环境中100%地工作。似乎AzureRM应用程序接口对发送的配置非常挑剔。我们一直使用powershell来启用它,powershell中使用的相同xmlCfg在terraform上不起作用。到目前为止,这对我们是有效的:( settings/protected_settings名称区分大小写!也就是xmlCfg可以工作,而xmlcfg不行)
main.cf
#########################################################
# VM Extensions - Windows In-Guest Monitoring/Diagnostics
#########################################################
resource "azurerm_virtual_machine_extension" "InGuestDiagnostics" {
name = var.compute["InGuestDiagnostics"]["name"]
location = azurerm_resource_group.VMResourceGroup.location
resource_group_name = azurerm_resource_group.VMResourceGroup.name
virtual_machine_name = azurerm_virtual_machine.Compute.name
publisher = var.compute["InGuestDiagnostics"]["publisher"]
type = var.compute["InGuestDiagnostics"]["type"]
type_handler_version = var.compute["InGuestDiagnostics"]["type_handler_version"]
auto_upgrade_minor_version = var.compute["InGuestDiagnostics"]["auto_upgrade_minor_version"]
settings = <<SETTINGS
{
"xmlCfg": "${base64encode(templatefile("${path.module}/templates/wadcfgxml.tmpl", { vmid = azurerm_virtual_machine.Compute.id }))}",
"storageAccount": "${data.azurerm_storage_account.InGuestDiagStorageAccount.name}"
}
SETTINGS
protected_settings = <<PROTECTEDSETTINGS
{
"storageAccountName": "${data.azurerm_storage_account.InGuestDiagStorageAccount.name}",
"storageAccountKey": "${data.azurerm_storage_account.InGuestDiagStorageAccount.primary_access_key}",
"storageAccountEndPoint": "https://core.windows.net"
}
PROTECTEDSETTINGS
}
tfvars
InGuestDiagnostics = {
name = "WindowsDiagnostics"
publisher = "Microsoft.Azure.Diagnostics"
type = "IaaSDiagnostics"
type_handler_version = "1.16"
auto_upgrade_minor_version = "true"
}
wadcfgxml.tmpl (为简洁起见,我去掉了一些性能计数器)
<WadCfg>
<DiagnosticMonitorConfiguration overallQuotaInMB="5120">
<DiagnosticInfrastructureLogs scheduledTransferLogLevelFilter="Error"/>
<Metrics resourceId="${vmid}">
<MetricAggregation scheduledTransferPeriod="PT1H"/>
<MetricAggregation scheduledTransferPeriod="PT1M"/>
</Metrics>
<PerformanceCounters scheduledTransferPeriod="PT1M">
<PerformanceCounterConfiguration counterSpecifier="\Processor Information(_Total)\% Processor Time" sampleRate="PT60S" unit="Percent" />
<PerformanceCounterConfiguration counterSpecifier="\Processor Information(_Total)\% Privileged Time" sampleRate="PT60S" unit="Percent" />
<PerformanceCounterConfiguration counterSpecifier="\Processor Information(_Total)\% User Time" sampleRate="PT60S" unit="Percent" />
<PerformanceCounterConfiguration counterSpecifier="\Processor Information(_Total)\Processor Frequency" sampleRate="PT60S" unit="Count" />
<PerformanceCounterConfiguration counterSpecifier="\System\Processes" sampleRate="PT60S" unit="Count" />
<PerformanceCounterConfiguration counterSpecifier="\SQLServer:SQL Statistics\SQL Re-Compilations/sec" sampleRate="PT60S" unit="Count" />
</PerformanceCounters>
<WindowsEventLog scheduledTransferPeriod="PT1M">
<DataSource name="Application!*[System[(Level = 1 or Level = 2)]]"/>
<DataSource name="Security!*[System[(Level = 1 or Level = 2)]"/>
<DataSource name="System!*[System[(Level = 1 or Level = 2)]]"/>
</WindowsEventLog>
</DiagnosticMonitorConfiguration>
</WadCfg>
我终于让Linux In-Guest Diagnostics工作了(LAD)。一些值得注意的事实是,与windows诊断不同,设置需要以json传输,没有base64编码。此外,LAD似乎需要一个带有存储帐户的SAS令牌。关于AzureRM应用程序接口的常规警告是关于配置的挑剔,并且区分大小写的设置仍然存在。到目前为止,以下是对我有效的方法..
# Locals
locals {
env = var.workspace[terraform.workspace]
# Use a set/static time to avoid TF from recreating the SAS token every apply, which would then cause it to
# modify/recreate anything that uses it. Not ideal, but the token is for a VERY long time, so it will do for now
sas_begintime = "2019-11-22T00:00:00Z"
sas_endtime = timeadd(local.sas_begintime, "873600h")
}
#########################################################
# VM Extensions - In-Guest Diagnostics
#########################################################
# We need a SAS token for the In-Guest Metrics
data "azurerm_storage_account_sas" "inguestdiagnostics" {
count = (contains(keys(local.env), "InGuestDiagnostics") ? 1 : 0)
connection_string = data.azurerm_storage_account.BootDiagStorageAccount.primary_connection_string
https_only = true
resource_types {
service = true
container = true
object = true
}
services {
blob = true
queue = true
table = true
file = true
}
start = local.sas_begintime
expiry = local.sas_endtime
permissions {
read = true
write = true
delete = true
list = true
add = true
create = true
update = true
process = true
}
}
resource "azurerm_virtual_machine_extension" "inguestdiagnostics" {
for_each = contains(keys(local.env), "InGuestDiagnostics") ? local.env["InGuestDiagnostics"] : {}
depends_on = [azurerm_virtual_machine_extension.dependencyagent]
name = each.value["name"]
location = azurerm_resource_group.resourcegroup.location
resource_group_name = azurerm_resource_group.resourcegroup.name
virtual_machine_name = azurerm_virtual_machine.compute["${each.key}"].name
publisher = each.value["publisher"]
type = each.value["type"]
type_handler_version = each.value["type_handler_version"]
auto_upgrade_minor_version = each.value["auto_upgrade_minor_version"]
settings = templatefile("${path.module}/templates/ladcfg2json.tmpl", { vmid = azurerm_virtual_machine.compute["${each.key}"].id, storageAccountName = data.azurerm_storage_account.BootDiagStorageAccount.name })
protected_settings = <<PROTECTEDSETTINGS
{
"storageAccountName": "${data.azurerm_storage_account.BootDiagStorageAccount.name}",
"storageAccountSasToken": "${replace(data.azurerm_storage_account_sas.inguestdiagnostics.0.sas, "/^\\?/", "")}"
}
PROTECTEDSETTINGS
}
# These variations didn't work for me ..
# "ladCfg": "${templatefile("${path.module}/templates/ladcfgjson.tmpl", { vmid = azurerm_virtual_machine.compute["${each.key}"].id, storageAccountName = data.azurerm_storage_account.BootDiagStorageAccount.name })}",
# - This one get's you Error: "settings" contains an invalid JSON: invalid character '\n' in string literal or Error: "settings" contains an invalid JSON: invalid character 'S' after object key:value pair
# "ladCfg": "${replace(data.local_file.ladcfgjson["${each.key}"].content, "/\\n/", "")}",
# - This one get's you Error: "settings" contains an invalid JSON: invalid character 'S' after object key:value pair
tfvars
workspace = {
TerraformWorkSpaceName = {
compute = {
# Add additional key/objects for additional Compute
computer01 = {
name = "computer01"
}
}
InGuestDiagnostics = {
# Add additional key/objects for each Compute you want to install the InGuestDiagnostics on
computer01 = {
name = "LinuxDiagnostic"
publisher = "Microsoft.Azure.Diagnostics"
type = "LinuxDiagnostic"
type_handler_version = "3.0"
auto_upgrade_minor_version = "true"
}
}
}
}
如果不用jsonencode包装整个东西,我就不能让模板文件工作。ladcfg2json.tmpl
${jsonencode({
"StorageAccount": "${storageAccountName}",
"ladCfg": {
"sampleRateInSeconds": 15,
"diagnosticMonitorConfiguration": {
"metrics": {
"metricAggregation": [
{
"scheduledTransferPeriod": "PT1M"
},
{
"scheduledTransferPeriod": "PT1H"
}
],
"resourceId": "${vmid}"
},
"eventVolume": "Medium",
"performanceCounters": {
"sinks": "",
"performanceCounterConfiguration": [
{
"counterSpecifier": "/builtin/processor/percentiowaittime",
"condition": "IsAggregate=TRUE",
"sampleRate": "PT15S",
"annotation": [
{
"locale": "en-us",
"displayName": "CPU IO wait time"
}
],
"unit": "Percent",
"class": "processor",
"counter": "percentiowaittime",
"type": "builtin"
}
]
},
"syslogEvents": {
"syslogEventConfiguration": {
"LOG_LOCAL0": "LOG_DEBUG"
}
}
}
}
})}
我希望这会有帮助..。
发布于 2020-02-20 19:32:35
因为这个问题是一年多前提出的,所以对于像我这样第一次尝试这个问题的人来说更多。我们只使用linux vms,所以这个建议适用于:
这是我的配置的编辑版本(用你自己的设置替换所有大写的所有位):
resource "azurerm_virtual_machine_extension" "vm_linux_diagnostics" {
count = "1"
name = "NAME"
resource_group_name = "YOUR RESOURCE GROUP NAME"
location = "YOUR LOCATION"
virtual_machine_name = "TARGET MACHINE NAME"
publisher = "Microsoft.Azure.Diagnostics"
type = "LinuxDiagnostic"
type_handler_version = "3.0"
auto_upgrade_minor_version = "true"
settings = <<SETTINGS
{
"StorageAccount": "tfnpfsnhsuk",
"ladCfg": {
"sampleRateInSeconds": 15,
"diagnosticMonitorConfiguration": {
"metrics": {
"metricAggregation": [
{
"scheduledTransferPeriod": "PT1M"
},
{
"scheduledTransferPeriod": "PT1H"
}
],
"resourceId": "VM ID"
},
"eventVolume": "Medium",
"performanceCounters": {
"sinks": "",
.... MORE METRICS - THAT YOU REQUIRE
}
}
}
SETTINGS
protected_settings = <<PROTECTED_SETTINGS
{
"storageAccountName": "YOUR_ACCOUNT_NAME",
"storageAccountSasToken": "YOUR SAS TOKEN"
}
PROTECTED_SETTINGS
tags = "YOUR TAG"
}
发布于 2020-06-11 22:03:27
刚刚在一个类似的问题上得到了解决:
Trying to add LinuxDiagnostic Azure VM Extension through terraform and getting errors
这包括获取SAS令牌和读取json文件。
https://stackoverflow.com/questions/53558919
复制相似问题