Tagging EC2 EBS Volumes in Auto Scaling Groups

Tagging becomes a huge part of your life when in the public cloud. Metadata is thrown around like hotcakes, and why not. At cloudstep.io we preach the ways of the DevOps gods and especially infrastructure as code for repeatable and standardised deployments. This way everything is uniform and everything gets a TAG!

I ran into an issue recently where I would build an EC2 instance and capture the operating system into an AMI as part of a CloudFormation stack. This AMI would then be used as part of a launch configuration and subsequent auto scaling group. The original EC2 instance had every tag needed across all parts that make up the virtual machine including:

  • EBS root volume
  • EBS data volumes
  • Elastic Network Interfaces (ENI)
  • EC2 Instance itself

When deploying my auto scaling group all the user level tags I’d applied had been removed from the volumes and ENI. This caused a few issues:

  1. EBS volumes couldn’t be tagged for billing.
  2. EBS volumes couldn’t be snapped based on tag level policies in Lifecycle Manager.
  3. Objects didn’t have a ‘Name’ tag which made it hard in the console to understand which virtual machine instance the object belonged too.

There are two methods I derived to add my tags back that I’ll share with you. The tags needed to be added upon launch of the instance when the auto scaling group added a server. The methods I used were:

  1. The auto scaling group has a Launch Configuration where the ‘User data’ field runs a script block at startup.
  2. Initiate a Lambda whenever CloudTrail logged an API reference of a launch event of an instance using CloudWatch.

Tagging with the User Data property and PowerShell

User data is simply:

When you launch an instance in Amazon EC2, you have the option of passing user data to the instance that can be used to perform common automated configuration tasks and even run scripts after the instance starts. You can pass two types of user data to Amazon EC2: shell scripts and cloud-init directives.

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html

[code language=”powershell”] Try { # Use the metadata service to discover which instance the script is running on $InstanceId = (Invoke-WebRequest ‘169.254.169.254/latest/meta-data/instance-id’).Content $AvailabilityZone = (Invoke-WebRequest ‘169.254.169.254/latest/meta-data/placement/availability-zone’).Content $Region = $AvailabilityZone.Substring(0, $AvailabilityZone.Length -1) $mac = (Invoke-WebRequest ‘169.254.169.254/latest/meta-data/network/interfaces/macs/’).content $URL = “169.254.169.254/latest/meta-data/network/interfaces/macs/”+$mac+”/interface-id” $eni = (Invoke-WebRequest $URL).content # Get the list of volumes attached to this instance $BlockDeviceMappings = (Get-EC2Instance -Region $Region -Instance $InstanceId).Instances.BlockDeviceMappings $Tags = (Get-EC2Instance -Region $Region -Instance $InstanceId).Instances.tag } Catch{Write-Host “Could not access the AWS API, are your credentials loaded?” -ForegroundColor Yellow} $BlockDeviceMappings | ForEach-Object -Process { $volumeid = $_.ebs.volumeid # Retrieve current volume id for this BDM in the current instance # Set the current volume’s tags $Tags | ForEach-Object -Process { If($_.Key -notlike “aws:*”){ New-EC2Tag -Resources $volumeid -Tags @{ Key = $_.Key ; Value = $_.Value } # Add tag to volume } } } # Set the current nics tag $Tags | ForEach-Object -Process { If($_.Key -notlike “aws:*”){ New-EC2Tag -Resources $eni -Tags @{ Key = $_.Key ; Value = $_.Value } # Add tag to eni } } [/code]

This script block is great and works a treat with newly created instances from an Amazon Marketplace AMI’s e.g. a vanilla Windows Server 2019 template. The launch configuration would apply the script as a part of the cfn-init function at startup. Unfortunately I’d already used the cfn-init function as part of the original image customisation and capture, the cfn-init would not re-run and didn’t execute this script block. So back to the drawing board in my scenario.

Tagging with CloudWatch and Lambda Function

The second solution was to create a Lambda function and trigger it using an Amazon CloudWatch Events rule. The Instance ID is parsed from the CloudWatch event in JSON to the Lambda function.

Here is the Lambda function that is written in python2.7 and leverages the boto3 and JSON modules.

[code language=”python”] from __future__ import print_function import json import boto3 def lambda_handler(event, context): print(‘Received event: ‘ + json.dumps(event, indent=2)) ids = [] try: ec2 = boto3.resource(‘ec2’) items = event[‘detail’][‘responseElements’][‘instancesSet’][‘items’] for item in items: ids.append(item[‘instanceId’]) base = ec2.instances.filter(InstanceIds=ids) for instance in base: ec2tags = instance.tags tags = [n for n in ec2tags if not n[“Key”].startswith(“aws:”) ] print(‘ original tags:’, ec2tags) print(‘ applying tags:’, tags) for volume in instance.volumes.all(): print(‘ volume:’, volume) if volume.tags != ec2tags: volume.create_tags(DryRun=False, Tags=tags) for eni in instance.network_interfaces: print(‘ eni:’, eni) eni.create_tags(DryRun=False, Tags=tags) return True except Exception as e: print(‘Something went wrong: ‘ + str(e)) return False [/code]

The OVF package is invalid and cannot be deployed – In the trenches with the AWS Discovery Connector

I was working with a customer recently who had trouble deploying the AWS Discovery Connector to their VMware environment. AWS offer this appliance as an OVA file. For those who aren’t aware, OVA (Open Virtualisation Archive) is an open standard used to describe virtual infrastructure to be deployed on a hypervisor of your choice. Typically speaking, these files are hashed with an algorithm to ensure that the contents of the files are not changed or modified in transit (prior to being deployed within your own environment.)

At the time of writing, AWS currently offer this appliance hashed in two flavours… MD5 or SHA256. All sounds quite reasonable right?

  • Download the OVA with a hash of your choice
  • Deploy to VMware.
  • Profit???

Wrong! I was surprised to receive an email from my customer stating that their deployment had failed (see below.)

There’s a small clue here…

The Solution

My immediate response was to fire up google and do some reading. Surely someone had blogged about this before? After all…. I am no VMware expert. I finally arrived at the VMware knowledge base, where I began sifting through supported ciphers for ESX/ESXi and vCenter. The findings were quite interesting, you can find them summarised below:

  • If your VMware cluster consists of hosts which run ESX/ESXi 4.1 or less (hopefully no one) – MD5 is supported
  • If your VMware cluster consists of hosts which run ESX/ESXi 5.x or 6.0 – SHA1 is supported
  • If your VMware cluster consists of hosts which run ESX/ESXi 6.5 or greater – SHA256 is supported

In the particular environment I was working in, the customer had multiple environments with a mix of 5.5 and 6.0 physical hosts. As I was short on time, I had no real way of telling if the MD5 hashed image would deploy on a newer environment. I also don’t have a VMware development environment to test this approach on (by design.)

After a few more minutes of googling, I was rewarded with another VMware knowledge base article. VMware provide a small utility called “OVFTool.” This applications sole purpose in life is to convert OVA files (you guessed it) ensuring that they are hashed with supported cipher of your choice. In my scenario, the file was re-written using the supported SHA1 cipher. All of this was triggered from a windows command line by executing:

ovftool.exe –shaAlgorithm=SHA1 <source image.ova> <destination image.ova>

After this I was able to successfully deploy the AWS Discovery Connector OVA as expected using my freshly minted image.

You can grab a copy of the tool – here

You can read more about VMware supported ciphers – here

Finally, I should call out that this solution is not specific to deploying the AWS Discovery Connector. Consider this approach if you are experiencing similar symptoms deploying another OVA based appliance in your VMware environment.

AWS ECS CloudFormation Fails – Unable to assume the service linked role.

I ran into an interesting issue when building a new ECS Cluster using CloudFormation. The CloudFormation stack would fail on Type: AWS::ECS::Service with error:

Unable to assume the service linked role. Please verify that the ECS service linked role exists. (Service: AmazonECS; Status Code: 400; Error Code: InvalidParameterException; Request ID: beadf3d5-3406-11e9-828d-b16cd52796ef)

Okay google, what’s this service linked role thingy?

A service-linked role is a unique type of IAM role that is linked directly to Amazon ECS. Service-linked roles are predefined by Amazon ECS and include all the permissions that the service requires to call other AWS services on your behalf.

https://docs.aws.amazon.com/AmazonECS/latest/developerguide/using-service-linked-roles.html

The first few times I ran my stack I assumed that this was for an IAM role that I was needing to assign to the AWS::ECS::Service to perform tasks much like a IamInstanceProfile of Type: AWS::EC2::Instance. When reviewing the available properties for Type: AWS::ECS::Service there was a Role definition:

  • Cluster
  • DeploymentConfiguration
  • DesiredCount
  • HealthCheckGracePeriodSeconds
  • LaunchType
  • LoadBalancers
  • NetworkConfiguration
  • PlacementConstraints
  • PlacementStrategies
  • PlatformVersion
  • Role
  • SchedulingStrategy
  • ServiceName
  • ServiceRegistries
  • TaskDefinition
Role - The name or ARN of an AWS Identity and Access Management (IAM) role that allows your Amazon ECS container agent to make calls to your load balancer.

I had some well defined Type: AWS::IAM::Role objects in my YAML for ECS execution and task roles but none of them were helping me with service linked account issue no matter how far I took the IAM policies.

Solution

To cut a long story and much googling short, the issue was nothing to do with my IAM policies but rather that the very first ECS cluster you create in the console using the getting started wizard creates the linked account in the backend. If your unlike me and read the full article about service linked roles you would have read:

when you create a new cluster (for example, with the Amazon ECS first run, the cluster creation wizard, or the AWS CLI or SDKs), or create or update a service in the AWS Management Console, Amazon ECS creates the service-linked role for you, if it does not already exist.

No mention in the above statement about CloudFormation. As per usual I jumped straight into a CloudFormation template without a test drive of the service and this time my attempt at being clever had given me a few moments of madness.

The easiest fix is to open up AWS CLI and run the following against your account once, then jump back into CloudFormation for YAML fun:

aws iam create-service-linked-role --aws-service-name ecs.amazonaws.com  

Resulting output:

{
    "Role": {
        "AssumeRolePolicyDocument": {
            "Version": "2012-10-17", 
            "Statement": [
                {
                    "Action": [
                        "sts:AssumeRole"
                    ], 
                    "Effect": "Allow", 
                    "Principal": {
                        "Service": [
                            "ecs.amazonaws.com"
                        ]
                    }
                }
            ]
        }, 
        "RoleId": "AROAIXGB2WBYGCXSPXY4O", 
        "CreateDate": "2019-02-19T05:55:58Z", 
        "RoleName": "AWSServiceRoleForECS", 
        "Path": "/aws-service-role/ecs.amazonaws.com/", 
        "Arn": "arn:aws:iam::112233445566:role/aws-service-role/ecs.amazonaws.com/AWSServiceRoleForECS"
    }
}

Job done. It all seemed so simple in retrospect.

IPv6 – slowly but surely

I first blogged about IPv6 and the reasons for its slow adoption way back in 2014. A lot can change in the world of ICT over the course of five years, but interestingly the reasons for slow adoption I believe have remained somewhat constant. I’ve updated my post to include some new thoughts.

The first time I recall there being a lot of hype about IPv6 was way back in the early 2000’s, ever since then the topic seems to get attention every once in a while and then disappears into insignificance alongside more exciting IT news.

The problem with IPv4 is that there are only about 3.7 billion public IPv4 addresses. Whilst this may initially sound like a lot, take a moment to think about how many devices you currently have that connect to the Internet. Globally we have already experienced a rapid uptake of Internet connected smart-phones and the recent hype surrounding the Internet of Things (IoT) promises to connect an even larger array of devices to the Internet. With a global population (according to http://www.worldometers.info/world-population/) of approx. 7.7 billion people we just don’t have enough to go around.

Back in the early 2000’s there was limited support in the form of hardware and software that supported IPv6. So now that we have wide spread hardware and software IPv6 support, why is it that we haven’t all switched?

Like most things in the world it’s often determined by the capacity to monetise an event. Surprisingly not all carriers / ISP’s are on board and some are reluctant to spend money to drive the switch. APNIC have stats (https://stats.labs.apnic.net/ipv6/) that suggest Australia is currently sitting at 14% uptake, lagging behind other developed countries.

Network address translation (NAT) and Classless Inter-Domain Routing (CIDR), have made it much easier to live with IPv4. NAT used on firewalls and routers lets many nodes in a network sit behind a single public IP address. CIDR, sometimes referred to as supernetting is a way to allocate and specify the Internet addresses used in inter-domain routing in a much more flexible manner than with the original system of Internet Protocol (IP) address classes. As a result, the number of available Internet addresses has been greatly increased and has allowed service providers to conserve addresses by divvying up pieces of a full range of IP addresses to multiple customers.

Unsurprisingly enterprise adoption in Australia is slow, perceived risk comes into play. It is plausible that many companies may be of the view that the introduction of IPv6 is somewhat unnecessary and potentially risky in terms of effort required to implement and loss of productivity during implementation. Most corporations are simply not feeling any pain with IPv4 so it’s not on their short term radar as being of any level of criticality to their business. When considering IPv6 implementation from a business perspective, the successful adoption of new technologies are typically accompanied by some form of reward or competitive advantage associated with early adoption. The potential for financial reward is often what drives significant change.

To IPv6’s detriment from the layperson’s perspective it has little to distinguish itself from IPv4 in terms of services and service costs. Many of IPv4’s short comings have been addressed. Financial incentives to make the decision to commence widespread deployment just don’t exist.

We have all heard the doom and gloom stories associated with the impending end of IPv4. Surely this should be reason enough for accelerated implementation of IPv6? Why isn’t everyone rushing to implement IPv6 and mitigate future risk? The situation where exhaustion of IPv4 addresses would cause rapid escalation in costs to consumers hasn’t really happened yet and has failed to be a significant factor to encourage further deployment of IPv6 in the Internet.

Another factor to consider is backward compatibility. IPv4 hosts are unable to address IP packets directly to an IPv6 host and vice-versa. So this means that it is not realistic to just switch over a network from IPv4 to IPv6. When implementing IPv6 a significant period of dual stack IPv4 and IPv6 coexistence needs to take place. This is where IPv6 is turned on and run in parallel with the existing IPv4 network. Again from an Enterprise perspective, I suspect this just sounds like two networks instead of one and double administrative overhead for most IT decision makers.

Networks need to provide continued support for IPv4 for as long as there are significant levels of IPv4 only networks and services still deployed. Many IT decision makers would rather spend their budget elsewhere and ignore the issue for another year.

Only once the majority of the Internet supports a dual stack environment can networks start to turn off their continued support for IPv4. Therefore, while there is no particular competitive advantage to be gained by early adoption of IPv6, the collective internet wide decommissioning of IPv4 is likely to be determined by the late adopters.

So what should I do?

It’s important to understand where you are now and arm yourself with enough information to plan accordingly.

  • Check if your ISP is currently supporting IPv6 by visiting a website like http://testmyipv6.com/. There is a dual stack test which will let you know if you are using IPv4 alongside IPv6.
  • Understand if the networking equipment you have in place supports IPv6.
  • Understand if all your existing networked devices (everything that consumes an IP address) supports IPv6.
  • Ensure that all new device acquisitions are fully supportive of IPv6.
  • Understand if the services you consume support IPv6. (If you are making use of public cloud providers, understand if the services you consume support IPv6 or have a road map to IPv6.)

Whilst there is no official switch-off date for IPv4. The reality is that IPv6 isn’t going away and as IT decision makers we can’t postpone planning for its implementation indefinitely. Take the time now to understand where your organisation is at. Make your transition to IPv6 a success story!!

Planning a move to the cloud with the AWS Application Discovery Service

Here at cloudstep, we love to help our customers achieve their goals. We believe that the cloud is a tool in the toolbox and we can use that multi-facet tool to help our customers realise success. Planning for success starts with goals, and goals come in many different shapes and sizes.

For any given solution, a customers goal may be focused on achieving financial or competitive advantage. Alternatively, they may be looking to realise operational efficiency by improving a day-to-day process using automation and orchestration. No matter your goal, you need a solid plan to ensure success. More often than not, that starts with validating that you have a sound understanding of the current state environment which will enable you to move forward towards achieving your goals.

Today I want to talk about a capability provided as part of the Migration Hub offering in AWS, the Application Discovery Service. This is a tool that we regularly use and encounter when meeting with customers. The core idea behind this capability (aptly named) is to help you discover critical details about your environment. This includes performance metrics and resource utilisation data which can be used for cost modelling, in our case… cloudstep.io. The tooling can also gather detailed network metrics to help you better understand the integrations and interfaces between applications in your environment. All of this data is at your disposal once you have decided upon which deployment model you would like to utilise.

AWS offer both an Agentless Discovery service and an Agent Based discovery service. Ordinarily, we typically use the Agentless discovery service. This is a great approach for organisations that operate entirely virtualised VMware infrastructure. Using this approach allows you to quickly inventory each of your VM’s that reside within your vCenter without the requirement of installing an agent on each guest VM. Choosing this path means that the agentless discovery service will query the VMware vCenter for performance metrics (irrespective of which OS the guest is running.) It can’t actually reach inside the virtual machine, therefore it is dependent on having a compatible version of the “VMware Tools” running inside each VM.

If you have a mixture of Physical and Virtual servers in your fleet, or you run another Hypervisor (such as Hyper-V) you may need to consider the Agent based deployment model. This approach is generally considered more labour intensive to get up and running due to the requirement to get hands on with each server. There are also some constraints around which OS’s it can fetch data from. So be mindful of this. You may even find that the best approach is to run a mix of the two deployment models. The outcome of both approaches is a series of performance data metrics which is shipped outbound using HTTPS to an S3 bucket. This bucket can then be queried by the AWS Migration Hub service. Alternatively you can export the data and analyse it using tooling of your choice.

For the remainder of the article, I will focus on our experience with the Agentless discovery approach. As I mentioned earlier, this is our preferred approach because it takes about an hour to get up and running and it generally produces more than enough quality data. In our experience, this provides an excellent baseline for commencing our cloudstep.io cost modelling engagement.

The AWS Agentless discovery connector operates as a VMware appliance within your vCenter environment. AWS provide a pre-canned OVA file which is around 2GB in size. You simply deploy this, the same way you would with any other open virtualisation archive. If you run multiple vCenters for different physical locations, you will need to deploy multiple instances of the appliance to service each stack.

If you experience issues deploying the OVA image within VMware, review my other blog – here

Deploying these appliances in enterprise environments often presents unique challenges. In our experience, this is where customers tend to have issues. Sometimes they deploy the appliances to management networks which don’t provide DHCP so they need to manually bind IP addresses, or there may be firewall rules which prevent connections from an access layer switch to perform the configuration process. The appliance does offer a terminal console (sudo setup.rb) where you can configure foundation services such as IP configs and DNS servers.

Another consideration you should make is “How will my appliance get outbound access to the internet?” After all, its sole purpose is to ship data outbound using HTTPS to an AWS S3 bucket via the Migration Hub. From a firewalling perspective, this is usually quite nice as outbound TCP443 generally doesn’t warrant a discussion with your security team. However, should your security team raise concern about corporate data being shipped off to the internet, AWS provide a detailed article on exactly what information is collected – here.

A final consideration you should make is proxy servers. If you utilise upstream proxy servers to police internet access, consider any rules you may need to define here. Typically speaking, the appliance will run headless in a “SYSTEM” context so you may need to allow it unauthenticated outbound internet access. Take a moment to think through any pitfalls you may encounter and also consider how you intend on interfacing with the appliance.

Once you have deployed your shiny new VM, you can fire up a web browser and configure it using the native web interface ( http://127.0.0.1 ) There are two things you will need:

  1. Read-only credentials to the vCenter you will inventory.
  2. AWS IAM Credentials to authenticate to the Migration Hub service.

Once you have completed the wizard, you will be greeted with a summary screen that presents instance specific configuration such as the appliances AWS connector ID.

The final step in the process is to to start the data collection process. You can action this by making API calls using the AWS CLI

aws discovery start-data-collection-by-agent-ids –agent-ids <connector ID>

Alternatively, you can also navigate to the Migration Hub console and manually approve the data collection process. If you have more than one appliance, you will have multiple connector ID’s registered here. You can validate that these line up by browsing to the appliance web interface where it will list its respective connector ID. The service polls the vCenter environment every 60 minutes, therefore it is reasonable to expect that you should be able to query your data within the AWS migration hub within an hour or two assuming everything is functioning as expected. Alternatively you can export the collected data to a CSV to commence your migration analysis.

In this blog I have explored the Application Discovery Service which is a capability provided by AWS’ Migration Hub. We have talked through common pitfalls that customers often experience when working with the agentless discovery service in effort to simply the deployment process. The data collected provides powerful insights into your environment which is crucial to success when planning a cloud migration. Should you need further assistance, do not hesitate to reach out to the team at cloudstep.io. We’d love to hear from you, and to help you on the road to success

To the cloud!