How To SSH Into Private Servers Via A Bastion Without Copying SSH Keys

This is a documentation on the the process of accessing the public EC2 instances from a bastion server that is created in the private subnet, as a follow up to the article on setting up a proper cloud infrastructure with basic security for applications on AWS.

Motivation

Once in a while, we need to communicate with the production servers to do checks. The proper way is to setup a bastion server in the public instance and ssh into them. However, setting the bastion server up with the proper configurations might be time consuming to get it right.

On top of that, once we are done, there is a financial incentive to shut the bastion server down to save cost. This will translate to more time consumed to spin it up and down.

Since we might not do this often, we would tend to forget how to set up or shut down the bastion server properly. This translates to more time debugging during each process should any steps be missed along the way.

Hence, it will be nice to have these processes recorded down in code.

Terraform Setup For Bastion Server

The terraform files to setup the bastion server is as shown below. This is a complete copy of the snippet in the article on setting up a standard AWS VPC using terraform. The explanation is there so I would not be covering that here.

# bastion
resource "aws_security_group" "bastion" {
  name = "${var.project_name}${var.env}-bastion"
  description = "For bastion server ${var.env}"
  vpc_id = aws_vpc.main.id

  tags = {
    Name = "${var.project_name}${var.env}"
  }
}

resource "aws_security_group_rule" "ssh-bastion-world" {
  type = "ingress"
  from_port = 22
  to_port = 22
  protocol = "tcp"
  # Please restrict your ingress to only necessary IPs and ports.
  # Opening to 0.0.0.0/0 can lead to security vulnerabilities
  # You may want to set a fixed ip address if you have a static ip
  security_group_id = aws_security_group.bastion.id
  cidr_blocks = ["0.0.0.0/0"]
}

resource "aws_security_group_rule" "ssh-bastion-web_server" {
  type = "egress"
  from_port = 22
  to_port = 22
  protocol = "tcp"
  security_group_id = aws_security_group.bastion.id
  source_security_group_id = aws_security_group.web_server.id
}

resource "aws_instance" "bastion" {
  ami = "ami-061eb2b23f9f8839c"
  associate_public_ip_address = true
  instance_type = "t2.nano"
  subnet_id = aws_subnet.public-ap-southeast-1a.id
  vpc_security_group_ids = ["${aws_security_group.bastion.id}"]
  key_name = aws_key_pair.main.key_name

  tags = {
    Name = "bastion-${var.project_name}${var.env}"
  }
}

resource "aws_key_pair" "main" {
  key_name = "${var.project_name}-${var.env}"
  public_key = "ssh-rsa something"
}

output "bastion_public_ip" {
  value = aws_instance.bastion.public_ip
}

Retrieving AWS EC2 Client

I will be using ruby to carry out the ssh process because I am really bad with shell script 🙁

These snippets here are translated from a rake task which is what I use for my projects. There may be errors here and there so do understand the process rather than just plain copy!

require 'aws-sdk'

aws_access_key_id = `aws --profile #{aws_profile} configure get aws_access_key_id`.chomp
aws_secret_access_key = `aws --profile #{aws_profile} configure get aws_secret_access_key`.chomp

ec2_client = Aws::EC2::Client.new(
  region: region,
  access_key_id: aws_access_key_id,
  secret_access_key: aws_secret_access_key
)

First, initialize an instance of EC2 client. That will require the correct access key id and secret access key. These can be easily retrieve by running the aws configure command in the shell.

Line 3 and 4 does this, given the desired aws_profile, and stores the value in respective ruby variables for use in the rest of the script.

Note that the back ticks (`), among other shell execution commands in ruby, should be used here as it is the only one the returns the output that we need to use. The chomp method removes the line break that is returned along with the output in the shell.

Retrieving The Bastion Server Instance

results = ec2_client.describe_instances(
  filters: [
    {
      name: 'instance.group-name',
      values: ["#{project_name}#{env}-bastion"]
    }
  ]
)

raise 'There are more than 1 reservations. Please check!' if results.reservations.count > 1

raise 'There are no reservations. Please check!') if results.reservations.count.zero?

instances = results.reservations.first.instances

raise 'There are more than 1 bastion servers. Please check!' if instances.count > 1

bastion = instances.first

Next we retrieve the bastion instance via the describe_instances method as shown on line 1.

On line 4 and 5, we narrow down our instances to search for using the filter instance.group-name. This filter refers to the security group that we have attached to the bastion instance (see the terraform files).

The next few lines handle the unexpected scenarios of having multiple reservations and instances. I do not know the difference between these 2 entities, but I guess that is trivial to our mission here.

Eventually, we will have the access to the bastion instance.

Retrieving The Private Web Server Instance

results = ec2_client.describe_instances(
  filters: [
    {
      name: 'instance.group-name',
      values: ["#{project_name}#{env}-web-servers"]
    }
  ]
)

raise 'There are no reservations' if results.reservations.count.zero?

private_ip_addresses = results.reservations.map do |reservation|
  reservation.instances.map(&:private_ip_address)
end.flatten

raise 'There are no private_ip_addresses.' if private_ip_addresses.count.zero?

instance_ip = private_ip_addresses.first

Next, we retrieve web server instances using the same method by filtering their security group’s name. Our target here is the private ip address of any one of the instances. Adjust accordingly if there is a particular private instance you are trying to access.

SSH Into Private Web Server Via Bastion Server

Here comes the main event, the ssh operation.

sh 'ssh-add -D'
sh "ssh-add -K #{Rails.root}/#{project_name}-#{env}"
sh('ssh ' \
'-tt ' \
'-A ' \
"-i #{Rails.root}/#{project_name}-#{env} " \
"ec2-user@#{bastion.public_ip_address} " \
'-o StrictHostKeyChecking=no ' \
"-o 'UserKnownHostsFile /dev/null' " \
"\"ssh ec2-user@#{instance_ip} " \
"-o 'UserKnownHostsFile /dev/null' " \
'-o StrictHostKeyChecking=no"')
sh 'ssh-add -D'

Use the sh utility command in ruby to execute shell script. We are not going to use the output of the commands here, so using back ticks is not necessary.

In this bash session, line 1 clears all ssh identities present if any with the -D option. Note that this bash session is decoupled from the bash session of the current terminal. Hence, at this point of time, there should not be any since it is a new session. We also do not need to worry about erasing the ssh agents that we have added. I am keeping it here for hygiene sake.

Line 2 adds the RSA identity of the key pair, which is used to create the bastion instance, to the ssh agent in the current session.

This step is extremely pivotal. It allows us to forward our ssh agent along with the required RSA identity to the bastion server. The bastion server will subsequently be able to authenticate with the web server due to the forwarded identity.

And realise this. All this is done without the bastion server actually possessing the ssh keys at all! This is immensely beneficial on the security side of things because the bastion server, as a server on the public subnet that is exposed to the Internet, is a point of vulnerability for your private instances. It poses a security risk if attackers are able to access the private instances using the ssh keys in the bastion server. But since the keys are not there, we can make sure Gandalf sees to them.

The command from line 3 onwards is the actual main ssh command.

Line 4 forces a pseudo-terminal allocation for us to interact with the web server once we have established the connection. Multiple t option ensures that the interactive session will be forced even if the ssh did not have a local tty for interaction purpose.

Line 5 forwards the ssh agent through the tunneling. And since we have added the RSA identity to the our ssh agent in this session, the authentication keys are also forwarded in the process, without make a copy in the bastion server itself.

Line 6 points to the identity file required to access the bastion server from our local machine. You would not need this line if you have created your bastion server and your web instances using the same key pair. For this case, this is an extra step that is not necessary as I have set up the web and bastion servers to use the same key pair.

Line 7 states the endpoint of the bastion server and where to ssh into.

Line 8 prevents the ssh mechanism to ask for our confirmation to carry out the operation.

Line 9 prevents our bastion’s ip address to be registered a known host on our ssh known_hosts file. As these bastion servers are meant to be shut down after use, their ip addresses will be different each time we spin them up. Hence, this option will prevent the unnecessary and unmonitored bulging of our known_hosts file.

Line 10 is the command to run after we have successfully ssh into the bastion server. In this case, we are running the a subsequent ssh command to ssh into the web server instance via its private ip address that we found earlier. Note that this command is wrapped in quotes.

To reiterate a few pointers here:

  • This subsequent ssh command does not require any identity file due to the RSA identity that is forwarded
  • The bastion server can access the private instances due to the setup of the security groups
  • The shell interaction session between the bastion server and the web instance is available to use on our local machine due to the -tt option mentioned in line 4.

Line 11 and 12 serve the same purpose as line 8 and, but this time for the bastion server’s ssh operation into the web servers.

Line 13 for hygiene sake, clear the RSA identity from the ssh agent.

Conclusion

There you have it, an ssh session via a bastion server without copying the security keys into it to ensure minimum vulnerability and maximum security!

AWS Lamba and API Gateway Integration With Terraform To Collect Emails

This is a documentation on creating a service that collects emails. It runs on serverless technology utilizing AWS lambda and API Gateway. It is also made easy to deploy to the cloud with infrastructure as code via Terraform in the form of a plug-and-play methodology.

Motivation

Often, I have to make static websites that are not exactly completely static because it requires a backend to collect the emails. While 3rd party services like mailchimp and sendgrid has their own SDKs to support easy integration for email collection, we have to be worried about hitting the limit in their packages and plans. This translate to stress for developers as we have to find a solution on it quickly and properly. If this happens on a weekend or a Friday, somehow this is always the case as more people are surfing the net then, the intensity is amplified.

For a new website, it is very hard to gauge the traffic and thus the plan required for the 3rd party service. this poses difficulties when budgeting for the project. Under utilizing the service also translate to unnecessary cost. The best kind of plan for such website is a pay as you go model, in my opinion, and that can be achieved by integrating with cloud providers like AWS.

Technology Stack

AWS Lambda

Enter AWS lambda where you only pay for what you use. You do not need to fork out money at the start of your project. Instead, you will just pay for how much you use, hence relieving you of the worry of wasting money on resources you are not using. In fact, this is only an issue if you are hitting 1 million potential users signing up with their email every month. The reason is because AWS lambda has 1 million free request every month before they start charging. This is highly unlikely for a new website, which means you now have a backend for your static website for free.

API Gateway

For the Serverless fuction, that is AWS Lambda, to connect to the Internet via an API, we need the API Gateway. This exposes the serverless function to be accessible by the World Wide Web with a HTTPS endpoint. It runs on the encrypted transport security layer protocol to uphold security by default. This allows your websites to use the serverless function via API calls.

Terraform

To set up the infrastructures, the usual way is to navigate the AWS management console, deploy the required AWS resoures and link them. This can be a challenge if you are not familiar with the required configurations. Not only will this translate to loss of precious time to debug these issues, which otherwise developers could have spent it with your loved ones and challenge the meme below, but it will also lead to frustration.

While frustration is a part and parcel of life as a programmer, we can also avoid them with our knowledge of code. Here is where Terraform enters the fray. It is an Infrastructure As Code where you write the configurations of the infrastructure once and you can deploy it multiple time without having to go through the whole forest of the AWS console each time. This means you do not need to remember every single step and do not need to deal with surprise bugs because you forgot one of them, or worse, had a spelling error.

Programming is like magic. You write very specific instructions in arcane languages to invoke commands, and if you get it even a little bit wrong you risk unleashing demons and destroying everything.

— Diana Carrier (@artemis_134) June 23, 2018

Since the blueprint infrastructure is in code, this means we can leverage version control features with git, and work together to improve the code base along the way without fear of not being able to rollback to the previous successful configuration.

Terraform Files

I will start off with the terraform files required to setup the infrastructure to deploy the code. Let’s start off with the place to store our emails.

The database – AWS DynamoDB

I will store the emails collected in AWS’s own noSQL database DynamoDB. This is a fast, simply structured and schemaless storage which fits my use case very aptly.

It allows fast and simultneous writes at high speed, so there is no fear of race conditions from spike in the volume of signups during a PR event promoting the product and getting people to leave their emails at the website.

Since it is schemaless, we can easily add new details of the users that you would like to collect on top of their emails along the way without having to migrate and fiddle with the structure of the database. With proper metaprogramming, you do not need to touch the backend code as well, leaving only the frontend to work on adding the new text fields for data collection.

For the sake of argument, we can also use the traditional relational database management system  (RDBMS) for this project. It is written in SQL, which is a langauge most, if not all, developers who every touched a database would have known. There is no need to use fancy noSQL for this simple project. In addition, the chances of leveraging the scaling advantage of noSQL over SQL databases are low, because you will need alot of traffic for that to become a worry. For a new website, that is highly unlikely to happen.

However, highly influenced by the cost, I am still sticking with DynamoDB in this case. To setup an AWS RDS to host a managed relational database, the cheapest MySQL database already goes for around 20 USD a month, as compared to the pay as you go model the DynamoDB employs. On top of that, it has a generous amount of free usage and storage under its free tier. This free tier does not last for the first 12 months after your signup but forever, unlike the RDS counterpart. We probably will NOT incur any cost using DynamoDB unless your marketing is brilliant for your new website.

resource "aws_dynamodb_table" "main" {
  name = "${var.project_name}-dynamodb_table"
  billing_mode = "PROVISIONED"
  read_capacity = var.dynamodb-read_capacity
  write_capacity = var.dynamodb-write_capacity
  hash_key = "email"

  attribute {
    name = "email"
    type = "S"
  }
}

Provisioning the database is the simplest. I am using Terraform variables to substitute values to set the number of reading and writing units required, as well as the table name for robustness sake.

I have set the billing mode to “provisioned” for simplicity sake. Afterall I am not expecting any insane burst of traffic for a site that is not popular. Even if it does, maybe due to some incrediable promotion at some hugely popular event, I do not expect the load to require me to scale the reading and writing capacities of the database. It is going to be a quick write of a few bytes.

On top of that, provisioned capacity means less configurations needed for the permissions to autoscale of the capacities of the database. It can take some time to configure that, and since that is outside the topic of the article, I will stick to “provisioned” billing mode.

The hash_key, or “partition key” in other definitions, is analogous to the primary key in a SQL database table. It requires specific details under the attribute property. You can specify the range_key, or “sort key” here if you require, and remember to add attribute to describe it as well.

Other attributes that are neither the partition key nor the sort key need not have a attribute property in this file. You can simply just write it in the database and it will register. Afterall, this is a schemaless database.

On top of that, it is a fully managed database, so it comes with all the goodies like backup and version maintenance to spare developers from all these chores.

The backend – AWS Lambda

​Next is the lambda function. It is written in Javascript using Nodejs. The file below is the configuration file to set the infrastructure required. Let’s dive into it.

resource "aws_lambda_function" "main" {
  filename = var.zipfile_name
  function_name = "${var.project_name}"
  role = aws_iam_role.main.arn
  handler = "index.handler"

  source_code_hash = "${filebase64sha256("${var.zipfile_name}")}"

  runtime = "nodejs12.x"
}

resource "aws_iam_role" "main" {
  name = "${var.project_name}-iam_lambda"

  assume_role_policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": "sts:AssumeRole",
      "Principal": {
        "Service": "lambda.amazonaws.com"
      },
      "Effect": "Allow",
      "Sid": ""
    }
  ]
}
EOF
}

resource "aws_iam_policy" "main" {
  name = "main"
  path = "/"
  description = "IAM policy for lambda to write to dynamodb table and logging"

  policy = templatefile("${path.module}/lambda_policy.tmpl", { dynamodb_arn = aws_dynamodb_table.main.arn })
}

resource "aws_iam_role_policy_attachment" "main" {
  role = "${aws_iam_role.main.name}"
  policy_arn = "${aws_iam_policy.main.arn}"
}

resource "aws_lambda_permission" "main" {
  statement_id = "AllowExecutionFromAPIGateway"
  action = "lambda:InvokeFunction"
  function_name = aws_lambda_function.main.function_name
  principal = "apigateway.amazonaws.com"

  source_arn = "${aws_api_gateway_rest_api.main.execution_arn}/*/*/*"
}

Uploading of the backend code will be using the base64 hash of the zipfile of the code. The code will need to be first compressed and zipped before taking this action. We will see how we can automate this process later.

This lambda function will need the permissions to write to the dynamoDB table. This is done using

  • aws_iam_role to establish trust between the 2 AWS services
  • aws_iam_policy to give permission for the lambda function access the database resource and perform the PutItem action. Details of the policy is interpolated via a template file, which we will go through later
  • aws_iam_role_policy_attachment to bind the aws_iam_role to the aws_iam_policy on the lambda function
  • aws_lambda_permissionto allow API Gateway to be able to integrate the lambda function and invoke it

The template file for the aws_iam_policy is shown below. It lists the actions that the lambda function is permitted to perform on the specified dynamodb table. It also contains the permissions for lambda function to push the logs to AWS Cloudwatch. By the way, these logging permissions are the default permissions for a lambda function, and this template adds on the DynamoDB permissions to them. Note the dynamodb_arn variable that is interpolated, which jusitifies the use of the template file instead of hardcoding the whole policy in the main terraform file for robustness sake.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": "dynamodb:PutItem",
      "Resource": "${dynamodb_arn}",
      "Effect": "Allow"
    },
    {
      "Action": [
        "logs:CreateLogGroup",
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Resource": "arn:aws:logs:*:*:*",
      "Effect": "Allow"
    }
  ]
}

The API layer – AWS API Gateway

The API Gateway is required to expose the lambda function to be consumed by servers and websites via a URL endpoint. The endpoint will be served over the HTTPS, which requires some extra configurations as documented below.

resource "aws_api_gateway_rest_api" "main" {
  name = var.project_name
}

resource "aws_api_gateway_resource" "main" {
  rest_api_id = aws_api_gateway_rest_api.main.id
  parent_id = aws_api_gateway_rest_api.main.root_resource_id
  path_part = "email"
}

resource "aws_api_gateway_integration" "main" {
  rest_api_id = aws_api_gateway_rest_api.main.id
  resource_id = aws_api_gateway_resource.main.id
  http_method = aws_api_gateway_method.main.http_method
  integration_http_method = aws_api_gateway_method.main.http_method
  type = "AWS_PROXY"
  uri = aws_lambda_function.main.invoke_arn
}

resource "aws_api_gateway_integration_response" "main" {
  depends_on = [aws_api_gateway_integration.main]

  rest_api_id = aws_api_gateway_rest_api.main.id
  resource_id = aws_api_gateway_resource.main.id
  http_method = aws_api_gateway_method.main.http_method
  status_code = aws_api_gateway_method_response.main.status_code
}

resource "aws_api_gateway_method" "main" {
  rest_api_id = aws_api_gateway_rest_api.main.id
  resource_id = aws_api_gateway_resource.main.id
  http_method = "POST"
  authorization = "NONE"
}

resource "aws_api_gateway_deployment" "main" {
  depends_on = [
    "aws_api_gateway_integration_response.main",
    "aws_api_gateway_method_response.main",
  ]
  rest_api_id = aws_api_gateway_rest_api.main.id
}

resource "aws_api_gateway_method_settings" "main" {
  rest_api_id = aws_api_gateway_rest_api.main.id
  stage_name = aws_api_gateway_stage.main.stage_name
  
  # settings not working when specifying the single method
  # refer to: https://github.com/hashicorp/terraform/issues/15119
  method_path = "*/*"

  settings {
    throttling_rate_limit = 5
    throttling_burst_limit = 10
  }
}

resource "aws_api_gateway_stage" "main" {
  stage_name = var.stage
  rest_api_id = aws_api_gateway_rest_api.main.id
  deployment_id = aws_api_gateway_deployment.main.id
}

resource "aws_api_gateway_method_response" "main" {
  rest_api_id = aws_api_gateway_rest_api.main.id
  resource_id = aws_api_gateway_resource.main.id
  http_method = aws_api_gateway_method.main.http_method
  status_code = "200"
}

output "endpoint" {
  value = "${aws_api_gateway_stage.main.invoke_url}${aws_api_gateway_resource.main.path}"
}

So let’s break it down.

The aws_api_gateway_rest_api represents the project in its entirety.

The aws_api_gateway_resource refers to each api route of this project, and there is only 1 in this case.

I have setup only 1 stage environment of aws_api_gateway_stage for this project using a Terraform variable. You can setup a different stages to differentiate the staging and production environments.

The aws_api_gateway_stage is associated to a aws_api_gateway_method_settings that sets the throttling rate of the API to prevent spams and overloading. For the method_path property, the wildcard route is used to apply to all routes instead of the only API route that was created. It is trivial in this case, but the explanation for picking this “easy” route is simply due to a bug. It I were to specify the exact route, which is in the form of {resource_path}/{http_method}, the settings on the throttling rate will not propagate. It was documented here on github but was not properly resolved. Leaving it here for now.

The aws_api_gateway_deployment configures the deployment of the API. Note the depends_on attribute that was assigned. This explicit dependency is critical to ensure the deployment is called into effect after all the necessary resources have been provisioned.

The aws_api_gateway_integration configuration sets the integration to lambda proxy using POST HTTP method without any authorization, as specified by the aws_api_gateway_method configuration. Lambda proxy allows us to handle the request from the server like how we would in a typical web application backend framework. The full request object is passed to lambda function and the API Gateway plays no part in mapping any of the request parameters. The API Gateway mapping has great potential to integrate interfaces properly, but for our use case, it is not necessary. I find this article doing a great job in explaining the API Gateway features with easy to consume information and summary, like a gameshark guide book written by the half-blood prince. Do take a look to understand AWS API Gateway better.

The aws_api_gateway_integration_response is responsible for handling the response from the lambda function. This is where we can make changes to the headers returned from the lambda function using the response_parameters property, which is not used in this case. This is also the place to map and transform the response data from the backend to fit the desired data structure using the response_templates property.

The aws_api_gateway_method_response is where we can filter what response headers and data from aws_api_gateway_integration_response to pass on to the caller.

The transform and mapping of the headers and data from the backend (ie the lambda function) in aws_api_gateway_integration_response and the filter of headers and data before passing to the front end in aws_api_gateway_method_response is not needed in this sample application. It is just good knowledge to have. There are 2 reasons why we do not need them here.

First, in a bit, we will go through the front end that will make an API call that is a simple request. A simple request does not require a preflight request, which is a API call made by browsers prior to the actual API call, as they are deemed safe since they are using standard CORS-safelisted request headers. In the event that one does need a preflight request because one is not making a simple request, we will need to set up another API route that will transform the headers returned from the backend and allow the relevant headers to be passed on to the front end for this preflight request. This will allow the frontend website to overcome the CORS policy enabled by default in modern browsers. This will mean we need to configure a new set of aws_api_gateway_rest_api, aws_api_gateway_integration, aws_api_gateway_method, aws_api_gateway_integration_response, aws_api_gateway_method_response just for this preflight request. Things can get complicated here, so I will leave out of this article. If you still to implement CORS, [this gist](https://gist.github.com/keeth/6bf8b67c82f9a085e03ecbb289a859d6) is a good reference.

Second, we are using lambda proxy integration, so the full response from the lambda will be passed to the front end and mapped automatically, provided the response from the lambda code is properly formatted. Refer to this documentation for more details on it.

At last, the output resource will print the value of the enpoint of the api for us to integrate in our frontend.

The Admin Stuff

This file contains the details that we will need to setup terraform and the variables we are using. The provider‘s region attribute here is hardcoded, which should ideally not be the case. I have yet to figure out how to make this dynamic and robust. The name with the todo- prefix should be changed to fit the project.

We are using an S3 bucket as the Terraform backend to hold the state of the infrastructure provisioned by Terraform. ​Creation of the bucket will be automated via a script that we will go through during the section on deployment.

provider "aws" {
  version = "~> 2.24"
  region = "eu-west-1"
}

terraform {
  required_version = "~> 0.12.0"
  backend "s3" {
    bucket = "todo-project-tfstate"
    key = "terraform.tfstate"
    region = "eu-west-1"
  }
}

variable "project_name" {
  type = string
  default = "todo-project"
}

variable "region" {
  type = string
  default = "eu-west-1"
}

variable "stage" {
  type = string
  default = "todo-stage"
}

variable "zipfile_name" {
  type = string
  default = "todo-project.zip"
}

variable "dynamodb-read_capacity" {
  type = number
  default = 1
}

variable "dynamodb-write_capacity" {
  type = number
  default = 1
}

The Application

Here is the application code in written in nodejs. It is a simple write to the dynamodb with basic error handling. It takes in only 1 parameter, that is the email. This code can definitely be improved by allowing more parameters to be written to the database in a dynamic way, so that the same code base can be used for a site that collects the first and last name of the user, as well as another site that collects the date of birth of the user. I will leave that as a future personal quest.

// Load the AWS SDK for Node.js
const AWS = require('aws-sdk');

// Set the region 
AWS.config.update({region: 'eu-west-1'});

// Create the DynamoDB service object
const ddb = new AWS.DynamoDB({apiVersion: '2012-08-10'});

exports.handler = async (event) => {
  console.log(JSON.stringify(event, null, 2));
  const params = {
    TableName: 'todo-project-dynamodb_table',
    Item: {
      'email' : {S: JSON.parse(event.body).email}
    }
  };

  // Call DynamoDB to add the item to the table
  ddb.putItem(params, function(err, data) {
    if (err) {
      console.log("Error", err);
    } else {
      console.log("Success", data);
    }
  });
  
  try {
    const result = await ddb.putItem(params).promise();
    console.log("Result", result);
    const response = {
      statusCode: 204,
      headers: {
        "Access-Control-Allow-Origin" : "*",
      },
    };
    return response;
  } catch(err) {
    console.log(err);
    const response = {
      statusCode: 500,
      headers: {
        "Access-Control-Allow-Origin" : "*",
      },
      body: JSON.stringify({ error: err.message }),
    };
    return response;
  }
};

A thing to note here is the need to return the Access-Control-Allow-Origin header in the response. The response also has to follow a particular but straightforward and common format in order for lambda proxy integration with API Gateway. This will map the response properly to the API Gateway method response and be returned to the frontend websites to overcome the CORS policy implemented by modern browsers.

Deployment

I will be using 3 ruby scripts for deployment related tasks, namely init.rb, apply.rb and destroy.rb, and a helper service object, get_aws_profile.rb for the deployment process.

Let’s take a look at them.

get_aws_profile.rb

# get_aws_profile.rb

class GetAwsProfile
  def self.call
    aws_profile = "todo-aws_profile"

    begin
      aws_access_key_id = `aws --profile #{aws_profile} configure get aws_access_key_id`.chomp
      abort('') if aws_access_key_id.empty?

      aws_secret_access_key = `aws --profile #{aws_profile} configure get aws_secret_access_key`.chomp
      abort('') if aws_secret_access_key.empty?
    rescue Errno::ENOENT => e
      abort("Make sure you have aws cli installed. Refer to https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html for more information.")
    end

    p "AWS_ACCESS_KEY_ID = #{aws_access_key_id}"
    p "AWS_SECRET_ACCESS_KEY = #{aws_secret_access_key}"

    [aws_profile, aws_access_key_id, aws_secret_access_key]
  end
end

This is a helper method that will get the aws_access_key_id and the aws_secret_access_key for usage in the scripts. Note that it uses the aws cli command to attain the keys. Hence, it has to be installed on your local machine prior to running. It also assumes you are using named profile to hold your credentials.

I don’t really like this setup since it requires these prerequisites. But well that can be solved again in the future.

init.rb

The first script to run is init.rb.

The init.rb will create the S3 bucket to be used as the terraform backend. Line 20 checks for the presence of this bucket and throws an exception if the bucket does not exist. The rescue block, if triggered, will create the non-existent bucket.

The initialization process on terraform is run via its docker image.

require 'bundler/inline'

gemfile do
  source 'https://rubygems.org'
  gem 'pry'
  gem 'aws-sdk-s3', '~> 1'
end

require './get_aws_profile.rb'

aws_profile, aws_access_key_id, aws_secret_access_key = GetAwsProfile.call

s3_client = Aws::S3::Client.new(
  access_key_id: aws_access_key_id,
  secret_access_key: aws_secret_access_key,
  region: 'eu-west-1'
)

begin
  s3_client.head_bucket({
    bucket: 'todo-project-tfstate',
    use_accelerate_endpoint: false
  })
rescue StandardError
  s3_client.create_bucket(
    bucket: 'todo-project-tfstate',
    create_bucket_configuration: {
      location_constraint: 'eu-west-1'
    }
  )
end

response = `docker run \
  --rm \
  --env AWS_ACCESS_KEY_ID=#{aws_access_key_id} \
  --env AWS_SECRET_ACCESS_KEY=#{aws_secret_access_key} \
  -v #{Dir.pwd}:/workspace \
  -w /workspace \
  -it \
  hashicorp/terraform:0.12.12 \
  init`

puts response

apply.rb

Once initialized, the next script to run is apply.rb.

Prior to applying the Terraform instructure, the backend code is packaged into a zip file. After application, the zip file is deleted for housekeeping.

require 'bundler/inline'

gemfile do
  source 'https://rubygems.org'
  gem 'pry'
  gem 'rubyzip', '>= 1.0.0'
end

require './get_aws_profile.rb'
require 'zip'

aws_profile, aws_access_key_id, aws_secret_access_key = GetAwsProfile.call

folder = Dir.pwd
input_filenames = ['index.js']
zipfile_name = File.join(Dir.pwd, 'todo-project.zip')

File.delete(zipfile_name) if File.exist?(zipfile_name)

Zip::File.open(zipfile_name, Zip::File::CREATE) do |zipfile|
  input_filenames.each do |filename|
    zipfile.add(filename, File.join(folder, filename))
  end
end

response = `docker run \
  --rm \
  --env AWS_ACCESS_KEY_ID=#{aws_access_key_id} \
  --env AWS_SECRET_ACCESS_KEY=#{aws_secret_access_key} \
  -v #{Dir.pwd}:/workspace \
  -w /workspace \
  -it \
  hashicorp/terraform:0.12.12 \
  apply -auto-approve`

puts response

File.delete(zipfile_name) if File.exist?(zipfile_name)

With this, the api is now deployed and can be called from any website. We will go through a sample front end integration in a bit.

destroy.rb

Once you are done with the project or are in the process of debugging, the destroy script will remove all the resources deployed. It will also remove the S3 backend that was created outside of Terraform.

require 'bundler/inline'

gemfile do
  source 'https://rubygems.org'
  gem 'pry'
  gem 'aws-sdk-s3', '~> 1'
end

require './get_aws_profile.rb'

aws_profile, aws_access_key_id, aws_secret_access_key = GetAwsProfile.call

response = `docker run \
  --rm \
  --env AWS_ACCESS_KEY_ID=#{aws_access_key_id} \
  --env AWS_SECRET_ACCESS_KEY=#{aws_secret_access_key} \
  -v #{Dir.pwd}:/workspace \
  -w /workspace \
  -it \
  hashicorp/terraform:0.12.12 \
  destroy -auto-approve`

puts response

s3_client = Aws::S3::Client.new(
  access_key_id: aws_access_key_id,
  secret_access_key: aws_secret_access_key,
  region: 'eu-west-1'
)

begin
  s3_client.head_bucket({
    bucket: 'todo-project-tfstate',
    use_accelerate_endpoint: false
  })

  s3_client.delete_object({
    bucket:  'todo-project-tfstate',
    key: 'terraform.tfstate', 
  })
  s3_client.delete_bucket(bucket: 'todo-project-tfstate')
rescue StandardError
  puts "todo-project-tfstate S3 bucket already destroyed."
end

Sample Frontend Integration

<!DOCTYPE html>
<html>
<head>
  <script
  src="https://code.jquery.com/jquery-3.4.1.min.js"
  integrity="sha256-CSXorXvZcTkaix6Yvo6HppcZGetbYMGWSFlBw8HfCJo="
  crossorigin="anonymous"></script>
</head>
<body>

  <h2>HTML Forms</h2>

  <form id="form">
    <label for="email">First name:</label><br>
    <input type="text" id="email" name="email" value="test@test.com"><br>
    <input type="submit" value="Submit">
  </form>

  <script type="text/javascript">
    $( "#form" ).submit(function(event) {
      event.preventDefault();

      $.ajax({
        type: "POST",
        url: "https://todo-endpoint.execute-api.eu-west-1.amazonaws.com/todo-stage/email",
        data: JSON.stringify({
          email: $('#email').val()
        }),
        success: function(data, textStatus, jqXHR) {
          debugger
        },
        error: function(jqXHR, textStatus, errorThrown) {
          debugger
        }
      });
    });
  </script>

</body>
</html>

Below is a simple html web page that will has the email prefilled for demonstration purpose. The form will submit via jquery.ajax() using default settings so as not to trigger the need for preflight request.

You will see that the email will be added to the DynamoDB table, and the logs of the lambda funciton will be recorded in AWS Cloudwatch.

Conclusion

This exercise helped me understand how lambda is integrated with API Gateway, as well as the immense potential as a robust middleware the latter can be. In addition, I got to understand preflight request and CORS better, as well as the jquery.ajax() function.

The project is saved in this repository for future reference.

How To Setup A Standard AWS VPC With Terraform

This is a documentation on how to setup the standard virtual private network (VPC) in AWS with the basic security configurations using Terraform.

In general, I classify the basics as having the servers and databases in the private subnets, and having a bastion server for remote access. There is definitely much room to improve from this setup and certainly much more in the realms beyond my knowledge. However, as a start, this is, at the very least, essential for a production environment,

Personally, I have an Amazon Certified Solutions Architect (Associate) certificate to my name, but like most of the engineering university graduates out there who have forgotten how to do dy/dx or  what the hell is the L’Hôpital’s rule, I have all but forgotten the exact steps to recreate such an environment.

AWS Associate Solutions Architect | vic-l

As a saving grace 😅, I should say that I do know how to set it up, just that I do not have it at the tip of my fingers. I would not get it right the first time, but given time I will eventually set it up correctly.

This is true for whenever I setup an environment for new projects. Debugging the setup which can be time consuming and frustrating. It is not efficient and is probably one of the key reasons why infrastructure as code (IaC) has become a trending topic in recent years.

Provisioning these infrastructures using code implies:

  • version control on code and, in turn, infrastructural changes made by members of the development team
  • easily reproducible infrastructures
  • automation

One of the frontrunners in this industry is Terraform. All that is required are the configurations written in files ending with the “tf” extension placed in the same directory.

The VPC

Start by provisioning the VPC.

We set the CIDR block to provide the maximum number private ip addresses that an AWS VPC allows. This implies that you can have up to 65,536 AWS resources in your VPC, assuming each of them require a private IP address for communication purpose.

resource "aws_vpc" "main" {
  cidr_block = "10.0.0.0/16" # 65536 ip addresses

  tags = {
    Name = "${var.project_name}${var.env}"
  }
}

The variables project_name and env can be placed in a separate .tf as long as they are in the same directory when Terraform eventually runs to apply the changes.

The Gateways

Next, we setup the Internet gateway (IGW) and NAT gateway (NGW).

The IGW allows for resources in the public subnets to communicate with the outside Internet.

The NGW does the same thing,  but for the resources in the private subnets. Sometimes, these resources need to download packages from the Internet for updates etc. This is in direct conflict with the security requirements that placed them in the private subnets in the first place. The NGW balances these 2 requirements.

# IGW
resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id

  tags = {
    Name = "${var.project_name}${var.env}"
  }
}

resource "aws_route_table" "igw" {
  vpc_id = aws_vpc.main.id

  tags = {
    Name = "igw-${var.project_name}${var.env}"
  }
}

resource "aws_route" "igw" {
  route_table_id = aws_route_table.igw.id
  destination_cidr_block = "0.0.0.0/0"
  gateway_id = aws_internet_gateway.main.id
}

# NGW
resource "aws_route_table" "ngw" {
  vpc_id = aws_vpc.main.id

  tags = {
    Name = "ngw-${var.project_name}${var.env}"
  }
}

resource "aws_route" "ngw" {
  route_table_id = aws_route_table.ngw.id
  destination_cidr_block = "0.0.0.0/0"
  nat_gateway_id = aws_nat_gateway.main.id
}

### NOTE ###
resource "aws_eip" "nat" {
  vpc = true
}

resource "aws_nat_gateway" "main" {
  allocation_id = aws_eip.nat.id
  subnet_id = aws_subnet.public-ap-southeast-1a.id

  tags = {
    Name = "${var.project_name}${var.env}"
  }
}

Both gateways need to be associated to their respective aws_route_table via an aws_route that will route out to everywhere on the Internet, as indicated by the 0.0.0.0/0 CIDR block.

The NGW requires some additional setup.

First, a NAT gateway requires an elastic IP address due to the way it is engineered. I would not pretend I know how it works to tell you why a static IP address is required, but I do know we can easily provision using Terraform.

This static IP address will also come in useful if your private instances need to make API calls to third party sources that require the instances ip address for whitelisting purpose. The outgoing requests from the private instances will bear the ip address of the NGW.

In addition, a NAT gateway needs to be placed in one of the the public subnet in order to communicate with the Internet. As you can see, we have made an implicit dependency on the aws_subnet which we will define later. Terraform will ensure the NAT gateway will be created after the subnets are setup.

The Subnets

Now, let’s setup the subnets.

We will setup 1 public and 1 private subnet in each availability zones that the region provides. I will be using the ap-southeast-1 (Singapore) region. That will be a total of 6 subnets to provision as there are 3 subnets in this region.

#### public 1a
resource "aws_subnet" "public-ap-southeast-1a" {
  vpc_id = aws_vpc.main.id
  cidr_block = "10.0.100.0/24"
  availability_zone_id = "apse1-az2"

  tags = {
    Name = "public-ap-southeast-1a-${var.project_name}${var.env}"
  }
}

resource "aws_route_table_association" "public-ap-southeast-1a" {
  subnet_id = aws_subnet.public-ap-southeast-1a.id
  route_table_id = aws_route_table.igw.id
}

#### public 1b
resource "aws_subnet" "public-ap-southeast-1b" {
  vpc_id = aws_vpc.main.id
  cidr_block = "10.0.101.0/24"
  availability_zone_id = "apse1-az1"

  tags = {
    Name = "public-ap-southeast-1b-${var.project_name}${var.env}"
  }
}

resource "aws_route_table_association" "public-ap-southeast-1b" {
  subnet_id = aws_subnet.public-ap-southeast-1b.id
  route_table_id = aws_route_table.igw.id
}

#### public 1s
resource "aws_subnet" "public-ap-southeast-1c" {
  vpc_id = aws_vpc.main.id
  cidr_block = "10.0.102.0/24"
  availability_zone_id = "apse1-az3"

  tags = {
    Name = "public-ap-southeast-1c-${var.project_name}${var.env}"
  }
}

resource "aws_route_table_association" "public-ap-southeast-1c" {
  subnet_id = aws_subnet.public-ap-southeast-1c.id
  route_table_id = aws_route_table.igw.id
}

#### private 1a
resource "aws_subnet" "private-ap-southeast-1a" {
  vpc_id = aws_vpc.main.id
  cidr_block = "10.0.1.0/24"
  availability_zone_id = "apse1-az2"

  tags = {
    Name = "private-ap-southeast-1a-${var.project_name}${var.env}"
  }
}

resource "aws_route_table_association" "private-ap-southeast-1a" {
  subnet_id = aws_subnet.private-ap-southeast-1a.id
  route_table_id = aws_route_table.ngw.id
}

#### private 1b
resource "aws_subnet" "private-ap-southeast-1b" {
  vpc_id = aws_vpc.main.id
  cidr_block = "10.0.2.0/24"
  availability_zone_id = "apse1-az1"

  tags = {
    Name = "private-ap-southeast-1b-${var.project_name}${var.env}"
  }
}

resource "aws_route_table_association" "private-ap-southeast-1b" {
  subnet_id = aws_subnet.private-ap-southeast-1b.id
  route_table_id = aws_route_table.ngw.id
}

#### private 1c
resource "aws_subnet" "private-ap-southeast-1c" {
  vpc_id = aws_vpc.main.id
  cidr_block = "10.0.3.0/24"
  availability_zone_id = "apse1-az3"

  tags = {
    Name = "private-ap-southeast-1c-${var.project_name}${var.env}"
  }
}

resource "aws_route_table_association" "private-ap-southeast-1c" {
  subnet_id = aws_subnet.private-ap-southeast-1c.id
  route_table_id = aws_route_table.ngw.id
}

Amidst this long snippet of configuration for the subnets, it is essentially a repeat of the same resources association.

For the public subnets, they are assigned the CIDR blocks 10.0.1.0/2410.0.2.0/24 and 10.0.3.0/24 respectively. Each will have up to 256 ip addresses to house 256 AWS resources that requires an ip address. Their addresses will be from, taking the first subnet as example, 10.0.1.0 to 10.0.1.255.

For the private subnets, they occupy the CIDR blocks 10.0.101.0/24, 10.0.102.0/24 and 10.0.103.0/24 respectively.

To be exact, there will be less than 256 addresses per subnet as some private IP addresses are reserved in every subnet. Of course, you can provision more or less ip addresses per subnet with the correct subnet masking setting.

Each subnet is associated to different availability zones via the availability_zone_id to spread out the resources across the region.

Each public subnet is also associated to the aws_route_table that is related to the IGW, while each private subnet is associated to the aws_route_table related to the NGW.

The Database

Next, we setup the database. We will provision the database using RDS and place it in the private subnets for security purpose.

At this point of time, I must admit that I do not know if this is the best way to setup the database. I personally have a lot of questions on how the infrastructure will change when the application scales eventually, especially for the database. How will the database be sharded into different regions to serve a global audience? How do the database sync across the different regions? These are side quests that I will have to pursue in the future.

For now, a single instance in a private subnet.

resource "aws_db_instance" "main" {
  allocated_storage = 20
  storage_type = "gp2"
  engine = "mysql"
  engine_version = "5.7"
  instance_class = "db.t2.micro"
  identifier = "rds-${var.project_name}${var.env}"
  name = "something"
  username = "something"
  password = "something"

  skip_final_snapshot = false
  # notes time of creation of rds.tf file
  final_snapshot_identifier = "rds-${var.project_name}${var.env}-1573454102"

  vpc_security_group_ids = [aws_security_group.rds.id]
  db_subnet_group_name = aws_db_subnet_group.main.id

  lifecycle {
    prevent_destroy = true
  }

  tags = {
    Name = "rds-${var.project_name}${var.env}"
  }
}

resource "aws_db_subnet_group" "main" {
  name = "db-private-subnets"
  subnet_ids = [
    aws_subnet.private-ap-southeast-1a.id,
    aws_subnet.private-ap-southeast-1b.id,
    aws_subnet.private-ap-southeast-1c.id
  ]

  tags = {
    Name = "subnet-group-${var.project_name}${var.env}"
  }
}

As you can see, we can see and review the full configuration for the database using code as compared to having to navigate around the AWS management console to complete the puzzle. We can easily know the size of the database instance we have provisioned as well as its credentials (Ok this is debatable if we want to commit sensitive data in our code).

In this configuration, I ensured that the database will produce a final snap shot in the event it gets destroyed.

Access to the database will be guarded by an aws_security_group that will be defined later.

The database is also associated to the aws_db_subnet_group resource. This resource consist of all the private subnet that we provisioned. This creates an implicit dependency on these subnets, ensuring that the database will only be created after the subnets are created. This would also tell AWS to place the database in the custom VPC that the subnets exist in.

I also ensured the database will not be destroyed by Terraform accidentally using the lifecycle configuration.

The Bastion

The bastion server allows us to access the servers and the database instance in the private subnets. We will provision the bastion inside the public subnet.

resource "aws_instance" "bastion" {
  ami = "ami-061eb2b23f9f8839c"
  associate_public_ip_address = true
  instance_type = "t2.nano"
  subnet_id = aws_subnet.public-ap-southeast-1a.id
  vpc_security_group_ids = ["${aws_security_group.bastion.id}"]
  key_name = aws_key_pair.main.key_name

  tags = {
    Name = "bastion-${var.project_name}${var.env}"
  }
}

resource "aws_key_pair" "main" {
  key_name = "${var.project_name}-${var.env}"
  public_key = "ssh-rsa something"
}


output "bastion_public_ip" {
  value = aws_instance.bastion.public_ip
}

I am using a Ubuntu-18.04 LTS image to setup the bastion instance. Note that the AMI id will differ from region to region, even for the same operating system. The image below shows the difference in the AMI id between Singapore and Tokyo regions.

ubuntu ami in ap-southeast-1| vic-l
ubuntu ami in tokyo region | vic-l

I will mainly use the bastion to tunnel the commands to the private subnet. Hence, there is no need for a large computation. The cheapest and smallest instance size of t2.nano is chosen.

It is associated to a public subnet that we created. Any subnet will work, but make sure it is public as we need to be able to connect to it.

Its security group will be defined later.

All EC2 instances in AWS can be given an aws_key_pair. We can generate a custom private key using the ssh-keygen command or you can use the default ssh key in your local machine so that you can ssh into the bastion easily without having to define the identity file each time you do so.

Then, there is the output block. After Terraform has completed its magic, it will output values defined in these output blocks. In this case, the public ip address of the bastion server will be shown on the terminal, making it easy for us to obtain the endpoint.

The Security Groups

Lastly, the connection is not completed without setting up the security groups that guards the traffic going in and out of the resources. This was the bane of my AWS Solution Architect journey. With the required configurations spelled out in code instead of steps in the console that exist only in the memory, Terraform has helped me greatly to further understand this feature.

There are a total of 3 aws_security_group resources  to be created, representing the bastion, the instances and the database respectively. Each of them have their own set of inbound and/or outbound rules, named “ingress” and “egress” in Terraform terms, that are configured separately.

While you can configure the inbound and outbound rules together within the resource block of the respective aws_security_group, I would recommend against that. This is because doing so will result in tight coupling between the security groups, especially if one of its aws_security_group_rule is pointing to another aws_security_group as the source. This is problematic when we eventually make changes to the security groups because, for example, maybe one cannot be destroyed because a security group that is it dependent on is not supposed to be destroyed.

And the frustrating thing is that Terraform, or maybe the underlying AWS api, do not indicate the error. In fact it takes forever to destroy security groups that are created this way, only to fail after making us wait for a long time, which makes debugging superfluously tedious.

There are many issues mentioning this and something related on Github, like this. This has to do with has been termed “enforced dependencies” that Terraform currently has no mechanism to handle.

By decoupling the aws_security_group and their respective aws_security_group_rule into separate resources, we will give Terraform and ourselves an easier time removing and making changes to the security groups in the future.

Bastion

Let’s see how we can configure Terraform setup the security of the subnets. We start off with the security group for the bastion server. We will make 3 rules for it.

# bastion
resource "aws_security_group" "bastion" {
  name = "${var.project_name}${var.env}-bastion"
  description = "For bastion server ${var.env}"
  vpc_id = aws_vpc.main.id

  tags = {
    Name = "${var.project_name}${var.env}"
  }
}

resource "aws_security_group_rule" "ssh-bastion-world" {
  type = "ingress"
  from_port = 22
  to_port = 22
  protocol = "tcp"
  # Please restrict your ingress to only necessary IPs and ports.
  # Opening to 0.0.0.0/0 can lead to security vulnerabilities
  # You may want to set a fixed ip address if you have a static ip
  security_group_id = aws_security_group.bastion.id
  cidr_blocks = ["0.0.0.0/0"]
}

resource "aws_security_group_rule" "ssh-bastion-web_server" {
  type = "egress"
  from_port = 22
  to_port = 22
  protocol = "tcp"
  security_group_id = aws_security_group.bastion.id
  source_security_group_id = aws_security_group.web_server.id
}

resource "aws_security_group_rule" "mysql-bastion-rds" {
  type = "egress"
  from_port = 3306
  to_port = 3306
  protocol = "tcp"
  security_group_id = aws_security_group.bastion.id
  source_security_group_id = aws_security_group.rds.id
}

The first is an ingress rule to allow us to ssh into it from wherever we are. Of course, this is not ideal as it means anyone from anywhere can ssh into it. We should scope it to the ip address where you work from, be it your home or your office. However, for my case, as a digital nomad, the ip address that I work with just changes so often as I moved around that it just makes more sense to open it up to the world. I made a calculated risk here. Please don’t try this at home.

The second is an egress rule that allow the bastion instance to ssh into the web servers in the private subnets. The source of this rule is set as the aws_security_group of the web servers.

The third rule is another outbound rule  to allow the bastion to communicate with the database. Since I am using <code>mysql</code> as the database engine, the port used is 3306. This allows us to run database operation on the isolated database instance in the private subnet via the bastion over the correct port securely.

Web Servers

Next will be the security groups for your web servers. The only rule that it requires will be the ingress rule for the bastion to ssh into itself over port 22.

resource "aws_security_group" "web_server" {
  name = "${var.project_name}${var.env}-web-servers"
  description = "For Web servers ${var.env}"
  vpc_id = aws_vpc.main.id

  tags = {
    Name = "${var.project_name}${var.env}"
  }
}

resource "aws_security_group_rule" "ssh-web_server-bastion" {
  type = "ingress"
  from_port = 22
  to_port = 22
  protocol = "tcp"
  security_group_id = aws_security_group.web_server.id
  source_security_group_id = aws_security_group.bastion.id
}

RDS

Lastly, the rds instance. It consist of 2 rules.

resource "aws_security_group" "rds" {
    name = "rds-${var.project_name}${var.env}"
    description = "For RDS ${var.env}"

vpc_id = aws_vpc.main.id
  tags = {
    Name = "${var.project_name}${var.env}"
  }
}

resource "aws_security_group_rule" "mysql-rds-web_server" {
  type = "ingress"
  from_port = 3306
  to_port = 3306
  protocol = "tcp"
  security_group_id = aws_security_group.rds.id
  source_security_group_id = aws_security_group.web_server.id
}

resource "aws_security_group_rule" "mysql-rds-bastion" {
  type = "ingress"
  from_port = 3306
  to_port = 3306
  protocol = "tcp"
  security_group_id = aws_security_group.rds.id
  source_security_group_id = aws_security_group.bastion.id
}

The first is of course to open up port 3306 to allow request from the web servers to reach the database to run the application.

The second is to allow the bastion to communicate over port 3306. We have to define the egress rule applied on the bastion server itself to connect out to the RDS instance previously. Now, this ingress rule will allow the incoming request from the bastion server to reach the RDS instance instead of being blocked off.

Terraform Apply

These resources can be defined in a single or multiple terraform files with the extension tf, as long as they are in the same directory.

If you are using docker to run terraform, you can do a volume mount of the current directory into the workspace of the docker container and apply the infrastructure!

Improvements

We can harden the security of this setup further by, for example, configuring the Network Access Control Level (NACL or Network ACL). In this setup, the default is allow all traffic in bound and outbound for all the resources. However, this will be beyond the scope of this article.

What’s Next

Note that I did not provision any EC2 instances where my application will run. At this point of time, you can feel free to provision the EC2 instances for the web servers just like the bastion server, but associating them with the private subnets.

For me, I favor AWS Elastic Beanstalk in handling the deployment. What I have done so far is only the provisioning of the infrastructure. Hence, in my case, instead of defining the EC2 instances, I will define an elastic beanstalk environment to host my Rails application and configure it to use the VPC to leverage on all the security.

Terraform With Docker

This is a documentation on how to use Terraform with Docker to provision cloud resources, mainly using AWS as the provider. It contains tips on certain practices that I personally deem best practices fo various reasons.

It will revolves around these 3 commands

docker run -v `pwd`:/workspace -w /workspace hashicorp/terraform:0.12.9 init
docker run -v `pwd`:/workspace -w /workspace hashicorp/terraform:0.12.9 apply
docker run -v `pwd`:/workspace -w /workspace hashicorp/terraform:0.12.9 destroy

The Terraform image comes with the entrypoint command terraform, so we will append the commands init and apply respectively.

The Flags

The most straightforward way to run Terraform on docker is to do a docker run with a volume mount connecting the directory where the terraform files are to the working directory in the docker container. Assuming the current working directory is where the files are, we can simply run the command.

The -v option mounts your current working directory into the container’s /workspace directory.

The -w flag creates the /workspace directory and sets it as the new working directory, overwriting the terraform image’s original.

To verify the this, we can run the command below to see that the current working directory in the container is in fact /workspace.

docker run -v `pwd`:/workspace -w /workspace --entrypoint /bin/sh hashicorp/terraform:0.12.9 -c pwd

Over here we are overwriting the default entrypoint command of Terraform to run a shell command.

terraform init

The init command will download the necessary provider files and modules required for the execution to the working directory in the container. And due to the volume mount, the files will be reflected in the current working directory on the local machine. The files would be downloaded to the folder .terraform.

It is paramount to have this files downloaded to the current directory because on subsequent runs, the files would not need to be downloaded again since they will be persisted and available.

terraform apply

apply is the command to deploy the resources to the cloud.

If you do not use a Terraform backend, the tfstate file that holds all the information for the provisioning will be written to the working directory, and in turn to the current directory.

If you do use a Terraform backend, there will be no tfstate file written locally. They will be written to the backend that you specified. However, in the event that a Terraform deployment fails, you will have a errored.tfstate file written to the working directory. This ​errored.tfstate​ file is extremely important to keep track of the state of provisioned environment in the event of failures.

A possible scenario is due to lost connectivity. I encountered that when I was traveling in some remote areas of Brazil. The volume mount saved my life. I am pretty surprised that there is not much documentation in the official terraform docs. I tried googling the query below but there is no result.

site:https://www.terraform.io/ "errored.tfstate"

Without the errored.tfstate file, undesirable duplicate resources may be created on subsequent deployments. In other cases, subsequent deployment itself might fail due to having resources of the same identification that is prohibited in AWS, which otherwise would not have occurred due to the out-of-sync state.

To update the state, we can run the command

docker run -v `pwd`:/workspace -w /workspace hashicorp/terraform:0.12.9 state push errored.tfstate

Apart from the errored.tfstate file, the tf log  file that you specify will also be written, which is may be used for debugging your terraform deployments.

terraform destroy

Lastly, destroy is the command to remove the resources from the cloud.