How To Setup And Debug Google App Script

This is a quick start guide on setting up Google App Script and using it in G Suite applications. Mainly, I will go through Google Sheets and Google Drive since they are the most widely used services.

Motivation

This is meant to remove the layer of repetitive mundane house keeping tasks via automation. In my side hustle, we have shifted our files from dropbox to Google Drive recently, bestowing us the ability to automate our tasks with Google App Scripts.

Setup

Head to your Google App Scripts console. There are many functions you can explore here. I will touch on just 3. Mainly the projects, executions and triggers.

Every project holds the script to execute based on a trigger. It is up to your jurisdiction to decide if a project should uphold a Single Responsibility Principle or do multiple task as part of a bigger singular operation.

Clicking on the ‘New Project’ button near the top left of the screen will bring you to the code editor where you can code your script.

In a script, you can define any number of functions. The exact function to run can be chosen during the trigger selection process. The usual setup is to have a main function, and the other functions are helpers.

Debug

The most important thing in coding is debugging. After all, it makes up most of our time as software developers.

In App Script, there is no luxury to debug the code interactively. You will need to rely on the usual logger. To print a log, write the code as such:

function main() {
  Logger.log('Hello World!');
}

There are 2 ways to view the logs.

First, within the script editor, go to View -> Logs. A dialog box will popup to show you the logs. A shortcut on mac is cmd + enter.

Second, is from the “My Executions” page back in the App Script console. I personally prefer looking at the logs through there because it does not only show its output but also other details like whether it is even running in the first place.

Click on the play arrow button to execute the function for debugging purposes.

Triggers

In the App Script editor, click on the clock icon to bring yourself to the function’s triggers page.

Add a trigger by clicking the bottom right button. A dialog box will popup and you can select the conditions of the trigger.

You can select the function in the script to run, as mentioned earlier, and the type of trigger. For Google Sheets, there are more options for triggering the script. More on that later.

Basics

Before you begin, it is good to know that every entity in G Suite have an id. This id is the gibberish string that you see in the URL of an opened Google Sheets, Google Docs or even a folder in Google Drive. It is highlighted in yellow in the image below.

Google Docs ID

Knowing the id of the particular file or folder allow you to carry out your operations without having to write the code to search for it. Then again, you can search them by name.

The entities that can be called and utilized in the App Script are documented here. Let’s take a look at DriveApp for example.

Example With Google Drive On Time Driven Trigger

Let’s say you want to remove editors you previously gave access to in your files. You want them to be removed when the files are moved into a particular folder.

function removeEditors() {
  var folder = DriveApp.getFolderById(ID_OF_FOLDER);

  iterateFiles(folder);
}

function iterateFiles(folder) {
  var files = folder.getFiles();
  while (files.hasNext()) {
    var file = files.next();
    Logger.log(`Looking at file ${file.getName()}`);

    var editors = file.getEditors();
    editors.forEach(function(editor) {
      var email = editor.getEmail();
      var approvedEmails = [
        'luffy@gmail.com',
        'zoro@gmail.com',
        'sanji@gmail.com',
        'usopp@gmail.com',
        'nami@gmail.com',
        'chopper@gmail.com',
        'robin@gmail.com',
        'franky@gmail.com',
        'brooks@gmail.com',
        'jinbei@gmail.com'
      ];

      if (!approvedEmails.includes(email)) {
        Logger.log(`Removing editor: ${email} from ${file.getName()}`);
        file.removeEditor(email);
      }
    })
  }
}

Given the ID_OF_FOLDER, this script will iterate through all the files in that folder, and for each file it will check if its editors’ emails are under the list of approved emails. If their email is not in the list, they are stripped of the editor role in the file.

Note the way the files variable is being looped. The loop is carried out if hasNext() returns true, and the next file in line is retrieved via next(). This is different from how the editors are looped with forEach, which is, I believe, the usual way developers loop through iterators in JavaScript.

The last step is to setup the time driven trigger to your liking.

The code, unfortunately, cannot be optimized by checking the lastUpdatedDate() of the file and sparing the need to loop through editors for files that were moved into this folder eons ago. This is because moving files into different folders does not update the file’s lastUpdatedDate().

Albeit a small inconvenience, I do expect more features and attributes in the future in App Script to be developed for us to utilize and optimize our codes. Until then, I sincerely hope that it stays free, as it already it right now, without any usage limitation or tiering. In fact, I hope it stays free forever!

Example With Google Sheet OnEdit Trigger

Let’s look at another example with Google Sheet. In a spreadsheet, we want to compile the values entered in a sheet into another sheet in the same spreadsheet in real time. Simple and straightforward. Let’s see how we can work on it.

Before that, take note of an extremely crucial step. Make sure the spreadsheet is a Google Sheet. If you uploaded a Microsoft Excel sheet and opened it using Google Spreadsheet, you will not be able to run any App Script until you convert it to a Google Sheet. If this applies to you, go to File -> Save as Google Sheet as shown below to make this necessary change.

Save as Google Sheets

The function is as shown below.

function compile(event) {
  var row = event.range.getRow();
  var column = event.range.getColumn();
  var value = event.range.getValue();
  var spreadsheet = SpreadsheetApp.getActiveSpreadsheet();
  var targetSheet = spreadsheet.getSheetByName("Target Sheet")
  targetSheet.getRange(row, column).setValue(value * 2)
}

We get the row and column that the user was editing on, whichever sheet that was, and multiply its value by 2 before saving it in the same row and column in the desired sheet with the name Target Sheet.

Now to add the trigger. In the script editor, click on the clock icon. It will bring you to the triggers of this project. Click on the button to add trigger on the bottom right of the page. Under Select event source, you will now see a new option From spreadsheet, and when you select it, you can select the event type to kickstart the function. We are looking for the onEdit event type for this case.

As you can see, this project is associated to only 1 spreadsheet – the spreadsheet it was created from. Additionally, each spreadsheet can have only 1 project as well.

Hence, you cannot have the same script running on different spreadsheet. At least if you did it this way. There may be another way to do so and overcome this restriction but I have yet to explore it.

Potential

The potential of App Scripts does not end here. Other than scripts, you can even code out html views to present visual dashboard based on real time changes.

On top of that, you can publish you own app scripts, as well as use the scripts other developers have made. This forms a community can supercharge automation to increase productivity.

Conclusion

Make use of App Script and automate away!

Customize Devise Forgot Password Token In Rails

This is a documentation on how to change the user flow of forgot password using the devise gem.

Motivation

I often work on projects that are purely API based and are served only on mobile devices as apps. This poses a problem when using devise because its main audience is web application. The user flow that it has set up thus assumes the presence and usage of web pages. That is absent for this case, and setting it up is troublesome to say the least.

In the case of forgot and resetting password flow, the user will receive an email with a link to reset their password. This link leads to a webpage and they will reset their password there and then.

For a pure API environment, this translates to an immense amount of extraneous work required. We need to setup the hosting of the webpages, prepare the styling assets, tweak css codes, wire up the SSL certificate, validate the input fields, and properly redirect from the website into the app once the password has been reset.

And the frustrating thing is that this is a small but essential function in a typical application. We cannot do away with it, but it is often neglected, or should I say taken for granted. And the disproportionally large effort it takes to set it up 1 web page just for this seemingly insignificant function is often overlooked.

I see the need to implement a way devise can allow user to change passwords without the use of webpage.

Project Specifications

The UX flow in my projects will look like this.

The reset password flow will send an email with a token that the users will enter in their app upon submitting the form on forgot password.

The user will enter their new password as well as the token in their app to change their password.

How Reset Password Work In Devise?

Before we can get to work, we need to understand how reset password flow runs in devise.

By default when the user submits his/her email in the forgot password flow, this controller method in the Devise::PasswordsController is called.

# POST /resource/password
def create
  self.resource = resource_class.send_reset_password_instructions(resource_params)
  yield resource if block_given?

  if successfully_sent?(resource)
    respond_with({}, location: after_sending_reset_password_instructions_path_for(resource_name))
  else
    respond_with(resource)
  end
end

The send_reset_password_instructions method is called on the class of the resource. The main code snippet is as shown below, retrieved from the source code in devise github repository.

module Devise
  module Models
    module Recoverable
      extend ActiveSupport::Concern
      protected
        def set_reset_password_token
          raw, enc = Devise.token_generator.generate(self.class, :reset_password_token)

          self.reset_password_token   = enc
          self.reset_password_sent_at = Time.now.utc
          save(validate: false)
          raw
        end
      module ClassMethods
        def send_reset_password_instructions(attributes={})
          recoverable = find_or_initialize_with_errors(reset_password_keys, attributes, :not_found)
          recoverable.send_reset_password_instructions if recoverable.persisted?
          recoverable
        end
      end
    end
  end
end

Eventually, the function set_reset_password_token function will be called to generate the reset_password_token. This method will eventually become one of the User model’s instance methods when it includes the recoverable module.

The logic of generating the reset_password_token is wrapped in the TokenGenerator class of the Devise module as shown below.

module Devise
  class TokenGenerator
    def generate(klass, column)
      key = key_for(column)

      loop do
        raw = Devise.friendly_token
        enc = OpenSSL::HMAC.hexdigest(@digest, key, raw)
        break [raw, enc] unless klass.to_adapter.find_first({ column => enc })
      end
    end
  end
end

As I want user to generate a token that they will need to enter together when they submit their new password after receiving it from an email, I definitely want to dictate the number of characters the token has for the sake of humane user experience. We will see how to tweak this method to generate a token of suitable length.

The function send_reset_password_instructions will also be triggered thereafter to send an email. By default, the reset password token that was generated will be appended to a url that is sent along in that email. That url meant for the users to click and go to a webpage to change their password. For my case, I will not be presenting the url to be clicked in the email, but just the the token string instead.

Generate Custom Reset Password Token

Here we will change the set_reset_password_token for the User model. This will only affect the User model and not other models, which may be crucial for you.

In my projects, I usually have another devise model, ie. the AdminUser model who needs to access to a CMS system. The CMS system is authenticated by none other than devise in the usual devise way. Hence, I do not want to do a site way change to all my models due to this sort of “hybrid”.

Hence, the new code will look like this.

protected
def set_reset_password_token
  raw, enc = Devise.token_generator.custom_generate(self.class, :reset_password_token)

  self.reset_password_token   = enc
  self.reset_password_sent_at = Time.now.utc
  save(validate: false)
  raw
end

The only line that was change is line 3. I am using the custom_generate method, which I define below.

module Devise
  class TokenGenerator
    def custom_generate(klass, column)
      key = key_for(column)

      loop do
        raw = SecureRandom.alphanumeric(Rails.configuration.confirmation_token_length)
        enc = OpenSSL::HMAC.hexdigest(@digest, key, raw)
        break [raw, enc] unless klass.to_adapter.find_first({ column => enc })
      end
    end
  end
end

Compared to the above, the only line that is changed in this case is line 7. The default method uses Devise.friendly_token, the source code of which can be found here.

I replaced it with a custom method of mine to generate and alphanumeric string of a custom desired length.

Seeing that I change so little for each part of the code, I could have just redefined the Devise.friendly_token and save some effort in copying and pasting codes. However, due to the fact that I am still going to have an AdminUser that will make use of the default configuration of devise as it, I cannot apply it site wide. Of course, if I have only 1 Devise model to work with, that will be a plausible route to take.

Send Email With Customized Reset Password Token

So now that the reset_password_token has been generated, it is time to send it out in the email.

There’s nothing to change on the Devise::Mailerclass here. All we need to change is the email view under reset_password_instructions.html.erb. Below is the default view from devise repository.

<p>Hello <%= @resource.email %>!</p>

<p>Someone has requested a link to change your password. You can do this through the link below.</p>

<p><%= link_to 'Change my password', edit_password_url(@resource, reset_password_token: @token) %></p>

<p>If you didn't request this, please ignore this email.</p>
<p>Your password won't change until you access the link above and create a new one.</p

We now have the @token that we can just display for users as a string, and we do not need edit_password_url link to be generated.

Conclusion

This is how we can modify the reset_password_token using devise with the least possible code changes. It involves understanding the flow of logic throughout the different components in devise, as well as the role each component play, so that you know what can and should be modified. The same can be applied to the confirmation flow as well.

This can be integrated with doorkeeper and devise on top of this guide that I wrote on integrating these 2 gems without hiccups since the amount of changes is little and not show stopping.

Class And Instance Methods in Ruby Metaprogramming

This is my own summary on the class and instance methods in relation to metaprogramming in ruby.

Motivation

Many articles out there have already given a detailed write up on this topic. Here I am giving my 2 cents, partly to help me revise when I stumble on this concept again. Because this is how I understand it.

The articles written by various other experienced developers are by no means inadequate. I guess it just people having different learning styles and so here I am documenting mine intending to serve only 1 audience, ie me 🙂

Start With The Syntax

I stumbled on this topic because of the various different syntaxes I have seen in various ruby code bases. And they are nothing like ruby. Weird symbols that go against ruby’s English like syntax and blocks with deep implicit meaning that brings about confusion when reading the code are some examples.

So I thought it is best to come clear with the syntaxes first.

Instance Methods

So there are a couple of ways to define instance methods in ruby.

class Instance1
  def hello
    p "self inside hello is #{self}"
  end

  class_eval do
    def hello
      p "self inside class_eval hello is #{self}"
    end
  end
end

class Instance2
  class_eval do
    def hello
      p "self inside class_eval hello is #{self}"
    end
  end

  def hello
    p "self inside hello is #{self}"
  end
end

Here are 2 classes Instance1 and Instance2 with the same method names and definitions. The only difference is the order which they are defined. And if we take a look at their output:

instance1 = Instance1.new
instance2 = Instance2.new

instance1.hello # "self inside class_eval hello is #<Instance1:0x00007ff17da771c0>"
instance2.hello # "self inside hello is #<Instance2:0x00007ff17d7bbb98>"

The method that is defined later will run. The significance here is that these 2 definitions are in fact the same thing. They are both legit but different ways to define instance methods, and the methods defined later will override the one defined initially as expected.

Class Methods

There are 3 different ways based on my research.

class Class1
  def self.hello
    p "self inside self.hello is #{self}"
  end

  instance_eval do
    def hello
      p "self inside instance_eval hello is #{self}"
    end
  end

  class << self
    def hello
      p "self inside class << self is #{self}"
    end
  end
end

class Class2
  instance_eval do
    def hello
      p "self inside instance_eval hello is #{self}"
    end
  end

  class << self
    def hello
      p "self inside class << self is #{self}"
    end
  end

  def self.hello
    p "self inside self.hello is #{self}"
  end
end

class Class3
  class << self
    def hello
      p "self inside class << self is #{self}"
    end
  end

  def self.hello
    p "self inside self.hello is #{self}"
  end

  instance_eval do
    def hello
      p "self inside instance_eval hello is #{self}"
    end
  end
end

The same thing here. 3 classes with the same 3 methods defined in different order. And yes, the output will be dictated by last one that is defined.

Class1.hello # "self inside class << self is Class1"
Class2.hello # "self inside self.hello is Class2"
Class3.hello # "self inside instance_eval hello is Class3"

No shit. They are just different ways to do the same thing.

Best Practices

So now with the confusion over different syntaxes out of the way, let’s refer to a single class for the rest of the article.

class MyClass
  class_eval do
    def hello
      p "self inside class_eval hello is #{self}"
    end
  end

  instance_eval do
    def hello
      p "self inside instance_eval hello is #{self}"
    end
  end
end

MyClass.hello # "self inside instance_eval hello is MyClass"
MyClass.new.hello # "self inside class_eval hello is #<MyClass:0x00007ff188369ad0>"

These are the best practices to define a class method and an instance method in the metaprogramming way. The instance_eval method is especially important, in my humble opinion, in replacing the class << self syntax that is so baneful in ruby linguistics.

Reading ruby code is like reading English

Why The Need For a Different Way To Define A Method?

One of the purpose of metaprogramming derives from the need to define methods during runtime. They are often used in conjunction with the method define_method to define new methods based on variables that are only available during run time.

An example would be the current_user method in the authentication related devise gem. If you have multiple models, like AdminUser and Player on top of User, you can easily access an instance of them in the context of the controller via current_admin_user and current_player respectively. This is done without you having to copy paste the content of current_user into these “new” methods.

This is made possible due to metaprogramming. devise defines these new methods at runtime, looking at all the models that require its involvement, and generate these helper methods all without writing extra code.

The robustness of metaprogramming is clearly crucial and essential.

The Essence of Instance and Class of a Class

So there is this whole confusion about a hidden class in a class in ruby. This all stems from 1 fact: everything in ruby is an object. That includes a class.

Everything is an object in ruby

So how can a class as we know it, with all its inheritance and class method and instance method properties, be an instance of a ruby object?

This is made possible with the existence of a hidden metaclass whenever a class is defined in ruby. This hidden class holds the common properties of classes as we know them and allow us to use mere ruby objects like a class. There’s a lot of confusion in this sentence due to the overlapping usage of the word class. Be sure to read it again.

Hence class methods are in fact instance methods of this metaclass. These methods of the metaclass are not inherited by the instances of the class, just like how a class method should be.

Singleton Class In Ruby

Another name for these hidden metaclasses is singleton class (‘eigenclass’ is another). I find this naming more apt in the context of ruby (And I will refer to it as singleton class from here onwards). Allow me to explain.

Classes should have a unique namespace in its codebase. Therefore when a class is defined, and so for its corresponding metaclass, it will not be defined again. Its one and only instance of itself will thus exist for the lifetime of the application with no duplicate. This gives the term ‘singleton’ so much more sense.

In fact, I perceive it as the official definition in ruby because of the method singleton_class which gives an object instance access to its “metaclass” instance. Here is an example.

MyClass.new.singleton_class.hello # "self inside instance_eval hello is #<Class:#<MyClass:0x00007faa7a3bb448>>"

Note the memory address of the singleton class. It indicates that this singleton class is in fact an instance.

instance_eval and class_eval

With a better concept of a class, a singleton class and an instance, let’s look at instance_eval and class_eval block.

Initially, it came across to me as unintuitive that a class method is defined under instance_eval, and an instance method is defined under class_eval. Why make life so difficult?

However, once we understand the concept, everything will fall into place.

Under instance_eval, we are looking at the MyClass under the context that it is an instance. We are evaluating it as an instance. And an instance of a class can only refer to the singleton class that will hold anything that should possess the properties of a typical class that we know in computing.

Under class_eval, we are evaluating MyClass as a class, where we define methods to be applied on instances of MyClass as usual.

These 2 methods determine the context in which the methods in it are defined. In particular, it dictates what the variable self refers to in each scope. This article has a much detailed explanation on that.

Conclusion

There’s definitely more to metaprogramming that to define methods during runtime. This idea of using code to generate code has immense potential and this may just be only the tip of the iceberg.

Integrating reCaptcha V3 With Turbolinks In Rails

Google has published the latest new version of reCaptcha V3 and I had to integrate it into my recent Rails projects. The greatest difference between the old version is its improvement in user experience. It removes the user friction where users are required to click on the notorious “I am not a robot” check box and at times take some spontaneous image verification quiz. In its place, the new reCaptcha observes the user’s actions on the website to determine if he/she is a genuine human user. It generates a score which the backend of the website will need to verify against to decide if the score is above the threshold of what is considered a real user. On the frontend, there’s no more extra step required to submit the form. Pretty neat!

In the midst of integrating it to my project, I had some problems, as usual, with turbolinks. The biggest of them is navigating between pages. Hence, this article seeks to document the process.

Initializing

Due to the use of turbolinks, the initialization process is different from what was documented. In fact, there is little to no documentation on the alternative way to initialize the recaptcha library. With reference to this blog, the initialization step is as such.

Note that I am using the slim template engine to generate my HTML views.

// in the < head >
script src='https://www.google.com/recaptcha/api.js?render=explicit&onload=renderCaptcha'
= javascript_pack_tag 'recaptcha', 'data-turbolinks-track': 'reload'

I insert this snippet at the head of the pages that requires reCaptcha using the content_for helper.

This method of requiring the file allow us to use a custom function to initialize the grecaptcha object. This thus provide us control as to when we want to initialize the object so as to prevent reinitialization when navigating between pages in a turbolinks environment.

This method is documented in an obscure area in the recaptcha V3 docs and is also usable in V2 as documented here.

The javascript function renderCaptcha will be called when the file has loaded, and it is constructed in the recaptcha.js.erb file.

Note that this file is given the attribute data-turbolinks-track with a value of reload. This implies that when we navigate between pages where the tracked assets required are different, the site will do a full reload instead of going through turbolinks. In particular for this case when navigating from a page with recaptcha to another without recaptcha, there will be a full reload of the page as the tracked asset, recaptcha.js.erb is no longer present.

This ensures that the recaptcha library is downloaded again and the renderCaptcha function is called when the script is loaded for initialization.

Let’s take a look at the content of the renderCaptcha function.

The Javascript

// recaptcha.js.erb
window.renderCaptcha = function() {
  document.grecaptchaClientId = grecaptcha.render('recaptcha_badge', {
    sitekey: "<%= Rails.application.credentials.dig(Rails.env.to_sym, :recaptcha, :site_key) %>",
    badge: 'inline', // must be inline
    size: 'invisible' // must be invisible
  });
  window.pollCaptchaToken();
}
window.pollCaptchaToken = function() {
  getCaptchaToken();
  setTimeout(window.pollCaptchaToken, 90000);
}
window.getCaptchaToken = function() {
  grecaptcha.execute(document.grecaptchaClientId).then(function(token) {
    document.getElementById('recaptcha_token').value = token;
  });
}
document.addEventListener("turbolinks:load", () => {
  $('#contact-form').on('ajax:success', event => {
    ...
    $('#contact-form').trigger('reset');
    window.getCaptchaToken();
  });
});

Firstly, note that this is an erb file. This allows us to render ruby variables into javascript and compiled by Webpacker during build time. Refer to this documentation on installing erb with Webpacker or my article on setting up bootstrap with Rails 6 and Webpacker on how to set this up. In this case, I am storing my recaptcha site key using the new Rails way since 5.2 and parsing it in the javascript file during build time for consumption.

The renderCaptcha() initializes the recaptcha script and renders the recaptcha badge on an HTML element with the id recaptcha_badge. Once initialized, the getCaptchaToken() will then retrieve the recaptcha token and utilize it in its callback function. I will be setting the value of the an input element with the id recaptcha_token. This input will be sent along to the backend for the backend to use for verification. More on the views in a bit.

My logic is to poll the new token every 1.5 minutes as the token expires every 2 minutes. The 30 seconds buffer should be sufficient for my backend, which will receive the recaptcha token, to verify with the recaptcha server before the token expires. I have split up pollCaptchaToken() with the actual function getCaptchaToken() to get the token because I will be using getCaptchaToken() explicitly after I submit the form to refresh the token.

Note the use of window and document here. These objects persist in between page navigations in a turbolinks environment. Hence, they provide us a way to keep track of data so we do not initialize the function multiple times while navigating back and forth. And the key data to track here is the grecaptchaClientId on the document object. It tracks whether we have initialized the recaptcha script already or not.

That said, remember the data-turbolinks-track attribute with the value reload added to the script? Once again, it ensures the page fully reloads should the tracked assets be any different in between page navigations. This ensures 2 things:

  1. Prevents multiple initilizations occurring while navigating between pages because grecaptchaClientId is not null
  2. Ensures initialization will occur when traversing from a page without the recaptcha script due to a full reload. Otherwise, we will have to wait for the polling function to happened before we can get our token, and that will be disastrous should the user submit the form with a blank token before that.

Lastly, I add an ajax:success event listener on the form to handle a successful remote javascript call to my Rails backend. Note that I cannot add the listener on the document object as such:

$(document).on('#contact-form', 'ajax:success', function() { ... })

As the document object persist between navigation, it will result in the event listener being added each time a page navigation occurs, hence causing undesirable effects.

The View

#recaptcha_badge.d-none data-turbolinks-permanent=''
= hidden_field_tag :recaptcha_token, '', data: { turbolinks_permanent:'' }

The #recaptcha_badge object will hold the badge of the reCaptcha. You can add styling in whatever way you want, but I am using the bootstrap d-none css class to hide it totally as I do not need it.

The hidden_field_tag renders a hidden input field where I will store the recaptcha token.

These elements are given the data-turbolinks-permanent attribute. This is a crucial step. It ensures that the elements with the same id are not re-rendered in between page navigations in a turbolinks environment. Persisting the form element across page loads prevents the input from losing the recaptcha token. Without it, we will need to wait for the polling function to occur again after navigation before we are able to get a new recaptcha token for submission.

That said, the data-turbolinks-permanent on the #recaptcha_badge may not be necessary. But I am just persisting it across pages as well for trivial reasons.

Of course, make sure the input is within the form element so that it gets passed to the backend upon submission.

Conclusion

This new recaptcha user experience is a definitely a good step towards improving conversion. But integrating with turbolinks is troublesome as always. I hope this article helped to address it, and provide enough explanation on why each step are required adequately.

Sort Algorithms Cheatsheet

This is a summary of the key features that make up an algorithm.

Motivation

While it is easy to understand the concept of each sort algorithm, I find it difficult for me to remember the key steps that define an algorithm or is characteristic of that algorithm. That key step may prove to be pivotal in solving the algorithm but may come in as insignificant from the macro view of its concept.

Hence, I decided that it will be better for me to jot the key pointers down.

Quicksort

def quicksort array
  recursion(array, 0, array.length - 1) #IMPT
  array
end

def recursion array, start, finish
  if start < finish # IMPT
    pivot_index = partition(array, start, finish)
    recursion(array, start, pivot_index - 1) # IMPT
    recursion(array, pivot_index + 1, finish) # IMPT
  end
end

def partition array, start, finish
  pivot = array[finish]
  pivot_index = start

  (start...finish).each do |index| # IMPT
    if array[index] <= pivot # NOTE
      array[index], array[pivot_index] =
        array[pivot_index], array[index]
      pivot_index += 1
    end
  end

  array[finish], array[pivot_index] =
    array[pivot_index], array[finish]

  pivot_index
end

Key Steps

  • Gist: Using a pivot value, distribute the array into 2 halves that are not ordered, but are collectively smaller on the left side and collectively larger on the right side.
  • The array is mutated.
  • The pivot value of each iteration will find its rightful position in the array at every iteration, eventually leading to a sorted array.

Discussions

Let’s start with the recursion function.

In the recursion function, note that the arguments are indices of the array, not the length. Keep this at the back of your mind so that you can understand when to end a loop.

Line 7 ensures we are iterating at least 2 elements.

In lines 9 and 10, the recursion occurs on either side of the pivot index in that iteration. Note that the pivot index does not participate in the next recursion, since it is already at where it belongs.

Now for the partition function.

In the partition function, the pivot does not participate in the reordering. Line 18 ensures the loop ends before reaching the last index, finish, which is the pivot, with the non-inclusive range constructor operator.

In the loop in line 18, we are are trying to push the values smaller than or equal to the pivot to the left here. It is also ok to use <.

We do so by swapping them with those that are bigger than the pivot, but exist on the left of those that are smaller.

The pivot_index increments at each swap and remembers the last position that was swapped. Hence, at the end of the loop, it holds the position of the first value that is bigger than the pivot. Everything on the left is either smaller than or equal to the pivot.

This is where the pivot belongs to in the array. We swap the pivot into that position. Ascend the throne!

The state of the array does not change in this last swap: all elements on the left of the pivot is still smaller or equal to the pivot, while all elements on the right of the pivot is still bigger than the pivot. They remain unsorted

The function returns the pivot’s position to the parent recursion function, which needs it to know where to split the array for the next iteration.

Lastly, let’s go back to the calling function where the initial recursion function is triggered. Make sure to pass in the last index of the array instead of its length.

Mergesort

def merge_sort(list)
  return list if list.length <= 1

  mid = list.length / 2
  left = merge_sort(list[0...mid])
  right = merge_sort(list[mid...list.length])
  merge(left, right)
end

def merge(left, right)
  return right if left.empty?

  return left if right.empty?

  if left.first <= right.first
    [left.first] + merge(left[1...left.length], right)
  else
    [right.first] + merge(left, right[1...right.length])
  end
end

Key Steps

  • Gist: first recursively halve array until we are dealing with 1 element, then recursively merge the elements back in a sorted order until we get back the array of the same size, and now it will be sorted
  • A recursive function that consist of 2 parts in order: recursively split and recursively merge
  • The array will be mutated
  • In lines 16 and 18, we are continuously appending the smaller of the first element on the right vs left array.
  • Lines 11 and 13 will take care of the comparison that is still ongoing, when 1 side has been fully appended while the other still have elements inside. Since these arrays are already sorted at whichever iteration, we can just append the whole array.
  • Remember the breaking function in line 2

Unfortunately, while this use of recursion is great, the number of recursions may become too excessive and cause a “stack level too deep” error.

We may need to to think prepare an alternative if the stack overflows.

def merge_sort(list)
  return list if list.length <= 1

  mid = list.length / 2
  left = merge_sort(list[0...mid])
  right = merge_sort(list[mid...list.length])
  merge(list, left, right)
end

def merge(array, left, right)
  left_index = 0
  right_index = 0
  index = 0

  while left_index < left.length &&
    right_index < right.length
    if left[left_index] <= right[right_index]
      array[index] = left[left_index]
      left_index += 1
    else
      array[index] = right[right_index]
      right_index += 1
    end
    index += 1
  end

  array[index...index + left.length - left_index] =
    left[left_index...left.length]
  array[index...index + right.length - right_index] =
    right[right_index...right.length]

  array
end

Line 15 till 36 basically carry out the operation with a while loop instead of recursion. It mutates the array along the way.

Insertion sort

Key Steps

  • Gist: insert elements one by one from unsorted part of array into sorted part of array
  • Divide the array into sorted portion and unsorted portion
  • Sorted partition always starts from the first element, as array of 1 element is always sorted
  • First element of unsorted array will shift forward until the start of the sorted portion of the array OR until it meets an element bigger than itself
  • Order of the sorted portion is maintained
  • The last element of the sorted array takes its place
  • The next iteration start on the next element of the unsorted portion, which is now the first element of the current unsorted portion
  • The loop mutates the array

Discussions

  • Best case is an already sorted array, so no shifting of elements from the unsorted to the sorted portion of the array, resulting in a time complexity of n
  • The worst case is a reverse sorted array, which results in the whole sorted array having to shift for each iteration. The first element of the unsorted portion of array is always at the the smallest and need to go to the front of the sorted portion. Time complexity is n^2

Selection sort

Key Steps

  • Gist: scan array to find the smallest element and eliminate it for the next iterations
  • Swap smallest element with the front most element
  • Scan the array in the next iteration excluding the smallest element(s)
  • Last remaining single element will be of the largest value, so iterations take place until n - 2

Discussions

  • Time complexity is n^2

Bubble sort

def bubble_swap array
  swap_took_place = true
  while swap_took_place
    swap_took_place = false
    (0...array.length - 1).each do |index|
      if array[index] > array[index + 1]
          array[index + 1], array[index] =
            array[index], array[index + 1]
          # increment swaps here to record
          # number of swaps that took place
          swap_took_place = true
      end
    end
  end
  array
end

Key Steps

  • Gist: keep swapping adjacent elements if left is larger than right down the array, and repeat this iteration for as many times as there are elements in the array. The last iteration will not have any swap occur to declare the array swapped.

Discussions

  • Time complexity is n^2

Connecting MSSQL Database Using Ruby On Rails

This is a documentation on how to connect to a MSSQL database in a Rails application. We will use FreeTDS as the main toolkit to establish the connection.

Motivation

I came across a gig that requires me to connect to a MSSQL database to extract the data via the application that I was building in Ruby On Rails. I spend quite some time experimenting  and playing with it before I can manage to get it to work.

It will be good to document my steps and reasons in case I come across another such request and my memory fails me.

Installation

While Ruby on Rails has a gem that serves as a wrapper around the FreeTDS library of files, it requires the FreeTDS binaries to be installed natively on the machine that is running the application.

This presents a number of challenges. First, the local machine used may be different for different users. Second, the operating system used in the servers and local machine may be different too.

For my case, I use macOS for my development work, and the Amazon flavored linux for my staging and production sites.

Installing FreeTDS on macOS

The steps listed here follows this guide closely.

First, install using these files locally in the kernel using homebrew.

​
brew update
brew install unixodbc freetds

ODBC is an API that is meant for database access across different platforms. unixODBC is the driver manager that allows unix systems to connect to ODBC-capable databases.

MSSQL is one such database. However, while it uses ODBC for connection, it uses the TDS protocol on the application layer for communication. Hence, a ODBC driver alone is insufficient for the machine to process the data in the database. This is where FreeTDS comes in.

FreeTDS is a set of libraries that will do the translation and allow our application to connect to the database and retrieve the data.

Installing TDS on Amazon Linux

Credits to this answer on stackoverflow. He even gave the steps required to install the packages via Elastic Beanstalk, which is convenient for me as I also use Elastic Beanstalk for deployment.

[ ! -e /home/ec2-user/freetds-1.00.86.tar.gz ] && \
wget -nc ftp://ftp.freetds.org/pub/freetds/stable/freetds-1.00.86.tar.gz -O /home/ec2-user/freetds-1.00.86.tar.gz || \
true

The first section of the code that is enclosed within a pair of square bracket is a unix command to check the existence of the zip file, which contains the necessary libraries, in the home path of the server. In the Amazon Linux system, the home path is /home/ec2-user by default. Adjust accordingly if you are installing in a linux local machine.

Should the file exist, the subsequent command to download the file will not be executed due to the logical && operation.

The last || operation with a true ensures the command returns a true, and the whole Elastic Beanstalk process will continue even if the file already exist. Of course, this step is not necessary if we are installing the libraries manually on our local linux machine.

[ ! -e /home/ec2-user/freetds-1.00.86 ] && \
tar -xvf /home/ec2-user/freetds-1.00.86.tar.gz -C /home/ec2-user/ || \
true

Similarly this step check for the presence of the unzipped file to prevent repeated and unnecessary unzipping of the compressed library.

[ ! -e /usr/local/etc/freetds.conf ] && cd /home/ec2-user/freetds-1.00.86 && \
sudo ./configure --prefix=/usr/local --with-tdsver=7.4 || \
true

[ ! -e /usr/local/etc/freetds.conf ] && \
( cd /home/ec2-user/freetds-1.00.86 && sudo make && sudo make install ) || \
true

The next 2 commands set up the configurations for FreeTDS and start finally installing its libraries. Upon installation, the config file freetds.conf will be produced, which explains the checks against its existence to prevent duplicate installation operations.

Application in Ruby on Rails

With the FreeTDS libraries installed in the kernel, we can look at how to use the tiny_tds gem to communicate with the MSSQL database. After installing it via bundler, we can sue the following commands to connect.

client = TinyTds::Client.new(
  username: Rails.application.credentials.dig(Rails.env.to_sym, :deltek, :username),
  password: Rails.application.credentials.dig(Rails.env.to_sym, :deltek, :password),
  host: Rails.application.credentials.dig(Rails.env.to_sym, :deltek, :host),
  port: Rails.application.credentials.dig(Rails.env.to_sym, :deltek, :port),
  database: Rails.application.credentials.dig(Rails.env.to_sym, :deltek, :database)
)

Following the new practice of using credential file to store secrets, I have stored all the database credentials in the encrypted credential.yml.enc file.

client.execute("
  SET ANSI_WARNINGS ON;
  SET ANSI_PADDING ON;
  SET ANSI_NULLS ON;
  SET QUOTED_IDENTIFIER ON;
  SET ANSI_NULL_DFLT_ON ON;
  SET CONCAT_NULL_YIELDS_NULL ON;
  SELECT @@OPTIONS;
").each

This next snippet sets the settings of the connection. I would not pretend to understand the reasons for the settings made here. However, this is the final settings that worked for me to make the subsequent queries to the database tables. I came to this final configurations after googling around for the different errors that were thrown at me while getting TDS to work.

result = client.execute("SELECT TOP 1 * FROM SOME_TABLE").each

This is an example of executing a query in SQL language. The result variable will be an array of hashes, where each hash represent 1 row of record.

result = client.execute("SELECT TOP 1 * FROM SOME_TABLE").each

Last but not least, make sure to close the client’s connection. This is not active record that “automagically” does that for you.

Data Structures Cheatsheet

This is a concise glossary of the concepts, features and applications of various data structures.

Motivation

Due to the coronavirus outbreaks, the major lockdowns in Europe that ensued, and the stay home quarantine I have to undergo upon return to my country, I am ceasing my digital nomad life, which I have recorded in my Instagram account. So here I am, refreshing my memory on data structures as I prepare to welcome a new phase of my life.

I have a problem finding a good concise cheatsheet that can properly remind me of the concepts of all the data structures, their key features and their runtime performance for various operations. More importantly, when to use them and, as I am a rubyist, how are they applied in ruby.

Each section will talk about 1 data structure. It will consist of the main concept behind how they are constructed, some key features that are unique to them, when it is the best use case for them, and if there is something similar in ruby. These concept follows the HackerRank’s youtube channel’s playlist on Data Structures.

ArrayList

An arraylist is a dynamic array that will expand its capacity when it reached its maximum. An array requires pre allocated memory to be created. That means we need to establish the size of each element in the array and their total count.

Typically, when the arraylist reaches capacity, its size will be doubled by some complicated built-in algorithm in one of the library files of the language. It also has methods that can be called manually to ensureCapacity of the array.

Array and arrayList are used interchangeably in this article.

Key Features

  • Expands capacity when required

Runtime

  • Access: O(1) with use of known index of element in array
  • Search: O(n)
  • Insert
    • prepend: O(n) due to need to shift all elements
    • append: O(1)
  • Deletion: O(n) due to need for search to destroy

Applications

  • List of items of any kind of order

Ruby Alternative

In ruby, everything is an object. That includes arrays. Arrays in ruby are made dynamic to behave like ArrayList, like in most other dynamic languages. The array object has some operation to ensure capacity for the array.

It is also heterogenous, which allows for different data types exists together as elements of the same array (since all of them are objects anyway).

Binary (Search) Tree

Trees are most of the time referring to binary trees. Each node in binary tree can have a maximum of 2 nodes. This “tree” is kind of like a linked list of objects. It is not an array.

And for binary search tree (BST), it has to have an increasing order in relation to a node from its left to right nodes. Based on this rule, the binary search can be carried out by propagating through the nodes by asking the deterministic question: Is the left node more or less than the right node. With a known sort order, each iteration can, probably, halve the total nodes to search. This results in a faster search time.

This is only  “probably” achievable if the BST is balanced. If the tree is lopsided on the right side for example, each iteration does not exactly halve the number of nodes to search. The worst case scenario would be to comb through all the nodes if they are all existing on the right node of one another.

There are many self balancing trees, one of them is the AVL tree named after its inventors. It involves changing the root node when it becomes unbalanced to ensure that “the heights of the two child subtrees of any node differ by at most one“.

Duplicates are allowed in some BST, meaning the there can be 2 nodes with the same value. It should always be obey the rule that the left node is <= to the current node. Duplicates introduce complexity in the search algorithms to determine the correct node to pick.

Key Features

  • A node and its left and right nodes has to be sorted in a specific order that can be classified as ascending or descending
  • Needs to be balanced to be useful
  • Traversal is always from left node then right node, with the current node hoping between and around the left and right to define the 3 different methods of traversal, ie.
    • inorder
    • preorder
    • postorder

Runtime

  • Balanced
    • Access: O(log n)
    • Search: O(log n)
    • Insert: O(log n)
    • Deletion: O(log n)
  • Imbalanced (worst case scenario)
    • Access: O(n)
    • Search: O(n)
    • Insert: O(n)
    • Deletion: O(n)

Applications

  • Database like CouchDB
  • Huffman Coding Algorithm for file compression
  • Generally large data with sortable characteristic, and its size should be large enough to justify the use of BST over arrays

Ruby Alternative

There is no native implementation of BST in ruby. However, there are gems out there that implement it. RubyTree by evolve75 is my favorite as it allows for content payload to be added to each node.

BST is quite an old and establish concept. Hence, these gems might appear old and unmaintained.

Min/Max Heap​

This tree always populated from the left to right across each level. It is considered minimum or maximum depending on whether the smallest or largest value is at the root node respectively.

After insertion, the new node is “bubbled” up to the correct node by a series of swapping with its parent node until it reaches the root node, if it reaches the root node.

If root node is deleted, the last node replaces it and “bubbled” down to the correct position.

Because of the way it is data is populated, there will be no gaps in between nodes, hence this tree can be stored as an array (no need for linked list)! One can simply use the index of the node in the array to access itself, and some formula to get the index of its neighbouring nodes and access them as well:

  • parent: (index – 1) / 2 (rounded down)
  • left: 2 * index + 1
  • right: 2 * index + 2

Key Features

  • Essentially an array
  • Root node is always the minimum or maximum, the last node is always the opposite
  • Nodes in between does not necessarily obey  the order.
  • Root node is usually the one being removed in application, replaced by the last node, and bubbled to the correct position accordingly
  • Min heap always look to find the smallest value among its children to swap down, opposite for max heap

Runtime

  • Access: O(1) with use of known index of element in array
  • Search: O(n)
  • Insert
    • append/prepend: O(1)
    • ordered insert: O(log n)
  • Deletion: O(log n)

Applications

  • Priority queues (eg. for elderly and disabled then healthy adults using weighted representation)
  • Hospital queues for coronavirus victims based on age and, therefore, savableness
  • Schedulers (eg task with higher priority will have higher weightage and will be bubbled to the correct position when added to the queue)
  • Continuous median problem

Ruby Alternative

There is no native heap implementation in Ruby. Gems are available.

Hash Table

Interestingly, a hash table consist of a hashing function and an array of linked lists. Together, they form a key value datastore.

The key to map to the value to store undergoes a hashing function to get an integer. This integer will represent the index in the array in which to store the data, that is the value corresponding to the key in the hash table. It will be added to the linked list behind the index of that array.

The data is saved as a linked list instead of an element in the array due to the probability of collisions from the hashing function. This allows multiple values to be stored in the same index of the array, but only if their key is different. Otherwise, they will overwrite the old data, as hash tables do no allow duplicate keys.

It is crucial for the hashing function to have a good key distribution. This is to prevent any of the linked list from being overwhelmingly long, resulting in long search time hopping through the linked list. Murmur hash is a good hashing function for this purpose.

Key Features

  • Hash function maps keys to index of array
  • Array is made up of linked list to store data while avoiding collisions from the hashing
  • Hash function with good distribution crucial to performance
  • No order

Runtime

  • Access: O(1)
  • Search: O(1)
  • Insert: O(1)
  • Deletion: O(1)

Applications

  • Anything that does not require order

Ruby Alternative

Murmur hash seems to be used in the native ruby hash. Note that it is easily reversible. Hence while it can be used for maintaining good key distribution, it is not ideal for cryptographic purposes.

Linked List

Each node will point to the next node. The last node will point to null. Accessing elements can be slow as the pointer need to jump through nodes, unlike array which can access instantly via the index. The advantage of linked list over array is that you do not need to allocate the required memory at the start. You will only use the memory that you need without wastage. It is very space efficient.

Another advantage is its speed during prepending elements or inserting them in the middle. Unlike the array where every element thereafter has to be shifted, it can be done in constant time in a linked list.

A variation, the doubly linked list, gives bearing to adjacent node on both ends. It allows traversal in both direction as its biggest advantage. The maintenance needed to maintain that the 2nd neighbouring node in all operations may be costly.

Last variation is the circular linked list. That said, there;s a classic linked list question on how to detect if a linked list has a cycle (not necessarily circular). The solution is to use a fast pointer and a slow pointer to loop through the link list until they point to the same node in a linked list with a cycle, or null for a non circular linked list. This is simple cycle detection algorithm known as Floyd’s tortoise and hare, and is entertainingly portrait in the video below.

Side note for me: the distance of the loop, not coincidentally but mathematically, equals to the start of the linked list to the location where the hare and tortoise meet. Again, not coincidentally but mathematically, the distance from the start of the linked list to the start of the loop equals to the distance from location where the hare and tortoise met to the start of the loop (continuing in the direction that the tortoise was originally moving in).

Key Features

  • Head node may be null
  • Last node will point to null
  • Doubly linked list is another variation, where each node points to its previous and next node

Runtime

  • Access: O(n)
  • Search: O(n)
  • Insert
    • append/prepend: O(1)
    • ordered insert: O(n)
  • Deletion: O(n)

Applications

  • Anything that requires order and needs to save on memory

Ruby Alternative

There is not native linked list in ruby. However, there are gems and this by spectator is still pretty active.

Queue

Queue is a collection of data that obeys the First In First Out (FIFO) principle.

Theoretically, as traversal is not suppose to happen in a queue, I believe that it is best implemented with linked list rather than an array. There is no resizing overheads, and no need to shift all the elements every time an element is taken out of the front of the queue.

Addition to the queue might mean having to hop through the whole link list to add the element at the back. However, I would solve this by using a circular linked list to have a grip on the first and last element, which is actually all the queue would care about. Of course, things will be different if it is a not-so-simple queue like a least recently used (LRU) implementation.

However, there are certain advantages we should consider implementing with arrays. Arrays can be cached more easily as they are consist of memory units adjacent to one another. On the other hand, a linked list consist of memory units that exist sparsely in the memory pool which hurts its caching capabilities. The reason is a TODO for me when I go beyond data structures during this revision weeks.

Nonetheless, cache engines like Redis has their own implementation of a linked list (Redis List) in their cache database. I do not know if this is the same caching mechanism that is affected by the sparse memory locations of a linked list, but it is probably good to know. 

Key Features

  • FIFO

Runtime

  • Insert (prepend): O(1)
  • Deletion (shift): O(1)

Applications

  • Restaurant queues

Ruby Alternative

Arrays are usually used as queues in ruby. It, however, does have a native Queue class, which is meant for multi threaded operations. On top of that, it has a SizedQueue class to ensure the size is within capacity.

Stacks

Stack are like the brother of queues. The only difference is they obey the Last In First Out (LIFO) principle.

Again as traversals are not supposed to happen, we can use linked list for the same advantages and considerations as explained in the “Queue” section above. And instead of appending to the end of the linked list, we will prepend to the linked list instead, where the head of the linked list represents the top of the stack. It will be constantly changing and where all the action will take place.

This ensures there is no overhead from resizing from using the array when the data gets too big, but it will need to take the need and performance in caching into consideration.

Arrays will be more suited to implement a stack than a queue. This is because all the push and pop will take place at the end of the array, unlike the queue which need to remove element from the front of the array and cause shifting of all elements forward.

Two stacks can be used to implement a queue with minimal performance overhead as well, as shown in the video below.

Key Features

  • LIFO

Runtime

  • Insert (push): O(1)
  • Deletion (pop): O(1)

Applications

  • Matching balanced parenthesis problems
  • Anagram / palindrome problems
  • Backtracking in maze
  • Reversals

Ruby Alternative

No native stack in Ruby. A simple array will suffice. Linked list gems are available too.

Graph

A graph is a superset of linked list. Unlike a linked list where it needs to be associated to a next element, and its previous element for a doubly linked list, a graph can have links to multiple nodes, not just to the adjacent ones.

The link between each node of a graph contains data to give more meaning to the relationship between nodes. In graph terminology, this link is called an “edge” (as a matter of fact, “nodes” are termed “vertices“). An edge can be directed or undirected. Think being friends (undirected) versus being a follower (directed) between users in a social network.

2 common ways to search a graph is the Depth First Search (DFS) and Breadth First Search (BFS). DFS has a weakness where it will search the full depth from one edge before moving on to the next. This translates to inefficiency if the vertex that we are searching for is on the other edge. Hence BFS is preferred.

Typically in BFS, a queue is used to store the next vertices to search.

There can be cycles of vertices having an edge to one another, hence during the search, it is imperative to check if a vertex has been visited or not to prevent going round in circles during the search. Unless we are talking about a Directed Acyclic Graph (DAG) where there are no directed cycles.

Key Features

  • Nodes are termed vertices
  • Edges contain data to describe the relation between vertices
  • BFS preferred
  • Flag to check if visited the vertex before in a search algorithm to prevent looping inside a cyclic relationship among vertices.

Runtime

Time complexity of a graph depends on how the edges and vertices are stored. The optimal choice of storage depends on any prior knowledge of how the graph might look like.

Applications

Ruby Alternative

There is no native graph data type in ruby. Gems are available.

Set

The data structure of a set is the same as that of a hash table. The difference is that set is not really concerned with the mapped value of a key. It just tracks whether the key is present.

This implies that there can be no duplicates in the keys just like a hash table. And unlike a hash table where a key can be mapped to a null value, the key will be removed if nullifed for the case of a set.

Key Features

  • No order
  • No duplicates

Runtime

  • Access: O(1)
  • Search: O(1)
  • Insert: O(1)
  • Deletion: O(1)

Applications

  • Attendance

Ruby Alternative

Ruby has a native data type for a set.

Rails select helper with default selected option disabled and prompt

This is a documentation once and for all on how to render a select tag in rails view with a disabled option that is preselected.

Motivation

I find myself having to refer to stackoverflow often for the solution because it just isn’t intuitive. It requires some sort hacking before it can be rendered the way I needed it. Often, I would not understand the purpose of writing the code that way unless I can remember the purpose and  see the end goal of what it is trying to achieve.

Code that is not self-explanatory is a sign of code smell. That is so not Rails.

The Old Way Of Coding

<= tweet_form.select :user_id,
	options_for_select(
		@users.map { |user| [user.name, user.id] },
      selected: tweet_form.object.user_id || 'Please select user',
      disabled: 'Please select user'
    ),
	{},
	{ class: 'form-control' }
%>

The way I used to code out the list of options is to create an array that contains yet another array as shown above. In the inner array, the first value will be the label that users will see when selecting the option from the list, while the second value is the actual value to be passed to the backend.

Here is the part that raises question. In line 3, I prepend the array of users using the + operator with an array containing Please select user. This is meant to be the first value to be selected so that it can act as a prompt for the select tag.

I use the options_for_select helper to make it a little easier to set the with the selected and disabled options. The selected key will select the prepended array’s value as the default value for the select tag, and the disabled key will set it as disabled so that users cannot choose it and send an invalid value to the backend.

Hence, without knowing the end goal, the code from line 3 to 5 will raise eyebrows. There has to be a better way and indeed there is. But first, let me touch on the remaining lines for completion sake.

Line 7 is for other options for the Rails select helper, like include_blank which we are not using.

Line 8 is for additional HTML attributes.

The New Way Of Coding

Rails 6 has added the prompt helper in the Rails select tag to achieve exactly this purpose. Let’s compare the new way of writing that snippet.

<= tweet_form.select :user_id,
	@users.map { |user| [user.name, user.id] },
	{
      selected: tweet_form.object.user_id || "",
      disabled: "",
      prompt: 'Please select user'
    },
	{ class: 'form-control' }
%>

We are no longer using the options_for_select helper as we do not need its selected and disabled options for any more unsettling hackery.

In its place, we use the prompt key in the line 6 under the options argument for the Rails select helper to give achieve the same result.

The disabled key here makes the prompt an unselectable option. Leave it out if your UX allow selection of nil value. You may need to engage the include_blank key here as well.

The selected key sets the selected value conditionally to be the prompt’s unless the form object already has a user_id value. This part is a little quirky; the concept of selecting the value of the form object should already be implemented by default without having to write out the conditional code as in line 4. I believe this is constructed to fit all scenarios for all UX requirements, but I can’t really be sure. I’ll check it out someday.

Note that by using prompt, the prompt will not be present as one of the options if the form object already has a value.

Conclusion

This is a lot neater and much more like the Rails we know. There is no more questionable code and every line and option has a clear purpose.

Handling DOM Elements From link_to remote: true Callback

This is a documentation of how to handle the response from a link_to remote: true API call and manipulate the DOM with minimal Javascript code.

Motivation

In the past, I use Javascript to add a click listener on a button element in order to make jquery.ajax() API call to my Rails server.

A typical use case would be to delete a row in a list. While I can make a resource delete request to the Rails server and it will reload the page with the new list the RESTful way, this UX flow does not work out in some cases.

Hence, I had to use the jquery.ajax() way to work things out. I did not use the link_to with remote: true helper because I thought there is no way for me to listen for the response and react.

The the only way I can listen on the callback and handle the DOM element thereafter, without having to reload the page was to use success callback the jquery.ajax(), or so I thought.

The Magic

So a better way is to use the link_to remote: true helper to render out the element without fuss the Rails way. And the key step is to add a Javascript listener on the element.

<!-- page.html.erb -->
<div data-model-id="<%= @model.id %>">
  <%= link_to 'DELETE', model_path(@model), method: :delete, class: 'delete-model', data: { confirm: 'Are you sure?' }, remote: true %>
</div>
// page.js
$(document).on('ajax:success', '.delete-model', event => {
  const [response, status, xhr] = event.detail;
  $(`.parent-row[data-model-id="${response.model_id}"]`).remove();
  alert(response.message);
});

The listener will listen for a ajax:success callback on the element that made the API call. Upon triggered by the event, it executes the block of code in its callback function. In this callback function, we will receive the data passed from the backend, which we can use to remove the DOM element as required.

Note that you might not want to use $(document).on() in a turbolinks environment as the listener will be added every time the page changes. A particular use case is documented here.

We can add a ajax:error listener as well to handle errors.

The Advantages

javascript crying | vic-l

This is a no hassle method of writing code in ruby (well, for the rendering of the element at least). The old way that I do, which is to use the jquery.ajax() method, requires more tear-inducing Javascript code to conjure. For a full stack Rails developer, it is not the most welcome.

On top of that, using Rails helper to render out the HTML element allows us to make use of the various Rails helpers to supercharge our development speed.

Url route helpers parse out the actual RESTful route to call with ease. Since it is dynamically interpolated, no code change is required should there be a route change.

We can also still take advantage of rails-ujs which has some handy features commonly needed for development. In the example above, I added a data-confirm attribute. This will be trigger rails-ujs to ask for confirmation before proceeding with the request, and gracefully abort the operation should the user cancel the confirmation.

This will require proper setup with the new Rails 6 version. Check out my article on how to properly setup Rails 6 with bootstrap, and of couse, integrate the new rails-ujs in its brand new frontend paradigm running on webpack.

Conclusion

Utilizing rails helpers as much as possible will exude the strength of the Rails framework even more, which is rapid development. This method of listening to remote API calls and act accordingly from the response allows exactly this.

AWS Lamba and API Gateway Integration With Terraform To Collect Emails

This is a documentation on creating a service that collects emails. It runs on serverless technology utilizing AWS lambda and API Gateway. It is also made easy to deploy to the cloud with infrastructure as code via Terraform in the form of a plug-and-play methodology.

Motivation

Often, I have to make static websites that are not exactly completely static because it requires a backend to collect the emails. While 3rd party services like mailchimp and sendgrid has their own SDKs to support easy integration for email collection, we have to be worried about hitting the limit in their packages and plans. This translate to stress for developers as we have to find a solution on it quickly and properly. If this happens on a weekend or a Friday, somehow this is always the case as more people are surfing the net then, the intensity is amplified.

For a new website, it is very hard to gauge the traffic and thus the plan required for the 3rd party service. this poses difficulties when budgeting for the project. Under utilizing the service also translate to unnecessary cost. The best kind of plan for such website is a pay as you go model, in my opinion, and that can be achieved by integrating with cloud providers like AWS.

Technology Stack

AWS Lambda

Enter AWS lambda where you only pay for what you use. You do not need to fork out money at the start of your project. Instead, you will just pay for how much you use, hence relieving you of the worry of wasting money on resources you are not using. In fact, this is only an issue if you are hitting 1 million potential users signing up with their email every month. The reason is because AWS lambda has 1 million free request every month before they start charging. This is highly unlikely for a new website, which means you now have a backend for your static website for free.

API Gateway

For the Serverless fuction, that is AWS Lambda, to connect to the Internet via an API, we need the API Gateway. This exposes the serverless function to be accessible by the World Wide Web with a HTTPS endpoint. It runs on the encrypted transport security layer protocol to uphold security by default. This allows your websites to use the serverless function via API calls.

Terraform

To set up the infrastructures, the usual way is to navigate the AWS management console, deploy the required AWS resoures and link them. This can be a challenge if you are not familiar with the required configurations. Not only will this translate to loss of precious time to debug these issues, which otherwise developers could have spent it with your loved ones and challenge the meme below, but it will also lead to frustration.

While frustration is a part and parcel of life as a programmer, we can also avoid them with our knowledge of code. Here is where Terraform enters the fray. It is an Infrastructure As Code where you write the configurations of the infrastructure once and you can deploy it multiple time without having to go through the whole forest of the AWS console each time. This means you do not need to remember every single step and do not need to deal with surprise bugs because you forgot one of them, or worse, had a spelling error.

Programming is like magic. You write very specific instructions in arcane languages to invoke commands, and if you get it even a little bit wrong you risk unleashing demons and destroying everything.

— Diana Carrier (@artemis_134) June 23, 2018

Since the blueprint infrastructure is in code, this means we can leverage version control features with git, and work together to improve the code base along the way without fear of not being able to rollback to the previous successful configuration.

Terraform Files

I will start off with the terraform files required to setup the infrastructure to deploy the code. Let’s start off with the place to store our emails.

The database – AWS DynamoDB

I will store the emails collected in AWS’s own noSQL database DynamoDB. This is a fast, simply structured and schemaless storage which fits my use case very aptly.

It allows fast and simultneous writes at high speed, so there is no fear of race conditions from spike in the volume of signups during a PR event promoting the product and getting people to leave their emails at the website.

Since it is schemaless, we can easily add new details of the users that you would like to collect on top of their emails along the way without having to migrate and fiddle with the structure of the database. With proper metaprogramming, you do not need to touch the backend code as well, leaving only the frontend to work on adding the new text fields for data collection.

For the sake of argument, we can also use the traditional relational database management system  (RDBMS) for this project. It is written in SQL, which is a langauge most, if not all, developers who every touched a database would have known. There is no need to use fancy noSQL for this simple project. In addition, the chances of leveraging the scaling advantage of noSQL over SQL databases are low, because you will need alot of traffic for that to become a worry. For a new website, that is highly unlikely to happen.

However, highly influenced by the cost, I am still sticking with DynamoDB in this case. To setup an AWS RDS to host a managed relational database, the cheapest MySQL database already goes for around 20 USD a month, as compared to the pay as you go model the DynamoDB employs. On top of that, it has a generous amount of free usage and storage under its free tier. This free tier does not last for the first 12 months after your signup but forever, unlike the RDS counterpart. We probably will NOT incur any cost using DynamoDB unless your marketing is brilliant for your new website.

resource "aws_dynamodb_table" "main" {
  name = "${var.project_name}-dynamodb_table"
  billing_mode = "PROVISIONED"
  read_capacity = var.dynamodb-read_capacity
  write_capacity = var.dynamodb-write_capacity
  hash_key = "email"

  attribute {
    name = "email"
    type = "S"
  }
}

Provisioning the database is the simplest. I am using Terraform variables to substitute values to set the number of reading and writing units required, as well as the table name for robustness sake.

I have set the billing mode to “provisioned” for simplicity sake. Afterall I am not expecting any insane burst of traffic for a site that is not popular. Even if it does, maybe due to some incrediable promotion at some hugely popular event, I do not expect the load to require me to scale the reading and writing capacities of the database. It is going to be a quick write of a few bytes.

On top of that, provisioned capacity means less configurations needed for the permissions to autoscale of the capacities of the database. It can take some time to configure that, and since that is outside the topic of the article, I will stick to “provisioned” billing mode.

The hash_key, or “partition key” in other definitions, is analogous to the primary key in a SQL database table. It requires specific details under the attribute property. You can specify the range_key, or “sort key” here if you require, and remember to add attribute to describe it as well.

Other attributes that are neither the partition key nor the sort key need not have a attribute property in this file. You can simply just write it in the database and it will register. Afterall, this is a schemaless database.

On top of that, it is a fully managed database, so it comes with all the goodies like backup and version maintenance to spare developers from all these chores.

The backend – AWS Lambda

​Next is the lambda function. It is written in Javascript using Nodejs. The file below is the configuration file to set the infrastructure required. Let’s dive into it.

resource "aws_lambda_function" "main" {
  filename = var.zipfile_name
  function_name = "${var.project_name}"
  role = aws_iam_role.main.arn
  handler = "index.handler"

  source_code_hash = "${filebase64sha256("${var.zipfile_name}")}"

  runtime = "nodejs12.x"
}

resource "aws_iam_role" "main" {
  name = "${var.project_name}-iam_lambda"

  assume_role_policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": "sts:AssumeRole",
      "Principal": {
        "Service": "lambda.amazonaws.com"
      },
      "Effect": "Allow",
      "Sid": ""
    }
  ]
}
EOF
}

resource "aws_iam_policy" "main" {
  name = "main"
  path = "/"
  description = "IAM policy for lambda to write to dynamodb table and logging"

  policy = templatefile("${path.module}/lambda_policy.tmpl", { dynamodb_arn = aws_dynamodb_table.main.arn })
}

resource "aws_iam_role_policy_attachment" "main" {
  role = "${aws_iam_role.main.name}"
  policy_arn = "${aws_iam_policy.main.arn}"
}

resource "aws_lambda_permission" "main" {
  statement_id = "AllowExecutionFromAPIGateway"
  action = "lambda:InvokeFunction"
  function_name = aws_lambda_function.main.function_name
  principal = "apigateway.amazonaws.com"

  source_arn = "${aws_api_gateway_rest_api.main.execution_arn}/*/*/*"
}

Uploading of the backend code will be using the base64 hash of the zipfile of the code. The code will need to be first compressed and zipped before taking this action. We will see how we can automate this process later.

This lambda function will need the permissions to write to the dynamoDB table. This is done using

  • aws_iam_role to establish trust between the 2 AWS services
  • aws_iam_policy to give permission for the lambda function access the database resource and perform the PutItem action. Details of the policy is interpolated via a template file, which we will go through later
  • aws_iam_role_policy_attachment to bind the aws_iam_role to the aws_iam_policy on the lambda function
  • aws_lambda_permissionto allow API Gateway to be able to integrate the lambda function and invoke it

The template file for the aws_iam_policy is shown below. It lists the actions that the lambda function is permitted to perform on the specified dynamodb table. It also contains the permissions for lambda function to push the logs to AWS Cloudwatch. By the way, these logging permissions are the default permissions for a lambda function, and this template adds on the DynamoDB permissions to them. Note the dynamodb_arn variable that is interpolated, which jusitifies the use of the template file instead of hardcoding the whole policy in the main terraform file for robustness sake.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": "dynamodb:PutItem",
      "Resource": "${dynamodb_arn}",
      "Effect": "Allow"
    },
    {
      "Action": [
        "logs:CreateLogGroup",
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Resource": "arn:aws:logs:*:*:*",
      "Effect": "Allow"
    }
  ]
}

The API layer – AWS API Gateway

The API Gateway is required to expose the lambda function to be consumed by servers and websites via a URL endpoint. The endpoint will be served over the HTTPS, which requires some extra configurations as documented below.

resource "aws_api_gateway_rest_api" "main" {
  name = var.project_name
}

resource "aws_api_gateway_resource" "main" {
  rest_api_id = aws_api_gateway_rest_api.main.id
  parent_id = aws_api_gateway_rest_api.main.root_resource_id
  path_part = "email"
}

resource "aws_api_gateway_integration" "main" {
  rest_api_id = aws_api_gateway_rest_api.main.id
  resource_id = aws_api_gateway_resource.main.id
  http_method = aws_api_gateway_method.main.http_method
  integration_http_method = aws_api_gateway_method.main.http_method
  type = "AWS_PROXY"
  uri = aws_lambda_function.main.invoke_arn
}

resource "aws_api_gateway_integration_response" "main" {
  depends_on = [aws_api_gateway_integration.main]

  rest_api_id = aws_api_gateway_rest_api.main.id
  resource_id = aws_api_gateway_resource.main.id
  http_method = aws_api_gateway_method.main.http_method
  status_code = aws_api_gateway_method_response.main.status_code
}

resource "aws_api_gateway_method" "main" {
  rest_api_id = aws_api_gateway_rest_api.main.id
  resource_id = aws_api_gateway_resource.main.id
  http_method = "POST"
  authorization = "NONE"
}

resource "aws_api_gateway_deployment" "main" {
  depends_on = [
    "aws_api_gateway_integration_response.main",
    "aws_api_gateway_method_response.main",
  ]
  rest_api_id = aws_api_gateway_rest_api.main.id
}

resource "aws_api_gateway_method_settings" "main" {
  rest_api_id = aws_api_gateway_rest_api.main.id
  stage_name = aws_api_gateway_stage.main.stage_name
  
  # settings not working when specifying the single method
  # refer to: https://github.com/hashicorp/terraform/issues/15119
  method_path = "*/*"

  settings {
    throttling_rate_limit = 5
    throttling_burst_limit = 10
  }
}

resource "aws_api_gateway_stage" "main" {
  stage_name = var.stage
  rest_api_id = aws_api_gateway_rest_api.main.id
  deployment_id = aws_api_gateway_deployment.main.id
}

resource "aws_api_gateway_method_response" "main" {
  rest_api_id = aws_api_gateway_rest_api.main.id
  resource_id = aws_api_gateway_resource.main.id
  http_method = aws_api_gateway_method.main.http_method
  status_code = "200"
}

output "endpoint" {
  value = "${aws_api_gateway_stage.main.invoke_url}${aws_api_gateway_resource.main.path}"
}

So let’s break it down.

The aws_api_gateway_rest_api represents the project in its entirety.

The aws_api_gateway_resource refers to each api route of this project, and there is only 1 in this case.

I have setup only 1 stage environment of aws_api_gateway_stage for this project using a Terraform variable. You can setup a different stages to differentiate the staging and production environments.

The aws_api_gateway_stage is associated to a aws_api_gateway_method_settings that sets the throttling rate of the API to prevent spams and overloading. For the method_path property, the wildcard route is used to apply to all routes instead of the only API route that was created. It is trivial in this case, but the explanation for picking this “easy” route is simply due to a bug. It I were to specify the exact route, which is in the form of {resource_path}/{http_method}, the settings on the throttling rate will not propagate. It was documented here on github but was not properly resolved. Leaving it here for now.

The aws_api_gateway_deployment configures the deployment of the API. Note the depends_on attribute that was assigned. This explicit dependency is critical to ensure the deployment is called into effect after all the necessary resources have been provisioned.

The aws_api_gateway_integration configuration sets the integration to lambda proxy using POST HTTP method without any authorization, as specified by the aws_api_gateway_method configuration. Lambda proxy allows us to handle the request from the server like how we would in a typical web application backend framework. The full request object is passed to lambda function and the API Gateway plays no part in mapping any of the request parameters. The API Gateway mapping has great potential to integrate interfaces properly, but for our use case, it is not necessary. I find this article doing a great job in explaining the API Gateway features with easy to consume information and summary, like a gameshark guide book written by the half-blood prince. Do take a look to understand AWS API Gateway better.

The aws_api_gateway_integration_response is responsible for handling the response from the lambda function. This is where we can make changes to the headers returned from the lambda function using the response_parameters property, which is not used in this case. This is also the place to map and transform the response data from the backend to fit the desired data structure using the response_templates property.

The aws_api_gateway_method_response is where we can filter what response headers and data from aws_api_gateway_integration_response to pass on to the caller.

The transform and mapping of the headers and data from the backend (ie the lambda function) in aws_api_gateway_integration_response and the filter of headers and data before passing to the front end in aws_api_gateway_method_response is not needed in this sample application. It is just good knowledge to have. There are 2 reasons why we do not need them here.

First, in a bit, we will go through the front end that will make an API call that is a simple request. A simple request does not require a preflight request, which is a API call made by browsers prior to the actual API call, as they are deemed safe since they are using standard CORS-safelisted request headers. In the event that one does need a preflight request because one is not making a simple request, we will need to set up another API route that will transform the headers returned from the backend and allow the relevant headers to be passed on to the front end for this preflight request. This will allow the frontend website to overcome the CORS policy enabled by default in modern browsers. This will mean we need to configure a new set of aws_api_gateway_rest_api, aws_api_gateway_integration, aws_api_gateway_method, aws_api_gateway_integration_response, aws_api_gateway_method_response just for this preflight request. Things can get complicated here, so I will leave out of this article. If you still to implement CORS, [this gist](https://gist.github.com/keeth/6bf8b67c82f9a085e03ecbb289a859d6) is a good reference.

Second, we are using lambda proxy integration, so the full response from the lambda will be passed to the front end and mapped automatically, provided the response from the lambda code is properly formatted. Refer to this documentation for more details on it.

At last, the output resource will print the value of the enpoint of the api for us to integrate in our frontend.

The Admin Stuff

This file contains the details that we will need to setup terraform and the variables we are using. The provider‘s region attribute here is hardcoded, which should ideally not be the case. I have yet to figure out how to make this dynamic and robust. The name with the todo- prefix should be changed to fit the project.

We are using an S3 bucket as the Terraform backend to hold the state of the infrastructure provisioned by Terraform. ​Creation of the bucket will be automated via a script that we will go through during the section on deployment.

provider "aws" {
  version = "~> 2.24"
  region = "eu-west-1"
}

terraform {
  required_version = "~> 0.12.0"
  backend "s3" {
    bucket = "todo-project-tfstate"
    key = "terraform.tfstate"
    region = "eu-west-1"
  }
}

variable "project_name" {
  type = string
  default = "todo-project"
}

variable "region" {
  type = string
  default = "eu-west-1"
}

variable "stage" {
  type = string
  default = "todo-stage"
}

variable "zipfile_name" {
  type = string
  default = "todo-project.zip"
}

variable "dynamodb-read_capacity" {
  type = number
  default = 1
}

variable "dynamodb-write_capacity" {
  type = number
  default = 1
}

The Application

Here is the application code in written in nodejs. It is a simple write to the dynamodb with basic error handling. It takes in only 1 parameter, that is the email. This code can definitely be improved by allowing more parameters to be written to the database in a dynamic way, so that the same code base can be used for a site that collects the first and last name of the user, as well as another site that collects the date of birth of the user. I will leave that as a future personal quest.

// Load the AWS SDK for Node.js
const AWS = require('aws-sdk');

// Set the region 
AWS.config.update({region: 'eu-west-1'});

// Create the DynamoDB service object
const ddb = new AWS.DynamoDB({apiVersion: '2012-08-10'});

exports.handler = async (event) => {
  console.log(JSON.stringify(event, null, 2));
  const params = {
    TableName: 'todo-project-dynamodb_table',
    Item: {
      'email' : {S: JSON.parse(event.body).email}
    }
  };

  // Call DynamoDB to add the item to the table
  ddb.putItem(params, function(err, data) {
    if (err) {
      console.log("Error", err);
    } else {
      console.log("Success", data);
    }
  });
  
  try {
    const result = await ddb.putItem(params).promise();
    console.log("Result", result);
    const response = {
      statusCode: 204,
      headers: {
        "Access-Control-Allow-Origin" : "*",
      },
    };
    return response;
  } catch(err) {
    console.log(err);
    const response = {
      statusCode: 500,
      headers: {
        "Access-Control-Allow-Origin" : "*",
      },
      body: JSON.stringify({ error: err.message }),
    };
    return response;
  }
};

A thing to note here is the need to return the Access-Control-Allow-Origin header in the response. The response also has to follow a particular but straightforward and common format in order for lambda proxy integration with API Gateway. This will map the response properly to the API Gateway method response and be returned to the frontend websites to overcome the CORS policy implemented by modern browsers.

Deployment

I will be using 3 ruby scripts for deployment related tasks, namely init.rb, apply.rb and destroy.rb, and a helper service object, get_aws_profile.rb for the deployment process.

Let’s take a look at them.

get_aws_profile.rb

# get_aws_profile.rb

class GetAwsProfile
  def self.call
    aws_profile = "todo-aws_profile"

    begin
      aws_access_key_id = `aws --profile #{aws_profile} configure get aws_access_key_id`.chomp
      abort('') if aws_access_key_id.empty?

      aws_secret_access_key = `aws --profile #{aws_profile} configure get aws_secret_access_key`.chomp
      abort('') if aws_secret_access_key.empty?
    rescue Errno::ENOENT => e
      abort("Make sure you have aws cli installed. Refer to https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html for more information.")
    end

    p "AWS_ACCESS_KEY_ID = #{aws_access_key_id}"
    p "AWS_SECRET_ACCESS_KEY = #{aws_secret_access_key}"

    [aws_profile, aws_access_key_id, aws_secret_access_key]
  end
end

This is a helper method that will get the aws_access_key_id and the aws_secret_access_key for usage in the scripts. Note that it uses the aws cli command to attain the keys. Hence, it has to be installed on your local machine prior to running. It also assumes you are using named profile to hold your credentials.

I don’t really like this setup since it requires these prerequisites. But well that can be solved again in the future.

init.rb

The first script to run is init.rb.

The init.rb will create the S3 bucket to be used as the terraform backend. Line 20 checks for the presence of this bucket and throws an exception if the bucket does not exist. The rescue block, if triggered, will create the non-existent bucket.

The initialization process on terraform is run via its docker image.

require 'bundler/inline'

gemfile do
  source 'https://rubygems.org'
  gem 'pry'
  gem 'aws-sdk-s3', '~> 1'
end

require './get_aws_profile.rb'

aws_profile, aws_access_key_id, aws_secret_access_key = GetAwsProfile.call

s3_client = Aws::S3::Client.new(
  access_key_id: aws_access_key_id,
  secret_access_key: aws_secret_access_key,
  region: 'eu-west-1'
)

begin
  s3_client.head_bucket({
    bucket: 'todo-project-tfstate',
    use_accelerate_endpoint: false
  })
rescue StandardError
  s3_client.create_bucket(
    bucket: 'todo-project-tfstate',
    create_bucket_configuration: {
      location_constraint: 'eu-west-1'
    }
  )
end

response = `docker run \
  --rm \
  --env AWS_ACCESS_KEY_ID=#{aws_access_key_id} \
  --env AWS_SECRET_ACCESS_KEY=#{aws_secret_access_key} \
  -v #{Dir.pwd}:/workspace \
  -w /workspace \
  -it \
  hashicorp/terraform:0.12.12 \
  init`

puts response

apply.rb

Once initialized, the next script to run is apply.rb.

Prior to applying the Terraform instructure, the backend code is packaged into a zip file. After application, the zip file is deleted for housekeeping.

require 'bundler/inline'

gemfile do
  source 'https://rubygems.org'
  gem 'pry'
  gem 'rubyzip', '>= 1.0.0'
end

require './get_aws_profile.rb'
require 'zip'

aws_profile, aws_access_key_id, aws_secret_access_key = GetAwsProfile.call

folder = Dir.pwd
input_filenames = ['index.js']
zipfile_name = File.join(Dir.pwd, 'todo-project.zip')

File.delete(zipfile_name) if File.exist?(zipfile_name)

Zip::File.open(zipfile_name, Zip::File::CREATE) do |zipfile|
  input_filenames.each do |filename|
    zipfile.add(filename, File.join(folder, filename))
  end
end

response = `docker run \
  --rm \
  --env AWS_ACCESS_KEY_ID=#{aws_access_key_id} \
  --env AWS_SECRET_ACCESS_KEY=#{aws_secret_access_key} \
  -v #{Dir.pwd}:/workspace \
  -w /workspace \
  -it \
  hashicorp/terraform:0.12.12 \
  apply -auto-approve`

puts response

File.delete(zipfile_name) if File.exist?(zipfile_name)

With this, the api is now deployed and can be called from any website. We will go through a sample front end integration in a bit.

destroy.rb

Once you are done with the project or are in the process of debugging, the destroy script will remove all the resources deployed. It will also remove the S3 backend that was created outside of Terraform.

require 'bundler/inline'

gemfile do
  source 'https://rubygems.org'
  gem 'pry'
  gem 'aws-sdk-s3', '~> 1'
end

require './get_aws_profile.rb'

aws_profile, aws_access_key_id, aws_secret_access_key = GetAwsProfile.call

response = `docker run \
  --rm \
  --env AWS_ACCESS_KEY_ID=#{aws_access_key_id} \
  --env AWS_SECRET_ACCESS_KEY=#{aws_secret_access_key} \
  -v #{Dir.pwd}:/workspace \
  -w /workspace \
  -it \
  hashicorp/terraform:0.12.12 \
  destroy -auto-approve`

puts response

s3_client = Aws::S3::Client.new(
  access_key_id: aws_access_key_id,
  secret_access_key: aws_secret_access_key,
  region: 'eu-west-1'
)

begin
  s3_client.head_bucket({
    bucket: 'todo-project-tfstate',
    use_accelerate_endpoint: false
  })

  s3_client.delete_object({
    bucket:  'todo-project-tfstate',
    key: 'terraform.tfstate', 
  })
  s3_client.delete_bucket(bucket: 'todo-project-tfstate')
rescue StandardError
  puts "todo-project-tfstate S3 bucket already destroyed."
end

Sample Frontend Integration

<!DOCTYPE html>
<html>
<head>
  <script
  src="https://code.jquery.com/jquery-3.4.1.min.js"
  integrity="sha256-CSXorXvZcTkaix6Yvo6HppcZGetbYMGWSFlBw8HfCJo="
  crossorigin="anonymous"></script>
</head>
<body>

  <h2>HTML Forms</h2>

  <form id="form">
    <label for="email">First name:</label><br>
    <input type="text" id="email" name="email" value="test@test.com"><br>
    <input type="submit" value="Submit">
  </form>

  <script type="text/javascript">
    $( "#form" ).submit(function(event) {
      event.preventDefault();

      $.ajax({
        type: "POST",
        url: "https://todo-endpoint.execute-api.eu-west-1.amazonaws.com/todo-stage/email",
        data: JSON.stringify({
          email: $('#email').val()
        }),
        success: function(data, textStatus, jqXHR) {
          debugger
        },
        error: function(jqXHR, textStatus, errorThrown) {
          debugger
        }
      });
    });
  </script>

</body>
</html>

Below is a simple html web page that will has the email prefilled for demonstration purpose. The form will submit via jquery.ajax() using default settings so as not to trigger the need for preflight request.

You will see that the email will be added to the DynamoDB table, and the logs of the lambda funciton will be recorded in AWS Cloudwatch.

Conclusion

This exercise helped me understand how lambda is integrated with API Gateway, as well as the immense potential as a robust middleware the latter can be. In addition, I got to understand preflight request and CORS better, as well as the jquery.ajax() function.

The project is saved in this repository for future reference.