Connect Multiple Heroku Apps To Same Postgresql Add-on & Overcome Tailwind SASSC Conflict

Tailwind css has a conflict with sassc gem, as stated in the README file of their github repository.

This is a problem for my application because I’m using rails admin for my admin panel, and it needs sassc to compile its assets.

This conundrum between tailwind css and other gems that need the usual rails way of compiling assets can be solved by setting up 2 heroku apps that talks to the same database. Here is how with postgresql.

First, deploy the main heroku app. This automatically adds the heroku-postgresql add-on.

Second, setup a new environment called admin for example. Here are the list of things to prepare for this new environment.

  • Create new config/environments/admin.rb file. Copy the content of config/environments/production.rb into it (this depends on your needs)
  • Install rails_admin gem under the admin group like group :admin do. Do the necessary installation for rails_admin
  • Mount rails_admin route under admin environment only, like mount RailsAdmin::Engine => '/', as: 'rails_admin' if Rails.env.admin?

Third, create a new heroku app. This creates it’s copy of postgresql addon. Remove it by heroku addons:remove admin-postgresql-addon-name. The name of the add on can be found in heroku dashboard under Overview > Installed add-ons

Fourth, attach the postgresql addon of your main app with heroku addons:attach main-postgresql-addon-name

Fifth, change the configs as such heroku config:set RAILS_ENV=admin && heroku config:set RACK_ENV=admin

Lastly, deploy your admin application! 🎉🎉

PS. I might not have all the steps recorded here, but if there’s any new error, it should be fairly straightforward to solve!

Using Postgresql JSON functions to escape backslash in Rails store

Storing Rails store as JSONB

When using Rails store, we can set the data type as plain text or a structure datatype like json, hstore or jsonb.

Between json and jsonb, depending on the use case, one might be more favorable than the other as the choice of Rails store data type. For 1, json take the text input as it is and saves it as json text, whereas jsonb saves the text input in a “decomposed binary format”, which adds overheads during writing, but reduces overhead when reading since no parsing needed.

In addition, jsonb allows indexing and executing database queries against, even with ActiveRecord. So most of the time, if you’re thinking of using Rails store, you probably would want jsonb. More on the advantages and disadvantages between json and jsonb can be found here.

In this case, I’m storing it as jsonb because I’m going to query it.

Rails store will save backslash

Creating a simple Rails store like that as depicted in the docs will generate a string with backslashes to escape the ". Taking this code snippet as an example:

class User < ActiveRecord::Base
  store :settings, accessors: [ :height, :weight ], coder: JSON 
end

And updating the user like user.update(height: 200, weight: 100) will save the data as "{\"height\": 200, \"weight\": 100}".

This causes a problem when we try to access the values inside via postgresql JSON functions. When we try to do a WHERE users.settings::jsonb->'height' IS NOT NULL, the query will not work as expect and the output will always be null. This is because users.settings is not a proper json, so accessing any keys just would not make sense.

Escaping Rails store backslashes

The trick here is to first use the #>> annotation to convert the jsonb into text first. This step removes the backslash, but outputs a text.

The resultant text can, in turn, be converted into jsonb via typecasting, like (users.settings#>>'{}')::jsonb.

Thereafter, we can manipulate the jsonb as a proper json and query it with how we would have used the postgresql JSON functions, like this where clause: WHERE (users.settings#>>'{}')::jsonb->'height' IS NOT NULL.

How To Edit A Rake Task From A Gem

Apart from having to contribute to the open source project of a gem that you’re using in your Ruby On Rails application in order to edit a rake task in there, you can simply use Rake::Task#enhance to add some codes before and/or after the rake task. Unfortunately, Rake::Task#enhance does not allow adding anything in the middle of the rake task 😢

The Endgame

Eventually, you just need to run the same rake task command rake my:task, and the pre and post actions will be executed as well

How To Use Enhance On rake task

Rake::Task#enhance is execute on the rake task that you intend to “enhance”, so to execute it would look like Rake::Task['my:task'].enhance.

It takes in a &block argument that would run after the rake task has completed. I love that you can add binding.pry in this block of code to debug the post rake task action that you intend to enhance too.

Here is my practical example using Rake::Task#enhance. I use the i18n-js gem whose function is to take all the yaml locale files in rails and translate them into javascript variables in a javascript file. The application can then do the translation on the frontend in javascript without having to reload to render the page from server side in order the get the translation from the backend.

The problem with it is that the translation javascript file that it generates does not include a fingerprint (last time I read the docs, there does not seem to have an option to add the fingerprint too). This causes problem for us when we update the content in the translation.js file since we use CDNs to cache our assets. When we update the translation.js file, the old cached version is served because the name of the file is the same.

The code snippet below enhances the rake task by:

  1. Generating the translation.js file first
  2. Removing the previous translation-XXfingerprintXX.js files if any
  3. Add the current timestamp as the fingerprint for the newly generated js file in step 1.
Rake::Task['i18n:js:export'].enhance do
   puts "Removing previous translations.js files"
   Dir.glob(Rails.root.join('public/javascripts/translations-*.js')).each do |file|
     File.delete(file)
   end
 puts "Adding fingerprint to public/javascripts/translations.js…"
   File.rename(
     Rails.root.join('public/javascripts/translations.js'),
     Rails.root.join("public/javascripts/translations-#{Time.current.to_i}.js")
   )
end

How To Add Code Before The rake task

Above, we cover how to run the task after the rake task has run. To run something before that, you would need to pass the code as an argument.

Rake::Task['my:task'].enhance(puts("Start my:task")) do
  puts("End my:task")
  puts("Start my:new:task")
  Rake::Task['my:new:task'].invoke
  puts("End my:new:task")
end

Other Examples

If you use capistrano to do deployment, you can refer to their github source code for the after macro.

It uses enhance to synchronous order of the rake tasks to run during deployment.

Improvements

Since we are providing a &block argument, I would have expected to be able to use the yield keyword to control when to execute the rake task, which is the commonly used in ruby or rails. So for example,

Rake::Task.enhance do
  puts "Start"
  yield
  puts "End"
end

I believe this will improve familiarity of the code to fellow rubyists! Good old convention over configuration philosophy!

Note to self: There might be some limitations/problems executing a rake task this way; explore first and maybe propose in rails github 🥶

Critical CSS in Rails 6

The idea here is to load the css that is precompiled by webpacker along with the html of the page so that the page can go on to load itself in a presentable appearance without having to wait for the required css to be downloaded.

We are going to read the compiled css file and load it inline it in the head element of the webpage.

始めましょう!

Preparing The Development Environment

Firstly, you need to set extract_css as true in your webpacker.yml configuration file for your development environment.

default: &default
  ...
  development:
    <<: *default
    extract_css: true
    ...

This will ensure that the file is also compiled in the development environment, and can be found in the public/packs folder.

Note that the file is still kept in memory and not physically stored in that folder. I believe this is due to some webpack-dev-server magic happening.

In production, the file will be compiled and physically stored in that public/packs folder.

Picking The Critical css Files To Compile

Next, you need to add the relevant css files in a javascript file in the app/javascript/packs folder.

This folder app/javascript/packs is where Webpacker will look at, by default, to get the files it needs to compile.

And if these file contains css codes, its css related loaders will compile them accordingly into a css with the same name as the javascript file itself. It will then be placed in the public/packs folder as mentioned.

In this snippet below, we are compiling only some of the modules of the bootstrap library into a javascript file called common.js. We should expect a file named common-somegibberish.css to be compiled.

// app/javascript/packs/common.js
require("bootstrap/scss/bootstrap-grid.scss")
require("bootstrap/scss/bootstrap-utilities.scss")

At this point of time, this resultant compiled css file is a legit css file that can be added to any html file and read by any browser, or maybe non Internet Explorer browser. You can proceed to read off this file into your application.html.erb file if you had removed the fingerprints on the compiled css file.

If not, this is how you can dynamically retrieve the file.

Lookup Compiled CSS File

A simple File.read to read off the content of the file, complemented by the Webpacker.manifest.lookup method to search for the file with the correct fingerprint.

<!-- application.slim -->
doctype html
 html lang="en"
   head
     ...
     style
       = File.read(File.join(Rails.root, 'public', Webpacker.manifest.lookup('common.css'))).html_safe
       ...

How To Remove Fingerprinting In Assets With Rails + Webpacker

In order to remove fingerprinting, that is the hash value appended to the file name of a compiled asset, be it a javascript, css or image file, you will need to configure the webpacker environment configuration files differently for each asset.

Due to the their different configurations, each method is different and I will try to explain why and how I got to this solution in this article.

TLDR

// config/webpack/environment.js

const { environment,  config } = require('@rails/webpacker')
//* REMOVE FINGERPRINT for js
environment.config.set('output.filename', 'js/[name].js')
//* REMOVE FINGERPRINT for images
const fileLoader = environment.loaders.get('file')
fileLoader.use[0].options.name = function(file) {
  if (file.includes(config.source_path)) {
    return 'media/[path][name].[ext]'
  }
  return 'media/[folder]/[name].[ext]'
}
//* REMOVE FINGERPRINT for css
const miniCssExtractPlugin = environment.plugins.get('MiniCssExtract')
miniCssExtractPlugin.options.filename = 'css/[name].css'
miniCssExtractPlugin.options.chunkFilename = 'css/[name].chunk.css'
module.exports = environment

Note that these codes are placed in the environment.js file because I wanted to apply the configuration to all my environments. If you need to apply to only 1 environment, go on to read my explanation of each step so that you understand the concept and can yield the code according to how you like and not the other way round.

Motivation

So I have a project where the identical code base needs to be hosted on 2 servers that each handles different traffic based on the requested paths.

Both servers sit behind a CDN, AWS Cloudfront in my case, and are 2 of the many origins I have set up. The CDN is configured to route, for explanation sake, paths starting with /onepiece to server 1, and the rest to server 2.

This setup poses a challenge where webpages going to server 1 will still be requesting assets, namely javascript, css and images files, that are configured to come from server 2.

The problem here is these files do not exist on server 2! This is because the fingerprint value generated by each server during compilation is different.

TODO: I need to double check on this point because wouldn’t this make caching problematic if deployment, and thus compilation, is frequent and the hash keeps changing?

This messes up the page styling and it seems that the best way is to remove the generated fingerprint during asset compilation, at least for my case.

Now I know, there is a good reason for the existence of fingerprinting, but for my particular case, I needed to get rid of it. I know what I’m doing, I hope.

So, 始ります!

Remove Fingerprint For Javascript Files

Javascript files are the easiest to have its fingerprint removed, or have their file names configured.

These are the only files the only files that webpacker can read, compile and output natively without any plugins or loaders.

Hence, changing its output name is as simple as 1 line of code as shown below.

environment.config.set('output.filename', 'js/[name].js')

We are removing the [contenthash] in the file name, from its original configuration of js/[name]-[contenthash].js, which is the variable that is telling the compiler to add a hash in the output file name.

In case you are wondering how do I get the default setting, you can do a console.log(environment.config.output.filename) and run the compile command to identify it. Of course it will be better if you can go to the source code of webpacker to confirm this but ain’t nobody got time for that.

This code can be placed in config/webpack/environment.js to standardise the configuration across all environment, or in config/webpack/production.js to execute only in the production environment.

Note that you are targeting only javascript files here. Other assets like css and images are handled differently. Hence, the extension is always js.

Some issues on Github complained about that [ext] does not work, and, I believe, is due to some misconception on how Webpacker work (Disclaimer: I don’t really know how it works either).

Of course it is not going to work here because the output of only javascript and javascript only files are configured in this step, hence there is no need for a variable to dynamically determine the extension value.

Remove Fingerprint For Images

So when do we use [ext]? Well, that’s in this section where we are configuring other assets like images, gifs and fonts (maybe even audio and video but I have not tried them), where there are multiple formats.

And this configuration is not performed on webpacker itself, but on file-loader.

file-loader is a loader that is included by default in Rails’ webpacker to handle the compilation of non javascript and css assets. We change the output of the file name here.

const { environment,  config } = require('@rails/webpacker')
...
const fileLoader = environment.loaders.get('file')
//* to see definition of name function
// console.log(fileLoader.use[0].options.name.toString());
fileLoader.use[0].options.name = function(file) {
  if (file.includes(config.source_path)) {
    return 'media/[path][name].[ext]'
  }
  return 'media/[folder]/[name].[ext]'
}

Since the file-loader is already part of the webpacker settings, we first need to get our hands on that object, and line 3 shows how it is done.

Next, we change the name option. This option is in charge of the format of the output and under the default settings of webpacker, it is a function with some conditional logic. You can see the original function using the code snippet in line 5. I basically remove the hash variable.

And yes, [ext] will work here, in file-loader.

Notice in line 7, there is this config.source_path. A little explanation about where it came from.

In the original code, it is “spelled” sourcePath. And if we were to just copy it and run our compilation code, it will throw an undefined value. So we have to understand its origins in order to provide the same object as the argument.

Basically, it is an attribute of the webpacker config object that has undergone an es6 destructuring to change from snake to camel case, and its value is populated from a yml file. You can do a search on sourcePath and source_path in the webpacker repository on Github to learn more about it.

And where do we get the config object? Back in line 1, we exported it along side environment. In the default environment.js file, this is not the default setting; only environment is imported so take note here.

With that, and understand the various placeholders of the file-loader loader, you can probably understand the logic behind the Rails webpacker team on generating the asset files, and change the configuration according to your needs.

Now on to the css files.

Remove Fingerprint For CSS

Here is yet another way to handle assets compilation.

For images, loaders were use. For css, we use plugins. The MiniCSSExtractPlugin is used by default in webpacker to minimize the final css files after they have gone through pre processes, like converting from sass, and post processes, like postcss, and output the minimized file. Naturally, here is where the output file name is configured.

const miniCssExtractPlugin = environment.plugins.get('MiniCssExtract')
miniCssExtractPlugin.options.filename = 'css/[name].css'
miniCssExtractPlugin.options.chunkFilename = 'css/[name].chunk.css'

Line 1 shows how we can get access to the plugin object that is already configured by default in webpacker.

In lines 2 and 3, we change the file name by removing the hash variable from the default value.

Some solutions that I have found mentioned they append the plugin. This still works, but it will generate both css files with and without the hash in the file name. And this is because 2 instances of MiniCSSExtractPlugin is executed in the process.

Conclusion

There is quite a lot of abstractions here for this simple configuration, and the documentation is not the best for webpacker. I guess the webpacker team is still trying to optimize webpacker and would rather focus on that than the API and the documentation. Hopefully that will change in the future!

Ruby Threads Are Concurrent, Not Parallel, Thanks To Global Interpreter Lock

This is a concept in Ruby, at least for Ruby 2 before the 2020 Christmas present by Matz and team, where threads can only run concurrently, but not in parallel.

This distinction needs to be clear.

In parallelism, multiple threads run together and doing stuff like iops, switching and allocating memory and variables, as well as making HTTP request.

On the other hand in concurrency, it is only when 1 thread becomes idle, maybe from making a HTTP request that will take at least finite amount of time to return, does another thread execute its program. This is the principle that ruby threads adhere to, at least if they are running under Ruby MRI which is the default.

The reason for this design is to prevent race conditions and uphold thread safety, at the expense of parallelism and thus performance, which is totally understandable. And the guardian of this job is none other than the Global Interpreter Lock (GIL).

Note to self: The GIL is analogous to the event loop mechanism in the Javascript engine.

To illustrate this concept better, below is a simple sinatra application that opens up 2 routes. Both executes 2 commands of sleep, but one of them is done using threads.

require 'sinatra'

get '/sleep' do
  start_time = Time.now

  2.times do
    sleep 2
  end

  elapsed_time = Time.now - start_time
  elapsed_time.to_s
end

get '/sleep_with_thread' do
  start_time = Time.now

  threads = []
  2.times do
    threads << Thread.new do
      sleep 2
    end
  end

  threads.each(&:join)

  elapsed_time = Time.now - start_time
  elapsed_time.to_s
end

As sleep is an idle operation, we would expect the GIL to release the lock on the first thread and execute the next thread. This will result in the elapsed_time for the sleep_with_thread route to be around 2 seconds, while that for the sleep route will be accumulatively at around 4 seconds, as shown in the screen shots below.

However, things will behave differently if it was not an idle command like sleep. Let’s setup 2 routes that goes to work rather than sleep!

require 'sinatra'
 get '/work' do
   start_time = Time.now
 2.times do
     string = ''
     50_000.times do
       string += '仕事!'.freeze
     end
   end
 elapsed_time = Time.now - start_time
   elapsed_time.to_s
 end

 get '/work_with_thread' do
   start_time = Time.now
 threads = []
   2.times do
     threads << Thread.new do
       string = ''
       50_000.times do
         string += '仕事!'.freeze
       end
     end
   end
 threads.each(&:join)
 elapsed_time = Time.now - start_time
   elapsed_time.to_s
 end

Note: The freeze command is a mere memory optimization and does not affect the response time in any significant manner.

The elapsed_time between the 2 operations are similar, as shown in the screenshots below.

This clearly depicts that the 2 threads does not run simultaneously, which would otherwise have resulted in the operation concluding in half the time shown.

Instead, the 2 threads are executed synchronously, one after another, because the operations involved are not idle.

This little experiment clearly defines the difference between concurrency versus parallelism, and portrays the limited capability of using threads in Ruby.

Conclusion

Hence, only spawn new threads when there is an idle operation involved, with the most practical example being making a HTTP request, in order to take advantage of threading in ruby.

Otherwise, you will be doing a move as redundant as “taking off your pants to fart”.

How To Setup Headless Chrome Browser For Rails RSpec Feature Test With Selenium And Capybara

When working on feature tests on a non headless browser, the automated browser will tend to take control over our display and we tend to lose focus. That means while the test is ongoing, we cannot do anything but to wait for the test to run its course. And that sucks.

Headless chrome to the rescue!

Since April in 2017, Chrome ships with a headless version. It allows running the browser without a GUI. That is just perfect for automated feature testing without having to lose control over your machine.

Fast forward to 2020 (actually it’s been around for quite a while but I just couldn’t set it up properly, until recently), you can setup headless chrome to run feature tests in your Rails application with almost no custom code that are suggested all over stackoverflow. I will go through how using Capybara and Selenium.

Install Latest Capybara and Webdrivers gems

It is important to make sure you are using the latest version of gems. Learn how to find the latest ruby gem version here.

# Gemfile
group :test do
  gem 'capybara', '~> 3.34'
  gem 'webdrivers', '~> 4.0'
end

Setup the Configuration for Selenium Headless Chrome

Setup this configuration in your rails_helper.rb file.

# rails_helper.rb

Capybara.javascript_driver = :selenium_chrome_headless
# Capybara.javascript_driver = :selenium_chrome

Line 3 specifies to use the default selenium chrome headless browser as the driver to run your feature tests. This driver comes with the latest capybara gem as one of the default drivers.

Line 4 specifies to use the default selenium chrome browser to run your tests. This setting will run browser in the foreground and you will lose control of your machine. It is commented out, but you probably might need it when you are debugging your tests, so I will leave it here. To use it, simply uncomment it and comment out line 3.

And there you have it!

That is all the configuration you need. The whole setup is already incorporated into the latest gems so you do not need to add in any more custom code.

String Algorithm Problems

Motivation

I tend to forget the flow of an algorithm and the reasons for executing certain steps that may be pivotal in getting the correct algorithm. So as a revision and as a post to my future self, here are my notes for the common algorithms related to strings.

Longest Common Subsequence

There’s a number of ways to solve this. First is using recursion, which has a caveat but forms the core to the optimized solution using dynamic programming. Let’s take a look at solving for longest common subsequence problems using recursion.

Solving Longest Common Subsequence problems using Recursion

def lcs(s1, s2)
  define_method :recursion do |index1, index2|
    case true
    when index1 == s1.length || index2 == s2.length
      return 0
    when s1[index1] == s2[index2]
      return 1 + recursion(index1 + 1, index2 + 1)
    else
      return [
        recursion(index1 + 1, index2),
        recursion(index1, index2 + 1)
      ].max
    end
  end

  recursion(0, 0)
end

lcs(s1,s2)

Note: the define_method is a ruby method commonly used in dynamic programming in ruby. I’m using it lazily here to allow the nested recursion method to be able to access the variables s1 and s2 in the parent scope. It is not necessary to use it and there are other ways to construct this.

The idea here is to constantly compare each character in s1 with each character in s2 to accumulate the count if their characters match, and move on to the next character along each string.

The breaking function, which will cause the recursion algorithm to stop, is when we reach the end of either string. In that scenario, the count will not be incremented.

The caveat here is that there is a repetition of computation of the characters of either string as we progress through the other.

Some memoization will help with this longest common subsequence problem. We do that with a table.

Solving Longest Common Subsequence problems using Dynamic Programming and Memoization

def lcs(s1, s2)
  table = Array.new(s1.length + 1) {
    Array.new(s2.length + 1) {0}
  }

  (1...s1.length + 1).each do |s1i|
    (1...s2.length + 1).each do |s2i|
      if s1[s1i - 1] == s2[s2i - 1] # IMPT
        table[s1i][s2i] =
          table[s1i - 1][s2i - 1] + 1 # IMPT
      else
        table[s1i][s2i] = [
          table[s1i - 1][s2i],
          table[s1i][s2i - 1]
        ].max
      end
    end
  end

  table.last.last
end

The approach here is to construct a table that will memorize the steps calculated before. This is the common approach to LCS questions and I believe the video below best explain the theory behind this.


Some things to note.

  • The last element of the table will accumulate the length of common subsequence, so looking at the last element in the table will get you the maximum.
  • In line 6, make sure to minus 1 off the index when getting the character in the string. This is because the table has 1 more element than the string along its corresponding dimension, so we need remove the offset.
  • Take note of the order of index to call in the multidimensional table. I tend to mix them. We are traversing vertically down the table first, then horizontally, and accessing the element is index from the vertical first then the horizontal, which is, unfortunately, opposite of the coordinate system where the horizontal (x) comes before the vertical (y).

To get the subsequence (not the length), you will need to trace back from the table, as explained in the video. Here is the code for that.

def lcs s1, s2
  ... # after getting the table

  i = s1.length
  j = s2.length
  string = ''

  while table[i][j] != 0
    value = table[i][j]

    case true
    when value == table[i][j-1]
      j -= 1
    when value == table[i-1][j]
      i -= 1
    else
      string += s2[j - 1] # NOTE
      i -= 1
      j -= 1
    end
  end

  string.reverse
end

We start off at the last element (bottom right cell) of the table, where i and j is the length of each string respectively.

Remember, the table has 1 more element on each dimension than its respective string, so assigning each string’s length to i and j respectively will point to the last element in the table.

In line 8, we start the loop of backtracing.

The breaking condition is to check if the current element in the table is 0 or not. If it is zero, it means all possible matching characters have been accounted for and there are no more matching character moving forward along to the front of either string. We do not need to worry about going out of index also, because the 1st horizontal and vertical of the table contains 0, so the backtrace will cease when it reaches there under this same breaking condition.

Line 12 to 15, we check if the current value in the table is derived from the table element on top of it or on its left. The order does not matter, as either way will bring us to the character that is eventually matching on both string at their respective correct position.

In line 17 to 19, the value in the current table element does not have the same value as either its top and left elements. This means there was a match and it got its value from the diagonal element. Remember, when there was a match, the table element will take the diagonally top left element and add 1 to itself.

On line 17, note that you can use either character from either string, since they are matched. Just make sure you are using the correct index for the correct string. Remember to offset 1 from the index as the table has 1 more element that the string in the corresponding dimension.

We append the matching character to a string and continue to loop. Eventually the string will contain the matching characters and we just have to reverse it to get the longest common subsequence in the correct order.


However, if you just need the length of the longest common subsequence, you can do some space optimization by using 2 rows instead of a table. Afterall, the only rows involved in the computation is the current and the previous row.

The idea here is to set the current_row as the previous_row once the row has completed its round of iteration, because the current_row will become the previous_row in the next iteration.

def lcs s1, s2
  previous_row = Array.new(s1.length + 1) {0}
  current_row = Array.new(s1.length + 1) {0}
  
  row_index = 1
  while row_index < s2.length + 1
    (1...current_row.length).each do |col_index|
      if s1[col_index - 1] == s2[row_index - 1] # IMPT
        current_row[col_index] =
          previous_row[col_index - 1] + 1
      else
        current_row[col_index] = [
          previous_row[col_index],
          current_row[col_index - 1]
        ].max
      end
    end

    previous_row, current_row =
      current_row, previous_row
    row_index += 1
  end
    
  previous_row.last
end

With that in mind, we can further optimize by observing the pattern that all the elements, starting from the front, in the previous_row will have the same value as the elements in the current_row until the element where there is a match. From there on, the same calculation resumes.

If there is no match, the whole current_row will have the same values as the previous_row.

Hence, we can skip the loop for these elements that we know will stay the same.

def lcs s1, s2
  previous_row = Array.new(s1.length + 1) {0}
  current_row = Array.new(previous_row.length) {0}
  
  row_index = 1
  while row_index < s2.length + 1
    s2char = s2[row_index - 1]
    matched_index = s1.index(s2char)

    if matched_index.nil?
      current_row = previous_row.dup # IMPT
    else
      col_index = matched_index + 1
      current_row[0...col_index] =
        previous_row[0...col_index].dup # IMPT

      while col_index < current_row.length
        if s1[col_index - 1] == s2char
          current_row[col_index] =
            previous_row[col_index - 1] + 1
        else
          current_row[col_index] = [
            previous_row[col_index],
            current_row[col_index - 1]
          ].max
        end
        col_index += 1
      end

      previous_row, current_row =
        current_row, previous_row
    end
    
    row_index += 1
  end
    
  previous_row.last
end

Here are some differences with the previous algorithm.

Line 8 will check if there is any character in s1 that matches the current character of s2 that we are looking at.

If there are no matches, the previous_row‘s value is copied over to the current_row. Be sure to use dup function to make sure it is a deep copy so that we are not manipulating the same object.

If there is a match, matched_index will hold the index of the first matched character in s1. We have to add 1 to it as the current_row is set to be longer than s1 by 1 (check out the video once again if you don’t understand why).

From there, we will follow the same logic of populating the rest of the current_row.

Note that we can only do the matching index trick for the first character that matches. The rest of the row will need to be recalculated once the change comes in.

Longest Common Substring

A similar approach to solving longest common subsequence solves the problem of Longest Common Substring. The difference between longest common substring and longest common subsequence is that the former must be a contiguous set of characters in the string, while the latter need not be contiguous as long it is sequential.

By the way, “contiguous” means elements in being adjacent to one another, while continuous means never ending.

It also involves a similar table, and we will see how it differs in the algorithm below.

def lcs(s1, s2)
  table = Array.new(s1.length + 1) {
    Array.new(s2.length + 1) {0}
  }

  max = 0
  indices = []
  (1...s1.length + 1).each do |s1i|
    (1...s2.length + 1).each do |s2i|
      if s1[s1i - 1] == s2[s2i - 1] # IMPT
        table[s1i][s2i] =
          table[s1i - 1][s2i - 1] + 1 # IMPT

        # this part onwards depend on the question
        if table[s1i][s2i] > max
          indices = [[s1i, s2i]] # IMPT
          max = table[s1i][s2i]
        elsif table[s1i][s2i] == max
          indices << [s1i, s2i]
        end
      end
    end
  end

  # do something with max or indices or table
end

Relative to the operations required for Longest Common Subsequence, this is much simpler.

In line 2, we create the same kind of table, where it’s longer than the corresponding string along the corresponding dimension by 1, and it is filled up by 0.

We then loop each element in the table, ignoring the first row and column.

We check whether the corresponding characters at each string at the corresponding indices match. Take note to minus 1 from the corresponding index when accessing the character.

If they do not match, we set that element as 0, and since the table is already filled with 0, we do not need to do anything.

Now that the else condition is out of the way, let see what we do when there is a match.

We simply add 1 to the value at the element at the diagonally top left of the current element, as shown in lines 11 and 12. This is essentially the formula.

The other steps at line 15 to 20 will be dependent on what your question is. In this case, I am keeping track of the largest common substrings and taking note of their indices.

Line 19 appends the current indices when there is a tie on the longest substring length, while line 16 nullifies the old records and replaces it with the current indices as there is a new maximum. Note that we are assigning it an array of array.

What’s left is how you want to use the data after the computation. The maximum length of the common substring can be easily obtained from the max variable.

What about to find the longest substring(s)?

def lcs(s1, s2)
  ... # after getting the table and indices
  strings = []
  indices.each do |index_pair|
    index1 = index_pair[0]
    index2 = index_pair[1]
    string = ''
    while table[index1][index2] != 0
      string += s1[index1 - 1] # IMPT
      index1 -= 1
      index2 -= 1
    end
    strings << string.reverse
  end

  strings.uniq
end

We continue from where we left off. We loop through the indices that hold the end positions of the longest common substrings between the 2 strings. These common substrings can be the same, and can also be different.

So for each pair of index in each element on indices, we add the character at the position, and move on to the element in its diagonally top left position. We append the characters as long as the value at its position in the table is not 1, signifying a match.

2 things to note when recording the character:

  • Remember to minus 1 from the index because the table’s dimension is longer that the corresponding string
  • While you can use either string to get the character since they will be the same as there is a match, you need to remember to use the correct index to access the correct string.

The resultant string is the common substring, but in reverse. Make sure to reverse it as shown in line 13.

As there may be repeated common substrings, use uniq like in line 16 to remove duplicates as required.

Z Algorithm

There are many pattern matching/string search algorithm developed over the years. Examples include Knuth-Morris-Pratt algorithm and Rabin-Karp algorithm, Aho-Corasick algorithm and Boyer-Moore string-search algorithm.

Z algorithm is another example and we will look into its implementation.

This video clearly explains the mechanism andI give it full credit. In this article, I will just be highlighting the key points based on the mistakes that I usually made while constructing the algorithm without reference.

Here’s a simple summary and refresher about what z algorithm is.

Z algorithm uses a “z-box” to mark characters whose neighboring patterns have been calculated before and save on those calculations and thus the iterations through them for optimization sake.

The steps involve generating this legendary z-box and

One caveat of the “z-box” occurs where the number subsequent matching characters reaches the end of the z-box. We cannot confidently say that the next character beyond the right end of the z-box does not contribute to a longer pattern match, so we need to take note of how we mark the end of the z-box.

With that, let’s take a look at the algorithm.

def zalgorithm s1, s2
  l = r = index = 0
  s = "#{s1}|#{s2}"
  z = Array.new(s.length) {0}
  index = 1

  while index < s.length
    if index > r
      l = r = index
      while r < s.length && s[r - l] == s[r]
        r += 1
      end
      z[index] = r - l
      r -= 1
    else
      if z[index - l] > r - index # IMPT
        l = index
        while r < s.length && s[r - l] == s[r]
          r += 1
        end
        z[index] = r - l
        r -= 1
      else
        z[index] = z[index - l]
      end
    end

    index += 1
  end

  z
end

We setup the variables required for the loop in lines 2 to 5.

l and r marks the left and right edge of the z-box that can be generated multiple times in the loop.

We combine s1 and s2 separated by a | character, assuming this | character will not appear in either string. We are trying to match s1 pattern where it appear in s2.

z in line 4 is not the z-box. It holds the number of subsequent subsequent that matches the pattern that we are looking for.

Lastly, we set the index to start at 1 because the first character is not matching with anything. And yes, we are starting, possibly, from within s1 to record any repeated pattern within s1 itself.

Now for the loop. It consist of an if-else statement at the top level.

In line 9 to 14, this logic is executed when we are out of the current z-box. We are no longer able to predict the number of matching characters from now on, so it is time to form a new box, and lines 9 to 14 does exactly this.

We set l to match the current index of the current iteration, and we want to find the end of this z-box that we are creating, which means to give r a value.

In line 10, we take care not to exceed the length of the string itself while finding the end of the z-box. We are also matching whether this current character matches the character in s1 at the same order (r - l gives the position at the start of the string, which is the pattern we are trying to match). This will also trigger a stop when we hit the special | character and incorrectly take s2 into account as part of the pattern to match.

We then record down how many of the subsequent characters from the current character matches the pattern, with the same order. Then come the important step at line 14. We do not include the last element that r currently holds as part of the z-box because there is a possibility of the pattern to continue matching with the character(s) beyond the end of the z-box. Taking it out of calculation will prevent this tragedy.

We now take a look at what the algorithm does when it has a z-box and the current iteration is inside it. Line 16 to 25 handles this.

We need to check if the recorded number of repeated string will reach the end of the z-box.

In line 16, index - l is the position of the pattern where the current character is expected to match, and we check its position in z to see if how many characters after that matches it. If this values bring us to beyond the edge of the z-box, we need to regenerate the z-box starting from the current character. Remember, we took out the last element of the z-box in line 14, so the comparator operator here will just be >.

Otherwise, we can simply copy the corresponding values that we recorded initially at the corresponding position, as depicted at line 24.

Regenerating the z-box is done in line 17 to 22 is exactly the same process as the case when we are outside the z-box, with only line 17 being different.

We only set l to be equal to index while r remains at its position, marking the edge of the z-box. This is because we are continuing to expand the z-box and predict the matching patterns of the subsequent characters beyond the edge of the previous z-box. We do not need to repeat what we have calculated from index to r.

Eventually, the z array will contain the number of subsequent characters matching from each index. We can manipulate the values in the z array to achieve what we need, like where are the positions of the full matches, how many full matches were there, etc.

Things to note

You can apply z algorithm on a single string, for the case of finding the suffixes that match the prefix of the string. In this case, you do not need to add the special character | as we did in line 3.

z algorithms find matches to the pattern at the start of the string. It does NOT match subpattern, that is the matching patterns start from anywhere in the pattern but the first character.

Window Sliding Algorithm for Substring problems

The window sliding technique is useful for problems involving consecutive elements. The object is usually to derive some results from some of these substrings, whether it is the minimum or maximum length of the substring that passes certain criteria, or whether substring contains certain characters.

Since this applies to consecutive elements, note that the window sliding technique can also be applied to non string problems, probably involving arrays or even linked lists.

The signature of a window sliding solution looks like this.

def solution args
  # setup variables
  
  window_end = 0 # change start point
  while window_end < array.length
    # deal with the current element first

    # check if certain condition is maintained
    # if not, adjust the start of the window to fulfill maintain the criteria

    window_end += 1
  end

  # return ans
end

The algorithm will involve looping the array to make sure we touch on every element. As we loop though the array, we shift the window along (marked by the ever increasing window_end variable that marks the end of the window) and carry out our logic to find our answer.

Once again, this sliding window method works exactly because we are looking at CONSECUTIVE elements. This property allows us achieve our answer within 1 pass.

At the start of the loop at line 6, we always deal with the current element first. Whether it is appending it to some array that we will need to keep track of, or accumulating its value for some calculation related to the answer, we deal with it first. This puts the current element into the sliding window for analysis, taking into account its effect on the sliding window.

Line 8 and 9 are some comments. They mark the location where the main analysis of the window take place. Here, depending on the question, it may consist of a while loop (yes, an inner while loop) to ascertain the start position of the window, usually when a criteria is not fulfilled.

This inner while loop will continuously shift the start of the window towards the right (the right end of the window is in standstill at this moment) as it seeks out the new start of the window where the criteria is fulfilled.

Now, we will need to track the start of the window, so a window_start variable is required. Remember to increment this window_start variable. With some arithmetic operations with the window_end variable

When it finally reaches the new position that will make the window fulfill the criteria, we process the result of the new sliding window. Maybe we need the length to find a maximum or minimum among all the substrings that fit the criteria, maybe we need to find if the window contains a certain character and note the string that the window forms. Whichever the case, once processed, remember to exit the loop.

However, there are times when you do not need this inner while loop. This happens when the length of the window is specified. For example, finding the substring of length X with the most number of vowels. In this case, there is no need to shift the start of the window constantly in search of the more optimum position of the window since the length of the window has to be fixed.

Instead, we just need to shift once, and use an if statement to check if the window has gone beyond its size.

We also do not need the window_start position of the window since we can derive it from the window_end position and the required length of the window.

When that happens, we need to remove the effect of the element at the start of the window, and then analyse what’s left of the window. Remember, at this point, the current element has been processed and already forms part of the window. Don’t process the current element here again!

Next, line 11 will increment the end of the window as we continue sliding it towards the end of the string.

Credits to this video below which clearly explains this concept. We will write out the algorithms to solve some questions involving the different variants as mentioned above.

So let’s try the example of fixed window length. Find the substring(s) of length 5 that has the most number of vowels. For simplicity sake, we assume the given string will have at least 5 characters and they are all lowercase.

def sw s, k = 5
  first_string = s[0...k]
  strings = [first_string]
  max = first_string.count('aeiou')
  
  window_end = k
  while window_end < s.length
    string = s[window_end - k + 1..window_end]

    count = string.count('aeiou')
    if count > max
      max = count
      strings = [string]
    elsif count == max
      strings << string
    end

    window_end += 1
  end

  strings
end

The code snippets are organized following the signature of the window sliding algorithm as presented above.

Line 2 to 4, we setup the variables needed for the loops later.

In line 6, based on the question and with the prior knowledge, we skip the first qualified substring and save on some iterations in case k gets insanely big.

Line 8 is where we process the current element first. Here we form the string with the current element as the last character.

In line 11, we check the condition: does the string have a higher vowel count than our maximum.

Lines 11 to 16, we process the result. Fairly straightforward here. The key thing to note is that there is no window_start. We do not need to continue looping to find a better start of the window as the size of it is fixed. So no inner while loop.

Remember to increment window_end for the loop to propagate, and lastly return the answer in line 21.

Now let’s look at a question that requires a moving window_start point.

Let’s say you want to find the number of contiguous elements in an array of item that can sum up to reach certain value. These elements can be of any length, but they must be adjacent to each other in the sequence.

def solution(arr, sum)
    count = 0

    window_start = 0
    window_end = 0
    while window_end < arr.length
        window = arr[window_start..window_end]

        loop do
            count += 1 if window.reduce(:+) == sum

            break if window_start == window_end

            window_start += 1
            window = arr[window_start..window_end]
        end

        window_end += 1
    end

    count
end

The gist lies in lines 9 to 16. A loop that continuously shift the start of the window to the right and continuously check the condition to improve the answer.

The key thing here is deciding how the loop is going to start and end, its break condition, and making sure to take care of the scenarios occurring at the edge of the window.

Next lexicographical permutation algorithm

The key idea here is to recognize that the last permutation in lexicographical order occurs when the last character is the smallest while the first character is the largest.

In other words, when the string is lexicographically reversed. And this holds true for substrings, or suffix in particular.

Starting from the end of the string, as we move forward character by character, the moment this reversed lexicographical order gets “reversed”, it means that the index we are at marks the start of a new substring that needs to be in a sorted order.

def solution s
  index = s.length - 1
  small = nil
  while index > 0 # IMPT
    if s[index - 1] >= s[index] # IMPT
      index -= 1
    else
      small = index - 1
      break
    end
  end

  return "no answer" if small.nil?

  char = s[small]
  array = s[small...s.length].chars.sort
  offset = 1
  small_index = array.index(char)
  while array[small_index + offset] == char
    offset += 1
  end

  front = array.delete_at(small_index + offset)
  s[0...small] + array.unshift(front).join('')
end

We set index to be the index of the last character in line 2, then we loop it forward. As we are constantly comparing the character in front of index, we do not end the iteration at the start of the string as there will not be anything in front to compare.

Take note of line 5 where we also ignore the case where the value before is equal to the value at the current index. There is no lexicographical difference between same characters.

In line 8 and 9, we meet the scenario we are looking for, a “reverse” in the reversed lexicographical order of the suffix we are currently iterating through. We assign the index before the current one to the variable small and break the loop to save on the iteration count.

We do an early return in line 13, where the string is already lexicographically reversed and is already in its last lexicographical permutation.

In lines 15 to 21, we are going to find the next lexicographical permutation for the substring we are in.

The idea here is simple. As it stands, we are at the last lexicographical permutation of the suffix with the first character (let’s call this first_character being whatever the index of the string is. So the next permutation has to start with another character that is next in the lexicographical order. Once the new first character is established, the rest of the suffix just need to be sorted as we usually do and we are done.

To find this next character, sort the suffix in ascending order (which should be the lexicographical order) and identify the index of the first_character.

This character will never be the last character in the sorted suffix due to that “reversal” as we were looping from the end of the original string in lines 4 to 10. We always know that there’s at least 1 more character after it.

If the string is made up of unique characters, the next character after it will be the new first character of the new suffix to make up the next lexicographical permutation of the original string.

However, if there can be duplicates in the original string, then we have to make sure the next character has to be different. With that, we use an offset, which will be minimum of value 1 since there’s always a next value and we do not need to worry about hitting out of range.

Line 19 to 20, we start from where the first_character is in the sorted suffix, and continuously increase the offset until the we find a different character that the first_character. Once again, we are looping along a sorted suffix here, and there will always be a character larger than the first_character, so we will definitely find the next character to start the suffix.

With this new character to mark the new suffix, we just have to join it back with the prefix and we will get the next lexicographical permutation. Line 23 and 24 shows just one way to do this.

Longest Palindromic Substring Using Manacher’s Algorithm

The common algorithm to solve for the longest common substring is the Manacher’s algorithm. There are many tutorials that explain it better than I will ever be able to, so I would not be covering the exact logic behind it. The best video explaining this concept is added below.

I would, however, be covering the gist of the code largely for my revision and refresher purpose. But before that, I will like to quickly remind myself what Manacher’s algorithm does not solve.

The Manacher’s algorithm only solves for the length of the longest substring that is a palindrome in a given string. It can also give you information of how many longest palindromic substrings are there.

The Manacher’s algorithm does not tell you what substrings they are nor the indices of where they start or where their centers are.

With that in mind, let’s hop into the algorithm.

def manacher string
  s = "|#{string.chars.join('|')}|"
  array = Array.new(s.length) {0}
  c = r = 0

  i = 1
  while i < s.length - 1 # IMPT
    if i < r
      array[i] = [
        r - i,
        array[2 * c - i]].min # IMPT
    end

    
    loop do
      offset = array[i] + 1
      break if s[i - offset] != s[i + offset]

      array[i] += 1
    end

    if i + array[i] > r # IMPT
      c = i
      r = i + array[i]
    end

    i += 1
  end

  array.max
end

We start off by creating the modified string, s, that has special characters on either side of each character. This is required to ensure all substrings have odd number of characters in them so that there is only 1 true center. Note that these special characters will take part in the formation of the palindrome, but in a genius manner, they do not affect the result.

An array of equal length to the string s is also created to hold the number of characters that match on either side of the corresponding character of the string at each index.

We set c as the center of the possible palindromes that we will be investigating, and r to remember the location of the edge of the palindrome that we have processed at each iteration.

We loop through the string s in lines 7 to 28 starting from index 1 and ending 1 before the last. This is because the characters at either end of the string will not form a proper center of a palindrome and thus there is no need to process them.

Remember, in this algorithm, we are constantly finding new center of possible palindromes and ascertaining the possible lengths of these new centers. As a center, it has to have a least a character on either side. That’s why the first and last characters are disqualified.

In each iteration within lines 7 and 28, we have to check 3 cases.

The first case in lines 8 to 12, we check if we are still within the right edge of the palindrome that we calculated in the previous iteration with center c. If so we can simply copy over the value at the mirror index of the current index i, pivoted around the center c, but with a caveat that we will cover shortly.

The idea here is that within the palindrome of center c, what’s on the right is the same as the left. Hence, if there are sub-palindromes on the right side of c, they will already have been calculated on the left side. We can thus copy over their lengths that have already been recorded in the array variable. 2 * c - 1 will give the mirror index.

However, this holds true only if the length of the palindrome previously recorded at the mirror index does not exceed the right edge of the palindrome of the current iteration. This is because it is possible that the next character after the right edge of the palindrome can chain and form a longer palindrome than what was previously calculated.

If that happens, the minimum length of this potentially longer palindrome is going to be at least r - i.

Hence, we set the value at i of array to the minimum of those 2 lengths. Note that this is not the final answer, we still have to calculate and find out if there’s a longer palindrome that can be formed.

The second case in lines 15 to 20, we ascertain how long a palindrome can be formed with character at i as the center. We compare the characters at an offset on either side of the center i and continually increase until they do not match.

The array[i] sets the offset at a starting value and skips repeated calculations that characters that we have already confirmed to be palindromic with the center i.

Lastly, we check if the new palindrome with the center i extends beyond the right edge of the current palindromic center c. If it does, we set the new values of c and r since they hold the latest data and allow us to iterate further.

Line 27, remember to increase i!

The array now contains the lengths of longest palindromic substring that each character at each index can form as the center. Use the max function to find the answer.

To find the number of longest palindromic substring, find the max value in the array and simply count how many times it appeared.

Anagrams

Palindrome

Cron Job, Elastic Beanstalk, Rails Rake

I ran into some issues lately where the the cron job is not running in my Rails deployed on Amazon Elastic Beanstalk. After much debugging, I found that it was because the environment variables are not loaded when the rake tasks were run.

This caused errors like bundler not installed or XX gems not found to appear all over my logs to frustrate the hell out of my weekends.

Took a deep dive and I found the solution. Or should I say the reason, because it remains unsolved and I had to take inspiration from an ancient war tactic to side step this issue. Before we get to the pseudo-solution, let’s find out the reason. And the main reason is…

Long story short: Rake tasks on cron job with Rails on Elastic Beanstalk will not work with Amazon Linux 2.

This is something I wished someone would have told me before I decided to upgrade my stack to the new version.

According to this pretty recent issue on Github, the Beanstalk team in AWS is aware that environment variables are not retrievable on the AL2 platforms like how it was on the old AL1.

The issue mentioned the fact that environment variables in the Beanstalk environment settings did not load, but according to my experiences, the environment variables on the system’s level are not loaded either.

I’m talking about $GEM_PATH, $BUNDLE_PATH etc. These are needed for ruby and rails, in particular, and the bundler to know where the gems are to run its operations and avoid the cursed bundle: command not found or gem not found errors.

I’m not the best candidate to debug a solution to fix it on AL2 platforms. Furthermore, by the time I can miraculously figure how to load the variables, Amazon would probably have already solved it.

So it’s much wiser to switch back to AL1. Henceforth, I will cover the method to run rails rake tasks via cron jobs in elastic beanstalk.

And yes this solution is indeed the last war tactic from Sun Tzu Art Of War.

These are the files you need.

# .ebextensions/01_cron.config
container_commands:
  01_remove_cron_jobs:
    command: "rm /etc/cron.d/cron_jobs || exit 0"
  02_add_cron_jobs:
    command: "cat .ebextensions/cron_tasks.txt > /etc/cron.d/cron_jobs && chmod 644 /etc/cron.d/cron_jobs"
    leader_only: true
# .ebextensions/cron_tasks.txt
* * * * * root bash -l -c "su -m webapp; cd /var/app/current && rake test"

# Take NOTE it ends with a new line!!

The commands of the 01_cron.config file will be run during deployment.

The first command, 01_remove_ctron_jobs, will remove all cron jobs. Should it fail, it will return a success status code to prevent the deployment procedure from stopping. You can also use ignoreErrors flag to achieve the same thing.

Why do we need to remove the cron jobs each time we deploy? Great question. We will get to that in a bit.

The second command 02_add_cron_jobs will copy the cron job definitions located inside the cron_tasks.txt file into another file. This other file will be located in /etc/cron.d directory, where the cron daemon will pick up files from and run the cron jobs as instructed.

The cron job basically runs the rake task as the webapp user, which has the knowledge of the necessary environment variables to execute rails command related to your application.

Now comes a very important point!

Make sure the last line of cron job end with a new line.

The definition of a cron job will be deemed invalid if it does not end with a line break. Hence, a common error occurs where the last cron job did not run due to this reason. I had this error for breakfast last Sunday.

Lastly, the leader_only flag will be set to ensure your cron job will only run on a single leader instance in your Elastic Beanstalk environment. If you intend to run the cron job on all instances, simply remove it and the option will revert to false as its default.

So back to the cliffhanger. Why do we have to remove the all the cron jobs during deployment?

This is because the leader of the cluster in the elastic beanstalk environment may not stay the same during each deployment. Maybe it was lost, physically, in an annual California wildfire event, or maybe it was scaled down and the leader was lost as a result

Doing this will ensure that at all times, there will only be 1 instance that will be executing your cron job, if so desired.

HOWEVER~

I just lied.

According to this forum thread, the leader of a cluster is determined during the elastic beanstalk environment creation. And it will remain the same throughout all subsequent deployments.

If the leader was lost during a timed scale down event, the leader is lost forever and the cluster in Elastic Beanstalk will remain leaderless until the end of time.

Which means the removal of cron jobs during each deployment is like taking off your pants to fart. It’s trivial.

Well, I’m just placing it there in case a leader reelection feature gets implemented.

The best way to solve this leaderless problem is to use a separate dedicated worker environment and period tasks. It is the official and cleaner way to do cron jobs anyway. Having your main server clusters running cron jobs eat at their computation prowess and that is something you would should architect your system to avoid.

The downside of it is that you will need to consider the cost of having another server to handle this.

Natural Sorting With Datatables

The most significant difference between natural sorting and your typical sorting is the order of a string when there are digits involved. For instance, the string “something2” will be placed before the string “something10” under normal sorting, but 10 is more than 2 for that matter.

This default sorting algorithm causes sorting problems with Datatables and we will see 2 ways to solve it, using a natural sort plugin and the data-sort or data-order attribute.

Natural Sort Plugin In Datatables

The documentation for this plugin lives here. This is how it is implemented in my Rails projects that are running on Turbolinks.

First, install the datatables plugin package via the command below:

yarn add datatables.net-plugins

In the javascript file, the snippet looks like that.

// app/packs/any.js
require('datatables.net-plugins/sorting/natural’)

document.addEventListener("turbolinks:load", () => {
  if (dataTables.length === 0 && $('.data-table').length !== 0) {
    $('.data-table').each((_, element) => {
      dataTables.push($(element).DataTable({
        pageLength: 50,
        columnDefs: [
          { type: 'natural-nohtml', targets: '_all' }
        ]
      }));
    });
  }
});

document.addEventListener("turbolinks:before-cache", () => {
  while (dataTables.length !== 0) {
    dataTables.pop().destroy();
  }
});

There is quite a bit of complexity in implementing this. The explanation can be found in this article and is largely due to the need to adapt to a turbolinks driven environment.

The key implementation takes place in line 9 and 10. The natural-nohtml type is specified to strip any html during sorting, while the _all value for the target key means to apply natural sort as the default sorting for all columns. More information and configuration options can be found in its documentation.

data-sort Or data-order Attribute

We can use the data-sort or data-order attribute of the table cell to indicate to Datatables to use these value to do the sorting instead of the values in the cell.

<td data-order="02">2</td>
<td data-order="10">10</td>

This will place the string in the correct numerical order. However, you would have to do the heavy lifting of padding the digits with the appropriate number of 0s.

It is a more useful feature if you want to sort values with a vastly different display value that its actual value that can mess up the sorting order, like date.

<td data-order="1332979200">Thu 29th Mar 12</td>
<td data-order="1354406400">Sun 2nd Dec 12</td>

Under normal sorting, the second <td> element will be placed above the first because ‘S’ comes before ‘T’. However, when we dictate the sort order to be using their epoch timestamp, the story will be different.