When working on feature tests on a non headless browser, the automated browser will tend to take control over our display and we tend to lose focus. That means while the test is ongoing, we cannot do anything but to wait for the test to run its course. And that sucks.
Fast forward to 2020 (actually it’s been around for quite a while but I just couldn’t set it up properly, until recently), you can setup headless chrome to run feature tests in your Rails application with almost no custom code that are suggested all over stackoverflow. I will go through how using Capybara and Selenium.
Install Latest Capybara and Webdrivers gems
It is important to make sure you are using the latest version of gems. Learn how to find the latest ruby gem version here.
# Gemfile
group :test do
gem 'capybara', '~> 3.34'
gem 'webdrivers', '~> 4.0'
end
Setup the Configuration for Selenium Headless Chrome
Setup this configuration in your rails_helper.rb file.
Line 3 specifies to use the default selenium chrome headless browser as the driver to run your feature tests. This driver comes with the latest capybara gem as one of the default drivers.
Line 4 specifies to use the default selenium chrome browser to run your tests. This setting will run browser in the foreground and you will lose control of your machine. It is commented out, but you probably might need it when you are debugging your tests, so I will leave it here. To use it, simply uncomment it and comment out line 3.
And there you have it!
That is all the configuration you need. The whole setup is already incorporated into the latest gems so you do not need to add in any more custom code.
I tend to forget the flow of an algorithm and the reasons for executing certain steps that may be pivotal in getting the correct algorithm. So as a revision and as a post to my future self, here are my notes for the common algorithms related to strings.
Longest Common Subsequence
There’s a number of ways to solve this. First is using recursion, which has a caveat but forms the core to the optimized solution using dynamic programming. Let’s take a look at solving for longest common subsequence problems using recursion.
Solving Longest Common Subsequence problems using Recursion
def lcs(s1, s2)
define_method :recursion do |index1, index2|
case true
when index1 == s1.length || index2 == s2.length
return 0
when s1[index1] == s2[index2]
return 1 + recursion(index1 + 1, index2 + 1)
else
return [
recursion(index1 + 1, index2),
recursion(index1, index2 + 1)
].max
end
end
recursion(0, 0)
end
lcs(s1,s2)
Note: the define_method is a ruby method commonly used in dynamic programming in ruby. I’m using it lazily here to allow the nested recursion method to be able to access the variables s1 and s2 in the parent scope. It is not necessary to use it and there are other ways to construct this.
The idea here is to constantly compare each character in s1 with each character in s2 to accumulate the count if their characters match, and move on to the next character along each string.
The breaking function, which will cause the recursion algorithm to stop, is when we reach the end of either string. In that scenario, the count will not be incremented.
The caveat here is that there is a repetition of computation of the characters of either string as we progress through the other.
Some memoization will help with this longest common subsequence problem. We do that with a table.
Solving Longest Common Subsequence problems using Dynamic Programming and Memoization
def lcs(s1, s2)
table = Array.new(s1.length + 1) {
Array.new(s2.length + 1) {0}
}
(1...s1.length + 1).each do |s1i|
(1...s2.length + 1).each do |s2i|
if s1[s1i - 1] == s2[s2i - 1] # IMPT
table[s1i][s2i] =
table[s1i - 1][s2i - 1] + 1 # IMPT
else
table[s1i][s2i] = [
table[s1i - 1][s2i],
table[s1i][s2i - 1]
].max
end
end
end
table.last.last
end
The approach here is to construct a table that will memorize the steps calculated before. This is the common approach to LCS questions and I believe the video below best explain the theory behind this.
Some things to note.
The last element of the table will accumulate the length of common subsequence, so looking at the last element in the table will get you the maximum.
In line 6, make sure to minus 1 off the index when getting the character in the string. This is because the table has 1 more element than the string along its corresponding dimension, so we need remove the offset.
Take note of the order of index to call in the multidimensional table. I tend to mix them. We are traversing vertically down the table first, then horizontally, and accessing the element is index from the vertical first then the horizontal, which is, unfortunately, opposite of the coordinate system where the horizontal (x) comes before the vertical (y).
To get the subsequence (not the length), you will need to trace back from the table, as explained in the video. Here is the code for that.
def lcs s1, s2
... # after getting the table
i = s1.length
j = s2.length
string = ''
while table[i][j] != 0
value = table[i][j]
case true
when value == table[i][j-1]
j -= 1
when value == table[i-1][j]
i -= 1
else
string += s2[j - 1] # NOTE
i -= 1
j -= 1
end
end
string.reverse
end
We start off at the last element (bottom right cell) of the table, where i and j is the length of each string respectively.
Remember, the table has 1 more element on each dimension than its respective string, so assigning each string’s length to i and j respectively will point to the last element in the table.
In line 8, we start the loop of backtracing.
The breaking condition is to check if the current element in the table is 0 or not. If it is zero, it means all possible matching characters have been accounted for and there are no more matching character moving forward along to the front of either string. We do not need to worry about going out of index also, because the 1st horizontal and vertical of the table contains 0, so the backtrace will cease when it reaches there under this same breaking condition.
Line 12 to 15, we check if the current value in the table is derived from the table element on top of it or on its left. The order does not matter, as either way will bring us to the character that is eventually matching on both string at their respective correct position.
In line 17 to 19, the value in the current table element does not have the same value as either its top and left elements. This means there was a match and it got its value from the diagonal element. Remember, when there was a match, the table element will take the diagonally top left element and add 1 to itself.
On line 17, note that you can use either character from either string, since they are matched. Just make sure you are using the correct index for the correct string. Remember to offset 1 from the index as the table has 1 more element that the string in the corresponding dimension.
We append the matching character to a string and continue to loop. Eventually the string will contain the matching characters and we just have to reverse it to get the longest common subsequence in the correct order.
However, if you just need the length of the longest common subsequence, you can do some space optimization by using 2 rows instead of a table. Afterall, the only rows involved in the computation is the current and the previous row.
The idea here is to set the current_row as the previous_row once the row has completed its round of iteration, because the current_row will become the previous_row in the next iteration.
def lcs s1, s2
previous_row = Array.new(s1.length + 1) {0}
current_row = Array.new(s1.length + 1) {0}
row_index = 1
while row_index < s2.length + 1
(1...current_row.length).each do |col_index|
if s1[col_index - 1] == s2[row_index - 1] # IMPT
current_row[col_index] =
previous_row[col_index - 1] + 1
else
current_row[col_index] = [
previous_row[col_index],
current_row[col_index - 1]
].max
end
end
previous_row, current_row =
current_row, previous_row
row_index += 1
end
previous_row.last
end
With that in mind, we can further optimize by observing the pattern that all the elements, starting from the front, in the previous_row will have the same value as the elements in the current_row until the element where there is a match. From there on, the same calculation resumes.
If there is no match, the whole current_row will have the same values as the previous_row.
Hence, we can skip the loop for these elements that we know will stay the same.
Here are some differences with the previous algorithm.
Line 8 will check if there is any character in s1 that matches the current character of s2 that we are looking at.
If there are no matches, the previous_row‘s value is copied over to the current_row. Be sure to use dup function to make sure it is a deep copy so that we are not manipulating the same object.
If there is a match, matched_index will hold the index of the first matched character in s1. We have to add 1 to it as the current_row is set to be longer than s1 by 1 (check out the video once again if you don’t understand why).
From there, we will follow the same logic of populating the rest of the current_row.
Note that we can only do the matching index trick for the first character that matches. The rest of the row will need to be recalculated once the change comes in.
Longest Common Substring
A similar approach to solving longest common subsequence solves the problem of Longest Common Substring. The difference between longest common substring and longest common subsequence is that the former must be a contiguous set of characters in the string, while the latter need not be contiguous as long it is sequential.
By the way, “contiguous” means elements in being adjacent to one another, while continuous means never ending.
It also involves a similar table, and we will see how it differs in the algorithm below.
def lcs(s1, s2)
table = Array.new(s1.length + 1) {
Array.new(s2.length + 1) {0}
}
max = 0
indices = []
(1...s1.length + 1).each do |s1i|
(1...s2.length + 1).each do |s2i|
if s1[s1i - 1] == s2[s2i - 1] # IMPT
table[s1i][s2i] =
table[s1i - 1][s2i - 1] + 1 # IMPT
# this part onwards depend on the question
if table[s1i][s2i] > max
indices = [[s1i, s2i]] # IMPT
max = table[s1i][s2i]
elsif table[s1i][s2i] == max
indices << [s1i, s2i]
end
end
end
end
# do something with max or indices or table
end
Relative to the operations required for Longest Common Subsequence, this is much simpler.
In line 2, we create the same kind of table, where it’s longer than the corresponding string along the corresponding dimension by 1, and it is filled up by 0.
We then loop each element in the table, ignoring the first row and column.
We check whether the corresponding characters at each string at the corresponding indices match. Take note to minus 1 from the corresponding index when accessing the character.
If they do not match, we set that element as 0, and since the table is already filled with 0, we do not need to do anything.
Now that the else condition is out of the way, let see what we do when there is a match.
We simply add 1 to the value at the element at the diagonally top left of the current element, as shown in lines 11 and 12. This is essentially the formula.
The other steps at line 15 to 20 will be dependent on what your question is. In this case, I am keeping track of the largest common substrings and taking note of their indices.
Line 19 appends the current indices when there is a tie on the longest substring length, while line 16 nullifies the old records and replaces it with the current indices as there is a new maximum. Note that we are assigning it an array of array.
What’s left is how you want to use the data after the computation. The maximum length of the common substring can be easily obtained from the max variable.
What about to find the longest substring(s)?
def lcs(s1, s2)
... # after getting the table and indices
strings = []
indices.each do |index_pair|
index1 = index_pair[0]
index2 = index_pair[1]
string = ''
while table[index1][index2] != 0
string += s1[index1 - 1] # IMPT
index1 -= 1
index2 -= 1
end
strings << string.reverse
end
strings.uniq
end
We continue from where we left off. We loop through the indices that hold the end positions of the longest common substrings between the 2 strings. These common substrings can be the same, and can also be different.
So for each pair of index in each element on indices, we add the character at the position, and move on to the element in its diagonally top left position. We append the characters as long as the value at its position in the table is not 1, signifying a match.
2 things to note when recording the character:
Remember to minus 1 from the index because the table’s dimension is longer that the corresponding string
While you can use either string to get the character since they will be the same as there is a match, you need to remember to use the correct index to access the correct string.
The resultant string is the common substring, but in reverse. Make sure to reverse it as shown in line 13.
As there may be repeated common substrings, use uniq like in line 16 to remove duplicates as required.
Z algorithm is another example and we will look into its implementation.
This video clearly explains the mechanism andI give it full credit. In this article, I will just be highlighting the key points based on the mistakes that I usually made while constructing the algorithm without reference.
Here’s a simple summary and refresher about what z algorithm is.
Z algorithm uses a “z-box” to mark characters whose neighboring patterns have been calculated before and save on those calculations and thus the iterations through them for optimization sake.
The steps involve generating this legendary z-box and
One caveat of the “z-box” occurs where the number subsequent matching characters reaches the end of the z-box. We cannot confidently say that the next character beyond the right end of the z-box does not contribute to a longer pattern match, so we need to take note of how we mark the end of the z-box.
With that, let’s take a look at the algorithm.
def zalgorithm s1, s2
l = r = index = 0
s = "#{s1}|#{s2}"
z = Array.new(s.length) {0}
index = 1
while index < s.length
if index > r
l = r = index
while r < s.length && s[r - l] == s[r]
r += 1
end
z[index] = r - l
r -= 1
else
if z[index - l] > r - index # IMPT
l = index
while r < s.length && s[r - l] == s[r]
r += 1
end
z[index] = r - l
r -= 1
else
z[index] = z[index - l]
end
end
index += 1
end
z
end
We setup the variables required for the loop in lines 2 to 5.
l and r marks the left and right edge of the z-box that can be generated multiple times in the loop.
We combine s1 and s2 separated by a | character, assuming this | character will not appear in either string. We are trying to match s1 pattern where it appear in s2.
z in line 4 is not the z-box. It holds the number of subsequent subsequent that matches the pattern that we are looking for.
Lastly, we set the index to start at 1 because the first character is not matching with anything. And yes, we are starting, possibly, from within s1 to record any repeated pattern within s1 itself.
Now for the loop. It consist of an if-else statement at the top level.
In line 9 to 14, this logic is executed when we are out of the current z-box. We are no longer able to predict the number of matching characters from now on, so it is time to form a new box, and lines 9 to 14 does exactly this.
We set l to match the current index of the current iteration, and we want to find the end of this z-box that we are creating, which means to give r a value.
In line 10, we take care not to exceed the length of the string itself while finding the end of the z-box. We are also matching whether this current character matches the character in s1 at the same order (r - l gives the position at the start of the string, which is the pattern we are trying to match). This will also trigger a stop when we hit the special | character and incorrectly take s2 into account as part of the pattern to match.
We then record down how many of the subsequent characters from the current character matches the pattern, with the same order. Then come the important step at line 14. We do not include the last element that r currently holds as part of the z-box because there is a possibility of the pattern to continue matching with the character(s) beyond the end of the z-box. Taking it out of calculation will prevent this tragedy.
We now take a look at what the algorithm does when it has a z-box and the current iteration is inside it. Line 16 to 25 handles this.
We need to check if the recorded number of repeated string will reach the end of the z-box.
In line 16, index - l is the position of the pattern where the current character is expected to match, and we check its position in z to see if how many characters after that matches it. If this values bring us to beyond the edge of the z-box, we need to regenerate the z-box starting from the current character. Remember, we took out the last element of the z-box in line 14, so the comparator operator here will just be >.
Otherwise, we can simply copy the corresponding values that we recorded initially at the corresponding position, as depicted at line 24.
Regenerating the z-box is done in line 17 to 22 is exactly the same process as the case when we are outside the z-box, with only line 17 being different.
We only set l to be equal to index while r remains at its position, marking the edge of the z-box. This is because we are continuing to expand the z-box and predict the matching patterns of the subsequent characters beyond the edge of the previous z-box. We do not need to repeat what we have calculated from index to r.
Eventually, the z array will contain the number of subsequent characters matching from each index. We can manipulate the values in the z array to achieve what we need, like where are the positions of the full matches, how many full matches were there, etc.
Things to note
You can apply z algorithm on a single string, for the case of finding the suffixes that match the prefix of the string. In this case, you do not need to add the special character | as we did in line 3.
z algorithms find matches to the pattern at the start of the string. It does NOT match subpattern, that is the matching patterns start from anywhere in the pattern but the first character.
Window Sliding Algorithm for Substring problems
The window sliding technique is useful for problems involving consecutive elements. The object is usually to derive some results from some of these substrings, whether it is the minimum or maximum length of the substring that passes certain criteria, or whether substring contains certain characters.
Since this applies to consecutive elements, note that the window sliding technique can also be applied to non string problems, probably involving arrays or even linked lists.
The signature of a window sliding solution looks like this.
def solution args
# setup variables
window_end = 0 # change start point
while window_end < array.length
# deal with the current element first
# check if certain condition is maintained
# if not, adjust the start of the window to fulfill maintain the criteria
window_end += 1
end
# return ans
end
The algorithm will involve looping the array to make sure we touch on every element. As we loop though the array, we shift the window along (marked by the ever increasing window_end variable that marks the end of the window) and carry out our logic to find our answer.
Once again, this sliding window method works exactly because we are looking at CONSECUTIVE elements. This property allows us achieve our answer within 1 pass.
At the start of the loop at line 6, we always deal with the current element first. Whether it is appending it to some array that we will need to keep track of, or accumulating its value for some calculation related to the answer, we deal with it first. This puts the current element into the sliding window for analysis, taking into account its effect on the sliding window.
Line 8 and 9 are some comments. They mark the location where the main analysis of the window take place. Here, depending on the question, it may consist of a while loop (yes, an inner while loop) to ascertain the start position of the window, usually when a criteria is not fulfilled.
This inner while loop will continuously shift the start of the window towards the right (the right end of the window is in standstill at this moment) as it seeks out the new start of the window where the criteria is fulfilled.
Now, we will need to track the start of the window, so a window_start variable is required. Remember to increment this window_start variable. With some arithmetic operations with the window_end variable
When it finally reaches the new position that will make the window fulfill the criteria, we process the result of the new sliding window. Maybe we need the length to find a maximum or minimum among all the substrings that fit the criteria, maybe we need to find if the window contains a certain character and note the string that the window forms. Whichever the case, once processed, remember to exit the loop.
However, there are times when you do not need this inner while loop. This happens when the length of the window is specified. For example, finding the substring of length X with the most number of vowels. In this case, there is no need to shift the start of the window constantly in search of the more optimum position of the window since the length of the window has to be fixed.
Instead, we just need to shift once, and use an if statement to check if the window has gone beyond its size.
We also do not need the window_start position of the window since we can derive it from the window_end position and the required length of the window.
When that happens, we need to remove the effect of the element at the start of the window, and then analyse what’s left of the window. Remember, at this point, the current element has been processed and already forms part of the window. Don’t process the current element here again!
Next, line 11 will increment the end of the window as we continue sliding it towards the end of the string.
Credits to this video below which clearly explains this concept. We will write out the algorithms to solve some questions involving the different variants as mentioned above.
So let’s try the example of fixed window length. Find the substring(s) of length 5 that has the most number of vowels. For simplicity sake, we assume the given string will have at least 5 characters and they are all lowercase.
def sw s, k = 5
first_string = s[0...k]
strings = [first_string]
max = first_string.count('aeiou')
window_end = k
while window_end < s.length
string = s[window_end - k + 1..window_end]
count = string.count('aeiou')
if count > max
max = count
strings = [string]
elsif count == max
strings << string
end
window_end += 1
end
strings
end
The code snippets are organized following the signature of the window sliding algorithm as presented above.
Line 2 to 4, we setup the variables needed for the loops later.
In line 6, based on the question and with the prior knowledge, we skip the first qualified substring and save on some iterations in case k gets insanely big.
Line 8 is where we process the current element first. Here we form the string with the current element as the last character.
In line 11, we check the condition: does the string have a higher vowel count than our maximum.
Lines 11 to 16, we process the result. Fairly straightforward here. The key thing to note is that there is no window_start. We do not need to continue looping to find a better start of the window as the size of it is fixed. So no inner while loop.
Remember to increment window_end for the loop to propagate, and lastly return the answer in line 21.
Now let’s look at a question that requires a moving window_start point.
Let’s say you want to find the number of contiguous elements in an array of item that can sum up to reach certain value. These elements can be of any length, but they must be adjacent to each other in the sequence.
def solution(arr, sum)
count = 0
window_start = 0
window_end = 0
while window_end < arr.length
window = arr[window_start..window_end]
loop do
count += 1 if window.reduce(:+) == sum
break if window_start == window_end
window_start += 1
window = arr[window_start..window_end]
end
window_end += 1
end
count
end
The gist lies in lines 9 to 16. A loop that continuously shift the start of the window to the right and continuously check the condition to improve the answer.
The key thing here is deciding how the loop is going to start and end, its break condition, and making sure to take care of the scenarios occurring at the edge of the window.
Next lexicographical permutation algorithm
The key idea here is to recognize that the last permutation in lexicographical order occurs when the last character is the smallest while the first character is the largest.
In other words, when the string is lexicographically reversed. And this holds true for substrings, or suffix in particular.
Starting from the end of the string, as we move forward character by character, the moment this reversed lexicographical order gets “reversed”, it means that the index we are at marks the start of a new substring that needs to be in a sorted order.
def solution s
index = s.length - 1
small = nil
while index > 0 # IMPT
if s[index - 1] >= s[index] # IMPT
index -= 1
else
small = index - 1
break
end
end
return "no answer" if small.nil?
char = s[small]
array = s[small...s.length].chars.sort
offset = 1
small_index = array.index(char)
while array[small_index + offset] == char
offset += 1
end
front = array.delete_at(small_index + offset)
s[0...small] + array.unshift(front).join('')
end
We set index to be the index of the last character in line 2, then we loop it forward. As we are constantly comparing the character in front of index, we do not end the iteration at the start of the string as there will not be anything in front to compare.
Take note of line 5 where we also ignore the case where the value before is equal to the value at the current index. There is no lexicographical difference between same characters.
In line 8 and 9, we meet the scenario we are looking for, a “reverse” in the reversed lexicographical order of the suffix we are currently iterating through. We assign the index before the current one to the variable small and break the loop to save on the iteration count.
We do an early return in line 13, where the string is already lexicographically reversed and is already in its last lexicographical permutation.
In lines 15 to 21, we are going to find the next lexicographical permutation for the substring we are in.
The idea here is simple. As it stands, we are at the last lexicographical permutation of the suffix with the first character (let’s call this first_character being whatever the index of the string is. So the next permutation has to start with another character that is next in the lexicographical order. Once the new first character is established, the rest of the suffix just need to be sorted as we usually do and we are done.
To find this next character, sort the suffix in ascending order (which should be the lexicographical order) and identify the index of the first_character.
This character will never be the last character in the sorted suffix due to that “reversal” as we were looping from the end of the original string in lines 4 to 10. We always know that there’s at least 1 more character after it.
If the string is made up of unique characters, the next character after it will be the new first character of the new suffix to make up the next lexicographical permutation of the original string.
However, if there can be duplicates in the original string, then we have to make sure the next character has to be different. With that, we use an offset, which will be minimum of value 1 since there’s always a next value and we do not need to worry about hitting out of range.
Line 19 to 20, we start from where the first_character is in the sorted suffix, and continuously increase the offset until the we find a different character that the first_character. Once again, we are looping along a sorted suffix here, and there will always be a character larger than the first_character, so we will definitely find the next character to start the suffix.
With this new character to mark the new suffix, we just have to join it back with the prefix and we will get the next lexicographical permutation. Line 23 and 24 shows just one way to do this.
Longest Palindromic Substring Using Manacher’s Algorithm
The common algorithm to solve for the longest common substring is the Manacher’s algorithm. There are many tutorials that explain it better than I will ever be able to, so I would not be covering the exact logic behind it. The best video explaining this concept is added below.
I would, however, be covering the gist of the code largely for my revision and refresher purpose. But before that, I will like to quickly remind myself what Manacher’s algorithm does not solve.
The Manacher’s algorithm only solves for the length of the longest substring that is a palindrome in a given string. It can also give you information of how many longest palindromic substrings are there.
The Manacher’s algorithm does not tell you what substrings they are nor the indices of where they start or where their centers are.
With that in mind, let’s hop into the algorithm.
def manacher string
s = "|#{string.chars.join('|')}|"
array = Array.new(s.length) {0}
c = r = 0
i = 1
while i < s.length - 1 # IMPT
if i < r
array[i] = [
r - i,
array[2 * c - i]].min # IMPT
end
loop do
offset = array[i] + 1
break if s[i - offset] != s[i + offset]
array[i] += 1
end
if i + array[i] > r # IMPT
c = i
r = i + array[i]
end
i += 1
end
array.max
end
We start off by creating the modified string, s, that has special characters on either side of each character. This is required to ensure all substrings have odd number of characters in them so that there is only 1 true center. Note that these special characters will take part in the formation of the palindrome, but in a genius manner, they do not affect the result.
An array of equal length to the string s is also created to hold the number of characters that match on either side of the corresponding character of the string at each index.
We set c as the center of the possible palindromes that we will be investigating, and r to remember the location of the edge of the palindrome that we have processed at each iteration.
We loop through the string s in lines 7 to 28 starting from index 1 and ending 1 before the last. This is because the characters at either end of the string will not form a proper center of a palindrome and thus there is no need to process them.
Remember, in this algorithm, we are constantly finding new center of possible palindromes and ascertaining the possible lengths of these new centers. As a center, it has to have a least a character on either side. That’s why the first and last characters are disqualified.
In each iteration within lines 7 and 28, we have to check 3 cases.
The first case in lines 8 to 12, we check if we are still within the right edge of the palindrome that we calculated in the previous iteration with center c. If so we can simply copy over the value at the mirror index of the current index i, pivoted around the center c, but with a caveat that we will cover shortly.
The idea here is that within the palindrome of center c, what’s on the right is the same as the left. Hence, if there are sub-palindromes on the right side of c, they will already have been calculated on the left side. We can thus copy over their lengths that have already been recorded in the array variable. 2 * c - 1 will give the mirror index.
However, this holds true only if the length of the palindrome previously recorded at the mirror index does not exceed the right edge of the palindrome of the current iteration. This is because it is possible that the next character after the right edge of the palindrome can chain and form a longer palindrome than what was previously calculated.
If that happens, the minimum length of this potentially longer palindrome is going to be at least r - i.
Hence, we set the value at i of array to the minimum of those 2 lengths. Note that this is not the final answer, we still have to calculate and find out if there’s a longer palindrome that can be formed.
The second case in lines 15 to 20, we ascertain how long a palindrome can be formed with character at i as the center. We compare the characters at an offset on either side of the center i and continually increase until they do not match.
The array[i] sets the offset at a starting value and skips repeated calculations that characters that we have already confirmed to be palindromic with the center i.
Lastly, we check if the new palindrome with the center i extends beyond the right edge of the current palindromic center c. If it does, we set the new values of c and r since they hold the latest data and allow us to iterate further.
Line 27, remember to increase i!
The array now contains the lengths of longest palindromic substring that each character at each index can form as the center. Use the max function to find the answer.
To find the number of longest palindromic substring, find the max value in the array and simply count how many times it appeared.
I ran into some issues lately where the the cron job is not running in my Rails deployed on Amazon Elastic Beanstalk. After much debugging, I found that it was because the environment variables are not loaded when the rake tasks were run.
This caused errors like bundler not installed or XX gems not found to appear all over my logs to frustrate the hell out of my weekends.
Took a deep dive and I found the solution. Or should I say the reason, because it remains unsolved and I had to take inspiration from an ancient war tactic to side step this issue. Before we get to the pseudo-solution, let’s find out the reason. And the main reason is…
Long story short: Rake tasks on cron job with Rails on Elastic Beanstalk will not work with Amazon Linux 2.
This is something I wished someone would have told me before I decided to upgrade my stack to the new version.
According to this pretty recent issue on Github, the Beanstalk team in AWS is aware that environment variables are not retrievable on the AL2 platforms like how it was on the old AL1.
The issue mentioned the fact that environment variables in the Beanstalk environment settings did not load, but according to my experiences, the environment variables on the system’s level are not loaded either.
I’m talking about $GEM_PATH, $BUNDLE_PATH etc. These are needed for ruby and rails, in particular, and the bundler to know where the gems are to run its operations and avoid the cursed bundle: command not found or gem not found errors.
I’m not the best candidate to debug a solution to fix it on AL2 platforms. Furthermore, by the time I can miraculously figure how to load the variables, Amazon would probably have already solved it.
So it’s much wiser to switch back to AL1. Henceforth, I will cover the method to run rails rake tasks via cron jobs in elastic beanstalk.
# .ebextensions/cron_tasks.txt
* * * * * root bash -l -c "su -m webapp; cd /var/app/current && rake test"
# Take NOTE it ends with a new line!!
The commands of the 01_cron.config file will be run during deployment.
The first command, 01_remove_ctron_jobs, will remove all cron jobs. Should it fail, it will return a success status code to prevent the deployment procedure from stopping. You can also use ignoreErrors flag to achieve the same thing.
Why do we need to remove the cron jobs each time we deploy? Great question. We will get to that in a bit.
The second command 02_add_cron_jobs will copy the cron job definitions located inside the cron_tasks.txt file into another file. This other file will be located in /etc/cron.d directory, where the cron daemon will pick up files from and run the cron jobs as instructed.
The cron job basically runs the rake task as the webapp user, which has the knowledge of the necessary environment variables to execute rails command related to your application.
Now comes a very important point!
Make sure the last line of cron job end with a new line.
The definition of a cron job will be deemed invalid if it does not end with a line break. Hence, a common error occurs where the last cron job did not run due to this reason. I had this error for breakfast last Sunday.
So back to the cliffhanger. Why do we have to remove the all the cron jobs during deployment?
This is because the leader of the cluster in the elastic beanstalk environment may not stay the same during each deployment. Maybe it was lost, physically, in an annual California wildfire event, or maybe it was scaled down and the leader was lost as a result
Doing this will ensure that at all times, there will only be 1 instance that will be executing your cron job, if so desired.
HOWEVER~
I just lied.
According to this forum thread, the leader of a cluster is determined during the elastic beanstalk environment creation. And it will remain the same throughout all subsequent deployments.
If the leader was lost during a timed scale down event, the leader is lost forever and the cluster in Elastic Beanstalk will remain leaderless until the end of time.
Which means the removal of cron jobs during each deployment is like taking off your pants to fart. It’s trivial.
Well, I’m just placing it there in case a leader reelection feature gets implemented.
The best way to solve this leaderless problem is to use a separate dedicated worker environment and period tasks. It is the official and cleaner way to do cron jobs anyway. Having your main server clusters running cron jobs eat at their computation prowess and that is something you would should architect your system to avoid.
The downside of it is that you will need to consider the cost of having another server to handle this.
This is a summary of the key features that make up an algorithm.
Motivation
While it is easy to understand the concept of each sort algorithm, I find it difficult for me to remember the key steps that define an algorithm or is characteristic of that algorithm. That key step may prove to be pivotal in solving the algorithm but may come in as insignificant from the macro view of its concept.
Hence, I decided that it will be better for me to jot the key pointers down.
Quicksort
def quicksort array
recursion(array, 0, array.length - 1) #IMPT
array
end
def recursion array, start, finish
if start < finish # IMPT
pivot_index = partition(array, start, finish)
recursion(array, start, pivot_index - 1) # IMPT
recursion(array, pivot_index + 1, finish) # IMPT
end
end
def partition array, start, finish
pivot = array[finish]
pivot_index = start
(start...finish).each do |index| # IMPT
if array[index] <= pivot # NOTE
array[index], array[pivot_index] =
array[pivot_index], array[index]
pivot_index += 1
end
end
array[finish], array[pivot_index] =
array[pivot_index], array[finish]
pivot_index
end
Key Steps
Gist: Using a pivot value, distribute the array into 2 halves that are not ordered, but are collectively smaller on the left side and collectively larger on the right side.
The array is mutated.
The pivot value of each iteration will find its rightful position in the array at every iteration, eventually leading to a sorted array.
Discussions
Let’s start with the recursion function.
In the recursion function, note that the arguments are indices of the array, not the length. Keep this at the back of your mind so that you can understand when to end a loop.
Line 7 ensures we are iterating at least 2 elements.
In lines 9 and 10, the recursion occurs on either side of the pivot index in that iteration. Note that the pivot index does not participate in the next recursion, since it is already at where it belongs.
Now for the partition function.
In the partition function, the pivot does not participate in the reordering. Line 18 ensures the loop ends before reaching the last index, finish, which is the pivot, with the non-inclusive range constructor operator.
In the loop in line 18, we are are trying to push the values smaller than or equal to the pivot to the left here. It is also ok to use <.
We do so by swapping them with those that are bigger than the pivot, but exist on the left of those that are smaller.
The pivot_index increments at each swap and remembers the last position that was swapped. Hence, at the end of the loop, it holds the position of the first value that is bigger than the pivot. Everything on the left is either smaller than or equal to the pivot.
This is where the pivot belongs to in the array. We swap the pivot into that position. Ascend the throne!
The state of the array does not change in this last swap: all elements on the left of the pivot is still smaller or equal to the pivot, while all elements on the right of the pivot is still bigger than the pivot. They remain unsorted
The function returns the pivot’s position to the parent recursion function, which needs it to know where to split the array for the next iteration.
Lastly, let’s go back to the calling function where the initial recursion function is triggered. Make sure to pass in the last index of the array instead of its length.
Mergesort
def merge_sort(list)
return list if list.length <= 1
mid = list.length / 2
left = merge_sort(list[0...mid])
right = merge_sort(list[mid...list.length])
merge(left, right)
end
def merge(left, right)
return right if left.empty?
return left if right.empty?
if left.first <= right.first
[left.first] + merge(left[1...left.length], right)
else
[right.first] + merge(left, right[1...right.length])
end
end
Key Steps
Gist: first recursively halve array until we are dealing with 1 element, then recursively merge the elements back in a sorted order until we get back the array of the same size, and now it will be sorted
A recursive function that consist of 2 parts in order: recursively split and recursively merge
The array will be mutated
In lines 16 and 18, we are continuously appending the smaller of the first element on the right vs left array.
Lines 11 and 13 will take care of the comparison that is still ongoing, when 1 side has been fully appended while the other still have elements inside. Since these arrays are already sorted at whichever iteration, we can just append the whole array.
Remember the breaking function in line 2
Unfortunately, while this use of recursion is great, the number of recursions may become too excessive and cause a “stack level too deep” error.
We may need to to think prepare an alternative if the stack overflows.
def merge_sort(list)
return list if list.length <= 1
mid = list.length / 2
left = merge_sort(list[0...mid])
right = merge_sort(list[mid...list.length])
merge(list, left, right)
end
def merge(array, left, right)
left_index = 0
right_index = 0
index = 0
while left_index < left.length &&
right_index < right.length
if left[left_index] <= right[right_index]
array[index] = left[left_index]
left_index += 1
else
array[index] = right[right_index]
right_index += 1
end
index += 1
end
array[index...index + left.length - left_index] =
left[left_index...left.length]
array[index...index + right.length - right_index] =
right[right_index...right.length]
array
end
Line 15 till 36 basically carry out the operation with a while loop instead of recursion. It mutates the array along the way.
Insertion sort
Key Steps
Gist: insert elements one by one from unsorted part of array into sorted part of array
Divide the array into sorted portion and unsorted portion
Sorted partition always starts from the first element, as array of 1 element is always sorted
First element of unsorted array will shift forward until the start of the sorted portion of the array OR until it meets an element bigger than itself
Order of the sorted portion is maintained
The last element of the sorted array takes its place
The next iteration start on the next element of the unsorted portion, which is now the first element of the current unsorted portion
The loop mutates the array
Discussions
Best case is an already sorted array, so no shifting of elements from the unsorted to the sorted portion of the array, resulting in a time complexity of n
The worst case is a reverse sorted array, which results in the whole sorted array having to shift for each iteration. The first element of the unsorted portion of array is always at the the smallest and need to go to the front of the sorted portion. Time complexity is n^2
Selection sort
Key Steps
Gist: scan array to find the smallest element and eliminate it for the next iterations
Swap smallest element with the front most element
Scan the array in the next iteration excluding the smallest element(s)
Last remaining single element will be of the largest value, so iterations take place until n - 2
Discussions
Time complexity is n^2
Bubble sort
def bubble_swap array
swap_took_place = true
while swap_took_place
swap_took_place = false
(0...array.length - 1).each do |index|
if array[index] > array[index + 1]
array[index + 1], array[index] =
array[index], array[index + 1]
# increment swaps here to record
# number of swaps that took place
swap_took_place = true
end
end
end
array
end
Key Steps
Gist: keep swapping adjacent elements if left is larger than right down the array, and repeat this iteration for as many times as there are elements in the array. The last iteration will not have any swap occur to declare the array swapped.
This is a concise glossary of the concepts, features and applications of various data structures.
Motivation
Due to the coronavirus outbreaks, the major lockdowns in Europe that ensued, and the stay home quarantine I have to undergo upon return to my country, I am ceasing my digital nomad life, which I have recorded in my Instagram account. So here I am, refreshing my memory on data structures as I prepare to welcome a new phase of my life.
I have a problem finding a good concise cheatsheet that can properly remind me of the concepts of all the data structures, their key features and their runtime performance for various operations. More importantly, when to use them and, as I am a rubyist, how are they applied in ruby.
Each section will talk about 1 data structure. It will consist of the main concept behind how they are constructed, some key features that are unique to them, when it is the best use case for them, and if there is something similar in ruby. These concept follows the HackerRank’s youtube channel’s playlist on Data Structures.
ArrayList
An arraylist is a dynamic array that will expand its capacity when it reached its maximum. An array requires pre allocated memory to be created. That means we need to establish the size of each element in the array and their total count.
Typically, when the arraylist reaches capacity, its size will be doubled by some complicated built-in algorithm in one of the library files of the language. It also has methods that can be called manually to ensureCapacity of the array.
Array and arrayList are used interchangeably in this article.
Key Features
Expands capacity when required
Runtime
Access: O(1) with use of known index of element in array
Search: O(n)
Insert
prepend: O(n) due to need to shift all elements
append: O(1)
Deletion: O(n) due to need for search to destroy
Applications
List of items of any kind of order
Ruby Alternative
In ruby, everything is an object. That includes arrays. Arrays in ruby are made dynamic to behave like ArrayList, like in most other dynamic languages. The array object has some operation to ensure capacity for the array.
It is also heterogenous, which allows for different data types exists together as elements of the same array (since all of them are objects anyway).
Binary (Search) Tree
Trees are most of the time referring to binary trees. Each node in binary tree can have a maximum of 2 nodes. This “tree” is kind of like a linked list of objects. It is not an array.
And for binary search tree (BST), it has to have an increasing order in relation to a node from its left to right nodes. Based on this rule, the binary search can be carried out by propagating through the nodes by asking the deterministic question: Is the left node more or less than the right node. With a known sort order, each iteration can, probably, halve the total nodes to search. This results in a faster search time.
This is only “probably” achievable if the BST is balanced. If the tree is lopsided on the right side for example, each iteration does not exactly halve the number of nodes to search. The worst case scenario would be to comb through all the nodes if they are all existing on the right node of one another.
Duplicates are allowed in some BST, meaning the there can be 2 nodes with the same value. It should always be obey the rule that the left node is <= to the current node. Duplicates introduce complexity in the search algorithms to determine the correct node to pick.
Key Features
A node and its left and right nodes has to be sorted in a specific order that can be classified as ascending or descending
Needs to be balanced to be useful
Traversal is always from left node then right node, with the current node hoping between and around the left and right to define the 3 different methods of traversal, ie.
Generally large data with sortable characteristic, and its size should be large enough to justify the use of BST over arrays
Ruby Alternative
There is no native implementation of BST in ruby. However, there are gems out there that implement it. RubyTree by evolve75 is my favorite as it allows for content payload to be added to each node.
BST is quite an old and establish concept. Hence, these gems might appear old and unmaintained.
Min/Max Heap​
This tree always populated from the left to right across each level. It is considered minimum or maximum depending on whether the smallest or largest value is at the root node respectively.
After insertion, the new node is “bubbled” up to the correct node by a series of swapping with its parent node until it reaches the root node, if it reaches the root node.
If root node is deleted, the last node replaces it and “bubbled” down to the correct position.
Because of the way it is data is populated, there will be no gaps in between nodes, hence this tree can be stored as an array (no need for linked list)! One can simply use the index of the node in the array to access itself, and some formula to get the index of its neighbouring nodes and access them as well:
parent: (index – 1) / 2 (rounded down)
left: 2 * index + 1
right: 2 * index + 2
Key Features
Essentially an array
Root node is always the minimum or maximum, the last node is always the opposite
Nodes in between does not necessarily obey the order.
Root node is usually the one being removed in application, replaced by the last node, and bubbled to the correct position accordingly
Min heap always look to find the smallest value among its children to swap down, opposite for max heap
Runtime
Access: O(1) with use of known index of element in array
Search: O(n)
Insert
append/prepend: O(1)
ordered insert: O(log n)
Deletion: O(log n)
Applications
Priority queues (eg. for elderly and disabled then healthy adults using weighted representation)
Hospital queues for coronavirus victims based on age and, therefore, savableness
Schedulers (eg task with higher priority will have higher weightage and will be bubbled to the correct position when added to the queue)
Continuous median problem
Ruby Alternative
There is no native heap implementation in Ruby. Gems are available.
Hash Table
Interestingly, a hash table consist of a hashing function and an array of linked lists. Together, they form a key value datastore.
The key to map to the value to store undergoes a hashing function to get an integer. This integer will represent the index in the array in which to store the data, that is the value corresponding to the key in the hash table. It will be added to the linked list behind the index of that array.
The data is saved as a linked list instead of an element in the array due to the probability of collisions from the hashing function. This allows multiple values to be stored in the same index of the array, but only if their key is different. Otherwise, they will overwrite the old data, as hash tables do no allow duplicate keys.
It is crucial for the hashing function to have a good key distribution. This is to prevent any of the linked list from being overwhelmingly long, resulting in long search time hopping through the linked list. Murmur hash is a good hashing function for this purpose.
Key Features
Hash function maps keys to index of array
Array is made up of linked list to store data while avoiding collisions from the hashing
Hash function with good distribution crucial to performance
Each node will point to the next node. The last node will point to null. Accessing elements can be slow as the pointer need to jump through nodes, unlike array which can access instantly via the index. The advantage of linked list over array is that you do not need to allocate the required memory at the start. You will only use the memory that you need without wastage. It is very space efficient.
Another advantage is its speed during prepending elements or inserting them in the middle. Unlike the array where every element thereafter has to be shifted, it can be done in constant time in a linked list.
A variation, the doubly linked list, gives bearing to adjacent node on both ends. It allows traversal in both direction as its biggest advantage. The maintenance needed to maintain that the 2nd neighbouring node in all operations may be costly.
Last variation is the circular linked list. That said, there;s a classic linked list question on how to detect if a linked list has a cycle (not necessarily circular). The solution is to use a fast pointer and a slow pointer to loop through the link list until they point to the same node in a linked list with a cycle, or null for a non circular linked list. This is simple cycle detection algorithm known as Floyd’s tortoise and hare, and is entertainingly portrait in the video below.
Side note for me: the distance of the loop, not coincidentally but mathematically, equals to the start of the linked list to the location where the hare and tortoise meet. Again, not coincidentally but mathematically, the distance from the start of the linked list to the start of the loop equals to the distance from location where the hare and tortoise met to the start of the loop (continuing in the direction that the tortoise was originally moving in).
Key Features
Head node may be null
Last node will point to null
Doubly linked list is another variation, where each node points to its previous and next node
Runtime
Access: O(n)
Search: O(n)
Insert
append/prepend: O(1)
ordered insert: O(n)
Deletion: O(n)
Applications
Anything that requires order and needs to save on memory
Ruby Alternative
There is not native linked list in ruby. However, there are gems and this by spectator is still pretty active.
Queue
Queue is a collection of data that obeys the First In First Out (FIFO) principle.
Theoretically, as traversal is not suppose to happen in a queue, I believe that it is best implemented with linked list rather than an array. There is no resizing overheads, and no need to shift all the elements every time an element is taken out of the front of the queue.
Addition to the queue might mean having to hop through the whole link list to add the element at the back. However, I would solve this by using a circular linked list to have a grip on the first and last element, which is actually all the queue would care about. Of course, things will be different if it is a not-so-simple queue like a least recently used (LRU) implementation.
However, there are certain advantages we should consider implementing with arrays. Arrays can be cached more easily as they are consist of memory units adjacent to one another. On the other hand, a linked list consist of memory units that exist sparsely in the memory pool which hurts its caching capabilities. The reason is a TODO for me when I go beyond data structures during this revision weeks.
Nonetheless, cache engines like Redis has their own implementation of a linked list (Redis List) in their cache database. I do not know if this is the same caching mechanism that is affected by the sparse memory locations of a linked list, but it is probably good to know.
Key Features
FIFO
Runtime
Insert (prepend): O(1)
Deletion (shift): O(1)
Applications
Restaurant queues
Ruby Alternative
Arrays are usually used as queues in ruby. It, however, does have a native Queue class, which is meant for multi threaded operations. On top of that, it has a SizedQueue class to ensure the size is within capacity.
Stacks
Stack are like the brother of queues. The only difference is they obey the Last In First Out (LIFO) principle.
Again as traversals are not supposed to happen, we can use linked list for the same advantages and considerations as explained in the “Queue” section above. And instead of appending to the end of the linked list, we will prepend to the linked list instead, where the head of the linked list represents the top of the stack. It will be constantly changing and where all the action will take place.
This ensures there is no overhead from resizing from using the array when the data gets too big, but it will need to take the need and performance in caching into consideration.
Arrays will be more suited to implement a stack than a queue. This is because all the push and pop will take place at the end of the array, unlike the queue which need to remove element from the front of the array and cause shifting of all elements forward.
Two stacks can be used to implement a queue with minimal performance overhead as well, as shown in the video below.
Key Features
LIFO
Runtime
Insert (push): O(1)
Deletion (pop): O(1)
Applications
Matching balanced parenthesis problems
Anagram / palindrome problems
Backtracking in maze
Reversals
Ruby Alternative
No native stack in Ruby. A simple array will suffice. Linked list gems are available too.
Graph
A graph is a superset of linked list. Unlike a linked list where it needs to be associated to a next element, and its previous element for a doubly linked list, a graph can have links to multiple nodes, not just to the adjacent ones.
The link between each node of a graph contains data to give more meaning to the relationship between nodes. In graph terminology, this link is called an “edge” (as a matter of fact, “nodes” are termed “vertices“). An edge can be directed or undirected. Think being friends (undirected) versus being a follower (directed) between users in a social network.
2 common ways to search a graph is the Depth First Search (DFS) and Breadth First Search (BFS). DFS has a weakness where it will search the full depth from one edge before moving on to the next. This translates to inefficiency if the vertex that we are searching for is on the other edge. Hence BFS is preferred.
Typically in BFS, a queue is used to store the next vertices to search.
There can be cycles of vertices having an edge to one another, hence during the search, it is imperative to check if a vertex has been visited or not to prevent going round in circles during the search. Unless we are talking about a Directed Acyclic Graph (DAG) where there are no directed cycles.
Key Features
Nodes are termed vertices
Edges contain data to describe the relation between vertices
BFS preferred
Flag to check if visited the vertex before in a search algorithm to prevent looping inside a cyclic relationship among vertices.
There is no native graph data type in ruby. Gems are available.
Set
The data structure of a set is the same as that of a hash table. The difference is that set is not really concerned with the mapped value of a key. It just tracks whether the key is present.
This implies that there can be no duplicates in the keys just like a hash table. And unlike a hash table where a key can be mapped to a null value, the key will be removed if nullifed for the case of a set.
This is a documentation on how to restrict text search to within specific directories per project you are working on in Sublime.
You may often find it annoying that a simple text search is searching in folders that is not part of your source code. While you can easily flip the switch in the user settings of the sublime text editor, it might not be ideal if you wear multiple hats like me and work on different frameworks.
One framework’s trash may be another’s treasure. Folders that are considered junk in one framework might be important in another. And if we happen to work on these framework together simultaneously, we would have to constantly flip the switch on and off as we jump between working on these projects.
If you are using sublime text because the framework you are working on is simple enough to manage and you do not want the computation-heavy indexing and compilation process to be running in the background constantly, this is an article for you to boost your productivity.
The User Setting Way
The commonly documented way of configuring your sublime editor is to tweak the configuration option in the user setting. Using restricted file search as an example, we can simple press CMD + , (assuming you are on a mac) to call up the user setting files in the sublime editor.
Next add this setting under the Preferences.sublime-settings file as shown:
Line 5, the path key is required. This is added by default when we save our project as a sublime project. More information on its function and purpose can be found here.
Next, close sublime. This time, open our project via the sublime project file that you have created.
Make a search now and you will realise that you are no longer searching in the folders that you have no interest in. The search process is also much faster than before because the program is going through less files, and ignoring the bulkier ones.
Hallelujah!
Housekeeping
Once we created the .sublime-project file, another file with the extension .sublime-workspace will also be created. The latter contains user specific data and you will not want to share it with other developers who may be working on the same source code as you. Add this file to our <code>.gitignore</code> file to achieve this.