I ran into some issues lately where the the cron job is not running in my Rails deployed on Amazon Elastic Beanstalk. After much debugging, I found that it was because the environment variables are not loaded when the rake tasks were run.
This caused errors like bundler not installed or XX gems not found to appear all over my logs to frustrate the hell out of my weekends.
Took a deep dive and I found the solution. Or should I say the reason, because it remains unsolved and I had to take inspiration from an ancient war tactic to side step this issue. Before we get to the pseudo-solution, let’s find out the reason. And the main reason is…
Long story short: Rake tasks on cron job with Rails on Elastic Beanstalk will not work with Amazon Linux 2.
This is something I wished someone would have told me before I decided to upgrade my stack to the new version.
According to this pretty recent issue on Github, the Beanstalk team in AWS is aware that environment variables are not retrievable on the AL2 platforms like how it was on the old AL1.
The issue mentioned the fact that environment variables in the Beanstalk environment settings did not load, but according to my experiences, the environment variables on the system’s level are not loaded either.
I’m talking about $GEM_PATH, $BUNDLE_PATH etc. These are needed for ruby and rails, in particular, and the bundler to know where the gems are to run its operations and avoid the cursed bundle: command not found or gem not found errors.
I’m not the best candidate to debug a solution to fix it on AL2 platforms. Furthermore, by the time I can miraculously figure how to load the variables, Amazon would probably have already solved it.
So it’s much wiser to switch back to AL1. Henceforth, I will cover the method to run rails rake tasks via cron jobs in elastic beanstalk.
* * * * * root bash -l -c "su -m webapp; cd /var/app/current && rake test"
# Take NOTE it ends with a new line!!
The commands of the 01_cron.config file will be run during deployment.
The first command, 01_remove_ctron_jobs, will remove all cron jobs. Should it fail, it will return a success status code to prevent the deployment procedure from stopping. You can also use ignoreErrors flag to achieve the same thing.
Why do we need to remove the cron jobs each time we deploy? Great question. We will get to that in a bit.
The second command 02_add_cron_jobs will copy the cron job definitions located inside the cron_tasks.txt file into another file. This other file will be located in /etc/cron.d directory, where the cron daemon will pick up files from and run the cron jobs as instructed.
The cron job basically runs the rake task as the webapp user, which has the knowledge of the necessary environment variables to execute rails command related to your application.
Now comes a very important point!
Make sure the last line of cron job end with a new line.
So back to the cliffhanger. Why do we have to remove the all the cron jobs during deployment?
This is because the leader of the cluster in the elastic beanstalk environment may not stay the same during each deployment. Maybe it was lost, physically, in an annual California wildfire event, or maybe it was scaled down and the leader was lost as a result
Doing this will ensure that at all times, there will only be 1 instance that will be executing your cron job, if so desired.
I just lied.
According to this forum thread, the leader of a cluster is determined during the elastic beanstalk environment creation. And it will remain the same throughout all subsequent deployments.
If the leader was lost during a timed scale down event, the leader is lost forever and the cluster in Elastic Beanstalk will remain leaderless until the end of time.
Which means the removal of cron jobs during each deployment is like taking off your pants to fart. It’s trivial.
Well, I’m just placing it there in case a leader reelection feature gets implemented.
The best way to solve this leaderless problem is to use a separate dedicated worker environment and period tasks. It is the official and cleaner way to do cron jobs anyway. Having your main server clusters running cron jobs eat at their computation prowess and that is something you would should architect your system to avoid.
The downside of it is that you will need to consider the cost of having another server to handle this.
This is a summary of the key features that make up an algorithm.
While it is easy to understand the concept of each sort algorithm, I find it difficult for me to remember the key steps that define an algorithm or is characteristic of that algorithm. That key step may prove to be pivotal in solving the algorithm but may come in as insignificant from the macro view of its concept.
Hence, I decided that it will be better for me to jot the key pointers down.
Gist: Using a pivot value, recursively split the array (or segments of array) and its resultant halves, where all the elements in left half is smaller than the pivot and all the elements in the right half is larger
3 steps: stop condition, partition, recursion
Check if the given left index is strictly smaller than the given right index
If they are equal, it means we are dealing with 1 element and there’s nothing to sort
Left index should not and will not be more than the right index
The key here is to return a partition index
Randomly select element and use element value as pivot (pivot index is ignored)
Left and right index will traverse towards center
Stop incrementing left index when its value is greater than or equal pivot value
Stop decrementing right index when its value is smaller than or equal pivot value
Swap the values and continue traversing
Ensure left <= right (this logic should be hoisted to the top)
Return the left index as the partition index
This partition index will be the eventual midpoint of this subarray
Call itself twice
One will use the given left index and partition index – 1
The other will the given right index the partition index + 1
Each iteration frees up 1 element
For each element freed, quicksort is executed lg n times on average
Average time complexity is n lg n
Worst case is n^2, if the pivot is always the smallest/largest value, so need to be careful when choosing pivot
Randomize selection of pivot to effectively reduce probability of hitting worst case
In place sorting algorithm where no extra memory is needed and the operation can be carried out on the array itself
Gist: first recursively halve array until we are dealing with 1 element, then recursively merge the elements back in a sorted order until we get back the array of the same size, and now it will be sorted
A recursive function that consist of 3 parts in order: split, recursion, merge
We will mutate the array, but it will not be an in place sorting. This is because it is difficult to do so.
So first create an empty arrayCopy of same size to temporarily hold the values of sections of the original array that are already sorted to be overwritten into the corresponding section in the original array.
Given a subarray array, the arrayCopy, the leftStart and rightEnd of the subarray to be split
Continue splitting and duplicate a copy of the left half and the right half
Stop condition: Stop the current iteration if the given array is only 1 element and just return
Call itself on the left half
Then call itself on the right half
So the splitting will continue until we are dealing with 1 element to kick start the merge
Dealing with the same subarray from the split step, provided with the leftStart and rightEnd of the subarray
The leftEnd and the rightStart are at the midpoint of this subarray
These 4 key indices mirror the indices in the original array
Traverse from leftStart to leftEnd together from rightStart to rightEnd
Fill up the arrayCopy in ascending order
Ensure the indices do not overshoot its respective end
Eventually, one half will have all its elements filled up in the arrayCopy, while the other half can simply append the remaining elements to the arrayCopy from where it left off
The arrayCopy is sorted for that exact portion of the original array, so we will overwrite that portion of the original array with the same portion of the arrayCopy
That portion of the original array may not be sorted in contemplation of its whole, but that will be addressed and its values overwritten in the subsequent merge steps.
Time complexity of n lg neven in worst case
Need extra n space for the array copy
Gist: insert elements one by one from unsorted part of array into sorted part of array
Divide the array into sorted portion and unsorted portion
Sorted partition always starts from the first element, as array of 1 element is always sorted
First element of unsorted array will shift forward until the start of the sorted portion of the array OR until it meets an element bigger than itself
Order of the sorted portion is maintained
The last element of the sorted array takes its place
The next iteration start on the next element of the unsorted portion, which is now the first element of the current unsorted portion
The loop mutates the array
Best case is an already sorted array, so no shifting of elements from the unsorted to the sorted portion of the array, resulting in a time complexity of n
The worst case is a reverse sorted array, which results in the whole sorted array having to shift for each iteration. The first element of the unsorted portion of array is always at the the smallest and need to go to the front of the sorted portion. Time complexity is n^2
Gist: scan array to find the smallest element and eliminate it for the next iterations
Swap smallest element with the front most element
Scan the array in the next iteration excluding the smallest element(s)
Last remaining single element will be of the largest value, so iterations take place until n - 2
Time complexity is n^2
Gist: keep swapping adjacent elements if left is larger than right down the array, and repeat this iteration for as many time as there are elements in the array
End before the 2nd last index so that we do not go out of bound when comparing the current element with the next for swapping
The largest elements starts forming at the end of the array
Subsequent iterations can end before the already sorted section of the array at the end as optimization
A flag reseted at each iteration can be used to end the function early if no swap is detected for the current iteration, which would mean the array is already sorted
This is a concise glossary of the concepts, features and applications of various data structures.
Due to the coronavirus outbreaks, the major lockdowns in Europe that ensued, and the stay home quarantine I have to undergo upon return to my country, I am ceasing my digital nomad life, which I have recorded in my Instagram account. So here I am, refreshing my memory on data structures as I prepare to welcome a new phase of my life.
I have a problem finding a good concise cheatsheet that can properly remind me of the concepts of all the data structures, their key features and their runtime performance for various operations. More importantly, when to use them and, as I am a rubyist, how are they applied in ruby.
Each section will talk about 1 data structure. It will consist of the main concept behind how they are constructed, some key features that are unique to them, when it is the best use case for them, and if there is something similar in ruby. These concept follows the HackerRank’s youtube channel’s playlist on Data Structures.
An arraylist is a dynamic array that will expand its capacity when it reached its maximum. An array requires pre allocated memory to be created. That means we need to establish the size of each element in the array and their total count.
Typically, when the arraylist reaches capacity, its size will be doubled by some complicated built-in algorithm in one of the library files of the language. It also has methods that can be called manually to ensureCapacity of the array.
Array and arrayList are used interchangeably in this article.
Expands capacity when required
Access: O(1) with use of known index of element in array
prepend: O(n) due to need to shift all elements
Deletion: O(n) due to need for search to destroy
List of items of any kind of order
In ruby, everything is an object. That includes arrays. Arrays in ruby are made dynamic to behave like ArrayList, like in most other dynamic languages. The array object has some operation to ensure capacity for the array.
It is also heterogenous, which allows for different data types exists together as elements of the same array (since all of them are objects anyway).
Binary (Search) Tree
Trees are most of the time referring to binary trees. Each node in binary tree can have a maximum of 2 nodes. This “tree” is kind of like a linked list of objects. It is not an array.
And for binary search tree (BST), it has to have an increasing order in relation to a node from its left to right nodes. Based on this rule, the binary search can be carried out by propagating through the nodes by asking the deterministic question: Is the left node more or less than the right node. With a known sort order, each iteration can, probably, halve the total nodes to search. This results in a faster search time.
This is only “probably” achievable if the BST is balanced. If the tree is lopsided on the right side for example, each iteration does not exactly halve the number of nodes to search. The worst case scenario would be to comb through all the nodes if they are all existing on the right node of one another.
Duplicates are allowed in some BST, meaning the there can be 2 nodes with the same value. It should always be obey the rule that the left node is <= to the current node. Duplicates introduce complexity in the search algorithms to determine the correct node to pick.
A node and its left and right nodes has to be sorted in a specific order that can be classified as ascending or descending
Needs to be balanced to be useful
Traversal is always from left node then right node, with the current node hoping between and around the left and right to define the 3 different methods of traversal, ie.
Generally large data with sortable characteristic, and its size should be large enough to justify the use of BST over arrays
There is no native implementation of BST in ruby. However, there are gems out there that implement it. RubyTree by evolve75 is my favorite as it allows for content payload to be added to each node.
BST is quite an old and establish concept. Hence, these gems might appear old and unmaintained.
This tree always populated from the left to right across each level. It is considered minimum or maximum depending on whether the smallest or largest value is at the root node respectively.
After insertion, the new node is “bubbled” up to the correct node by a series of swapping with its parent node until it reaches the root node, if it reaches the root node.
If root node is deleted, the last node replaces it and “bubbled” down to the correct position.
Because of the way it is data is populated, there will be no gaps in between nodes, hence this tree can be stored as an array (no need for linked list)! One can simply use the index of the node in the array to access itself, and some formula to get the index of its neighbouring nodes and access them as well:
parent: (index – 1) / 2 (rounded down)
left: 2 * index + 1
right: 2 * index + 2
Essentially an array
Root node is always the minimum or maximum, the last node is always the opposite
Nodes in between does not necessarily obey the order.
Root node is usually the one being removed in application, replaced by the last node, and bubbled to the correct position accordingly
Min heap always look to find the smallest value among its children to swap down, opposite for max heap
Access: O(1) with use of known index of element in array
ordered insert: O(log n)
Deletion: O(log n)
Priority queues (eg. for elderly and disabled then healthy adults using weighted representation)
Hospital queues for coronavirus victims based on age and, therefore, savableness
Schedulers (eg task with higher priority will have higher weightage and will be bubbled to the correct position when added to the queue)
Continuous median problem
There is no native heap implementation in Ruby. Gems are available.
Interestingly, a hash table consist of a hashing function and an array of linked lists. Together, they form a key value datastore.
The key to map to the value to store undergoes a hashing function to get an integer. This integer will represent the index in the array in which to store the data, that is the value corresponding to the key in the hash table. It will be added to the linked list behind the index of that array.
The data is saved as a linked list instead of an element in the array due to the probability of collisions from the hashing function. This allows multiple values to be stored in the same index of the array, but only if their key is different. Otherwise, they will overwrite the old data, as hash tables do no allow duplicate keys.
It is crucial for the hashing function to have a good key distribution. This is to prevent any of the linked list from being overwhelmingly long, resulting in long search time hopping through the linked list. Murmur hash is a good hashing function for this purpose.
Hash function maps keys to index of array
Array is made up of linked list to store data while avoiding collisions from the hashing
Hash function with good distribution crucial to performance
Each node will point to the next node. The last node will point to null. Accessing elements can be slow as the pointer need to jump through nodes, unlike array which can access instantly via the index. The advantage of linked list over array is that you do not need to allocate the required memory at the start. You will only use the memory that you need without wastage. It is very space efficient.
Another advantage is its speed during prepending elements or inserting them in the middle. Unlike the array where every element thereafter has to be shifted, it can be done in constant time in a linked list.
A variation, the doubly linked list, gives bearing to adjacent node on both ends. It allows traversal in both direction as its biggest advantage. The maintenance needed to maintain that the 2nd neighbouring node in all operations may be costly.
Last variation is the circular linked list. That said, there;s a classic linked list question on how to detect if a linked list has a cycle (not necessarily circular). The solution is to use a fast pointer and a slow pointer to loop through the link list until they point to the same node in a linked list with a cycle, or null for a non circular linked list. This is simple cycle detection algorithm known as Floyd’s tortoise and hare, and is entertainingly portrait in the video below.
Side note for me: the distance of the loop, not coincidentally but mathematically, equals to the start of the linked list to the location where the hare and tortoise meet. Again, not coincidentally but mathematically, the distance from the start of the linked list to the start of the loop equals to the distance from location where the hare and tortoise met to the start of the loop (continuing in the direction that the tortoise was originally moving in).
Head node may be null
Last node will point to null
Doubly linked list is another variation, where each node points to its previous and next node
ordered insert: O(n)
Anything that requires order and needs to save on memory
There is not native linked list in ruby. However, there are gems and this by spectator is still pretty active.
Queue is a collection of data that obeys the First In First Out (FIFO) principle.
Theoretically, as traversal is not suppose to happen in a queue, I believe that it is best implemented with linked list rather than an array. There is no resizing overheads, and no need to shift all the elements every time an element is taken out of the front of the queue.
Addition to the queue might mean having to hop through the whole link list to add the element at the back. However, I would solve this by using a circular linked list to have a grip on the first and last element, which is actually all the queue would care about. Of course, things will be different if it is a not-so-simple queue like a least recently used (LRU) implementation.
However, there are certain advantages we should consider implementing with arrays. Arrays can be cached more easily as they are consist of memory units adjacent to one another. On the other hand, a linked list consist of memory units that exist sparsely in the memory pool which hurts its caching capabilities. The reason is a TODO for me when I go beyond data structures during this revision weeks.
Nonetheless, cache engines like Redis has their own implementation of a linked list (Redis List) in their cache database. I do not know if this is the same caching mechanism that is affected by the sparse memory locations of a linked list, but it is probably good to know.
Insert (prepend): O(1)
Deletion (shift): O(1)
Arrays are usually used as queues in ruby. It, however, does have a native Queue class, which is meant for multi threaded operations. On top of that, it has a SizedQueue class to ensure the size is within capacity.
Stack are like the brother of queues. The only difference is they obey the Last In First Out (LIFO) principle.
Again as traversals are not supposed to happen, we can use linked list for the same advantages and considerations as explained in the “Queue” section above. And instead of appending to the end of the linked list, we will prepend to the linked list instead, where the head of the linked list represents the top of the stack. It will be constantly changing and where all the action will take place.
This ensures there is no overhead from resizing from using the array when the data gets too big, but it will need to take the need and performance in caching into consideration.
Arrays will be more suited to implement a stack than a queue. This is because all the push and pop will take place at the end of the array, unlike the queue which need to remove element from the front of the array and cause shifting of all elements forward.
Two stacks can be used to implement a queue with minimal performance overhead as well, as shown in the video below.
Insert (push): O(1)
Deletion (pop): O(1)
Matching balanced parenthesis problems
Anagram / palindrome problems
Backtracking in maze
No native stack in Ruby. A simple array will suffice. Linked list gems are available too.
A graph is a superset of linked list. Unlike a linked list where it needs to be associated to a next element, and its previous element for a doubly linked list, a graph can have links to multiple nodes, not just to the adjacent ones.
The link between each node of a graph contains data to give more meaning to the relationship between nodes. In graph terminology, this link is called an “edge” (as a matter of fact, “nodes” are termed “vertices“). An edge can be directed or undirected. Think being friends (undirected) versus being a follower (directed) between users in a social network.
2 common ways to search a graph is the Depth First Search (DFS) and Breadth First Search (BFS). DFS has a weakness where it will search the full depth from one edge before moving on to the next. This translates to inefficiency if the vertex that we are searching for is on the other edge. Hence BFS is preferred.
Typically in BFS, a queue is used to store the next vertices to search.
There can be cycles of vertices having an edge to one another, hence during the search, it is imperative to check if a vertex has been visited or not to prevent going round in circles during the search. Unless we are talking about a Directed Acyclic Graph (DAG) where there are no directed cycles.
Nodes are termed vertices
Edges contain data to describe the relation between vertices
Flag to check if visited the vertex before in a search algorithm to prevent looping inside a cyclic relationship among vertices.
There is no native graph data type in ruby. Gems are available.
The data structure of a set is the same as that of a hash table. The difference is that set is not really concerned with the mapped value of a key. It just tracks whether the key is present.
This implies that there can be no duplicates in the keys just like a hash table. And unlike a hash table where a key can be mapped to a null value, the key will be removed if nullifed for the case of a set.
This is a documentation on how to restrict text search to within specific directories per project you are working on in Sublime.
You may often find it annoying that a simple text search is searching in folders that is not part of your source code. While you can easily flip the switch in the user settings of the sublime text editor, it might not be ideal if you wear multiple hats like me and work on different frameworks.
One framework’s trash may be another’s treasure. Folders that are considered junk in one framework might be important in another. And if we happen to work on these framework together simultaneously, we would have to constantly flip the switch on and off as we jump between working on these projects.
If you are using sublime text because the framework you are working on is simple enough to manage and you do not want the computation-heavy indexing and compilation process to be running in the background constantly, this is an article for you to boost your productivity.
The User Setting Way
The commonly documented way of configuring your sublime editor is to tweak the configuration option in the user setting. Using restricted file search as an example, we can simple press CMD + , (assuming you are on a mac) to call up the user setting files in the sublime editor.
Next add this setting under the Preferences.sublime-settings file as shown:
Line 5, the path key is required. This is added by default when we save our project as a sublime project. More information on its function and purpose can be found here.
Next, close sublime. This time, open our project via the sublime project file that you have created.
Make a search now and you will realise that you are no longer searching in the folders that you have no interest in. The search process is also much faster than before because the program is going through less files, and ignoring the bulkier ones.
Once we created the .sublime-project file, another file with the extension .sublime-workspace will also be created. The latter contains user specific data and you will not want to share it with other developers who may be working on the same source code as you. Add this file to our <code>.gitignore</code> file to achieve this.