Notes on “Common Sense DSA” by Jay Wengrow
Chapter 1 : Why data structures matter
-
Data structure operations
- Read
- Search
- Insert
- Delete
-
Array
- They are allocated contiguous blocks of memory.
- Read - 1 step
- Search - n step (where n is the size of the array)
- Insert
- at end - 1 step
- at beginning - n + 1 steps
- Delete - n steps
-
Sets
- (simplification:) Same as arrays except that it does not allow duplicate entries.
- Read, Search and Delete all same as normal arrays.
-
Insert operation has diff. time complexity.
- Insertion at end: \(n + 1\) steps. n steps for searching if the value we want to insert already exists and 1 step for actual insertion.
- Insertion at beginning: \(n + n + 1 = 2n + 1\) steps. n steps to search if value to be inserted already exists. n steps to shift every pre-existing value to the right. 1 step for the actual insertion at the beginning.
Chapter 2: Why algorithms matter
-
Even though we might settle on a data structure, even then the algorithm we choose to use on this data structure makes a massive difference in efficiency. This is why algorithms matter.
-
Next data structure - Ordered arrays
-
What is it?
It is a classic array except that all elements are at all times ordered in ascending order.
-
Insertion
Insertion is more tedious that it was for classic arrays. When we want to insert a value, we first need to determine the index where it must be inserted, then move all elements starting from this index till the array end one step to the right and finally end up inserting the element in the given index.
Hence, insertion takes \(N + 2\) steps irrespective of index to insert value in. When index is closer to beginning there are fewer comparisons and more shifts. When index is farther away from beginning there are more comparisons and fewer shifts.
Exception to this rule is that the index to insert is the end of the array. In that case we have N comparisons and 1 step for actual insertion. This makes it \(N + 1\) step.
-
Searching & binary search
Having an ordered array enables us to use binary search which is orders of magnitude faster than our traditional linear search. Binary search only works with ordered arrays.
This is an exapmle of how choosing the right data structure (ordered arrays) enables us to choose an algorithm (binary search) that is much more efficient than any sorting algorithm available to us in a classic array.
For binary search, though, each time we double the size of the array, we only need to add one more step.
-
Chapter 3: O Yes! Big O Notation
-
The key question: If there are N data elements, how many steps will the algorithm take?
-
O(N) and O(1)
When an algorithm is O(N) it means when there are N data elements, the algorithm takes N steps.
-
Algorithm having O(N) complexity is known to have linear time complexity.
eg: linear search
-
Algorithm having O(1) complexity means that it has constant time. i.e. Irrespective of the number of data elements (N) the algorithm always takes only 1 step.
eg: Reading from an array.
-
-
The soul of Big O
How will an algorithm’s performance change as the data increases?
In other words, it does not just want to tell us how many steps an algorithm takes for N data inputs. It wants to tell us how the number of steps increase as data changes.
This is why if we have an algorithm that takes 3 steps irrespective of the data size we do not write its complexity as O(3). Instead, we write it as O(1). As mentioned above, Big-O is concerned with the how the number of steps increase as data changes. In this regard, O(3) is the same as O(1).
-
Big-O and its pessimism
Big-O can theoretically depict both best-case and worst-case scenarios. For example, in linear search we have the scenario where the searched element is present in index 0 itself. In this best case, linear search becomes O(1). But in the worst case it is as we know O(N).
But generally, Big-O represents worst-case since this pessimistic approach can help in understanding how inefficient an algorithm can get in the worst-case scenarios and making choices accordingly.
-
O (log N)
Refers to algorithms that increase one step every time the data is doubled. In Big-O notation whenever we say O(log N) it is actually \(O(log_2 N)\). We just omit the base 2 for convenience.
Binary search has O(log N).
-
Sorting algorithm times from most to least efficient
Algorithm times arranged
from Most efficient to least efficientO(1) O(log N) O(N)
Chapter 4: Speeding up your code with Big O
-
Bubble sort
In each pass through, the highest unsorted value “bubbles” up to its correct position.
Bubble sort has \(O(N^2)\) complexity. This means bubble sort has quadratic time.
Explanation of how bubble sort works is given in wikipedia.
-
\(O(N^2)\) complexity
Very often (but not always), when an algorithm nests one loop inside another, the algorithm is \(O(N^2)\).
function hasDuplicateValue(array) { let steps = 0; // count of steps for(let i = 0; i < array.length; i++) { for(let j = 0; j < array.length; j++) { steps++; // increment number of steps // !!! if(i !== j && array[i] === array[j]) { return true; } } } console.log(steps); // print number of steps if no duplicates return false; }Here, assume we have an array of length N. The outer loop iterates through every element of the array one by one. For every iteration of the outer loop, the inner loop iterates N times. So the statement in the inner loop (the comparison) is executes \(N^2\) times in the worst case. Hence, this code has \(O(N^2)\).
The criteria here is, how many times is the code in the inner loop (the comparison statements) is executed in the worst case.
The above code can be optimized to an algorithm that has \(O(N)\) instead of \(O(N^2)\).
function hasDuplicateValue(array) { let steps = 0; let existingNumbers = []; for(let i = 0; i < array.length; i++) { steps++; if(existingNumbers[array[i]] === 1) { return true; } else { existingNumbers[array[i]] = 1; } } console.log(steps); return false; }Here, we get rid of the inner loop and keep just the outer loop. This means the new code has O(N).
-
Exercises
Chapter 5: Optimizing Code with and Without Big O
-
Selection sort
In each pass-through the lowest element is placed in the current starting position. It takes about half the number of steps bubble sort does. This means selection sort is twice as fast as bubble sort . If bubble sort had \(O(N^2)\) then selection sort should have \(O(N^2/2)\).
But Big-O notation ignores constants. Therefore, in the Big O notation, Bubble sort and Selection sort both have \(O(N^2)\) runtimes.
Animation of how selection sort works as given in geeksforgeeks.
-
Why Big-O ignores constants
Big-O is concerned with the overarching categories of algorithmic speeds. O(N) and \(O(N^2)\) are completely different categories. O(N) grows linearly as the data increases. \(O(N^2)\) grows exponentially as the data increases.
Any factor of N in O(N) will at some data size be faster than \(O(N^2)\). Hence, these are differentiated in Big-O.
But, both O(N) and O(100N) grow linearly as the data increases. This does not interest Big-O. Therefore, both of these algorithms are put under the same category of O(N).
This is why bubble sort and selection sort are both \(O(N^2)\) even though in reality, selection sort is twice as fast as bubble sort.
Chapter 6: Optimizing for optimistic scenarios
-
Chapter intent:
Big-O focuses on worst-case scenarios. But what about those situations which are not worst-case, rather average-case scenarios. In this chapter we will be focusing on such scenarios.
-
Insertion sort
Animation of how this works in wikipedia.
-
Big O Notation only takes into account the highest order of N when we have multiple orders added together.
i.e. \(O(N^2 + 2N - 2) \implies O(N^2 + N) \implies O(N^2)\)
-
Time complexity of insertion sort
In worst case scenarion insertion sort has time complexity of \(O(N^2)\), the same as bubble sort and selection sort.
-
Efficiency of insertion sort
- Selection sort: \(O(N^2/2) \implies O(N^2)\)
- Bubble sort: \(O(N^2) \implies O(N^2)\)
- Insertion sort: \(O(N^2+2N-2) \implies O(N^2)\)
Remember these are worst case time complexities. In the average case, insertion sort performs much better.
-
Insertion sort v/s selection sort
Best case Average case Worst case Selection sort \(N^2/2\) \(N^2/2\) \(N^2/2\) Insertion sort N \(N^2/2\) \(N^2\)
-
Chapter 7: Big O in everyday code
Exercises worked out here.