Sunday 30 March 2014

Week 11: Sorting and Efficiency!

Over the past couple of weeks, the lectures have been focusing on the topic of sorting and efficiency. Sorting is one of the main processes in computer science, and it involves arranging elements from a list in some desired order. Computer scientists have come up with numerous sorting algorithms over the past couple of decades and some of them are naturally more efficient than others. Needless to say, it is important for computer scientists to find the most efficient sorting algorithms in order to optimize performance, and hence we analyze some of the most common sorting algorithms below.

Among the most famous algorithms are bubble, selection, insertion, merge and quick sorts. Bubble sort involves comparing adjacent items and exchanging those out of order; in each subsequent pass through the list, the algorithm places the next largest value in its proper place. It sort is by far the least efficient algorithm, having a performance of O(n^2) in every single case and having to exchange values in every single comparison in its worst case. Not very far ahead is selection sort; in each pass of this sorting algorithm, the smallest value in the list is placed in its proper location. It has an average performance of O(n^2), but unlike bubble sort only one exchange has to be made in each pass through the list. Insertion sort is slightly more efficient than selection. This algorithm always keeps a sorted sublist in the lower indices of the list; in each pass through the list, the next item is placed back into the right position in this sublist, making the sorted sublist one element larger until it contains the entire list. Insertion sort has an average and worst case performance of O(n^2) like bubble and selection; however, it is much more efficient when it comes to its best case, in which its running time becomes O(n).

Compared to bubble, selection and insertion sorts, merge and quick sorts are considerably more efficient. Merge sort is a recursive sorting algorithm that works by repeatedly splitting the list in half until its two halves are either empty or have a single item; then, it works by merging the two halves in the correct order, continually until the entire list is merged in the correct way. Quick sort involves recursively partitioning the given list using a  pivot value into two different parts: the left (everything that has a lesser value than the pivot) and the right (everything that has a greater value than the pivot), and recursively calling quick sort on each of these parts until they have length less than two, in which case they are already sorted. Quick and merge sorts have an average efficiency of O(n log n), so that their performance is roughly equivalent; however, in its worst case, quick sort's performance drops to O(n^2) as the pivot can be selected to be the greatest element of the list in each case, splitting the list into two very uneven halves.

Python's built-in sorting algorithm, however, is by far the most efficient, as it is a hybrid of different algorithms including merge and insertion sorts; it also takes into account the ordering of the given list, making the algorithm more efficient based on the specific case.

Thursday 20 March 2014

Week 10

Assignment 2 part 2 was due this week, and, surprisingly, I found it very straightforward! Although painfully long, functions is_regex, all_regex_permutations, and build_regex_tree were particularly uncomplicated (even though they might look very complex at a first glance).

On the other hand, regex_match function was slightly harder. The trickiest part of this function was in having to check whether a given StarTree matched some string. This is because this involved splitting the string into all possible concatenations of any length, and I found this especially tricky; at the end, I ended up doing this recursively by splitting the given string into two parts in all possible ways and then recursively splitting these two parts in all possible ways until I had all possible concatenations of the string. This made my function particularly confusing and not very efficient, in my opinion, but I could not find a better way to do it; the code was actually so inefficient that when checking whether a StarTree matches a very long string, Python took a painfully large amount of time to perform the task. In some cases, it even crashed as there was not sufficient stack space.

This week's lab was slightly harder than last week's. The first task, which required us to write an inorder traversal for a binary tree, was much simpler than the second task. The second task was to write code for a function named longest  that takes the root of a binary tree as argument and that returns the first node in a linked list that contains every value in a longest path from the root of the binary tree to one of its leaves. This was particularly hard since we were not allowed to utilize an intermediate Python list or similar container class to keep our information temporarily. In the end, we were actually successful in writing this challenging function, but it appears that we did it in a way other than expected. Our code utilized a helper function that kept track of the length of the longest path so that it would keep adding new nodes to the linked list until its length was the same of that of the longest path.

During this week's lectures, we went over more "big-oh" material as well as were introduced to two new sorting algorithms: merge and quick sorts, of which I will talk about in detail next week!

Saturday 15 March 2014

Week 9

Over the course of this week, we started covering material about the "big-oh" in lectures.  The so-called "big-oh" refers to the running time of different algorithms. The most common analysis of "big-oh" performance is in different sorting algorithms, as I will cover in depth on week 11. The different efficiencies of Python algorithms depend on the function's running time, which include constant, logarithmic, linear, and quadratic, among others.

Some of the different types of sorting algorithms that we went over in lecture this week included insertion and selection sorts, which we had already learned about in CSC108 so it was mostly review material for most of the students.

This week's lab was particularly straight-forward as it involved writing the method count_less for the class of binary search trees in several different ways. The first way was by writing a nested helper function within method count_less, and this was very easy as we were allowed to make use of recursion. Other ways included having to not use None as well as implementing a helper method in the class _BSTNode. The hardest way we had to write this method was the last; we were expected to write the method non-recursively by instead making use of iterations. Surprisingly, this task was very uncomplicated and easy to understand as my partner and I utilized a for loop.

Sunday 9 March 2014

Week 8

During this week, we received our midterm marks back, and to be honest I was quite pleased with my performance!

In lecture, we covered more LinkedList examples by implementing new methods to the class; this task was particularly straight-forward and involved a lot of recursion. We then moved on to a new topic: binary search trees. Binary search trees are a special kind of tree expressions which have some specific constraints:
-each node in a binary tree can only have a maximum of two children
-data in the left subtree is less than that in the root, which in turn is less than that in the right subtree

This makes searching for a particular node especially efficient since it cuts down the tree by two each time we encounter a new node.

The methods implemented in the BST class further used recursive strategies, as in the method insert and __str__.

In this week's lab, we dealt with more LinkedList examples, implementing some of the methods of the Python class list into LinkedList. Some methods included __setitem__, __len__, and __delitem__. In writing the method __len__, we decided to add a simple attribute in the __init__ method of LinkedList so it would keep track of the length of the list, making necessary changes when the list was changed; this allowed __len__ to be very short and efficient.

This was also the week during which the first part of assignment one was due. Part I was very simple compared to the first assignment; it dealt with a class hierarchy of regular expression trees. The way I created my hierarchy was based on the number of children in each type of tree as well as the specific root node for each type of tree. This made the implementation of methods particularly straightforward.
Part II, on the other hand, seems to be quite challenging. I've begun working on the very first part of Part II, but I still haven't quite wrapped my head around it. Hopefully things will get better next week!

Thursday 27 February 2014

Week 7: Recursion!

The main focus of this course so far has been on the recursive technique. Recursion is a computer science technique, where in order to solve a problem it is required to solve smaller instances of the same problem. Recursion involves writing a function whose code includes a reference to the function itself; in other words, it involves functions that call themselves to solve smaller subproblems.

The recursive technique, as we have seen in lecture, often consists of a single return statement, which is usually a list comprehension with two separate parts: the base case and recursive call. A list comprehension allows for the creation of a new list through iteration within a single line, making the code much more concise and understandable. The base case is the instance of the function which does not lead to a recursive call; this is probably one of the most important elements of recursion, as without it, the function would consist of infinite recursion; in other words, the program would cause the function to continuously keep recalling itself until it crashes. On the other hand, the recursive call part of the return statement includes the reference to the function itself, under a specific condition of an if statement.

Among many of the examples of recursion we saw in lecture are methods involving the classes Tree, LinkedList, and Turtle. We have also seen recursion on nested lists, such as the function sum_list, which takes in a nested list of numbers as an argument and returns the sum of all of its elements.

An advantage of utilizing recursion is the fact that sometimes it is the only way of solving certain very complex problems. By breaking a very complex problem into smaller, simpler cases, the problem suddenly becomes much easier to deal with. Moreover, the code can become much more elegant and sometimes even more straightforward through the use of recursion.

On the other hand, a disadvantage of the recursive technique is that when processing an extensive instance of the problem which requires a very large amount of recursive calls, the operation might cause a StackOverflowException leading the program to crash.

Overall, recursion is very useful, but sometimes it should be avoided in particularly simple problems in order to prevent Pythonistas from overcomplicating the code.

Sunday 16 February 2014

Week 6

During this week, we learned about the concept of trees. Trees are Python objects that consist of two sets: a set of nodes and a set of directed edges. One of the nodes is distinguished as the root; it is defined as being the only node with no parents. On the other hand, every non-root node has exactly one parent. While leafs are defined as nodes that have no children, internal nodes, on the other hand, are those that contain one or more children.

Paths can be formed between certain nodes in a Tree. A path is a sequence of nodes n(1), n(2), ..., n(k), where there is an edge from n(i) to n(i+1). The length of any path is defined as the number of edges in this path. A very important property of trees is the fact that there exists a single, unique path from the root to each node in the tree. For example, in the case of the root itself this path is just n(1) (if the root is n(1)), with length zero.

Another very important property of trees is the fact that paths cannot form any loops. The height of a tree is the maximum path length in a tree. Trees with a height of at least one contain subtrees, which are trees formed by any tree node together with its descendants and the edges leading to them. The arity of any specific tree is defined as the maximum number of children for any node in that tree.
During lecture, we wrote specific functions for the trees, including pre-order, post-order and in-order.

Moving on to a different subject, in this week's lab we were expected to transform list comprehensions into loops over iterables. In my opinion, this was a particularly straightforward task, and my partner and I were even able to finish the lab early for the first time since the beginning of the semester!

The last new thing we learned about this week were the two list comprehensions any and all. The first one can be applied to a list to check if any of the elements satisfy a given condition, while the latter can be used to get the bool of whether all elements inside the list satisfy a specific condition. These are particularly useful in order to write shorter and more comprehensible code, which is always more efficient!

Saturday 8 February 2014

Week 5

Throughout this week, we have been dealing with deeper recursion problems as well as learning about the hierarchy of name and method searches. I also began working on Assignment 1 in the beginning of the week, and finally finished it a few days ago.

Assignment 1 deals a lot with class definitions and methods, as well as recursion. When I first looked at the code for GUIController.py, ConsoleController.py and TOAHModel.py, I was startled. I did not understand most of the code in GUIController.py and was really confused as to how to start the assignment. However, after following the first two steps in the assignment handout, I created the method header and code for each required method, and was pleasantly surprised when I ran GUIController.py to play the game manually and it worked! On the other hand, I thought ConsoleController.py was particularly straightforward and finished it quite quickly.

The challenge, however, lied in Tour.py, where we were expected to write the recursive function that would move a tower of cheeses from the first stool to the last stool with the highest possible efficiency. I found this particular step in the assignment especially hard because I could not find a way to optimize the choice of i. After a lot of brainstorming, writing equations on pieces of paper and searching to understand the problem, I believe I derived to right function for finding i, which seemed to work for all numbers. From then on, I finished the assignment fairly efficiently and was extremely excited when everything worked!

The lab this week dealt even more with recursion problems. We were expected to write tons of recursive functions, and then modify them further until they were very efficient. I was actually surprised to find out that I understand recursion pretty well. By tracing each step on paper first, writing the code seemed particularly straightforward afterwards. My partner and I were able to finish all functions and we were even able to work on the bonus exercises.

During lecture, we learned about the hierarchy of searching for names and methods in Python. When a Pythonista refers to a name on Python, the program first looks in the innermost scope, called the local scope; this scope usually consists of the function body. If it does not find such a name, it goes on to look at the enclosing scopes, called nonlocal; these usually occur when the referral to the name occurs in a function within other functions. Next, Python looks for the name in the current module __main__, referred to as global. Finally, if it does not find the name in any of the previous places, it goes on to see if there are any built-in names with the same name. If it does not find the name in any of these locations, an error is generated.

When a method is called, the process is similar to that of searching for names. First, Python looks in the nearest subclass and then it works its way upwards until it finds a method with such a name. Similarly, if Python fails to find this method, it generates an error.