Panoply: Console-based TODO list manager

That’s right, yet another todo list manager. I am calling it Panoply, for no other reason than the fact that I like that word. Well, it is a learning project. I wanted something that does not depend on having an Internet connection all the time (e.g. Wunderlist and Teux Deux) (it is possible that those awesome products have offline modes, but still), and something that is free (unlike Things). More importantly, I wanted to check the feasibility of Python for building, testing and maintaining a relatively larger project. It certainly doesn’t have the graphical finesse of the apps mentioned above, but if I ever decide to go in that direction with it, it will be yet another learning adventure. As it stands right now though, Panoply remains a CLI app.

It’s still very much in a pre-alpha stage, but I have something running and working that I can now keep tweaking and enhancing. I am trying to follow Test Driven Development in this project as faithfully as I can (though I guess I could be better). The goal is essentially to write a command line tool in Python that helps with deadlines and in managing personal tasks. It is supposed to be an app that automatically adjusts itself somehow, based on current deadlines, future deadlines, and past overdue deadlines, so that it is better than a plain text todo list. I am the sole developer, user experience designer, user interaction designer and the tester of all things for now. I am hoping that that would change at some point. The data model that Panoply is currently using consists of simple CSV files. This aspect of the project might very well change in the future if I feel the need for a better data structure.

My idea is to keep the scope minimal, and enhance it one feature at a time. As it stands right now, if you were to test the app as of the day of writing this blog entry, it lets you start a task collection, add a task to the collection with a user name, and let you save and load the entire list of tasks. I am currently also supporting the functionality to selectively check off items so that you can mark them as ‘done’ and they no longer show in the list when you view it. Last but not least, I support the ability to scan the list of tasks and tasks collections and prompt the user that they need to hustle if they have a task listed that has a deadline past the current date (termed ‘overdue’ in the Panoply universe).

I am having a lot of fun with this project, trying to hack on it for 10 minutes every other day. My full time job and other obligations don’t allow for more at this point, but Panoply will certainly grow with time.

Fixing pip and easy_install, and getting colorful output from unittest

I’ve noticed in a lot of talks/ videos that people in the Ruby universe that work with tools like RSpec for writing unit tests/ doing TDD often have a nice colorful output. Aesthetics aside, I like the idea of the visual feedback – with red indicating your errors or failures and green or blue indicating your passing tests. It makes it much easier to use the sense of color to get feedback instead of having to actually carefully read the text scrolling on the terminal. There has to be a reason, after all, why TDD often has the moniker red-green-refactor. Searching on this matter, I rather quickly came across pyrg, which seems like an excelled tool.

However, since I kind of overwrote my existing Python installation (OS X 10.6) with the version from Macports, my easy_install and pip were still pointing toward one of my older Python installations. Ergo, pyrg wouldn’t run. After researching for a while I came across this excellent post on Stack Overflow, and followed the instructions to uninstall my easy_install and then reinstall using the distribute package mentioned on there, followed by reinstalling pip.

This by itself didn’t do anything, and I had to make sure in my shell’s rc file that the Macport version of Python was in the path before other, regular places like /usr/local/bin, etc. That fixed it. Run with $ pyrg python_file or $ python python_file |& pyrg. Colors can be easily customized as well. Hooray for going from the colorless test run output on the left to a proper visual feedback on the right!

Colorful

Colorful with pyrg

Colorless

Colorless without pyrg

First Python patch accepted

I blogged about starting open source contributions to Python and how my first patches were the most basic child’s play non-imaginative patches ever possible but yet how it was a new beginning (see the post here).

Finally after quite a bit of waiting I got notifications today that one of the patches has been accepted and committed to the core in Python 2.7, 3.2, 3.3 as well as 3.4 – this is fantastic news, and I am all the more motivated to keep going and to make more patches.

You can see the bug report, the discussions, and the applied fix/ patches here, and in particular look at some of the differences and most importantly my name at the bottom of the page here (I am also attaching a screenshot just because). Now if you ever download the Python source, and look under Misc/ACKS, you can find my name for eternity.

Image

Extremely happy about this development 🙂

Sleep sort in Python

People keep coming up with smart ideas, even though sometimes they might not be useful for scaling up. One such example is that of sleep sort, which works like a charm, albeit I highly doubt that it would work for a long array. It certainly won’t work for negative numbers. Whether it works for real numbers or not depends on the implementation. Here’s a simple Python script I wrote, taking from the original idea, that works for integers. The idea is simple – for each number in an array, read the number, spawn a new process, count to the number, and then print the number.


"""
Sleep sort using Python multiprocessing module
Idea:
– read a number n from a list
– spawn a new process
– within the process
– sleep n seconds
– print that number
Question – for how many numbers would it work?
"""
from multiprocessing import Process
import time
def worker(n):
time.sleep(n)
print n
def main():
numbers = [4, 3, 6, 1, 5, 7, 0, 9, 6, 8]
for n in numbers:
p = Process(target = worker, args = (n, ))
p.start()
if __name__ == '__main__':
main()

view raw

sleepsort.py

hosted with ❤ by GitHub

I am using multiprocessing, which is allegedly better than threading, because it side-steps the Global Interpreter Lock in Python by using subprocesses rather than threads.

Oftentimes in computing you just have to look at an idea and smile at its beauty, even if you won’t really ever use it in your work.

Fast Vector Multiplication in Python

Multiplication is an expensive operation. If your code has to perform multiplication thousands or say, hundreds of thousands of times, your code will be slow. Especially in a dynamically typed language like Python if you are just using the vanilla language to perform said computation.

After trial and error, trying hard to speed up my code, I stumbled upon certain methods available in numpy, which helped in more than doubling my execution speed. I won’t talk about the actual use case for it, but let me illustrate with an example. Both my actual use case as well as this example talk only about multiplying multi-dimensional vectors efficiently.

Assume that you have two 100-dimensional vectors that you want to multiply component-wise. How does the execution time change with the number of times you perform the computation? How does traditional Python compare against Numpy’s dot method? The code for this experiment is available here.

As you can see in the attached image, the speed up with using numpy.array and numpy.dot is up to 22 times. The y-axis of the graph represents time taken in seconds, and the x-axis depicts (in 1000s) the number of times the two vectors were multiplied component-wise. The red line represents traditional Python and the green line represents numpy.dot. Of course, numpy is using C to achieve this, but kudos to the numpy

Python vs Numpy

Python vs numpy for multiplying vectors

developers for making lives easier for those who only want to use Python (or do not want to reinvent the wheel, or do not want to make terrible mistakes in something that is otherwise taken for granted).

Beginning Open Source Contributions to Python

Open Source is a great way to learn and grow – to look at huge codebases written by world class developers, and to be mentored by them when you try to write a patch for an existing issue. You learn the entire workflow of starting a patch to having it committed to the central source code repository, which can be a very enriching experience. The discussions that accompany the process are also priceless in terms of how much you learn about software development as well as practicality.

Earlier this year I had decided that I would start with contributing to the Python project. Being one of my favorite languages with a vibrant, large and intellectual community (and also increasing industry support – check out the sponsors at PyCon 2013), it was a natural choice. I might decide to get involved with the Scala community at a later point as well – which is fast becoming my other favorite language 🙂 But that story belongs to another blog post.

A lot has been written about how to start contributing to Python here, here, and here. After the idea had been incubating in my subconscious for too long, I decided it was now or never. As a first step, I downloaded and built CPython on a Linux virtual machine, but gave up because of several issues – mostly it was very inconvenient to fire up a virtual machine every time I wanted to do something with the source, and Virtual Box does not really provide the smoothest experience. I decided to build the CPython source on OS X eventually, which also worked without issues.

The official Python bug list is maintained here. A natural first step to starting is to look for some ‘easy’ task – specifically, at the tests that accompany the project under consideration, and to try to refine and enhance them, thereby increasing the test coverage. A lot of people also start with documentation fixes, which is obviously a lot easier than actually trying to take a stab at the code. Furthermore, the parts of Python that are written in Python would probably be easier to handle than the parts that are C. Still, it could be a daunting task to actually find an issue to work on. Luckily for me, I found something between the two extremes -coding and documentation, to get my hands dirty.

The official Python documentation has a lot of example code, and there is a mechanism which automatically tests the code and the results given in the documentation against what actually happens when Python runs that piece of example code. As it turns out, because of several reasons – some legacy, some pedantic, and some errors and omissions – there are a lot of failing test cases for doctest (the automatic mechanism that checks the example code against reality). I found some easy changes to make that fixed some of the issues, and submitted two patches, here and here. The discussions and even the code diffs should be visible publicly, even if you do not have an account on the bug list.

The community has been great, very welcoming and very professional. Of course, the next steps are for me to look at the comments made by the core committers on my patches, and enhance the patches themselves until they are good candidates for being accepted to the codebase. A common question that arises is how exactly to bundle the changes, and which changes can be bundled as one patch and which ones should be a patch of their own. Another one is to make sure that the problem that the patch is trying to address is also solved for the different versions of Python – 2.5, 2.6, 2.7, 3.1, 3.2, 3.3 and 3.4 – which, to be very honest, is the only thing I really dislike about Python – the existence of so many flavors concurrently. It also complicates my workflow a little bit – to make sure I have access to all these versions at any given time.

All the same, I am hopeful that these patches will eventually be accepted, and also that eventually I’ll be able to make contributions to the codebase itself which would be more valuable than what I have done so far – which might pave the way to the possibility of becoming a core committer – which is ambitious for me at this stage and depends upon how much time I would have in the near future. Very proud at the prospect of having my name listed along with all the contributors to the Python project!

My first Hackathon, and lessons learned in Git and Python

Out of the blue I had the idea this summer that I should be doing a hackathon. The idea of collaborating with someone in real time with the constraint that we want to build something within two days seemed appealing to me. I only collaborated with one other person, but it turned out great. We brainstormed, determined something concrete we wanted to work on, and hacked away for two days in 7-hour sessions (so, yeah, it was a mini semi quasi hackathon). I did not only have fun with it, but also learned some small nuances about Git and Python.

Specifically, we worked on collecting random tweets using the Twitter API, and trying to associate those tweets with Wikipedia articles. I worked with the Twitter API (the search API) and performed tons of preprocessing and cleaning of the data, removing noise, and deducing for each tweet Wikipedia surface forms present in the tweet (which then become the keywords associated with the tweet). My collaborator worked on deriving a tree-like structure from the Wikipedia category hierarchies. The stepwise details of how we organized it can be found here, and some of the code is here. There’s unfinished business to attend to, but we intend to work on the problem again sometime later this summer.

Also specifically, I learned what to do in cases where you’re using Git and execute an mv command followed by a git commit. It has the effect of showing the new names as being staged for the next commit while the older ones are shown as listed under deleted files. However, you cannot just delete the older files listed like that because they no longer exist; you moved them to the new names. I found this blog which helped me get it done, especially the git rm `git ls-files – -deleted`command. The screenshots show my actual before and after; see how Git is smart enough to know that files have just been moved, after you’ve done the step the linked blog outlines.

before

Before the magic

after
After the magic

I further had the chance to use some of the best practices and idiomatic usage in Python, found here, in particular the parts about using get and setdefault for dictionaries, naming conventions and some optimization used in building strings from substrings. All in all, a fantastic experience, with more to come!

Python Koans

As of today, I am a Python Koans graduate =)


291 test cases, one after the other, taught me a lot of things that books probably could not have taught me. I learned the nuances of multiple inheritance and __setattr__, __getattr__ and __getattribute__ in particular. It’s amazing. My next Koans would be most probably JavaScript.

Shout out to Greg Malcolm for going through the trouble and making it available. A must for any Python developer.

Cheers!

Lessons in Python multiple inheritance

While hacking away at the Python Koans, I came across multiple inheritance in Python, and thought it worthwhile to blog about some specifics with some concrete code. The code and explanation are here, and talk about

[1] Priority in multiple inheritance when multiple super classes contain methods that have the same name

[2] Priority in multiple-stage inheritance when the super class as well as super-super class contain methods that have the same name

[3] The method resolution ordering in Python multiple inheritance.

[4] The effect of calling super() in Python.

I’ll be back with more.

A few Python ‘Gotchas’: sorted() and list.extend()

I recently got bitten by a couple bugs in a project I was doing in Python and thought I’d share. Short and sweet, I’ll get directly to the point.

The sorted() method in Python does not sort an iterable in place. It returns the sorted iterable. Remember to catch the sorted iterable into a new variable.

my_list = [5, 3, 1, 4]
sorted(my_list, reverse = True) # Useless, unless you are printing it directly
my_list = sorted(my_list, reverse = True) # Correct

The next one is a bit more subtle. The extend() method to a list is useful if you want to add the contents of a list _individually_ to another list. But if you are adding a string to a list, the extend() method will split the string, treating it as a list of characters. The following example will make it clear.

my_str = 'randomness'
my_list = []
my_list.extend(my_str) # my_list = ['r', 'a', 'n', 'd', 'o', 'm', 'n', 'e', 's', 's']
my_list.extend(['hello', 'world']) # my_list = ['r', 'a', 'n', 'd', 'o', 'm', 'n', 'e', 's', 's', 'hello', 'world']

Small things. Must remember. So long, and happy Pythoning!