My first Hackathon, and lessons learned in Git and Python

Out of the blue I had the idea this summer that I should be doing a hackathon. The idea of collaborating with someone in real time with the constraint that we want to build something within two days seemed appealing to me. I only collaborated with one other person, but it turned out great. We brainstormed, determined something concrete we wanted to work on, and hacked away for two days in 7-hour sessions (so, yeah, it was a mini semi quasi hackathon). I did not only have fun with it, but also learned some small nuances about Git and Python.

Specifically, we worked on collecting random tweets using the Twitter API, and trying to associate those tweets with Wikipedia articles. I worked with the Twitter API (the search API) and performed tons of preprocessing and cleaning of the data, removing noise, and deducing for each tweet Wikipedia surface forms present in the tweet (which then become the keywords associated with the tweet). My collaborator worked on deriving a tree-like structure from the Wikipedia category hierarchies. The stepwise details of how we organized it can be found here, and some of the code is here. There’s unfinished business to attend to, but we intend to work on the problem again sometime later this summer.

Also specifically, I learned what to do in cases where you’re using Git and execute an mv command followed by a git commit. It has the effect of showing the new names as being staged for the next commit while the older ones are shown as listed under deleted files. However, you cannot just delete the older files listed like that because they no longer exist; you moved them to the new names. I found this blog which helped me get it done, especially the git rm `git ls-files – -deleted`command. The screenshots show my actual before and after; see how Git is smart enough to know that files have just been moved, after you’ve done the step the linked blog outlines.


Before the magic

After the magic

I further had the chance to use some of the best practices and idiomatic usage in Python, found here, in particular the parts about using get and setdefault for dictionaries, naming conventions and some optimization used in building strings from substrings. All in all, a fantastic experience, with more to come!


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s