Latest Entries »

Here is a completely new and fully working (out of what I have implemented) blog entry. I am liking this setting a lot; might eventually transform my whole blog to this new setting, built on Twitter’s Bootstrap (CSS and JS), Picons free icons, google-code-prettify and my own stuff.

Currently the promised post on how to refactor a C program is up. Happy refactoring!

Teaser for my new blog

Teaser for my new blog :) Still under construction… but this is how it will look like. Taking help from Bootstrap’s CSS/JS, google-code-prettify, free icons from Picons, putting it all together in Dreamweaver and adding a blog post about some refactoring that I did for a C application.

[There will be a post here about the C refactoring]

The masterpiece

Teaser for my new blog

I finally jumped on the NoSQL bandwagon and gave Redis a try.

I’ve been hearing about NoSQL for quite some time as a lightweight but much faster database system (the speed and ease being the advantage over RDBMS, with the disadvantage being the lack of relations). One of the several NoSQL systems is Redis, which I read about recently.

It is essentially a flat dictionary, that stores keys and values, but the values themselves can be data structures like sets, lists, dictionaries, ordered sets etc. It is an in-memory data store, and it is also persistent – meaning once you populate the dictionary and have the server running, the data is there forever. You can stop the server and restart it, and come back and the data you uploaded will still be there.
The main advantage is that it’s blazingly fast, and has benchmarks of 100,000 set and 80,000 get operations per second.
I installed Redis on my work machine, which was 4 lines of command line instructions. I next started the Redis server. Then I installed the Python client to Redis, called python-redis, which is available from the Ubuntu repos. Then I wrote a script that reads through the n-grams and uploads them into the Redis data store. The script is available from my GutHub account.
I chose ordered sets for this – it seemed to be a good idea to have 5 dictionaries, one for 1gms, one for 2gms, etc, and have the following structure (each of them is an ordered set):
key = {1gms/ 2gms/ 3gms/ 4gms/ 5gms}
value = ordered set, all n-grams with the scores
It took 20 minutes to upload 13 million 1gms into the store, and now the look-up is instantaneous. And the best thing is that as long as the Redis server is running on my work machine, anybody can access the store from anywhere.
The steps are (from Python CLI/ REPL)




import redis
rs = redis.Redis('localhost') # or the host address
rs.zcard('en:1gms') # will return the cardinality of the ordered set 'en:1gms'
rs.zscore('en:1gms', 'hello') # will return the count for hello


Even if I stop the server now and restart it, all these counts will be there. Redis has methods for sets, dictionaries, ordered sets, lists, etc. All that you see above, starting with ‘z’ are meant for ordered sets.

Of course it doesn’t do smoothing, but I read about Redis and I was excited to try it. I can now use Redis for anything that has large data and needs fast look-ups =D

Obviously NoSQL is just a flat dictionary, but I am sure Redis is using some efficient mechanism for storing the data. It is written in ANSI C, so it only makes sense. Besides, Redis comes with ‘batteries included’ in that it has the server which, once running, can serve any client; a persistent data store, which comes back alive even if the server is stopped and restarted; the values don’t have to be strings but can be more complex data structures themselves, and finally, it has clients in a number of languages.

Redis provides quite a few benefits over a classical RDBMS, as enumerated here – for example better, more efficient data structures. However, a caveat is that once the data becomes greater than the memory and the system starts paging, the performance degrades radically. So perhaps this method is not the silver bullet for looking up all n-grams instantaneously if you don’t have enough memory. But it can still be useful (and it’s easy to use, and it comes with batteries included as mentioned earlier) for several other scenarios.

Go learn yourself some Redis :)

JavaScript’s Peculiarities

I recently started learning JavaScript properly [the last word being the keyword]. I found a totally awesome online book, Eloquent JavaScript, and its best feature is the presence of a console and a coding area right below the text of a chapter itself. During my reading of the first few chapters, here are some points that stood out to me about JavaScript, which I have summarized below:
- When you use == to compare entities, JavaScript tries to convert the value types. So if you do not want automatic type conversion to happen, use === or !== instead
- When concatenating with a string, numbers are automatically converted to strings, e.g. “Apollo” + 5 produces “Apollo5″. “5″ * 5 produces 25 on the other hand, because JavaScript tries to convert the string to a number
- NaN == NaN and NaN === NaN both return false. The way to check whether a number is NaN is using the function isNaN, e.g. isNaN(NaN)
- A return statement with no expression results in returning undefined
- A block of code within braces does not produce a new local scope – only functions can produce a new scope
- JavaScript’s == operator is equivalent to Python’s is keyword
- Arrays in JavaScript can be non-homogeneous, i.e. they might have values of all types mixed together
- In the JavaScript Date class, the month numbers start from 0 and go up to 11, which is confusing because the date numbers do start from 1
- JavaScript will not warn you if you try to use a variable name that has been taken before. It will silently execute if you assign something to var max, even though Math.max is a stardard that already exist
- [Personal opinion] Since JavaScript is not statically typed, its advanced constructs like higher order functions can become really confusing when you need to determine which arguments are functions and which arguments are values/ variables

More to come.

Many have heard of the good things that Vim can do. Many propose and maintain that once you take the time to actually learn Vim, it will be your last editor, ever, because you’ll never want anything else. Many claim that nothing can beat the awesome combination of simplicity and power that is Vim.

This is a neutral description of some Vim stuff that I learned recently, and I’ll let the readers decide whether they conclude the same as the others above.

Apart from learning some built-in commands and keyboard shortcuts in Vim from the ground up, like here, I think it is also essential to actually learn some VimScript to be able to fully leverage the power of Vim – for example here. Finally, after having familiarized yourself with these, the next step would be to actually start looking at some of the plugins that exist in the Vim repository. Installing them is mostly easy – some of them can just be copied verbatim to your own .vimrc file.

Imagine being able to insert something at the end of every line in a block of code in a few keystrokes. Imagine being able to delete, find, replace, edit anything in a line without using the mouse, and without pressing the arrow keys at all. And it doesn’t end there.

Consider, for example, how helpful an automatic code/ snippet generator can be – something that enables you to, for example, press for followed by two tabs and automatically inserts for(int i = 0; i < count; i++) { // Code }. Imagine that on top of that, you place the cursor on top of the i and type another name, and the name changes in all the three places where it appears. Cool, isn’t it? Think of the time it saves, and the errors that it precludes. And, you guessed it, a plugin like this already exists, and you can even see it in action in a video before you decide to install it!
Happy Vimscripting!

Growing up in another country, I did not even dream of how technology and measurement can actually make our lives better, healthier, and yet easier to manage in a way that I am about to talk about. Extensive and intermittent reading on the Web, and living in this country and having easy and inexpensive access to the technology and resources, and the course of time - all these things together have collectively changed a lot in my everyday lifestyle, and I must say that the change has been for the better.

Consider for example nutrition. We know how our bodies need a specific amount of energy every day. If we work out or play any sports, the amount of energy required for these activities is added. If you want to be healthy, you must avoid certain foods, eat more of other foods, and exercise. The basic equation for maintaining weight plays with the balance between how much the body is burning and how much we are eating. We all know that. Yet wouldn’t it be better if we could break this down to a quantitative level of detail, so we would know exactly ‘how much’? If something is worth doing, it’s worth doing it well. Why do half things? Why do things based on intuition? We can do better than that. And it’s easy. How?

Buy a measuring cup. You know that 1 cup of brown rice gives you (give or take) 250 calories. So use 1 cup. Don’t use a handful. Don’t do it ‘by feeling’. Do it accurately. Do it right. The same goes for measuring spoons and weight scales. These things are inexpensive on Amazon. 6 oz of white meat is what your portion should be. And it’s easy to measure. Buy a George Foreman grill – so you can grill your chicken breasts and don’t have to use a knife/ pan/ baking pans etc etc. Get an electric rice cooker. Make life easy for yourself – we’re talking about appliances that cost less than $15 each!

But it doesn’t end there. Use LoseIt – something I came across when I read Scott Hanselman‘s blog entry here. It’s an incredibly easy and useful application and it’s free. Tell it what your target weight is, and what your current weight is, and it will tell you how many calories you need to consume every day to reach that goal. Add every day everything you eat, and it will collectively count the calories for you. If you eat something that doesn’t exist in LoseIt’s database, you can ADD it – get the nutrient information from the Web and just create a new food item, and it’s there for you to use later! Graph your weight progress, and your eating habits. The calories you spend working out are subtracted from the overall calories for the day. Find on the Web how many calories you spend doing a sport if you don’t already see it in LoseIt’s database. See weekly reports. Measure it accurately. Don’t get me wrong – you don’t have to go to the extreme – no matter what you do it will still be slightly inaccurate – what with that odd extra tablespoon of oyster sauce you put in your chicken – but it’s still way better than not doing it at all.

I only wish I could have this app on my desktop. Maybe there are alternatives out there. But I like it so far, despite the fact that it’s on the Web.

Okay, now let’s talk about work, professional stuff, and productivity. Do you know how much of your time every day is wasted in email, surfing the Web, and contemplating stuff? Do you want to nail down the exact amount of time you are spending on a project, and see nice graphical reports? Use Toggl - which I came across thanks to Vanessa Hurst‘s blog entry here. The basic app is free, despite being on the Web (again, maybe there are desktop alternatives – and I would like to have something like that locally installed). But the interface is great and very useful. Create several projects. Start tracking time when you’re actually working on your project, and then stop tracking when you get to your email or some random activity. Start tracking again (maybe another project this time) when you get to it. See how many hours you REALLY work in a week, and not just sit in front of your computer. Eye-opening, isn’t it?

Technology today, is awesome. And it can make our lives better, healthier, more efficient, and easier to manage. We just need to figure out the right tools for the job, and then use them smartly.

Measure it – because if you don’t measure it, you know nothing about it. Leave alone being able to manage it.

 

Peculiarities of Python

Hi all,

After a short hiatus because of my summer internship, I am back to blogging, and what better way to start again by summarizing the peculiarities of Python [for people coming from a Java/ C++ background]. So, tighten up your seatbelts, and here goes:

[1] Python supports multiple inheritence

[2] Python’s self is the same thing as C++/ Java/ C# this

[3] There is no function/ method overloading in Python

[4] All methods in Python are by default virtual, without the explicit use of the keyword

[5] Python has an in-built copy module that can copy arbitrary Python objects

[6] Object identity in Python is represented by the keyword is – unlike the == in Java. The == in Python always represents value equality

[7] Any class in Python can override the __len__ method, which can dictate what the len function returns. If a class defines the __cmp__ method, its objects can be compared using ==.

[8] There are wrapper classes like UserDict and UserString that extend/ modify the functionality of base classes like dict and string – you may not directly derive from dict and string

[9] Member variables are called data attributes, and they are defined inside the __init__ method. Static variables are called class attributes, they are defined outside the __init__ method, and do not have the static keyword

[10] There are no constants in Python

[11] Nothing is protected in Python

[12] Anything whose name starts with __ and does not end with it is private (function, method or attribute), and everything else is public – the keywords private and public are not used

This is just the beginning, more to come.

The deque class in C++

Going through some really valuable and practical C++ issues from this resource from Stanford, I came across the deque STL class in C++. Seems pretty interesting to me. Although there are performance issues to take into consideration, seeing as, unlike the vector class which ensures that all elements come from contiguous blocks of memory, the deque implementation is a little different in that the elements might be spread out across different pages in the memory. However, it does allow for pushing and popping from both ends of the queue (hence the name dequedouble ended queue), and in certain situations when the queue is going to be small and you are frequently going to be pushing and popping at both ends, deque is a really invaluable resource to have in your toolkit. This location very nicely and thoroughly differentiates deques from vectors. One of the exercises in the course reader was a ring buffer, which I implemented and posted here – really straightforward and simple to implement.

Graph-plotting in Python

Having played with other options of plotting graphs, most recently I’ve settled with Python, and in particular matplotlib and pyplot. Pyplot is a part of Matplotlib, which in turn is Python’s answer to Matlab. Really neat and easy to use. I recently

Pyplot/ Matplotlib

Pyplot/ Matplotlib

generated about 120 graphs using an app I wrote, which is called the Disruptive Set Analyzer and can be downloaded here. The readme and the code should be self explanatory. The thresholds are determined empirically, and to summarize, this app can be used to generate plots like this for any scenario where you want to compare two systems, where each system is producing some kind of a numerical score for a bunch of same instances. The instances themselves are divided into different sets, which are also produced by the app in the form of text files. I am currently planning on making this available through the Web, where the user uploads a CSV file and a pair of systems they are interested in, along with the threshold values, and the backend generates this plot. Next, the user can hover the mouse on the individual plot points and the actual instance [it might be a string or anything] will then be revealed. This might be accompanied with a legend of sorts at the bottom of the plot. However, these are just an added layer of user interface, so it is at lower priority for me right now.

PS: in the code you see that I’ve identified by comments the part which makes it possible to produce graphs like this and save them in PNG format even if you only have a non-X-based shell access to the machine on which you are running the code and producing the graphs.

This is not going to be a complete comparison of everything that the two languages have to offer, but a beginner’s first look at how the two languages look and feel different. Expect a lot of additional posts on additional differences and similarities in the future.

Apart from Python being dynamically typed, interpreted and C# being statically typed, compiled, here are some subtle points of interest:

1. In C#, you can use {0}, {1} etc. as placeholders for variables in a format string, in a Console.WriteLine() function call, and the format string is followed by a comma separated list of the variables used. In Python (in Python 3 that is, not in Python 2), you do that by calling the method format of a string object, which has the {0}, {1}, etc. embedded, and the string is therefore followed by a dot, which is followed by the keyword format, parentheses (hence, a tuple) containing all the variables.

2. In C#, using the directive using <namespace name> is sufficient to include everything from the corresponding namespace into the current file. In Python, even after you use import, you need to qualify the name of the corresponding entity with the name of the namespace, e.g. random.randint(). If you want to avoid doing that, you may either do from module import *, or from module import entity. You may use something like System.Console.WriteLine() in C# without saying using System, but you cannot do something like that in Python.

3. In C# there are no performance differences whichever variant you use. I am not sure what happens under the hood in Python. I will have to get back to you on that one.

4. In Python, once you install a module, you can import it easily from anywhere. In C# on the other hand, some assemblies (read, modules for all practical purposes), are located in different directories and the compiler will need to be told explicitly about that.

5. Being a compiled language (with a compiled bunch of files called an assembly), C# (or any .NET language, for that matter) gives rise to things like the IL (CIL or MSIL), metadata, manifest, and by extension assembly viewers (disassemblers), decompilers, etc… we don’t have these problems in Python. Python does have py2exe which generates a compiled executable for the Windows platform. However, py2exe apparently only keeps the byte-compiled version of the code and not the raw one, so as it appears right now, it might not be that straightforward to get your raw Python source code back if you only have the compiled version. In a C# assembly, you can get back to the actual code by using decompilers, e.g. Reflector.

That’s all for now.

PS: WordPress, please allow using C# as a tag, don’t automatically convert C# to C++ !! For now, I am using C-sharp as the tag:-)

Follow

Get every new post delivered to your Inbox.