[Adapted from David Patterson’s article The Trouble with Multicore IEEE Spectrum magazine, July 2010]
The semiconductor industry has, as of the last 30 years now, been focusing on putting several microprocessors on a chip. But this has been being done with no clear notion of how such devices will in general be programmed.
Why then, has the industry taken such a gamble, just hoping that someone someday will be able to figure out how to program multiple cores/ processors? Well, it turns out, there was no alternative.
For decades, the burgeoning trend in the industry has been to squeeze as many transistors as possible on a chip. What further pushed processing power further up was the advent of microprocessors that could do several things at once. The continually reducing size of transistors and the consistent upping of microprocessor rates worked out quite well for a considerable amount of time.
However, around 2003, the whole process became stagnated. Why? Because the operating voltage could not be reduced any longer. Adding more transistors therefore caused the amount of dissipated heat for each square millimeter of silicon to go up – hitting the power wall. Try to add more transistors to a standard chip now and keeping it cool will become a problem. After all, as David Patterson puts it in the article, nobody would want a laptop that burns your lap.
So, heat problems, and failure to increase performance of a single chip has led designers to shift their focus instead on assembling multiple cores on a chip. Potentially, with several low-end microprocessors working together in parallel, you can have much more computing power.
Welcome multicore microprocessors, or many-core microprocessors.
So the major trend change has been, instead of focusing on how to pack multiple transistors on a chip [using efficient circuitry techniques], now focus on how to pack multiple cores on a chip. The core has become the new transistor, so to say.
So why does all this make programming these chips difficult? Or why is it that we are not able to fully utilize the computing power provided to us by the standard chips shipping from Intel?
For starters, not all problems could be transformed into several smaller problems – problems that are capable of running in parallel independent of each other. Complications will arise if one of these parts cannot be completed until the other is finished. All the several parts will also have to be timed such that they finished together – otherwise the other segments will keep waiting for any segments that are still running.
Technical terms for these problems are load balancing, sequential dependencies and synchronization. And it is the job of the programmer to handle these problems. Hence the challenge.
One hope was that the right parallel programming language will make parallel programming straightforward. APL, Id, Linda, Occam, SISAL – languages have come and gone, some have even made parallel programming easier, but they haven’t succeeded in making parallel programming as fast, efficient, and flexible as traditional sequential programming languages. Hence they haven’t become very popular either.
On the other end of the spectrum certain visionaries believed that if they just designed the proper hardware, things will become a smooth sail. That idea hasn’t worked out so far either.
Automatic parallelization of programs using appropriate software hasn’t been much of a success either. While this has shown to be effective for up to 8 cores, for any larger number of cores the usefulness of such an automatic parallelizer is looked upon with skepticism. Research has been going on further in this area.
Having talked about the negative aspect, let us now try to see the bright side of things. One area in which parallelism does work is when you can have a bunch of smart programmers divide a problem into several parts that do not depend much on each other. ATM transactions, airline ticketing, Internet search are some examples – essentially it is easier to parallelize a problem where a lot of users are doing the same thing, rather than a single user doing something complicated.
Another success story happens to be computer graphics – where several unrelated scenes can be generated in parallel. At a much more complicated level, some algorithms have been discovered to parallelized computations of single images too. High end GPUs (Graphics Processing Units) may contain hundreds of processors.
Scientific computing and weather prediction are more such examples.
To summarize so far, data parallel or embarrassingly parallel problems are prone to be easily solved using parallelism. Another important point to note is that it usually takes hordes of doctorates and efficient programmers to fully utilize the computing power provided by multicore processors – and desktop level applications simply lack that kind of intellectual horsepower behind them.
As more and more people start working on the problem of parallelization, there is increasing hope. Programmers are mostly focusing on dual and quad-core processors for now. Besides, while programmers in the past depended on the chip makers to keep giving them faster and faster chips to be able to handle bigger and bigger problems, now they cannot depend on the single chips to get any better, so they have to put some effort in inventing the right way to program multicore chips.
Nevertheless, instead of finding an all-encompassing way to convert every piece of software to run on many parallel processors, rather naturally the trend is to develop a few new applications that can take advantage of the many-core processors. One such application is speech recognition.
One problem that the researchers are facing is that many-core processors are not yet being designed, and simulating a 128-core processor with software will also be complicated. A way around it is using field-programmable gate arrays (FPGAs).
To conclude, there are several possible ways in which the industry and programmers can move now, and it is going to be very interesting to watch how things develop over the next decade.