I am sure that a lot of you live or work in a building which has a battery of elevators to handle intra-building traffic. While I was (ironically) waiting for the elevators, I wondered why do the elevators require people to first request for up/down and then, inside the elevator, request for the proper floor. Wouldn’t it be more efficient if you could request the destination floor right at your current floor?

Imagine that instead of up/down buttons and up/down lights for each elevator on each floor,  you had a panel of buttons arranged vertically with lights for each elevator adjacent to it. Something like
[1] r g b y
[2] r g b y

where [1], [2], etc. are buttons and r g b y… are colored lights for elevators. When you want to go to a given floor, you directly press the floor’s button on the panel and the system immediately schedules an elevator to pick you up (you see that elevator’s colored light light-up on the panel). Then you go and walk to your elevator… when it comes, enter it, and then leave at your destination. The panel of buttons inside the elevator is optional. However, inside the elevator, it should give you an indication of which floors it is going to stop.

I think the biggest gain from using this approach is that you can schedule elevators efficiently. This is because the system has very early knowledge of destination floors. Thus, if two people want to go to the same floor, the system brings up only one elevator. At the same time, if two people want to go in the same direction but different floors, the system may bring up two elevators for them. This, this allows distribution of “floor load” equally among all the elevators instead of depending on the people to avoid flash crowds to a given elevator.

Another gain is that people don’t have to think about which floor they are currently on and decide whether they need to go up or down. All they need to think about is the destination floor (which they already do). This also reduces crowd movement inside the elevator near the panel where space is much smaller than a floor’s lobby.

The disadvantage of this approach is that you need more number of buttons, lights and wires installed on every floor. However, the cost for that shouldn’t be very high compared to the elevator system itself.

P.S. It probably would be possible to do an even better elevator schedule if the feedback from the system about which elevator is going to which floor is allowed to come late (as late as when an elevator actually arrives). However, first that doesn’t seem good from a usability standpoint (the user has to wait for a flash or a bell to indicate that their elevator has come). Secondly, that would encourage crowding around the panel in the floor lobby. So, I would let go of this schedule optimization for usability sake.

Posted in Computing, Life | 3 Comments

Pushing the limits in Internet technology

I went through an interesting talk by Geoff Huston at the linux.conf.au conference (video link). The tone of the talk reminds of another paper I read some time back by Bob Briscoe “Flow Rate Fairness: Dismantling a Religion“. While Bob’s argument was about how the fundamental protocols on the internet (TCP) lack a market oriented view of fairness, Geoff’s argument is about how the fundamental protocols on the internet (IPv4) lacks a market oriented view of allocating IP addresses. More importantly, this is a lesson for me again in the importance of business logistics in adoption of any kind of infrastructure technology. While this interplay may not be so important for end user technology (such as a consumer device like smartphones or laptops), it is extremely important for technology which forms the basis of businesses of a large number of independent firms. Geoff argued that the primary reason for lack of IPv6 adoption was a lack of understanding in market incentives. The future for both address allocation and flow rate fairness as we march into the future remains unknown. I look forward to an interesting interplay between technology, openness and market forces in the Internet. We live in interesting times!

related posts: Network Neutrality and flow rate fairness

Posted in Economics, Internet | Leave a comment

(x) amount of RAM should be enough for anybody

Just came across this on coding horror: http://www.codinghorror.com/blog/2011/01/24-gigabytes-of-memory-ought-to-be-enough-for-anybody.html


RAM is as good a resource in a computer as CPU time itself. While CPU scheduling can be done in the order of milliseconds (thread context save/restore + CPU cache re-population) which makes applications appear to run simultaneously, the same cannot be said for RAM scheduling. RAM scheduling requires paging out memory to disk and give the newly freed RAM resources to the demanding application. Disk I/O is slow and a RAM context switch cannot be done in milliseconds. Thus, for a responsive computer, while it is okay to have a CPU good enough to keep UI features for a single application responsive, we need to make sure that the RAM working sets for all applications need to be resident in RAM for a well functioning (responsive) computer.

From a developer’s point of view, given that he can measure and control exactly how much time does it take for his single application to respond on a CPU, dealing with CPU scheduling is generally fine for him. He assumes that his app is the only app running on the user’s system and in most cases, his assumptions about code path lengths for app responsive-ness are reasonably accurate. Memory is a whole different beast. The developer has no idea how much of a working set can he assume a computer to have for its app. The worst case figure is essentially zero, so the developers go by the best case figure of having the entire system RAM to their application. If for some reason, the sum of working sets of all the applications running is greater than RAM, the computer fundamentally cannot effectively schedule those applications leading to a phenomenon commonly referred to as “thrashing” where the computer spends most of its time context switching between RAM working sets without getting any actual work done.

The solution we have all been comfortable with so far has been to keep inflating the amount of RAM in our systems such that the developer’s expectation of the amount of RAM needed trails behind the actual RAM present in our system. We could aim to make it closer to the CPU way of doing things where its possible for the developer to track the working set requirements of their applications and make it possible for them to gracefully degrade application experience with smaller amounts of RAM. A simple malloc() + demand paging in the background doesn’t cut it then. We would need a more elaborate memory programming model where the system would let the application know when it is taking RAM away from it.

Posted in Computing | 1 Comment

Peer-to-peer Google Wave – cloudless style

If you read my Google Wave review, you may also have noticed my comment about a non-cloud way of doing Google Wave. There are quite a number of people who are getting concerned about more and more personal information being uploaded to the Internet and the privacy nightmare it is turning into. Google wave itself seems to be an interesting concept for next generation collaboration on the Internet. It would be unwise not to look at it from a privacy centric view point.

Communication over the Internet is, at a very low level, done via message passing. The network essentially is a collection of computers (clients and servers) and messages are passed between them for any kind of communication. However, with the rise of forums, wikis and web 2.0 sites (and cloud computing in general) the communication pattern at a higher level is no more simple message passing but collaborative editing of documents which live on some server. Suddenly, the actual nodes in the network and the messages between them is no more the conduit of information. The living evolving document is the new, higher level conduit of information. For all practical purposes the collaboratively edited document is the “network channel” for communication between the participating clients.

Google Wave takes this essential idea much further. A wave document is the central focal point for communication between users, robots, external websites and even applications. It would not be a stretch of imagination to give an application a pointer to a wave document and make it automatically collaborate with other participants and applications without knowing about their IP addresses or connectivity status.

The unfortunate part of the story is that wave documents still reside in the cloud and hence will suffer from the same privacy concerns we have been dealing with when using wikis and forums. Peer-to-peer communication would be better for privacy but p2p architectures are still very much message passing oriented. Can we marry Google Wave and peer-to-peer?

P2P waves

It is not very hard to imagine an application which can keep peer-to-peer connections with friends (a friend-to-friend network) and collaborate over wave like documents. The documents would be replicated, similar to the way they are replicated between federated domains in Google Wave. In fact, the easiest way to do this may be to use some VPN software to create a private network between friends and run the open-source Google Wave servers on each node. Some of such VPN software is also not very hard to configure [eg. Remobo, Leaf, Wippien]. However, each user basically maintains his or her own wave domain and the Google Wave servers are probably not designed with such a use case in mind.

Even if we leave behind the re-usability of Google’s reference implementation, some practical architectural problems will remain. The Google wave servers are probably not designed for frequent disconnection in mind. The wave document should be accessible from some authoritative server at all points in time. Also, given the federated authentication, parts of the wave document may live entirely within the user domains first given access to. Adding more users to such parts of the document would require those domains to be online. This always available assumption might hold for cloud storage but doesn’t hold for peer-to-peer architectures… at least the kind discussed above.

P2P waves with redundant but encrypted content

I was thinking about this problem when I hit upon an idea. The basic problem is that if I wanted to attach a private message to a friend of mine on a peer-to-peer wave document then I cannot send the message if my friend is offline. However, if I encrypt my message using a shared key then even though the wave document is shared across the p2p network, only my trusted friend(s) can read the private message. Thus, comes the key insight:

Redundantly propagating encrypted content among peers can help with the “always available” problem with peer-to-peer networks.

However, if we are willing to do this for private messages, we can also extend the same for the whole wave document! Thus, instead of maintaining connectivity with at least a few of the participants in the wave document and sending the wave document in plain text to them, we can now maintain connectivity with unknown peers in a peer-to-peer network and encrypt the wave document with keys shared with just the users the document is intended for. Such an architecture would ensure good connectivity, fast downloads and very importantly uninterrupted availability which is one of the main strengths of storing wave documents in the cloud.

Curiously, a peer-to-peer network of the above nature already exists. Its called Wuala and it is a peer-to-peer file backup and sharing service with attention to privacy, selective sharing of files and file availability. A very interesting tech talk about the internals of the technology behind Wuala is here. Though the Wuala guys have currently focused on just getting the peer-to-peer storage right (which is a tough computer science problem in itself), it is not very hard to imagine having a wave document hosting over it. The essential idea is to take read-only file sharing and make it read-write with some level of revision history and conflict management.

However, we definitely lose on realtime communication with a Wuala like approach. Peer-to-peer storage and lookup will definitely be more latency than a direct connection to a wave server. On the other hand, that may not be a small price to pay for privacy.

Posted in Computing, Internet, Social | 16 Comments

Perspective on Google Chrome OS, iPhone / Palm Pre and Microsoft Windows

Microsoft being in the software business pretty much relies on the fact that they deliver compelling platform value to the hardware vendors to whom they sell their client OS. What strikes me most about the iPhone / Palm Pre development is the fact that they subsidize their hardware to such an extent that it doesn’t make any sense for hardware vendors to sell their bare hardware with Microsoft Software (priced separately) on top of it. You cannot get hardware of the class of iPhone / Palm Pre at the aggressive price point of $200 without already being locked into a platform.

Microsoft cannot just use their old “sell software platform” over commodity hardware to get to the feature list of iPhone / Palm Pre. They will have to partner with somebody or make their own hardware. This already happened with game consoles.

Google Chrome OS is possibly a similar shot at consumer notebook / netbook platform business. Here is a hypothetical scenario: Somebody makes a really slick 3G/4G/WiFi capable netbook. However, instead of them scrambling for a free Linux distribution, Google pays them to install their OS on their netbook and subsidize its price for consumers. It is not such a crazy idea given that Google already pays Mozilla to make Google search the default and subsidize Firefox (making it free, actually) and many craplet providers pay notebook sellers to put craplets on your notebook. The netbook OS itself is locked in to Google technologies and Google webapps. Google makes money off their web platform. Suddenly, there is a price difference (a huge one actually… given Microsoft’s Windows OEM pricing is high compared to netbook prices) and Google has an advantage in consumer netbook space.

However, its not that rosy for Google yet. Though I talked about “Google technologies” and “Google platform”, there is none right now which makes as much money as search. It would be better for Google, if they come up with a bunch of popular webapps that make ad money for them… something which people would use most of the time. Google would then be able to play its huge and successful ad network to its advantage.

What can Microsoft do? Given that its significantly behind Google in advertising revenue, it should play on its other strengths. It is investing heavily in Azure and providing ways for enterprises to transition and operate off the cloud. However, this article is about consumer devices. If Microsoft focuses on integrating Windows with Azure specific services then they will have a way to subsidize notebook / netbooks similar to the above scenario and make money from Azure services. However, its situation is also similar to Google’s… they don’t have a highly successful (money generating) “web platform” yet.

What kind of web platform am I talking about? Some basic services are essential:

  1. Single sign-on across web applications
  2. In-built Internet wide notification mechanism (nice interface over email / IM).
  3. Collaboration platform (not very different from Live Mesh, though something like Google Wave is probably better)
  4. Some basic apps like blog publishing, photo sharing etc.

Microsoft already has such services under the Live branding. However, the critical step really is to release Azure frameworks to enable other developers to tie in to Live web applications. It may not be a bad idea to showcase such apps in an App Store much like the iPhone App Store and make money off that.

Google has an edge that it starts to make money immediately over its search the moment Chrome OS gets into consumer hands. Also, it has Google App Engine which competes with Azure. In a sense, Google has two sources for making money, ad revenue and Google App Engine, and it is strong in ads. Microsoft is not so strong in ads and like Google, is just starting with Azure.

It is also interesting to note that Facebook already has a way for applications to be rolled in a highly popular social network. However, its inability to make money through advertising and keeping the social graph closed inside Facebook makes me feel that its necessary to look at providing Cloud infrastructure as a means for making money. Instead of keeping a closed social network, may be its a better idea to open up the network and compete for webapp developers who would target your cloud platform. Of course, all of this is great for the consumer who gets more and more free applications and subsidized hardware. However, this also means that traditional desktops like the Windows desktop, Mac OS X, KDE and Gnome don’t cut it for the netbooks… and the difference between netbooks and notebooks widen as the developments in the two different ecosystems deviate from each other.

Posted in Computing, Internet | 3 Comments

Google Wave is impressive

So, if you haven’t noticed yet, Google Wave happened recently. If you have like an hour to kill for some really impressive tech demo, you might want to take a look at the video on the link.

Its a fairly ambitious project, one that merges email and chat into one seamless application. I call it ambitious because it tries to do the following simultaneously:

  • Merge real-time (chat) and asynchronous (mail) like communication together. They had that going on in a somewhat crude form (compared to google wave) with gmail’s chat.
  • Instead of independent messages being sent either way it more about editing a collaborative document. The concept is fairly close to a wiki except that its also real time.
  • It has a decentralized architecture where pieces (also known as wavelets) of the collaborative document can originate from multiple domains and can be edited. The various pieces or wavelets can have
    • Domain scope: some parts of the collaborative document may remain private in a set of domains.
    • Revision history: every edit is stored so that there is a revision history and accountability of every change made to the collaborative document.
  • The document model itself is pretty general and allows many views (and consequently many google wave “applications”) to be built with the existing framework of de-centralized real-time collaboration.

So its basically a decentralized, real-time wiki with a flexible document model ready for making applications and guess what, email are chat are the most basic things that can be built starting from that.

Comparison with file syncing services

You may also compare this to Live Mesh where the basic idea really was some collaboratively edited cloud storage with edit notifications available as an activity feed for applications to tap in. In fact, when I played with various file syncing services, such as Live Mesh and dropbox, the limitation which stood out most was the inability of applications to manage conflicting changes. Changing the underlying document model from essentially a large chunk of bytes to hierarchically organized, change history retaining document model is just what applications need. Of course the applications need to be redesigned (separation of the actual document and the changes made to it) but that may be a necessary step to achieve what Live Mesh set out to do in the first place. Of course, I am not sure if Live Mesh was supposed to be either decentralized or real-time.

Note that even though the implementation of google wave demoed at Google I/O was an HTML 5 application, there is nothing stopping a normal desktop application like Microsoft Office to adopt a similar document model.

Comparison with Social Networks

Does existence of google wave necessarily obsolete social networking sites like facebook (i.e. sites that provide the social network infrastructure and not the specialized social networking applications such as dopplr, last.fm, digg)? I think they still have a place because social network communication over facebook is very public and that public meme is a value that facebook provides in addition to a collaborative infrastructure. However, I am pretty sure there will be somebody out there who will take google wave’s reference implementation and just add enough “public” mantra to it to make a competitive social networking platform.

This is of course great news for social networking applications such as last.fm, dopplr and digg who can now use google wave for their social networking infrastructure.

Will it be robust and glitch-free?

Managing concurrent real-time editing of a document originating in pieces from several domains is of course not going to be glitch-free :). However, there is quite some stuff to be learnt from decentralized source code management (tools like git and mercurial). These guys have been dealing with decentralized conflict management and revision history since quite some time now. I guess the following additions to the wave implementation will make it more robust than just keeping content (wavelets) in authoritative domains.

  • A wave domain should cache the entire wave aggressively (at least the parts it has access to).
  • The document model should be built such that it is incrementally consistent i.e. if a set of changes from a domain or a user is removed then the document isn’t corrupted.
  • Each incremental edit should be checksummed for integrity against the entire history of edits previous to this edit. This ensures that edits originating in different domains agree on consistency of their edits. This also requires that we revoke edits that happen over an older revision (the whole “rebase-ing” concept in git applies here… and should be possible to do automatically if changes are synced sufficiently in real-time).

Wave without the cloud?

I am sure we will all be happy to see Google or somebody else provide free and nearly unlimited storage for wave documents. Like gmail, it should be easy to monetize for the cloud expenses using ads. However, I was thinking, if it was possible to build it using p2p techniques alone. Given git like consistency and aggressive caching, it may not be a bad idea to let the client be authoritative on its wave documents but be offline too. It also achieves storage redundancy automatically so one may be able to just start a fresh client and sync up its waves from its peers if it has the appropriate signing key. Git itself may not be suitable for building this as git doesn’t provide a standard incremental consistency model (I am thinking basic DTD document type check). However, it should be fairly easy to adapt the core git storage with such a type check, handling wavelets from different authentication domains and wrap it in a real-time network layer to make such a client.

The bigger problem really is maintaining connections to a large number of peers in the network. However, NAT traversal schemes should help.

Posted in Computing, Internet | 2 Comments

Many-core task parallelism: low overhead task switching

With the upcoming multi/many-core processors, there is an industry wide push towards programming for task-parallelism. Without it, programs aren’t getting any faster. There are high level programming constructs such as OpenMP and ConcRT which can be used at the programming language level to exploit task parallelism. However, one often forgets that every task runs within a context (usually a thread context) and switching between contexts (for synchronization or fair scheduling) is costly. It is generally recommended that one should try to expose as much parallelism in their programs as possible and let a runtime schedule them on the cores that are available on the system. Of course, with that comes context switching cost till there comes a time where dividing up work into finer parallel chunks leads to poor performance due to a large amount of context switch overhead.

At the hardware level, task parallelism allows the processor to schedule instructions from multiple streams together on its functional units. It can do so because the individual tasks are supposed to be independent instruction streams. However, with it also lies the caveat, these streams often contain a large amount of state which the hardware manages independently and which the system software has to save/restore on task switch. While very fine grained task parallelism seems like an elegant goal to achieve with software in general, after a level of parallelism granularity, task switch overheads can spoil the entire show. [A task switch in (system) software involves saving and restoring the entire register state including vector registers which can be a large amount of task state].

Fortunately, there has been some work (in computer architecture) to deal with this problem. One of the nicest examples I found was the Tera Computer system. Its architecture is detailed in this paper [PDF]. There are two notable features of the Tera Computer (later the Cray MTA) which enabled low overhead task switching:

  1. A large number of hardware thread contexts. The Tera Computer processors had 21 stage pipelines. However, a single processor could store the context of 128 threads in total. Thus, it was very effective at hiding instruction and memory latencies.
  2. The Tera Computer supported hardware based thread synchronization primitives. You could make a (hardware) thread wait on the state of a memory word which you could update from another thread, thus “waking” up this thread. There was no need for system software to do any context save/restore as the thread context was always present in hardware.

However, the above technique seems to have a limitation. If my program has more than 128 hardware contexts then I need to resort to context save/restore. If my program has fewer than 128 hardware contexts (but enough to keep the 21 stage pipeline busy) I am wasting hardware resources (register storage).

I thought about this for a while and it occurred to me that register based architectures are perhaps not suitable for MIMD (task-parallel) processors. I know this is a bold claim but I feel that may be resorting to a memory operand based architectures (where an instruction’s operands are memory addresses) may eliminate context switch overhead all together. Note that the working set of a hardware thread is probably cached on the processor L1 cache and is available as fast as register storage itself*. However, having 3 64bit memory addresses in every instruction doesn’t seem like a good idea. It increases program size and takes up valuable instruction memory bandwidth. What I propose instead is to do something similar to the register stack engine in Itanium or register windows in SPARC. However, instead of using bona-fide registers, the instruction operands are offsets from a per-thread stack pointer. Using offsets allows register like compressed instruction encoding while at the same time not deviating from the memory operand instruction model. The only context that the software needs to save on a task switch is the stack pointer which is okay as its just a single register. Thus task switching simply involves saving/restoring the stack pointer and resuming execution by popping the return address off the stack. With this CPU architecture, its possible to do lightweight context switching without any assistance from hardware (like the Tera Computer).

But who am I kidding… I am going to see only x64 and ARM for the rest of my life :/.

Update: When I was discussing this with a colleague, an interesting question came up: How do we integrate thread wake-up from software and hardware events with the above scheme? At first I thought just having a trap for the hardware events would be okay… however hardware events may require a much lower service latency and special processing than software events to get good performance (eg. wake-up after L1 cache miss). So I finally thought of the following: the architecture keeps separate lists of ready-to-run contexts (stack pointers) for hardware and software events and service the hardware ones preferentially. While notifications for wake-up of software events could be done in software, the hardware would need to take care of hardware wake-up events and wake-up (and resume the halted instruction) in the corresponding threads. Thus, it is also important to keep this list in pinned L1 cache. There also needs to be a way for the hardware to let the software know when a previously hardware interrupted thread is ready-to-run… a trap could work here as long as we guarantee that it won’t cause any further trap invocations. Another simpler and more performant mechanism could be to just pre-empt the current running thread with the hardware woken one and maintain a stack of contexts to switch back to the pre-empted ones once the current one arrives at a hardware / software synchronization point.

One may also think of using low overhead task parallelism to cleanly emulate cache hierarchies, cache coherency and many other kinds of system specific processor facilities (reminds me of the PALcode in DEC Alpha).

* The fact that the Itanium architects went with a backing store for their register stack (and not direct memory operands) makes me a little skeptical that my idea can be implemented in a performant way on real hardware. However, given a large number of switchable contexts, an occasional event of high latency to L1 shouldn’t be a real problem.

Posted in Computing | 1 Comment