Sharvil Nanavati

Windows 7 Internals

The folks over at Channel9 recently interviewed Mark Russinovich, a Technical Fellow at Microsoft. Mark shares some nitty-gritty details of the improvements they've made in the Windows 7 kernel - particularly with respect to performance. The interview is a bit slow at times but there are enough interesting tidbits to warrant a watch.

Media_httpsharvilnana_dhlha

Here are some of the improvements that I found particularly interesting:

  • Scalability
    • Finer grained locks on dispatcher queue
    • Finer grained locks on PFN database
    • Support for up to 256 cores
  • Power Consumption/Battery Life
    • Core parking: putting cores into deeper sleep states by migrating processes away to more active cores
    • Socket parking: putting an entire socket into a low power state by parking cores on the same socket (this is really cool!)
    • Timer coalescing API
  • Virtualization
    • Integrated support for creating/mounting VHDs
    • Boot directly from VHD!

The timer coalescing API needs an explanation. Suppose you have two timers on your system firing every 5ms, except the first timer was set at t=0ms and the second was set at t=1ms. Then your timer interrupts have to fire at 5ms, 6ms, 10ms, 11ms, ... to service those timers. Since both timers have a period of 5ms, it would be more efficient to reuse the interrupt at t=5ms by advancing the first servicing of the second timer. This way, the CPU has more time between interrupts to go into a deeper sleep state or to execute other code.

In short, the timer coalescing API allows timers to share the same interrupt by adjusting the first timer event of a new timer.

Filed under  //   channel9   kernel   microsoft   windows  

Exhibit A

Media_httpsharvilnana_ezdiq

WTF? Yes, this was supposed to be lunch at Google Waterloo yesterday. The black goo is supposedly melted (yes, melted) seaweed. Reminds me of another food story.

Policy-Based Component Design

Media_httpsharvilnana_jbadq
It's difficult to predict the exact needs of a client when building reusable software components. Sometimes even your own needs change drastically over the course of a project. One useful technique to deal with the variability is policy-based component design.

The idea is to build a component that implements a core behaviour and applies a client-specified policy to make the key decisions. For example, if I built a hash table component, the client should be able to specify when and how to resize the table. Some clients may double the size of their hash table when the load factor reaches 0.8 whereas others may increase the table size to some fixed value based on expected data growth once they have a good estimate of it. It all depends on the nature of the environment in which the component is used. And only the client knows that.

There are many ways to implement a policy-based design. In Java, I could use the strategy pattern. In C, I would use function pointers and a context (e.g. void pointer). Andrei Alexandrescu used C++ templates to achieve this goal in Modern C++ Design. All of those approaches are perfectly valid and result in more reusable components.

A final note: policy-based design applies to components at any granularity, not just at the class/function level. PAM is a fine example of a coarse-grained policy-based component which I mentioned in a previous post. Or consider a hypervisor that manages physical resource allocation based on a user's needs (e.g. allocate 512MB of RAM to virtual machine 1).

Image from Amazon

CUSEC 2009

Media_httpsharvilnana_yjoqh
I just attended the CUSEC 2009 conference in Montreal last week(end) and it was incredible. Between the speakers' stories, the crazy evening parties, and meeting so many awesome people, there was little to complain about.

Speakers included Leah Culver of pownce.com... um, fame?, Dan Ingalls (best known for his work on Smalltalk), and Richard Stallman who shouldn't need much of an introduction. Each speaker added a different flavour to the conference and despite their diverse backgrounds, they all showed a genuine passion for their work. It's impossible to be in such company and not get fired up.

The only downside was the insane focus on web technologies, but I suppose I'll just have to find a systems conference in the area...

Picture by caribb

Filed under  //   cusec   montreal  

Amazon EC2: Persistent Storage

Amazon has just started a private beta program for a new persistent storage API in EC2. According to their documentation, they provide an API to create and manage volumes between 1GB and 1TB in size that behave like unformatted disks. Each volume is persistent and independent of EC2 instances and a single EC2 instance can mount multiple volumes. Their disks are supposed to be low-latency and high throughput with calls to store snapshots onto S3.

A lack of persistent storage has been the biggest challenge for developers as EC2 (in my experience) has rather high failure rates. With this persistent storage API (scheduled for public release later this year), Amazon has just made EC2 a dead-easy buy-in.

Filed under  //   amazon   ec2   s3   web  

Amazon EC2: Persistent Storage

Amazon has just started a private beta program for a new persistent storage API in EC2. According to their documentation, they provide an API to create and manage volumes between 1GB and 1TB in size that behave like unformatted disks. Each volume is persistent and independent of EC2 instances and a single EC2 instance can mount multiple volumes. Their disks are supposed to be low-latency and high throughput with calls to store snapshots onto S3.

A lack of persistent storage has been the biggest challenge for developers as EC2 (in my experience) has rather high failure rates. With this persistent storage API (scheduled for public release later this year), Amazon has just made EC2 a dead-easy buy-in.

Filed under  //   amazon   ec2   s3   web  

Amazon EC2: The Potential

Media_httpsharvilnana_kjsoc
Amazon's Elastic Compute Cloud (EC2) is easily their most powerful web service offering. With EC2, you get flexible, on-demand computing resources: by launching an instance you get full access to a brand-new machine and its resources. Each CPU-hour costs $0.10, which equates to less than $80 per month if you run an instance 24/7! What's more, you can launch as many instances as you'd like so you can have your own network of machines hosted by Amazon. The clincher? All data transferred between EC2 instances and S3 is free!

The default configuration has the following specs:

  • CPU: 32-bit, 1.0-1.2 GHz Opteron/Xeon equivalent
  • RAM: 1.7 GB
  • Disk: 160 GB
  • NIC: 100 MBit

They have additional configurations if you need more resources on a single machine. For details, see the EC2 site. To make all of this work, EC2 allocates a virtual machine running on the Xen hypervisor instead of a physical machine for every instance launched.

Amazon designed EC2 primarily to perform many computationally expensive operations - something like batch video encoding or image recognition. Instead of making large hardware investments to perform these (potentially one-shot) tasks, you run the tasks in parallel on a few (hundred?) EC2 instances. Once the tasks are complete, just shut down the instances and your billing stops there. While Amazon's vision for EC2 is pretty sweet, the reality is that there's so much more potential there.

EC2 is the next-generation data center.

Instead of doing capacity planning as with a traditional data center, with EC2, I could monitor the load on my server and programmatically launch parallel instances once it reaches a threshold utilization. When the utilization drops again, I can terminate the extra instances and go back to a fairly quiescent state. With free traffic between EC2 and S3, I can churn through collected data as many times as I need to as in Amazon's vision. Amazon could even issue hardware updates (e.g. more RAM) to running instances without rebooting! With the inexpensive per-hour prices, any small business can afford to keep an active standby. The flexibility offered by programmatically managing machines running in a virtualized data center is tremendous. Coupled with Amazon's pricing model, this sort of service is poised to take some serious market share away from the traditional, physical data centers.

While EC2 has the potential to be all of this and probably much more, it's not currently ready to displace traditional data centers. In a subsequent post, I'll discuss some of the issues preventing EC2 from realizing this dream.

Filed under  //   amazon   ec2  
Posted March 2, 2008

OCamlPAM 1.0 Released!

Media_httpsharvilnana_mdqxk
PAM is a slick policy-based authentication mechanism. It abstracts away the method of authentication from applications and makes it possible to change the authentication method for a deployed application/service while running instead of making that decision at compile-time. I've come to love PAM because it makes single sign-on a possibility and lets me focus on my application logic rather than the details of, say, LDAP authentication.

Since I've been playing around with Objective Caml lately and I needed to do some authentication, I wrote an OCaml wrapper for PAM. Take a look, give it a go, and authenticate away!

Filed under  //   authentication   ocaml   programming  

Power Profiling: Saving Energy in Software

Media_httpsharvilnana_alidg
I came across a pretty nifty profiling tool a little while ago called PowerTOP. It gives you a live view of the power consumed by applications running on your Linux machine in an interface that resembles top. As a system-level power profiler, it gives you an idea of which applications or drivers are waking up the CPU or preventing it from entering a sleep state.

I think this kind of tool is awesome for both developers and end-users. As a user, it tells me which applications are draining my laptop's battery so I might close those applications if I'm not using them or I might find an alternative altogether. From what I understand, some people have gained significantly higher use-times for their laptops by just killing innocuous-looking apps that don't let the processor enter a deeper sleep state. As a developer, this is good news because it will give me a quantitative measure of my application's power consumption, giving me a means to determine the power-efficiency of different designs. This tool has also managed to expose bugs in a few applications!

Check out PowerTOP, fix your applications, and keep this planet green (and my laptop running longer and cooler)!

Filed under  //   energy   programming  

A Closer Look: Amazon S3

Media_httpsharvilnana_bgygq
One of the most mature Amazon web services, Amazon Simple Storage Service (S3) provides a virtually unlimited data storage service. That's right: you can upload as much data as you'd like and it will be held on their machines with all the network capacity you could ever want and with redundancy built-in. Hard drive failures are easily the primary cause of server downtime and Amazon has taken the burden upon themselves to manage all the devices and failures that go along with it. As the name implies, the service is designed to provide simple access so you can't do funky things like mount the virtual filesystem directly.

I've been using S3 for over a year and I haven't had any reliability issues with it. Others have had brief outages but they were mostly when the service was first introduced. I'm quite happy with S3 but there's one missing feature that keeps it from being the ultimate simple storage service: range-PUT.

Suppose I've got a file on S3 and I want to update a small part of it. Without range-PUT, I would normally have to transfer the entire file again using the HTTP PUT method to store it on the remote host. Using the Content-Range header, I could specify just the range of bytes that have changed within the file and transfer just that portion. This feature would save a lot of bandwidth (and, consequently, money) if files often get modified partially.

Of course, supporting Content-Range opens up a can of worms. What happens if the file doesn't exist and the start of my range isn't offset 0? What if the file does exist but the start offset is beyond the end of file (i.e. not a simple append)? I can think of two solutions that seem reasonable: return an error or create the file if it doesn't exist and zero-pad the holes. The former would be easier to implement while the latter would produce a behaviour like Linux sparse files.

There are two major application classes that range-PUT would be suited to. The first would be the class of applications where we always append to the end-of-file. Log files would fit into this category but, more importantly, we could resume broken transfers. When uploading large files (S3 supports file sizes of up to 5GB), I've found that my connections often get dropped so if I could just append to an existing file, I could write an upload tool that would auto-resume. The second class of applications would be the ones that only update part of a file. In most cases, I'd imagine this kind of update would take place to change some file metadata. For example, if I modify the metadata for my MP3 file, I'd rather just upload the few changed bytes instead of uploading the whole MP3 again. The music is the same, it's just the metadata that has changed. This problem is even worse when dealing with video files.

S3 is a fantastic storage service. It's reliable, it's cheap, and it takes away the hassle of managing your own hardware or creating a highly-available, redundant persistent store. If S3 supported range-PUT, it would save a huge amount of bandwidth resulting in an even lower cost of operation.

Filed under  //   amazon   s3