home · projects · papers · blog · gallery · contact
anil madhavapeddy // anil.recoil.org

Yurts for Digital Nomads

29 April 2010   |   Anil Madhavapeddy   |   tags: perscon,ocaml   |   post syndicated from Personal Containers   |   all posts

The App Engine data collector for Personal Containers is coming on nicely, and is on track for an alpha preview release fairly soon. Working with AppEngine has been interesting; it’s got excellent availability and you can’t beat the price (free), but coding robust Python that doesn’t trip over the tight resource limits for individual requests, asynchronous tasks and queries is tricky. While it is good for small records such as my iPhone or Find My iPhone GPS traces traces, it doesn’t work so well with my gigabytes of photographs or decades of e-mail.

This confirmed our earlier intuition that there is no one perfect solution for personal data handling; instead, we need to embrace diversity and construct an infrastructure that can cope with change over the coming decades. Mobile programming has changed beyond recognition in just a few years, and cloud providers are specialising in different ways (e.g. PiCloud for simple compute, or EC2 for fancy services like elastic load balancing).

So to recognise this, we are building components that all interoperate with your personal data, keep it secure, and ensure it persists for more than a few years. Malte Schwarzkopf came up with the term “digital yurts”, and it’s stuck. We’ve written a draft paper about it, and would love to hear your comments and feedback on the approach.

There are some interesting recent trends that make doing this particularly important:

If you’re interested, join our group or contact me directly. At this stage, you need desire and the ability to hack code, but things are settling down over the next few months…


Pulling together a user interface

15 April 2010   |   Anil Madhavapeddy   |   tags: perscon,web   |   post syndicated from Personal Containers   |   all posts

We’ve been hacking away on fleshing out the App Engine node for personal containers. We’re building this node first because, crucially, deploying an App Engine VM is free to anyone with a Google account. The service itself is limited since you can only respond to HTTP or XMPP requests and do HTTP fetches, and so its primary use is as an always-on data collection service with a webmail-style UI written using extjs.

Personal containers gather data from a wide variety of sources, and normalise them into a format which understands people (address book entries, with a set of services such as e-mail, phone, IM and online IDs), places (GPS, WOEID), media (photos, movies) and messages (Tweets, emails, Facebook messages). I’ll post more about the data model behind personal containers in a follow-up as the format settles.

The App Engine node has a number of plugins to gather data and aggregate them into a single view (see screenshot). Plugins include:

I’m switching tacks briefly; we received an Amazon Research Grant recently and I’m building a node that runs as a Linux server to act as a longer-term archival and search server. This is being written in OCaml and uses Tokyo Cabinet (with Jake Donham’s excellent bindings) and so should be speedy and a useful alternative implementation of the HTTP REST interface. The plan is to automatically synchronize meta-data across all the nodes of a personal container, but store large and historical data away from expensive cloud storage such as App Engine.

There are lots more plugins in development, such as Foursquare and Gowalla OAuth collectors, an Android mobile application to upload location and contacts information, and Google GData synchronization. If you’re interested in one of these or something else, please do get in touch or just fork the project and start hacking!


Opening a website

29 March 2010   |   Anil Madhavapeddy   |   tags: perscon,web   |   post syndicated from Personal Containers   |   all posts

We’ve been working away at building a new type of database to help individuals keep reigns on their ever-increasing personal digital information. The first prototypes run freely on Google App Engine to gather your data behind-the-scenes, and we are working on more advanced versions that run on embedded devices and the cloud.

If you’re interested in keeping track of your personal data, you can start off with the installation instructions to clone your own version. After that, read up on the design of the system (which is still changing as we research new ideas around it). When you find something you want to fix, or add a new plugin data source, just clone the code and send us back fixes!


C#, F# and Other Programming Language Fun at PDC 2008

01 November 2008   |   Anil Madhavapeddy   |   tags: pdc2008   |   all posts

My favourite session of PDC 2008 was by Anders Hejlsberg as he described the Future of C#. Anders is an excellent speaker, and it was educational watching how he made fairly complex type systems come across to an audience more used to simpler programming languages. I say this after attending ICFP recently, where that wasn’t true of all the presentations!

The high-level message was exactly what I wanted to hear: in order to exploit multi-core processors more effectively, Microsoft’s languages are focussing on improvements in three areas:

An interesting aside was his assertion about “co-evolution”. Microsoft have a number of languages which have recently been unified under the CLR, such as Visual Basic, C# and most recently F#. Rather than have one language race ahead of the others, they like to “borrow” features as appropriate into other languages. This is obviously made easier by having a common run-time foundation, and an example of co-evolution is the introduction of lambdas into C# 3.0 as obviously seen in F#.

Concurrency

Anders observes that all the attempts to automatically parallelize software has not yielded good results in the past, and so support needs to be built into the language to do “top-down” parallelism instead of the compiler inferring it from the bottom-up. To this end, they are introducing parallel extensions into the .NET framework. This makes use of functional features to parallelize code, including LINQ queries or CPU-intensive compute.

For an example of how this code looks, check out Jurgen Van Gael’s post on using F# with the Task Parallel Library. I wanted to learn more about this, and wandered over to the Hands On Labs where F# and Visual Studio 10 were all pre-installed. I then ran into an extremely cool feature of F#… asynchronous workflows.

F# introduces an extension to the usual let ML operator in the form of let!. As Don Syme explains on his blog ), this can be interpreted as “run the asynchronous computation on the right and wait for its result. If necessary suspend the rest of the workflow as a callback awaiting some system event”. So this construct lets you write straight-line code which can potentially block, without the hassle of spawning threads or encoding continuation passing style constructs in the code. Already a nice improvement over OCaml! (although to be fair I’ve not had a chance to check out some of the parallel OCaml extensions).

Dynamic Programming

Dynamic programming is the main focus of C# 4.0. In order to support dynamic objects as first-class citizens in a statically typed world, they introduce a dynamic static type. This forces all method resolution on that object to happen at run-time and disables static checks by the compiler (aside from looking from mixups between dynamic and static objects). The dynamic type now makes it easy to wrap support for Python, Ruby and Javascript, since the relevant dispatch functions can all be hidden away in the method resolver at run-time, leaving the programmer with a single syntax for invoking methods across these different languages.

The actual definition of method resolvers is pretty straight-forward; he demonstrated custom getters and setters (similar to Python for example) by using the IDynamicObject interface to define actions to take when properties are accessed. His example did the usual dictionary wrapper which mapped setting arbitrary properties onto an internal dictionary variable.

Another improvement in this space is the addition of optional arguments and labelled arguments. Both of these have well-defined semantics (optional arguments have to come after non-optional ones, and evaluation of arguments is left-to-right) and are purely syntactic improvements with no run-time cost. One of the best examples he showed of using these around COM interoperability. In current versions of C#, due to the lack of named arguments a common function such as “Save As” might require 12 or more stub arguments to be specified as ref missing. Now, those long, repetitive lines can be folded down to only the arguments which are required.

Sort of related to the earlier co-evolution was his demonstration of how similar Javascript and C# syntax is now with this new support. He took a Silverlight 2 code snippet written in Javascript and ported it over to C# 4.0 with some very mechanical changes. The languages are definitely converging fast!

If you’re interested in checking out more on this topic, look at IronPython, the Dynamic Language Runtime (DLR), and the TL10 session on dynamic languages on .NET. The DLR is currently maintained as part of IronPython, but is being moved out to sit on top of the CLR for more languages such as Visual Basic, Javascript or COM. The DLR covers common optimizations which are useful for scripting languages, such as call-site caching, dynamic dispatch and expression trees (e.g. for LINQ). The DLR has a bunch of binders which bridge between different backends such as .NET, Silverlight, the native Python or Ruby backends, or COM applications such as Microsoft office.

Meta-programming

Later on at the Future of Programming Languages panel session, Anders talked about meta-programming as being one of the future improvement he’s looking at. Currently, there is a lot of ad-hoc code generation in place when creating Windows applications, and unifying this into the language would give safety and maintainability improvements.

In order to do this, for C# 5.0 they are rewriting the compiler to be self-hosting in C#, since it has historically been a C++ application. This permits them to switch the compiler from being a traditional “black box” compiler to a hosted .NET service which can be called directly by .NET programs in order to do dynamic run-time compilation of code. Other portions of the compiler chain are also exposed to permit incremental program construction by third-party code.

He demonstrated this with a pretty nifty C# top-level, into which he directly typed Winforms code to construct a window with a few simple buttons using the C# compiler server. Not to be outdone by this, Miguel de Icaza promptly upstaged Anders at his (fantastic) Mono 2.2 session. He demonstrated the new C# shell which is present in Mono trunk builds and can essentially be used like an OCaml or Python top-level to mess around and manipulate C# code. He also talked about embedded Mono and SIMD support which pushes their compiler ahead of Microsoft’s in the 3D performance game.

Summary

I’m firmly convinced about the potential of F# now. I had the opportunity at the Open Spaces area to quiz Scott Guthrie about whether or not F# was a toy language. He replied using the same arguments as Anders that the higher-level language approach (declarative, functional) was very important strategically to Microsoft to let their developer platform continue to survive in a multi-core world.

This boils down to the individual languages not being that important any more (as seen by the sharing of features between C# and F#), and the underlying execution layer (the CLR/DLR) adding efficient support. Now any old language can adopt higher-level features without having to re-do all the optimization grunt work again and again. Much like Xen offers a new golden age for innovative new OS research by freeing programmers from writing a million hardware device drivers, it looks like .NET is ushering in a new age of programming language innovation!

Inspired by the PDC talks, I’ve got MonoDevelop and F# up and running on my Macbook Air, and am just playing with GTK# and CocoaSharp#. If this works as well as OCaml, then it might finally be time to abandon the old stalwart and move to a new language for my day-to-day stuff!


Working through TED videos on the train

28 October 2008   |   Anil Madhavapeddy   |   all posts

I’m over at the Microsoft PDC in Los Angeles this week, and commuting over from Anand’s place in Camarillo using the Metrolink. The 3 hour commute gives me a great excuse to catch up on my TED talks on my iPhone.

The entire set of talks is available online via iTunes, and these really stood out:

There are about 80 videos left on my viewing list.. this is going to take a while!


Peeking under the hood of High Availability

17 September 2008   |   Anil Madhavapeddy   |   tags: citrix,xen   |   post syndicated from Citrix   |   all posts

Well, the big launch of XenServer 5 has gone smoothly, and with it have arrived a flood of questions about how exactly the new High Availability functionality works.  I’ll use this post to explain the overall architecture of HA in XenServer 5, and also how some of the fault detection and failure planning works.

Fundamentally, HA is about making sure important VMs are always running on a resource pool. There are two aspects to this: reliably detecting host failure, and computing a failure plan to deal with swift recovery.

Detecting host failure reliably is difficult since you need to remotely distinguish between a host disappearing for a while versus exploding in a ball of flames.  If we mistakenly decide that a master host has broken down and elect a new master in its place, there may be unpredictable results if the original host were to make a comeback!   Similarly, if there is a network issue and a resource pool splits into two equal halves, we need to ensure that only one half accesses the shared storage and not both simultaneously.

Heartbeating for availability

We solve all these problems in XenServer by having two mechanisms: a storage heartbeat and a network heartbeat. When you enable HA in a pool, you must nominate an iSCSI or FC storage repository to be the heartbeat SR. XenServer automatically creates a couple of small virtual disks in this SR. The first disk is used by every physical host in the resource pool as a shared quorum disk. Each host allocates itself a unique block in the shared disk and regularly writes to the block to indicate that it is alive.

I asked Dave 'highly available' Scott, the principal engineer behind HA about the startup process:

“When HA starts up, all hosts exchange data over both network and storage channels, indicating which hosts they can see over both channels; i.e. which I/O paths are working and which are not.  This liveness information is exchanged until a fixed point is reached and all of the hosts are satisfied that they are in agreement about what they can see.  When this happens, the HA functionality is ‘armed’ and the pool is protected.”

This HA arming process can take a few minutes to settle for larger pools, but is only required when HA is first enabled.

Once HA is active, each host regularly writes storage updates to the heartbeat virtual disk, and network packets over the management interface.  It is vital to ensure that network adapters are bonded for resilience, and that storage interfaces are using dynamic multipathing where supported.  This will ensure that any single adapter or wiring failures do not result in any availability issues.

The worst-case scenario for HA is the situation where a host is thought to be off-line but is actually still writing to the shared storage, since this can result in corruption of persistent data.  In order to prevent this situation without requiring active power strip control, we implemented hypervisor-level fencing.  This is a Xen modification which will hard-power the host off at a very low-level if it doesn’t hear regularly from a watchdog process running in the control domain.  Since it is implemented at a very low-level, this also covers the case where the control domain becomes unresponsive for any reason.

Hosts will self-fence (i.e. power off and restart) in the event of any heartbeat failure unless any of the following hold true:

Planning for failure

The heartbeat system gives us reliable notification of host failure, and so we move onto the second step of HA: capacity planning for failure.

A resource pool consists of several physical hosts (say, 16), each with potentially different amounts of host memory and a different number of running VMs.  In order to ensure that no single host failure will result in the VMs on that host being unrestartable (e.g. due to insufficient memory on any other host), the XenServer pool dynamically computes a failure plan which calculates the actions that would be taken on any host failure.

But there’s one more complexity… a single host failure plan does not cover more advanced cases such as network partitions which take out entire groups of hosts.  It would be very useful to be able to create a plan that could tolerate more than a single host failure, so that administrators could ignore the first host failure and be safe in the knowledge that (for example) three more hosts could fail before the pool runs out of spare capacity.

That’s exactly what we do in XenServer… the resource pool dynamically computes a failure plan which considers the “number of host failures to tolerate” (or nhtol).  This represents the number of disposable servers in a pool for a given set of protected VMs.

The planning algorithms are pretty complex, since doing a brute force search of all possible failures across all hosts across all VMs is an exponential problem.  We apply heuristics to ensure we can compute a plan in a reasonably small time:

Since planning algorithms are designed for unexpected host failures, we only consider absolutely essential resource reservations which would prevent the VM from starting on the alternative host (e.g. storage is visible, and enough memory is present).  We do not perform CPU reservation on the basis that it can be optimised at a later stage via live relocation once the VM is back up and running.

Overcommit protection

We now have HA armed and a failover plan for our VMs.  But what if you want to make changes to your configuration after HA is enabled?  This is dealt with via overcommit protection.

The XenServer pool dynamically calculates a new failover plan in response to every XenAPI call which would affect it (e.g. starting a new VM).  If a new plan cannot be calculated due to insufficient resources across the pool, the XenServer will return an overcommitment error message to the client which blocks the operation.

The “What if?” Machine

This overcommit protection would be quite irritating if you have to keep trying things and seeing if a plan exists or not, and so we built in a ”What If?” machine into XenServer to facilitate counter-factual reasoning.

When reconfiguring HA via XenCenter, you can supply a hypothetical series of VM priorities, and XenServer will return a number of host failures which would be tolerated under this scheme.  This lets you try various combinations of VM protections depending on your business needs, and see if the number of host failures is appropriate to the level of paranoia you desire.

This can even be done via the CLI, using the snappily named ”xe pool-ha-compute-max-host-failures-to-tolerate” when HA is enabled.

The nice thing about XenServer HA is that it is done at the XenAPI level, and so  any of the standard clients (such as the xe CLI or XenCenter) or any third-party clients which use the XenAPI will all interoperate just fine.  The XenServer pool dynamically recalculates plans in response to the client requests, and so no special “oracle” is required outside of the pool to figure out HA plans.

Finally, HA makes master election completely invisible.  Any host in a pool can be a master host, and the pool database is constantly replicated across all nodes and also backed up to shared storage on the heartbeat SR for additional safety.  Any XenAPI client can connect to any host, and a redirect is issued to the current master host.

Protection Levels

Each VM in an HA pool can be either fully protected, best-effort or unprotected. VMs which are protected are all included in the failover planning, and if no plan exists for which they can all be reliably restarted then the pool is considered to be overcommitted. Hugh Warrington (who implemented the XenCenter HA support) explained what use protection levels are:

“Best-effort VMs are not considered when calculating a failover plan, but the pool will still try to start them as a one-off if a host that is running them fails.  This restart is attempted after all protected VMs are restarted, and if the attempt to start them fails then it will not be retried.  This is a useful setting for test/dev VMs which aren’t critical to keep running, but would be nice to do so in a pool which also has some important VMs which absolutely must run.”

There are some advanced features which are only available via the CLI.   Each protected VM in an HA pool can be assigned a numeric ha-restart-priority.  If a pool is well-resourced with a high nhtol, then these restart priorities are not relevant: the VMs are all guaranteed to be started.

If more hosts fail than have been planned for, then the priorities are used to determine the order in which VMs are restarted.  This ensures that in over-committed pools, the most important VMs are restarted first.  Although the pool will start priority 1 VMs first, they might not finish booting before the priority 2 VMs, and so this should not be used as the basis for service ordering.

Note that it's very important to ensure that a VM is agile when protecting it by HA.  If the VM is not agile (e.g has a physical CD drive mapped in from a host), then it can only be assigned Best Effort restart since it is tied to one host.

XenCenter support for HA

The best practice for HA is not to make configuration changes while it is enabled.  Instead, it is intended to be the "2am safeguard" which will restart hosts in the event of a problem when there isn't a human administrator nearby.  If you are actively making configuration changes such as applying patches, then HA should be disabled for the duration of these changes.

XenCenter makes some common changes under HA much more user-friendly, which I asked Ewan Mellor (the principal GUI engineer) about:

So, I hope this short article has given you a taster… just kidding! This post is almost as long as my PhD thesis, but then, HA is a complex topic. Please do feel free to get back to me with comments and feedback about how we can improve it in the future releases, or if you just love it the way it is.  Many thanks to Dave Scott, Richard Sharp, Ewan Mellor and Hugh Warrington for their input to this article.



all posts