Using Project “Orleans” in Halo

UPDATE: Corrected date of talk.


On Wednesday I gave a talk at the Build 2014 conference on how we used the Project “Orleans” technology from Microsoft Research to build the cloud services that power Halo experiences.

Project “Orleans” is the result of work done in Microsoft Research to simplify the development of distributed systems by offering a programming model and runtime that embrace the Actor Model and enable developers to better reason about scalable, distributed concurrency. The result is increased developer productivity and services that scale well by default.

We’ve talked about this at some level before in videos and in other presentations but this talk was different for two reasons: First, we went into a lot more depth including showing code examples and detailing some of the unique benefits of the “Virtual” actor concept in “Orleans”. Second, we released a public preview of “Orleans” enabling folks to download it, play with it and give feedback to the development team.

I would love to talk more about this but before I do I’d love for folks to watch the video, download “Orleans”, play with the samples and engage in discussions on Codeplex. As I see the discussions evolve I will tailor future blog posts to address things I see.

Looking forward to seeing what cool things folks build with Orleans! Based on the current conversations I’ve been seeing on Twitter, it looks like I’m going to start working on a post to talk more about the programming model or Orleans.

Posted in Cloud, Project "Orleans"

Seagate Kinetic Open Storage

A few weeks ago, Seagate announced their Kinetic Platform and Drive, a reinvention of the disk interface. I’ll admit that when I first heard about the vision I was skeptical but when you dig into the details of their design and in particular their focus, it starts to get really interesting.

While we have seen evolutions in Hard Drive technology accross various parts of the technology stack, most modern ones are still been based on the logical block addressing (LBA) abstraction introduced by SCSI. Today, scalable cloud storage from folks like Google, Facebook, Amazon and Microsoft use file-systems like GFS, HDFS, Cosmos and Dynamo. All of these systems reinvented the way we think about file systems however most solutions are still built on top of POSIX like file systems using disk command protocols that are based on block addressing. Seagate is intent on changing this by developing a new set of drive access APIs that use Ethernet as the physical interconnect to the drive and by defining a key/value abstraction (apparently using ProtocolBuffers) to replace LBA.

A key idea here is the elimination of servers that exist purely to have drives connected to them. Instead clients connect directly to individual data drives. In GFS this means  eliminating chunk servers. In Azure Storage, you might be able to eliminate extent nodes. Their wiki contains some examples of how this could work with Hadoop, Riak CS and others. Seems like in the Hadoop case, they don’t eliminate the need for data nodes but in the case of Riak CS the need for object servers is eliminated. This should increase the drive density in a typical datacenter storage rack (example). Seagate hopes to work with teams building large scale storage solutions to have them build in support for writing to drives through the Kinetic API layer. I’m excited to see how much traction they get on this.

Kinetic reminded me of the CORFU project in Microsoft Research. In CORFU they take a different approach with an append-only log API instead of a key-value API. The CORFU approach is optimized for the characteristics of flash based storage. They have a similar intent to remove storage servers from the picture and introduce a protocol that allows clients to interact directly with internet connected “flash units” called a SLICE (shared log interface controller). To prove out CORFU, the research team built an implementation of ZooKeeper on top of it.

Both Kinetic and CORFU talk about supporting multiple clients and replication and both rely on clients to initiate replication. Kinetic returns version numbers in it’s GET API and supports a version number in it’s PUT API. It seems that with these APIs, a Kinetic drive might meet all the requirements of a CORFU slice, allowing Kinetic drives to be used to implement a log using the ideas outlined in the CORFU papers. On top of this I think one could implement a fairly fast pub-sub messaging system with features similar to Kafaka without requiring a separate service as the broker.

I do not know a whole lot about how Kinetic Drives are implemented but I wonder if drives themselves would implement the Kinetic Key/Value API using a log structured approach internally even if they aren’t SSDs as is the case in Key/Value stores like BigTable, Riak’s BitCask, Azure Storage’s Streams and others.

I really like the idea of lower level components evolving to support modern use cases and I look forward to the observations and new technologies that come from this. In particular I wonder what existing design choices might need to be revisited as we build new systems.

Tagged with: ,
Posted in Cloud, Storage

Advanced Erasure Codes for Cheaper Redundant Storage

Last weekend I re-read this paper on the internals of Azure storage. It’s an excellent deep dive into the architecture of Windows Azure storage, Its stream layer shows the similarity to Cosmos although this paper goes into much more detail than I’ve seen in any paper that references Cosmos. Streams are append only which allows for much faster reads and writes, Google’s GFS and Hadoop’s HDFS are similar in this regard.

RangePartitions in the Object Tables use Log-Structured Merge-Trees which works similar to the SSTable architecture of Google’s BigTable. The HFile format used in Apache HBase has some similarities. Section 5 goes into a lot of detail about the in-memory data structures used to index and return results as well as the triggers and algorithms involved in splitting  and merging RangePartitions according to traffic.

The reason for the re-read was in preparation for learning more about Azure’s use of erasure codes for redundant storage. Replicated storage is becoming expensive as the amount of data we store grows. Erasure codes provide a more efficient solution. As an analogy, conventional triple replication is the distributed equivalent of RAID 1 disks (well, 0+1 really). The use of erasure codes is like a distributed equivalent to RAID 6 drive arrays, Instead of replicating each block in a file, parity blocks are created that can be used to recover any of the original blocks. RAID 5 uses parity stripes as well but the parity computation is simpler and easier to understand. In RAID 5, the parity clock is constructed by computing the bitwise XOR  of all the other blocks. This takes advantage of the fact that if A  B  C = P, then  B  C = A and  B  P = C etc. So if you lose a block, you can construct it from all the others. The challenge with XOR as a parity bit computation is that you can only use it to recover from a single failure. RAID 6 introduced a second parity block that uses an alternative function. The most common form of code used are Reed-Solomon codes.

The challenge with these codes is they result in higher recovery costs, recovering a lost partition requires transmitting a large amount of data to a single location for reconstruction. Recovery is also often computationally expensive. Recent new approached are helping change that.

Windows Azure uses erasure codes to replicate fragments of “sealed” data (see section 4.4 of the first paper). By leveraging a new code they developed called Local Reconstruction Codes the Azure team has been able to achieve even more efficient storage. A key part of this approach is to generate parity blocks from subsets of the full collection of blocks. With this, any block in that subset can be easily recovered without needing to read all blocks. LRCs achieve a higher mean time to failure than triple replication or Reed-Solomon codes and offer configurable tradeoff between storage overhead and reconstruction costs. Azure can also use LRCs to deal with a network partition, temporarily using the parity blocks to return data from a block that is temporarily unavailable. For more see this paper on “Erasure Coding in Windows Azure Storage”.

Facebook has been using a version of HDFS (HDFS RAID) That uses RS codes to do something similar for cold files. In a recent paper they also introduced a new flavor of HDFS (HDFS-Xorbas) that uses uncannily similar techniques. They even have the same acronym: LRC- Locally Repairable Codes. Their solution focuses on reducing the I/O and network cost of repairs when using Reed-Solomon codes.

It’s exciting to see these kinds of cost saving innovations in storage architecture, I wish I could learn more about how Amazon’s Glacier works under the covers to find out if they’re doing anything like this. I look forward to seeing (and trying to keep up with) future advances in this space.

Tagged with:
Posted in Cloud, Storage

Yes, everyone should learn to code.

I just read this article and I think the author shomewhat missed the point. Learning to code is useful in much the same way that learning basic math and science skills can be useful to someone who eventually spends all their time in the humanities. A knowledge of logic and the basic intuitions that result from learning basic programming skills have applicability far beyond the world of computer science.

The article lists a number of intents and behaviors that can be used to identify people who shouldn’t learn how to code and I think those are all seem like fine ways of idnetifying people who shouldn’t become professional programmers but the fact that I hated biology doesn’t mean it wasn’t something I shoudl have learned a bit of in my youth.

Everyone should learn to code at a basic level perhaps not as a separate subject in school, maybe part of a standard math course. It’s really unfortunate the number of kids out there who don’t even know what programming is and we need to fix that. While doing that we can also give them useful skills of an analytical thinking process that has broad applicability.

Tagged with:
Posted in Learning to Code

Starting it back up

I’ve had this blog for a while but then I moved to play with posterous.com for a bit. They are shutting down and I never really used my posterous space that much. WordPress seems like a good place to park myself for now. I actually have 2 posts I’ve been working on that I hope to post shortly.

Posted in Uncategorized