submit to reddit Delicious Digg!
Sonian Leverages the Cloud Computing Model to Revolutionize Email Archiving

By John Panagulias, submitted by John Panagulias, Tuesday, February 16, 2010, 8:00AM

Cloud computing is often described as a disruptive shift in the IT landscape. Sometimes that can be misconstrued as a one-time event, rather than a process that will occur over time. The idea that computing is evolving, however, shouldn't be too difficult to grasp. In fact, the way cloud computing is transforming how businesses operate can be seen in some very practical ways. It is, for all intents and purposes, the very essence of how disruption happens. That is, in an almost methodical way.

One of those areas that are ripe for a new way of approaching a problem is that of email archiving. The requirement is not new. Companies have long had to deal with the issue of how to archive and manage email. But few would argue that it is a core competency that any company should develop. It is costly and time consuming for the IT department to handle.

The founders of Sonian thought there was a better way by creating a software as a service (SaaS) solution that would leverage cloud-based infrastructure to provide a secure, compliant and affordable archive and discovery service.

Sonian was founded in 2006 and launched its SaaS solution in 2007. All together the company has raised $7 million in venture funding, including a Series A round with Prism Venture Works and Summerhill Venture Partners.

To learn more about Sonian, I sat down for an in-depth interview with founder and CTO Greg Arnette. We discussed how Sonian designed their software as service (SaaS) offering from the ground-up to leverage cloud-based infrastructure, as well as, the customer benefits of using Sonian as a replacement for a traditional email archiving solution. Below is a condensed version of our conversation.

John Panagulias (JP): Can you describe your cloud computing solution?

Greg Arnette (GA): Sonian delivers a software as a service (SaaS) solution for email archiving. We're focused on solving the pain points that companies have around data retention: things like compliance, e-discovery, litigation support, long-term storage management, and employee generated content. And that means we provide a central, easy to use, very affordable, cloud-powered data management repository that can help a company solve these critical problems for a lot less than if they were going to do it all themselves

JP: Talk about the value proposition of using Sonian. There's obviously a cost component, but there's additionally, maybe even more importantly, the ability to provide more flexibility to a company because they don't have to maintain the infrastructure to support email archiving. Thus, resources can be applied to areas that are core to the business.

GA: Well, the first part of the value proposition is certainly around cost and return on investment. So, on average, when a company decides they need to deal with the email archiving problem, they have two choices: do it themselves, or use a SaaS offering.

If a company were to solve that problem themselves it typically costs around $110.00 per employee per year all-in. They have to purchase software from vendor A and hardware from vendor B and do the traditional implementation, then operate the archiving system in their own network and supply or spend enough on storage and personnel for the overall management of it. When you think about the massive storage required for archiving and then backing up the storage and making redundant copies for continuity, disaster recovery events and, all the other stuff, it gets expensive quickly.

With Sonian's cloud computing solution it's about one-third the cost. That's the same feature set as "do it yourself" with a dramatically quicker implementation time - usually a company can get up and running in 15 minutes - all for around $36.00 per employee per year. That's all-in. In that respect, it's very easy to quantify the value proposition in terms of how it impacts an IT budget.

But, it also solves some intangibles or things that are harder to quantify easily. It removes a distracting pain point from the IT personnel so they can focus on more value-added problems that face their business. It let's IT focus on the things that help make that particular business or organization more competitive, more efficient.

For planning purposes, it's easy to plan on a per employee, per year subscription model for paying for IT, especially when there's no hidden charges. We include unlimited storage with that fee, so it's easy to plan and it makes cost of IT more predictable.

And it also allows the companies to participate in the could ecosystem, which means getting a lot of value for a reasonable IT dollar, without having to make this big wholesale conversion from on-premise to the cloud.

Think about Sonian as kind of an on-ramp to additional kinds of cloud computing services. We show companies how easy it is to solve one major pain point. Then, they can look at state of the art of providers like Amazon Web Services, RackSpace and Microsoft Azure. It's a perfect use case match for solving the archiving problem, which is all about moving lots of data into the cloud efficiently and securely and then being able to search and mine that data continuously on cloud-scale infrastructure. It's a perfect way to approach the problem.

JP: Let's shift gears a bit. You've done something a lot of software companies would like to do: you built a SaaS solution from the ground-up for cloud computing. Can you describe your approach?

GA: We figured out how to build software that's designed for cloud computing, which is taking software that's very different from what you might think of as being deployed in the enterprise, or even deployed in a traditional data center, because at the core of our software is this ability to scale up in real time, across hundreds of thousands of CPUs and then when the job is done, scale back down. So it's a constantly fluid CPU or computer footprint.

We've been thinking about this stuff for 3 years now. Whereas, a lot of software companies are suddenly realizing "Holy cow, this cloud thing is taking off and we need to get there, and you know, let's put a cloud label on something we're doing, even though we might not technically be in the cloud."

There are a couple of things on this topic. First, even though we're a SaaS cloud computing company, serving the enterprise, we don't have to worry about the muck of the data center plumbing that we live on top of. We're trusting that other vendors in our stack of services, like Amazon, are going take care of that and do it in a way that's really affordable and really reliable. This let's Sonian focus on our software IP to solve a problem in a domain area that we know really well, which is compliance and storage management of communications data and data objects.

We can focus on the really important stuff and let someone else deal with the commodity stuff, like network plumbing and so forth.

As a result, we've built a very efficient organization on people and capital usage, and I'm focusing Sonian software developers to think about this new world of designing application software stack that is almost self aware of how much CPU it needs at any given moment to operate, and in turn, how much it costs itself to operate. And that's this idea of being able to think differently about solving some of these problems.

And, of course, there are new challenges that come with cloud computing. For example, how do you secure infrastructure that you don't technically own? How should I think about these storage volumes that I use to store data on? I have to think about security and data quite differently. That doesn't mean these problems are insurmountable because we have solved those problems. But it makes you think differently about how to approach them. It makes you think about the idea that every transaction you perform, you have to assume failure, so we program for failure very early on in our models, and then you scale up and you get more reliable from that early thinking.

Built into our software stack are the concepts of a distributed enterprise service plus architecture, where components are loosely coupled but there's a core backbone of a job queue that coordinates all the activities from a front-end experience that a customer has all the way through to the backend.

We're also using concepts that Google has pioneered like MapReduce, which allows us to parcel out a processing job. A job might be to search over a terabyte of data looking for some keywords across N number of CPUs and then aggregate the results back into the UI very quickly. That's what the cloud is really good at, you know, having a bunch of CPU available on demand so that you can provide a very pleasing experience to your customers and also manage your costs very granularly.

JP: Can you tell us a little more about your design from a technical perspective?

GA: Some of the technology choices we made are things that you wouldn't see in an enterprise setting. For instance, we're using Erlang as a framework for distributed OLTP processing. We're using the Clojure language on top of the Java Virtual Machine (JVM), which is a very efficient and kind of rising star language that takes advantage of the JVM's sturdy reliability but lets one be a more efficient developer. Thus, you're not mired in the inefficiencies of the Java language, but you're taking advantage of the JVM.

And we chose Ruby on Rails as our web UI presentation framework, which affords agile development for the feature sets that customers see. Our goal is to deliver new features every six to eight weeks, as opposed to the traditional enterprise setting, when new features showed up every 18 to 24 months. With agile development, Sonian can be very reactive to the customer's desires.

And also, on the product management front, you can have a sliding scale between providing a leading edge feature set and then be a fast follower. So, we can experiment with things that we think the customers might want to see but can't articulate it until they see it. And we can also react very quickly to a new feature that a customer might articulate to us that we didn't anticipate but see that there is an opportunity there so that more than one customer would want to benefit from that new feature.

JP: In terms of your Ruby on Rails development, are you using a cloud-based platform or do you do the development on-premise?

GA: Each developer writes code on their own local computer, but very quickly it gets pushed up to the cloud where testing and integration happens.

Each developer has access to a test sandbox. It's very important - and that's the big differentiation in developing in the cloud. In the past, there used to be a really firm dividing line between software development and systems administration.

In the cloud it feels like it's very blurry. A lot of the developers are actually participating in sys admin types of things, because on the cloud, sys admin means something different than what it meant in the old world, collocated, where sys admins would take care of hardware issues and firewall and routing and hard drives and stuff like that. Whereas in the cloud, it's more like developer ops or a developer admin kind of blend. Where a lot of sys admin stuff is really all in software and it's all around just making things operate more efficiently and fluidly.

So it's very important to get code into a mirror image of the production system as quickly as possible to see how things will work, and that's what we do - we develop on the cloud because we're going to be deployed on the cloud.

JP: How about issues related to security and privacy and where the data lives. Can you talk about how Sonian deals with these issues?

GA: It all starts off with an architecture that, from day one, was designed to meet the concerns that an enterprise IT decision maker would have around where their data resides and how will it be managed.

Within a cloud environment, and specifically, the Amazon and RackSpace clouds, customer data is stored in separate silos. There's no commingling of data. With Sonian, customer information lives in its own separate data silos because we can carve up the cloud any way we want and deliver a totally separate experience. Customers love the idea that their data is only their data; it's not mixed with other tenants of the system. Contrast this with SalesForce.com, where everything is commingled in a massive relational database.

Furthermore, the customer gets to enter a pass-phrase that they use to generate the encryption key for their data and they can change that when they want to. This is the concept of key rotation. So they're participating in the security decisions. There's also the total transparency on how we secure this information. So we use AES 256-bit encryption, which is the Department of Defense standard.

Data is always stored and encrypted inside the cloud, even temp and swap space on CPUs. If, for some reason, someone at Amazon were to mount a storage volume that we were using on a different CPU that we weren't controlling, that data would just be a bunch of gibberish because they wouldn't have the encryption keys. So every bit of storage is encrypted, and you can't say that about the competitors who are doing things in their own environment, or in a dedicated datacenter model because they're not encrypting everything. They encrypt basically by the cage that this stuff lives in inside the data center.

In the cloud, since you don't own the hardware, you encrypt everything and make it unusable to anybody but Sonian or our customers. And then there are different checks and balances in the system for audit trails and the ability to create check sums on the data so we know if it's been tampered with or if it's been changed since the last time we stored it. All the appropriate things you do in a compliance setting.

But it's transparency in describing how we do it and pointing to the algorithms and how durable they are for encryption, and the dedicated data silo model that make customers comfortable with how we manage their information in a cloud environment.

JP: Changing gears, let's talk about customers. What's the traction been like for Sonian?

GA: To date we have over 2,000 customers. There was a huge surge in growth starting in mid-2009. It was literally a hockey stick. We planted a lot of seeds in 2008 and into 2009. We've benefited from the early business development work. We think it's been impressive growth for a company that opened its doors only in 2007. We're happy to have 2,000 customers, but we know we've got a lot more potential.

Equally interesting is our multi-pronged distribution method, where we sell directly but we also sell through white-label partners who OEM our service inside of their offering. An OEM partner for us is a company that's selling some type of email or collaboration service, whether it's the direct kind of service, like a hosted mailbox, or whether it's in the email security space, like anti-spam and anti-virus service.

Sonian provides a very nice API that allows a partner to embed our service inside of their service, so that email archiving capability can be offered to their specific customer base. So it's a good distribution model. To date, we have seven OEM white-label partners that are redistributing Sonian through their channels. It literally gives us a nice scale on leverage effect.

JP: Okay Greg, here's your final question: What's the roadmap look like for Sonian?

GA: Our initial feature is focused on email communications and instant messaging communications. We have in development the ability to archive and manage all types of enterprise content fitting into our archive platform. You should see this in the second half of the year.

Our roadmap, essentially, maps into where the buying audience's decisions are going to be based as time goes on. Today, enterprise decision makers are buying email and instant messaging archiving. What we're hearing from our customers and our partners is the desire to broaden that to all types of enterprise content.

In fact, our architecture was designed from day one to be very broad and encompassing of all data object types, but we had to map into a feature set that we knew an audience was buying right now. That got us focused on the email aspect of communication. Today, that's where the biggest pain point is. But clearly there is a movement afoot within enterprises to, in parallel, layer in other types of information sharing systems. SharePoint is one that is growing very quickly. So there needs to be an ecosystem around SharePoint, to archive and manage that content in a very durable and a cost-effective way.