Paul Schnackenburg chats to Microsoft’s Scott Schnoll, Principal Technical Writer – Exchange Server about Exchange 2010, new features in the recently released Service Pack 1, block mode replication, Database Availability Groups and using JBOD for storage, Exchange storage architecture, Role Based Access Control, Exchange Online, 2TB database size, being an Exchange administrator, checking database integrity while the database is online and exporting and importing PST files without having Outlook installed.
PS So Service Pack 1 for Exchange 2010 came out this morning.
SS Yeah, about 10 hours ago.
PS For me it was this morning, I woke up to the announcement this morning. In your opinion, what is the most important feature in SP1?
SS Oh, wow as you saw in the slide deck there are more than 75 features but I’m going to be biased though and since I work primarily in the high availability, DR and site resiliency space to me the biggest feature is the improvements in continuous replication, with a new feature called block level replication. Historically we had the file mode that we introduced in Exchange Server 2007, where we shipped closed transaction logs [of a mailbox database] from the active copy to the passive copy. Now, once a passive copy has caught up we can tell the active copy that and instead of sending closed transaction logs it can send blocks directly out of the ESE buffer as we’re writing them on the active copy and ship that same write asynchronously over to the passive copy. So that means we’re sending the data pretty much as soon as it’s being written over to the other side. This means we get the data off the box faster and we’re also reducing the amount of data being shipped because we’re only sending ESE blocks instead of 1 MB closed log files each time. What’s interesting is that copies [of databases] can move in and out of this state independently so one copy might be keeping up so it might go into block mode where another copy might have higher latency on the network and it’ll stay in file mode. And the system will just automatically detect whether things have caught up and switch seamlessly back and forth between them. To me that’s one of my more favourite features but there’s so much good stuff, we’ve done a lot of work in the Exchange Control Panel, added a lot of new user interface around journaling, transport rules and Active Sync device policies. We’ve done some work in the Exchange Management console to add additional UI around the DAG experience; like IP addresses and alternate witness servers. Other experiences like public folder client permissions are now available in the UI. I could literally go on and on about all the features but those are some of the things that stand out in my mind.
PS So what kind of improvements are you seeing with block level replication? I guess are we talking about big enterprises seeing benefits here because there’s less data on the wire?
SS Well that’s an interesting question. The amount of data that’s going to be on the wire is going to be directly related to messaging activity, so if there’s a lot of messaging activity there’s also going to be a lot of logs. It’s also going to be driven by your hardware capabilities, you may be generating a ton of log files but you might also have hardware that’s able to keep up with that generation and network that can also keep up. So it’s really going to depend, almost on a copy by copy basis, as to the activity and available bandwidth, latency and everything else going on with the system at that time so it figures out whether it can switch into this mode but the benefits for any copy that goes to this mode are pretty big. For one thing we’re getting the data off that box a lot quicker, it may take, depending on the environment it might take 10 seconds, it might take 10 minutes to fill up a whole log file which means that data might already be sitting in there for several minutes before it’s actually externalised off the box. One of the reasons we originally gave this feature the name, “Continuous Replication” was to reinforce the concept that you want replication to occur continuously. The whole idea with data resiliency is to get that data copied somewhere else as soon as possible. So if we can do that by copying the data directly out of the write buffer instead of waiting for a closed log file to be generated and then copying that over that’s going to be much faster in terms of externalising the data and of course the quicker and the more data you get off the box the less time it’s going to take to get any data that didn’t make it off the box should some sort of failure occur.
PS How has the market received Exchange 2010?
SS Oh I think really well, I don’t know if you saw the statistics from my slides but in the cloud there’s already almost 40 million running in the cloud right now. On the SP1 code we’ve got well over half a million mailboxes running between MSIT (Microsoft Internal IT) and TAP (Technology Adoption Program – companies that adapt Microsoft beta code in production) and I think just what you saw from the attendance in my session – standing room only, in a pretty big room to begin with, that shows you that people are very hungry for Exchange, particularly customers who are on Exchange 2003 for example. They are realising and trying to move on and they’re really seeing Exchange 2010 not just as the right time to move on but also the right business value. So I think, it’s interesting, there’s both technology and business drivers almost going neck to neck to get there and I think you’ll see adoption increasing because of that.
PS Now one of the big things that changed between Exchange 2010 and earlier versions was the storage architecture, we now have DAGs (Database Availability Groups) and the option to go with cheaper SATA drives / tier 2 storage, whatever you want to call it. But a lot of Exchange administrators, especially old hands, are likely to be very attached to their SANs (Storage Area Networks). Microsoft still support SANs but what kind of storage architectures are you seeing out in the real world around Exchange 2010? Are lots of companies going with the tier 2 storage setup or is everyone holding on to their SANS?
SS I don’t have any percentages for you but I can tell you that all of the storage options are being deployed by all sorts of customers. There’s customers as you said that have already invested in the SAN, we think it makes perfect sense to maintain that. You’ve already invested in that storage and you want to use it for Exchange, that’s one of the reasons we support it for Exchange 2010 but when it comes time to buy new storage we also wanted to give you more options. If you look historically, take Exchange 2003 for example, the capital cost for a typical enterprise class solution, the storage itself was about 80% of those capital costs so that’s one of the reasons we started, not in Exchange 2010 but actually in Exchange 2007, doing a lot of work inside the product to target low cost storage. The other thing we also started in Exchange 2007 which also carried through to Exchange 2010 – obviously with our development cycle we have to peer into our magic ball – to figure out what storage will look like three, four even five years from now. When we started work on Exchange 2010 which was before the time even Exchange 2007 RTM’ed we sort of forecasted what the state of the industry was going to be. Some very clear pictures emerged from that, one is that the speed of drives aren’t getting any faster, they’re capped at 15 000 RPM and they’re not likely to go any faster in my lifetime. But they’re also getting denser, it used to be 10 GB, 100 GB and now it’s one and two TB, soon to be three TB or four TB. They’re not getting any faster but they are getting larger and it turns out that with these drives that if you target mostly sequential IO (In Out operations) instead of random IO which we’ve done historically in Exchange you get much better performance out of these lower cost drives.
They perform in some cases six times better with sequential IO than what you might get with random IO. We did a tremendous amount of work in Exchange 2010, the first time since the beginning of Exchange to modify the store schema, and when I say modify I don’t say that lightly, it took two of our top developers like three years to do this work but the work was worth it as we can now target these large, slow, low cost drives, tier 2 SATA, mid-tier SATA and so forth. And I think that what happened was that somewhere along the line the message got confused that you had to choose one or the other but the reality is that the choice is ultimately going to be based on what you already have and what you need. If you already have SAN storage that you want to repurpose for Exchange – great. If you’re buying new storage you can certainly go and buy a new SAN or you can save a lot of money and go and buy a lot more low cost storage. And the trade-off there will be that the management paradigm will shift something that you might not get in certain storage configurations. So it’s really all about lighting up options, it’s not about one storage platform being better than another. For one thing, if you think about the work that we’ve done, we reduced IOPS (Input Output Operations per Second) for Exchange 2007 compared to 2003 by 70% and then from 2007 to 2010 by another 70%. So there’s about a 91.5% reduction in IOPS going from 2003 to 2010. So we don’t need fancy, expensive storage anymore, we’ll certainly work on it but you no longer have to buy that as your de-facto storage solution like you did in Exchange 2003. I think that’s the message that customers need to think about; you’ve got more choices, more options and many of those options can significantly reduce your costs not only as you move forwards but also as you grow your mailbox storage sizes even larger.
PS Yes, I remember, before Exchange 2010 was released I wrote an article on the whole restructuring of the database and the schema changes to improve IO performance.
SS The amount of work that we did, both in the Store and in ESE is just phenomenally incredible and it’s all specifically to target this low cost storage. Everything we did was specifically to reduce IO. I know here at TechEd Australia we had a session on Exchange storage and I know we changed that session to be more practical in using the Exchange storage calculator but you may have seen the original version (from TechEd US, find the session here http://www.msteched.com/2010/NorthAmerica/UNC301). In the four hundred level we talk about the lazy view updates, write smoothing, database cache compression, all those changes that we did combine to, like I said, give you that significant IOPS reduction and pretty much make the storage argument unnecessary. We don’t need a lot of IOPS anymore, now it’s going to be almost a business preference as to what you choose.
PS So that leads me to the next question, what’s the most common DAG setup that you see out there, do people go with three database copies, do they go with four, do they go with two, do they go with multi-site?
SS There again, that’s largely dictated by business needs, there are some that are just doing DAGs, a single DAG within a single site, they just need “within the datacentre resiliency”, they don’t need site resiliency, yet. And they maybe have just a couple of copies so they’ll still do things like RAID protection. And then there’s other companies that like the idea of what we call native Exchange data protection where you’re using three or more copies and you’re also using single item recovery through retention policies and so forth. Combining the different features together to provide that native data protection. And then that lets you do things like eliminate or reduce traditional backups. And at the same time it uses lower cost storage with multiple copies. So it really depends almost totally on what the organisation’s individual needs are, their high availability and site resiliency needs are largely going to dictate their DAG architecture and the DAG architecture will dictate the storage architecture. For instance if you’re going to be using JBOD (Just a Bunch of Disks), an unraided set of disks you have to have a minimum number of copies. We don’t want anyone to be using JBOD unless they’ve got at least three copies of the database. Once you have three copies we think that’s a really sufficient number, we don’t think you need to use RAID which is basically providing you with a copy at the hardware level. We think a software based copy is going to be better for you because the software will be able to tell the difference between physically good and corrupt data which the RAID stack won’t be able to do for you.
PS Another big change in Exchange 2010 was the introduction of RBAC (Role Based Access Control) and the whole permissions model changed and I notice that there are other Microsoft products coming out that are also adopting role based access control so that’s obviously where we’re heading in the future. How was that received in the enterprise?
SS I think many enterprises received it very well. They really enjoy the idea of finally breaking out of the mould where you were mapping the security model to ACLs (Access Control Lists) on objects. Now, with RBAC, you can actually map them to the tasks that they’re trying to perform and scope them to servers or databases that they can perform them on. And that’s a pretty powerful thing. But they’re also some organisations that enjoy what we call the split permissions model where we segregate the permissions for AD (Active Directory) and Exchange from one another. An Exchange admin can’t mess with non-Exchange AD objects and an AD admin can’t mess with Exchange objects. That’s actually something we brought back in SP1 and when you install SP1 we give you the ability to actually switch back to that split permissions model. And you can use that model if you want, you also have the ability later to switch back to the RBAC model or go back and forth as you need to.
PS OK, so that’s not a combination, once you go to the split permissions model you’re sort of going back to the old way of doing things, setting permissions on objects and saying you’re an AD administrator, you have no access here, you’re an Exchange administrator, you have access here.
SS Yes, so in RBAC you’re saying that you may be an AD admin or Exchange admin but RBAC is going to tell you what you’re going to be able to do. You may have these other administrative rights, perhaps you may even have permissions that enable you to grant some sort of elevated status but anything that you try do to inside of Exchange is going to be enforced by, in this case, the RBAC permissions model. You might have super elevated status but unless you’ve got permissions to run that cmdlet or run it against the scope that you’re trying to run it against it’s not going to work for you.
PS How are you guys going with the mixed thing, where you have the on premise and people going online as well (through hosted Microsoft Exchange)? Do you see that becoming more popular or do you see more people going, well look, we’re going to go on premise or were going to go to the cloud.
SS We actually call that scenario cross premise where you have stuff in both places. Right now I’m talking to customers who want to do all three things. There are some customers who are saying we’re definitely going to be on premise, we’re not going to the cloud because we want to maintain pure physical security for everything. And then there are customers who say I like the idea of perhaps moving VIP users to the cloud and having the cloud manage the high availability and the DR and all the operational tasks and then maybe non-VIP users we might do onsite because they have very basic needs that don’t require HA or anything fancy. And then there’s the third group that are really interested very strongly in saving the cost of the on premise form as well. So they’ll be hard in the cloud and hard on premise with the full feature set but controlling them in separate ways. So for example if you look at something like a DAG, there’s no cross premise DAG, you can’t have a DAG that’s partially in your on premise organisation and partially in the cloud.
PS That would be pretty cool though.
SS Yes that would be pretty cool but you can understand the challenges associated with such a solution. Obviously we have the underlying cluster that requires everything to be in the same domain so right away there we have an issue we have to deal with and then of course the two separate organisations and the cross forest permissions model and trying to manage a DAG would be very difficult too. But who knows what a future version will bring?
PS Exchange 15
SS Never heard of it.
PS If I could mind read I could find out all these really interesting things about the next version.
SS If you could mind read you wouldn’t sit here talking to me right now.
PS Do you have any figures on the browser usage now that Exchange 2010 supports the other browsers and you can get full fidelity in Outlook Web App?
PS How come you went from 200 GB to 2 TB database size as the maximum recommended size?
SS Ah, that’s speaking to the strength and maturity of the DAG platform. The DAG has changed and evolved considerably. It used to be that we had this construct called a clustered mailbox server but it was actually a server network identity that we had to move over. And all the databases and everything that came with it. Now that we can just flip the databases around within the DAG within 30 seconds or less, things are a lot faster, moving is a lot easier for you. The other thing to consider is that email is skyrocketing and that’s only going to continue, so mailboxes are only going to get larger, which means databases are only getting larger. And to support a large database you need to have some way of having fast recovery for that so the DAG and database copies gives you that fast recovery. So if you have at least two or more copies of your database then you can simply grow that database to a much larger size because you don’t have to restore from tape if there’s a problem. You just have another one which is almost bit by bit duplicate of it, that’s nearly up to date, a few logfiles behind perhaps, but almost completely up to date. And it’s online, waiting to be activated if necessary. A lot of the behaviours, the paradigms from legacy versions of Exchange have literally been rendered unusable because of the large datasets. Now that’s one of the reasons we dropped streaming backup going from Exchange 2007 to 2010. Streaming backups can’t handle databases that are hundreds and hundreds of GB; it’s certainly not going to be doing you any service on a database that’s 1 TB in size. But if you have another online copy of that database that the system in maintaining and keeping healthy for you that can be made active in production in 30 seconds or less now you are capable of working with much larger datasets and you can safely grow your dataset that large. Now I’m not saying that everyone with a DAG and a couple of database copies should do that, there’s a level of operational maturity that’s needed in order to manage this stuff, particularly if you’re going to do larger DAGs and if you’re going to go the whole way and to three plus copies in a JBOD environment because there’s new nuances that come with that management paradigm. We may be dropping of the old stuff but that doesn’t mean the new stuff is completely hands free and doesn’t need to be managed. It just needs to be managed in a different way. That’s what a lot of people need to realise; that the old behaviours only apply to the old versions, the new versions come with new behaviours, and new methodologies behind them and you can’t take the old stuff with you into the new era. It would be like trying to fuel a train today with coal. Trains don’t run on coal today, you can’t do that. So it’s the same thing with Exchange 2010, the management paradigm is different, you won’t use streaming backups anymore, you don’t need a SAN by default anymore, you don’t have the complexity of the system in some cases like you had before. So once people get over that mindset I think that’s going to be the final hurdle to get them to race to the adoption.
PS I think another thing, I read it somewhere in the documentation is this whole notion of realising that if you go with DAG and you go with JBOD you’re going to have disks fail. And that’s a little scary, to think like that as an administrator.
SS Especially when we tell you to put the database and the logs on that one disk, yes.
PS That’s very alien to Exchange administrators.
SS It’s the exact opposite of everything we’ve been telling them. You know we used to say: use RAID, separate your databases from your logs for recovery purposes. Yes there was a performance benefit but it was largely for recovery. If you lost your database you were still fine because it’s really the log files that hold all the data that hasn’t been committed yet. Don’t use circular logging was another thing we used to say, there’s only very few cases where you could use it. Now we say, if you’re not doing backups the only way you’re going to get your logs truncated, the only way, is to use circular logging. So a lot of what we’re telling them now is the exact opposite of what we told them to do for the last 13-14 years, and I can understand why it’s difficult because they’ve spent their whole careers as administrators doing those things that we’re telling them not to do anymore. And that can be a difficult thing to do. You tend to feel like a master of those processes that we’ve been doing for that long and now to be told that everything you’ve been doing no longer applies. Now you have to do things this new way, I can see that could cause an administrator to be, you know, a little worried. All the job security they had by being the admin that knew all the best practises, and knew them cold, is now gone. And there are new best practises to have to learn. But that’s the nature of the industry; this whole industry is about constant, non-stop change. And the reality is that this is the way enterprises do things. But this isn’t new to enterprises, we had JBOD before RAID existed, that’s what we all used, and then RAID came along and that was great because it provided value that JBOD didn’t get you. Now we taken some of the value that you get from RAID and we’ve taken it out of the storage stack and into the application. So a perfect example is our page caching feature of Exchange 2010. One of the benefits you get from RAID is bad block detection, so if there’s a bad block on any of the disks in the RAID set the RAID controller will prevent you from storing any data there. If you’re not using RAID, you’re using JBOD; you don’t have something telling you what’s good and what’s bad until you try to write to it. So we try to write to someplace and if it turns out to be a bad block, now we have physical corruption, page corruption inside our database. We have the ability in a replicated environment to ask one of the other database copies for a good copy of the corrupt page. So we stop the replication, send them a message, they send us back a good page, we patch it up and that database that was just corrupt is now patched up. Because we know where that block was we can mark it internally as a bad block and never write to it on the disk again. So we’ve put a lot of that functionality that you normally get from RAID up into the application stack so that makes it a lot easier, a lot more palatable, to feel more warm and fuzzy about getting rid of RAID and going with JBOD. But the reality as I said is that JBOD is not new, it’s the original disk, literally before RAID existed, that’s what we had. Individual platters that were sometimes very thick, but individual platters that were isolated and autonomous. Now we go back to that and enterprises do that because it’s less luggage, it’s easier to deal with than having the complication, complexity and the cost that RAID or SAN brings with it. It’s much, much easier to manage.
As an example in MSIT when we moved from SAN to DAS we allowed the Exchange server administrators to become the administrators of the storage as well. And it used to be that when we were on the SAN we had to do management all the time and we had to be super careful because we might be updating the controller firmware in one place and actually find that it does a reset of the SAN fabric and everything is rippling through the SAN even though you’re only touching one component. Whereas in DAS we can just have the administrator go do whatever DAS maintenance might be needed, it’s going to be very little because it’s not a complicated task but they can do that whenever they’re doing the regular server maintenance, they do the Windows update, they take care of any disk issues that might need to be taken care of. It’s actually lowered our cost of ownership for Exchange inside of MSIT by moving away from dedicated storage and storage administrators to just having the administrators take care of the storage. It’s not that complex anymore, it’s just DAS and JBOD, simple management.
PS So what does Exchange Online run on?
SS It’s running Exchange 2010. Well, the true Exchange online, what we call internally our friends and family, our Live@Edu (www.microsoft.com/liveatedu), that’s Exchange 2010. There’s also the Business Online Productivity Suite, BPOS, I believe that’s still on Exchange 2007 and moving to 2010. Just wanted to clarify that because not everyone knows the difference between the two offerings. But yes, we’re using DAGs in the servers of course; we’re using multisite DAGs as well and as I said in my session that’s all managed by the Exchange core team. Obviously we can’t run a very large, world class service without having lots of resiliency and redundancy built in we’re using DAGs with multiple copies and multiple sites and using JBOD, its working great for us.
PS Doing the ISINTEG while the database is running, that’s pretty cool.
SS That is very cool, there were a lot of administrators who were worried that ISINTEG went away in Exchange 2010 RTM and I think they’re going to be very excited, not only is it back in terms of functionality, not in name of course but in terms of functionality. But it can be done online, you don’t have to take a whole database offline just to scan it and fix it. So that to me is an amazing thing.
PS Finally, exporting and importing to PST. It’s one of those things that have come up often and you mentioned the new functionality now available in SP1.
SS We finally got those cmdlets in there, we give you some properties, some switches so you can control which folders get imported and exported, by default we do them all but if you want certain folders included and excluded we can do that, control what goes in and out. But I think the biggest plus is going to be that you don’t need Outlook on the box. That used to be kind of a deal breaker for some shops because to do it previously on the box you needed a 64 bit version of Outlook to get this to work and not everybody had that obviously. So now, the ability to be able to manipulate the server based PST import and export without having Outlook on the box is going to be a huge win for all administrators.
PS So you’re pretty proud of Service Pack 1?
SS I’m very proud of it. I think a tremendous amount of work was done by a very large group of talented people in a very short amount of time. We RTM’ed 2010 back in early October so it’s less than a year, ten months and we’ve got this service pack and like I said, it’s got 75 new features in it. Just amazing productivity improvements for information workers and administrators.
PS I’ve always liked Exchange and I like writing about it.
SS As you might have guessed I can talk about it forever and in some of my sessions earlier this week I did! I was like “I know I’m running late but you’ve got to see this!”. There’s a lot to be excited about in it and I know there are some other email solutions out there but the reality is; the world runs on Exchange. And I think that’s what makes it so exciting for me.
It’s cool to be able to be a part of the development process; I was an MVP for a long time before joining Microsoft so now to be on the inside and doing all this work to develop it has actually opened up opportunities for me personally that I would never have had. For instance we do all this automated testing but I like to, as part of writing my product documentation, I want to do it all myself, I don’t want to use our automated tools to deploy Exchange and have it do it for me. I want to grab the bits, download it to my own lab and start working with it in my own lab. So I know, I was literally the first person in the world to ever build a DAG by hand, first person in the world to do that. No one else can say that. That by itself is really cool. I was the first person to manually install Exchange on a virtualized platform in a supported way. So I get these opportunities to do these things that no one else gets to do and that to me is the most exciting stuff ever. And then you get to go talk about the DAG to everybody, after you’re the first person ever who got to do it and it’s like wow. And then of course it’s fun, I get to come up with a lot of the names for things.
Things like Standby Continuous Replication, my name. Things like DAC-P in the Datacenter Activation Coordination Protocol (DACP); my name. I got to give it a name! It’s cool, I take a lot of pride in my work, I end up sticking my nose in areas of Exchange that are well outside of my core areas. I’ll do work on the deployment feature for example or I’ll go into our tools and modify strings that I don’t directly own but that will improve the product so to me it’s so awesome to be able to work on this product that’s used by so many people. And then I get to come here, in a room full of people who are just as interested and want to talk to you about it. It’s just awesome.
PS I think Exchange is one of those products that bring out passion in people.
SS It does! Definitely, I’ve seen that passion a lot and that can be a good thing or a bad thing but the fact that it brings out passion I think will probably always be a good thing because it shows that they’re interested in the product enough to spend some emotional cycles on it. If they didn’t care about they’d cut and go somewhere else but the fact that they’re complaining about it tells me that they care about this product and that they want to improve it. To me that’s a great customer to be talking to.
PS Thanks, it’s been great talking to you.
SS Great chatting to you.
Paul Schnackenburg attended Tech Ed Australia 2010 as a guest of Microsoft.