With SQL Server 2005 End of Support just six months away, a lot of organizations are scrambling to get their databases upgraded to SQL Server 2014.
Microsoft IT joined us on a webinar last week to share their lessons learned doing a SQL Server upgrade. Driven by their “Get Current, Stay Current” initiative, the staff running answers.microsoft.com upgraded from SQL Server 2008 to SQL Server 2014.
Michael Schaeffer, senior services engineer at Microsoft, discussed the project, highlighting:
- the criticality of site uptime, given its worldwide popularity with 2 million+ users every day
- the need to avoid downtime not just from outages, but from maintenance as well
- the importance of handling traffic surges without performance loss – the answers site saw traffic more than double in the days following the Windows 10 launch
- the decision to architect an abstraction layer between the web app and the database to enable a zero downtime environment
- the seamless failover after outages enabled by the abstraction layer
Some highlights of the results following the ScaleArc deployment:
- downtime is measured at 50/1000 of 1%, including patching time
- 22 clustered physical SQL Servers became half that using virtualization
- the ScaleArc architecture is supporting an 1192% increase in traffic on the site
When Michael was detailing how the architecture – combining server virtualization, SQL Server 2014, and ScaleArc – has delivered zero downtime, the audience started peppering him with questions. Here are a few of the more interesting exchanges:
Q: With asynchronous mirroring, will there still be no data loss or zero downtime?
A: Michael explained that the replication technique does not affect zero downtime. For the Microsoft site, the Primary server is in one data center in the US and it does synchronous replication to the secondaries in that data center and asynchronous to the secondaries in the remote data centers – a second US data center, Singapore, and Dublin, Ireland. During a failure of the primary, the ScaleArc software holds inbound transactions in queue, while the secondary is getting promoted and continues to serve reads throughout. The mirroring technique is independent of that process.
Q: If a write transaction is in progress, and the connection to the database is lost, how does ScaleArc help keep the transaction live?
A: Michael detailed that ScaleArc does not try to keep the transaction live – it’ll do a roll back if a write was not complete. The ScaleArc software does not commit any writes on behalf of the database. If a transaction was in flight past ScaleArc but not committed when the primary goes down, that transaction will fail. If a transaction was still inbound to ScaleArc, the ScaleArc software will hold it in queue until the secondary is ready to handle the transaction.
Q: Does failover need manual intervention?
A: Michael discussed how Microsoft has used the ScaleArc API to instrument scripts that completely automate the failover. He noted that fast failover is crucial, so he doesn’t want to rely on manual intervention to complete the failover.
Q: What about the memory and CPU to the server it fails over to? Did you allocate extra memory and CPU to handle it or add extra memory and CPU dynamically?
A: Michael noted that dynamic memory allocation can be problematic, so Microsoft has stuck with static memory allocation.
Q: You’re planning to migrate “answers” into Azure – does the ScaleArc software run there?
A: Yes, Michael confirmed. The plan is to take the ScaleArc architecture into Azure when Azure supports the SQL Server configurations the Answers team needs. You can find ScaleArc on the Azure Marketplace today, supporting both SQL Server and MySQL and soon Oracle.
To hear more about Microsoft’s upgrade, check out the full webinar recording or the written Microsoft case study.