Back in January 2011, VCE Customer Care was an aspiration. The leadership team was working closely with our support partners at VMware, EMC and Cisco, and developed a philosophy to deliver the highest level of service possible to our customers. This included providing 24 x 7 global Vblock Systems customer support, escalation management, release management and account management services. This required creating unique roles, implementing new technologies and developing innovative processes that ensured VCE could exceed customer expectations, by delivering a differentiated support end-to-end experience.Taking the First StepVCE realizes that Vblock Systems is a transformation for traditional IT departments and our first focus was to provide customers with a designated Customer Advocate to help them transition to new operational models. VCE Customer Advocates are key to the success of VCE customer support as they help the customer maximize the value they get from their Vblock. Our Customer Advocates are supported by a team of Escalation Engineers, highly experienced technical experts that proactively assist customers in their Vblock management, advising from a technical standpoint on critical issues and more maintenance queries.Since VCE Support responds to all questions about Vblock Systems technology across the globe, a Technical Service Representative team was created, taking over the initial customer contact for reporting new Service Requests. The TSRs went through an eight week boot camp to develop a good grounding in all technical aspects of the Vblock, as well as ITIL practices and management of customers. Since they are the first person the customer interacts with, we want to ensure this team could field customer inquiries in all technical aspects of the Vblock, with the ability to resolve issues on first contact, and to effectively triage issues that might require further troubleshooting.Supporting The BusinessThroughout this time, new IT systems were developed and put in place to manage customer engagement. VCE selected out-of-the-box, best in breed cloud solutions for these systems, such as CRM and Knowledge Base systems, and BT Cloud Telephony solution, supported by a state-of-art B2B solution that connects VCE Support directly to our support partners. We also developed a portal for customers to log their service requests and access the Vblock documentation and support matrices. All this was completed in less than 6 months from start to finish.Evolution of VCE Global Customer Care Today, VCE has developed its own fully standalone support organization, with a team of subject matter experts on all aspects of the Vblock technology. These support engineers, called Vblock Platform Support Engineers, work in tandem with the TSR team in ensuring that VCE’s customers get the best support available for their Vblock Systems. They are trained in all aspects of the Vblock, so they provide in depth support of the Vblock components, while still having that depth of knowledge of the Vblock that allows them to see the issue holistically and not as a standalone component issue. This is critically important when supporting Vblock Systems, as Vblock is not just a set of individual components, but an infrastructure that has been pre-tested and engineered to work in a high availability environment maintaining maximum up time.What we have demonstrated to date is that even by setting the bar high with customer expectations, we have a customer satisfaction rate in the high 90th percentile range. This proves that we are delivering on the plan that was set in motion just over two years ago. VCE support will continue to evolve using best in breed technology in our products, and in our support delivery processes and systems, to ensure the best possible support experience for our customers.
This year, I had the opportunity to attend the World Economic Forum (WEF) in Davos, Switzerland, which brought together over 3,000 leaders from business, government, international organizations, civil society, academia, media and the arts. I’ve attended the gathering in the past, and can say that this meeting was my favorite experience for a few reasons.First, the overarching sentiment of optimism among attendees was inspiring. There was a strong, positive feeling with regard to the economy, as well as the role that technology is playing – and will continue to play – in solving some of the world’s biggest problems.That said, it was recognized that technology alone can’t drive positive change; organizations across sectors and geographies must collaborate to realize the opportunities that technology presents. We’re all aware of how emerging technology such as artificial intelligence, machine learning and robotics is beginning to disrupt industries, economies and even day-to-day human interactions, as outlined in our recently released research on Realizing 2030. It’s going to take strong leadership and corporate commitment to shape a future that works for all, and it was clear that those in attendance agreed.This notion of collaborative partnership and collective responsibility was my second big observation, aligning with the conference theme of “Creating a Shared Future in a Fractured World.” Throughout the week, actionable, measurable commitments were made by some of the largest corporations in the world in partnership with third party public sector organizations, particularly in the areas of sustainability and the circular economy. A great example was the unveiling of the Platform for Accelerating the Circular Economy (PACE), a public-private collaboration co-chaired by the CEO of Philips, and the heads of the Global Environment Facility and UN Environment.Dell Technologies was part of this coalition of forward-thinking leaders that pledged to accelerate the implementation of the circular economy. The principles of a circular economy, which seeks to design out inefficiencies and turn “waste” into a valuable resource, have long been employed by Dell as a key strategy to meet our Legacy of Good 2020 goals; most recently through our programs including ocean plastics and closed-loop gold. In the pledge that was announced at WEF, we committed to closing the loop on all used Dell equipment, including capital equipment that becomes available to us, by 2020.My third observation was with regard to the amount of open, honest discussion around diversity, gender and race. It was clear that every company sees diversity as a business imperative, but many are struggling with how to foster environments of inclusion, including managing negative sentiment such as aggression, bullying and tone. There isn’t a simple, one-size fits all solution to ensuring equality in the workplace and society, especially when you apply a global lens to the challenges various groups face across geographies. To make progress, the general consensus was that companies really need to examine their cultures and address issues head-on through strong leadership commitment and example – something I’ve always believed in very strongly. One of the ways we’ve approached this at Dell Technologies is through our early and strong participation in Catalyst’s Men Advocating Real Change (MARC) initiative, which aims to create a more inclusive workplace by engaging leaders in candid conversations about the role of gender in the workplace as well as topics such as unconscious bias.“I was so glad to hear that the Dell Technologies value proposition and strategy is resonating.ShareLast but not least, a real highlight of WEF was my engagement with our customers who also attended the conference. Throughout our conversations, I was so glad to hear that the Dell Technologies value proposition and strategy is resonating. We had a shared excitement of all the opportunity ahead to realize the digital future and drive positive change.While I walked away from Davos inspired and energized, there were also a few areas that have stayed with me. Areas where, as a global community, we need to invest more time, resources and expertise. One is with regard to the refugee crisis. I had the opportunity to participate in a simulated session called “A Day in the Life of a Refugee,” which was incredibly eye-opening to me personally, educating me on the magnitude of the opportunity we have for change. This crisis, which counts an unacceptable 65+ million men, women and children, represents a devastating loss of human life and potential.The second is around equality for women in society and the workplace, which I alluded to above within the broader conversation of diversity and inclusion. While the conference sent a strong message by having an all-female co-chair team this year, the crowd was still largely male-dominated, which our customers even noticed and flagged in our discussions. This is a reflection of the need for more women in leadership positions – across the private and public sectors, as well as government.I opened this blog discussing the amazing potential for technology to reshape economies, industries and lives. I’ll close with the notion that with great opportunity comes great responsibility. As the “Fourth Revolution” is upon us, we must anticipate what digital transformation means for labor forces of all ages and skill sets. We must be prepared to reskill the workforce for the digital economy, while at the same time encouraging the next generation to explore careers in STEAM (science, technology, engineering, arts, math). By remaining accountable and collaborative, optimistic and forward-thinking, I believe we will be able to capitalize on the tremendous opportunity that technology – and Dell Technologies – can bring to the world to advance human progress.
Node After Node38,424 [Years]0.999999897 Reliability (MTBF)Availability As we covered in our previous post ScaleIO can easily be configured to deliver 6-9’s of availability or higher using only 2 replicas that saves 33% of the cost compared to other solutions while providing very high performance. In this blog we will discuss the facts of availability using math and demystify the myth behinds ScaleIO’s high availability.For data loss or data unavailability to occur in a system with two replicas of data (such as ScaleIO) there must be two concurrent failures or a second failure must occur before the system recovers from a first failure. Therefore one of the following four scenarios must occur:Two drive failures in a storage pool ORTwo nodes failures in a storage pool ORA node failed followed by a drive failure ORA drive failed followed by a node failureLet us choose two popular ScaleIO configurations and derive the availability of each.20 x ScaleIO servers deployed on Dell EMC’s PowerEdge Servers R740xd with 24 SSD drives each, 1.92TB SSD drive size using 4 x 10GbE Network. In this configuration we will assume that the rebuild time is network bound.20 x ScaleIO servers deployed on Dell EMC’s PowerEdge Servers R640 with 10 SSD drives each, 1.92TB SSD drives using 2 x 25GbE Network. In this configuration we will assume that the rebuild time is SSD bound.Note: ScaleIO best practices recommend a maximum of 300 drives in a storage pool, therefore for the first configuration we will configure two storage pools with 240 drives in each pool.To calculate the availability of a ScaleIO system we will leverage a couple of well know academic publications:RAID: High Performance Reliable secondary Storage (from UC Berkeley) andA Case for Redundant Array of Inexpensive Disks (RAID).We will adjust the formulas in the paper to the ScaleIO architecture and model the different failures.Two Drive FailuresWe will use the following formula to calculate the MTBF of ScaleIO system for a two drive failure scenario:Where:N = Number of drives in a systemG = Number of drives in a storage poolM = Number of drives per serverK = 8,760 hours ( 1 Year)= MTBF of a single drive= Mean Time to Repair – repair/rebuild time of a failed driveNote: This formula assumes that two drives that fail in the same ScaleIO SDS (server) will not cause DU/DL as the ScaleIO architecture guarantees that replicas of the same data will NEVER reside on the same physical node.Let’s assume two scenarios – in the first scenario the rebuild process is constrained by network bandwidth – in the second scenario the rebuild process is constrained by drive performance bandwidth.Network BoundIn this case we assume that the rebuild time/performance is limited by the availability of network bandwidth. This will be the case if you deploy a dense configuration such as the DELL 740xd servers with a large number of SSDs in a single server. In this case, the MTTR function is:Where:S – Number of servers in a ScaleIO clusterNetwork Speed – Bandwidth in GB/s available for rebuild traffic (excluding application traffic)Conservative_Factor = factor additional time to complete the rebuild (to be conservative).Plugging in the relevant values in the formula above, we get a MTTR of ~1.5 minutes for the 20 x R740, 24 SSDS @ 1.92TB w/ 4 X 10GbE network connections configuration (two storage pools w/ 240 drives per pool). The 20 x R640, 10SSDs @ 1.92TB w/ 2 X 25GbE network connections config provides MTTR of ~2 minutes. These MTTR values reflect the superiority of ScaleIO’s declustered RAID architecture that result in a very fast rebuild time. In a later post we will show how those MTTR values are critical and how they impact system availability and operational efficiency.SSD Drive BoundIn this case, the rebuild time/performance is bound by the number of SSD drives and the rebuild time is a function of the number of drives available in the system. This will be the case if you deploy less dense configurations such as the 1U Dell EMC PowerEdge R640 servers. In this case, the MTTR function is:Where:G – Number of drives in a storage poolDrive_Speed – Drive speed available for rebuildConservative_Factor = factor additional time to complete the rebuild (to be conservative).System availability is calculated by dividing the time that the system is available and running, by the total time the system was running added to the restore time. For availability we will use the following formula:Where:RTO – Recovery Time Objective or the amount of time it takes to recover a system after a data loss event (For example: if two drives fail in a single pool), where data needs to be recovered from a backup system. We will be highly conservative and will consider Data Unavailability (DU) scenarios as bad as Data Loss (DL) scenarios therefore we will use RTO in the availability formula.Note: the only purpose of RTO is to translate MTBF to availability.Node and Device FailureNext, let’s discuss the system’s MTBF when a node fails and followed by a drive failure, for this scenario we will be using the followed model:Where:M = Number of drives per nodeG = Number of drives in the poolS = Number of servers in the systemK = Number of hours in 1 year i.e. 8,760 hoursMTBFdrive = MTBF of a single driveMTBFserver = MTBF of a single nodeMTTRserver = repair/rebuild time of failed serverIn a similar way, one can develop the formulas for other failure sequences such as a drive failure after a node failure and a second node failure after a first node failure.Network Bound Rebuild ProcessIn this case we assume that rebuild time/performance is constrained by network bandwidth. We will make similar assumptions as for drive failure. In this case, the MTTR function is:Where:M – Number of drives per serverS – Number of servers in a ScaleIO clusterNetwork Speed – Bandwidth in GB/s available for rebuild traffic (excluding application traffic)Conservative_Factor = factor additional time to complete the rebuild to be conservativePlugging the relevant values in the formula above, we get a MTTR of ~30 minutes for the 20 x R740, 24 SSDS @ 1.92TB w/ 4 X 10GbE network connections configuration (two storage pools w/ 240 drives per pool). The 20 x R640, 10SSDs @ 1.92TB w/ 2 x 25GbE Network config provides MTRR of ~20 minutes. During system recovery ScaleIO rebuilt about 48TB of data for the first configuration and about 21TB for the second configuration.SSD Drive BoundIn this case we assume that the Rebuild time/performance is SSD drive bound and the rebuild time is a function of the number of drives available in the system. Using the same assumptions as for drive failures, the MTTR function is:Where:G – Number of drives in a storage poolM – Number of drives per serverDrive_Speed – Drive speed available for rebuildConservative_Factor = factor additional time to complete the rebuild to be conservativeBased on the provided formulas let’s calculate the availability of ScaleIO system based on the two different configurations:20 x R740, 24 SSDS @ 1.92TB w/ 4 X 10GbE Network(Deploying 2 storage pools w/ 240 drives per pool) Node After Node69,163 [Years]0.999999975 20 x R640, 10SSDs @ 1.92TB w/ 2 x 25GbE: Drive After Node27,665 [Years]0.999999937 Overall System4,714 [Years]0.99999952 or 6-9’s Drive After Drive43,986 [Years]0.999999955 Overall System15,702 [Years]0.99999989 or 6-9’s Drive After Drive105,655 [Years]0.999999983 Since these calculations are complex, ScaleIO provides its customers with FREE online tools to build HW configurations and obtain availability numbers that includes all possible failure scenarios. We advise customers to use this tool, rather than crunch complex mathematics, to build system configurations based on desired system availability targets.As you can see, yet again, we prove that the ScaleIO system easily exceeds 6-9’s of availability with just 2 replicas of the data. Unlike other vendors, neither extra additional data replicas nor erasure coding is required! So do you have to deploy three replica copies to achieve enterprise availability? No you do not! The myth is BUSTED. Node After Drive138,325 [Years]0.999999985 Drive After Node6,404 [Years]0.999999691 Node After Drive276,650 [Years]0.999999993 Reliability (MTBF)Availability
The numbers don’t lie. Sixty percent of Dell EMC VxRack FLEX customers make a second purchase within 180 days. And these are big customers – Fortune 500 companies that have resources, do their research and select proven and competitive solutions after going through rigorous bids and POCs. This tells us that our rack-scale hyper-converged infrastructure (HCI) system delivers on its promise and VxRack FLEX is on an upwards trajectory with no signs of slowing down.Launched in 2015, VxRack FLEX offers a turnkey experience that includes delivery and support of a single solution managed holistically. Under the covers, it includes world-class compute, storage and integrated networking for both virtualized and non-virtualized environments. With VxRack FLEX’s simplicity of deployment and ongoing management, an analyst report found that VxRack FLEX offers 6x faster time-to-value and 30 percent lower TCO compared to a traditional SAN.VxRack FLEX enables massive scale-out capabilities for the data center along with flexible deployment options (compute and storage residing on the same server or separated out). Add nodes one by one within a single rack or scale out with additional racks as compute and storage resources are consumed. This provides your infrastructure with elastic sizing and efficient scalability, allowing you to start small with your proof of concept and grow to web-scale size as your requirements evolve. Because VxRack FLEX supports diverse environments, it is ideal for consolidating both traditional and modern applications that demand high performance, availability, and resiliency onto a single system.There have been a number of important enhancements to VxRack FLEX already this year. Available this month, VxRack FLEX is integrated with the latest 14th Generation Dell EMC PowerEdge Servers. This next generation system means more powerful handling of workloads, greater capacity, and improved flexibility:5x more IOPS per node60 percent more flash capacity per node4x more memory34 percent more virtual machines per node250 percent more bandwidthDell EMC PowerEdge 14th generation servers are designed specifically for and tailored to HCI workloads that depend on the tenets of both servers and storage. This enables us to offer customers enhanced storage capacity and flexibility, allowing customers to optimize their storage configurations for their hyper-converged environment. These servers enable VxRack FLEX to deliver significantly faster access to applications and data thanks to higher core counts, faster clock frequency, more memory channels, and faster memory.Also new this month are enhanced monitoring, alerting and reporting capabilities – a huge priority for our customers running mission-critical applications because it allows for greater agility and control of server resources. These new features mean failures are quickly identified and Dell EMC Support is informed immediately for speedy resolutions. Proactive alerting and automated technical support means less time is spent troubleshooting so more time can be spent addressing business priorities.If you’ll be at Dell Technologies World this year, be sure to join our breakout sessions to get a deeper dive on our architecture, use cases and workloads:VxRack FLEX: Introduction & What’s NewVxRack FLEX: Running Mission-Critical Enterprise Applications on HCIVxRack FLEX: Architecture OverviewVxRack FLEX: Achieving the Best Performance & AvailabilityIf you can’t be there in person, check out some of our recent white papers:Wikibon White Paper: VxRack FLEX Business Value FindingsIDC Integrated Networking White PaperSplunk on VxRack FLEX for Machine Data Analytics PaperESG Technical White Paper: Running Traditional Databases on VxRack FLEXESG White Paper: Modernizing Virtual Infrastructures using VxRack FLEXESG White Paper: Enabling Enterprise Cloud Adoption with VxRack FLEXVxRack FLEX has had tremendous momentum coming out of 2017 and with these exciting new releases happening in Q1 and Q2 we expect nothing but acceleration and growth. Reach out to us at any time to discuss the best HCI approach to meet your needs!