A Guide to Your SLAs | Part 2
In this post:
In our last post, CloudCover’s Director of Data Science, Aaron Sorensen, talked about how Service Level Agreements are typically defined and measured and how AI is helping deliver more efficient delivery. Aaron has had a long career in maintenance management and can speak with clarity about some of the technical details that trip up clients and vendors alike.
In the second part of our interview, Aaron discusses the flexibility of SLAs, the importance of relationships between TPMs and OEMs, and how service is changing due to advances in AI technology. We have also provided definitions of key terms and some additional context about maintenance management.
Defining What a “Four-Hour Response Time” Means in an SLA
Service-level agreements often guarantee 7x24x4 service, meaning you can log a ticket 7 days a week and 24 hours a day, with a four-hour response time to resolve problems. But what “four hours” means in practice is often ambiguous, as there’s often a gap between the problem identification and problem diagnosis that precedes any response. Aaron explains what many clients don’t recognize: that four-hour response time isn’t always necessary:
Lots of customers say, ‘I need four-hour service for everything.’ We can perform that, but it’s a lot less expensive if the SLA were not just four hours for everything, but maybe [only] for critical failures—meaning productivity is affected, productivity is down. You need four-hour service for anything that’s an escalated case. But if it’s a hard drive failure, maybe that’s OK to sit down [with us] and we can give you a price that matches those types of SLAs. In fact, we can build a smart quote around everything so the AI can look at all of the equipment types and say, ‘some of this equipment is really, really valuable. It needs to be up 100 percent of the time. Some of the equipment [problems], like drive failures, aren’t necessarily as big of a deal.’
If you have a storage array with [lots of] drives in it, you don’t necessarily need every single drive replaced within four hours. In fact, you could wait a couple of weeks to get three or four drive failures or 10 drive failures. Those are realities that exist in [our] customer system that we can manage through a better quote that’s less expensive for our customers.
How AI is Impacting Service Levels
As we explained in our last post, IT maintenance support is often grouped into different tiers of service: level one capturing help desk assistance, two being more in-depth technical support, and three being direct assistance from a product and service support specialist. Aaron provided some additional details about how these service levels function in application:
A level one agent is an agent who may not have all the experience in the world working with failures. Typically, we allow level one agents to manage hard drive failures, battery failures, real routine stuff. They’ll look up part numbers. Oftentimes, in this day and age, we [CloudCover] can have an AI doing a lot of that [work]. A level one agent will come in and just look over [what the AI has done] and make sure that it’s accurate.
The benefit of a provider using AI means that problems don’t have to wait for a live person to be available to diagnose and start resolving. Meaning the customer gets faster and more accurate service delivery. In addition to diagnostic functionality, AI can support TPM providers through chatbots and other customer interface applications:
What I’m seeing take place is not that AI is taking over anybody’s job, but it’s allowing us to move level one [agents] into different responsibilities. Really, what we’re [CloudCover is] able to do is keep a smaller service desk, but hyper-use in real focused areas on level two and level three [problems]. Level three guys are becoming much better customer communicators because they’re more needed in bridge lines. They’re often times needed to talk to an executive about what they’re seeing fail rather than needing them to diagnose. So we’re moving our level twos into diagnostic [roles]. Our level ones are managing, helping to validate the part numbers that they’re seeing [failures in]. And so we’re using level ones, level twos, level threes a little differently because of A.I. But it hasn’t cost anybody their jobs. It’s making really dynamic and fast service possible using people in really different ways.
How Do You Negotiate an SLA?
An SLA is a legally binding contract, so it’s important to get the terms of service correct. Understanding why you need certain levels is a step most companies don’t realize they can take. Aaron discusses how you can negotiate an SLA by ensuring you are working with a vendor that can customize it for your organization. Meaning with a truly intelligent service provider, customers should only be paying for exactly what they will need.
First of all, anything can be negotiated with anybody at any time. It’s just, “how much are we willing to spend?” The reason why OEMs and typically TPMs want to define the SLAs the way they do, because they want every ticket to look the same so that a level one or two or three [agent] knows how to process it. They try to make everything as uniform as possible so that the limitations of people aren’t showing up, so we can manage those things quickly. The funny thing is that AI is making all of that go away. AI can read and treat cases totally differently and see it the same way every single time, so [AI] can manage thousands of different types of scenarios really easily.
The way that service delivery is being shaped, and the way it will look in two to five years, it’s going to be very different than it looks today. And so that’s kind of what we’re seeing. But ultimately, SLAs are always renegotiated and can be negotiated with the OEM or the TPM. It doesn’t really matter. In fact, you can have an OEM who says, no longer is this product being sold, no longer is this product under support. But if you’re big enough, you have enough money, they’ll give you support for those devices. So the reality is, all those rules that are hard, ironclad rules are open to change. At any point in time, you can negotiate just about anything you need to. So whatever works best for your company is how generally TPMs and OEMs [can] work around you.
How Does Hybrid Maintenance Impact SLAs?
Hybrid maintenance management refers to the practice of combining original manufacturer and third-party maintenance solutions in order to customize maintenance packages, cut down on costs, and optimize the service levels needed for each asset. Aaron talked about some specific examples that illustrate CloudCover’s approach to hybrid maintenance:
[Networking] is a great example. You have not only hardware that’s being managed by the OEM, so they’ll warranty all the hardware for you, but they’ll also warranty the software, and that becomes a huge caveat for customers to get their mind around. Legally, we can’t. What that means is I can’t reproduce firmware or sell you firmware or give you firmware that was from the OEM and then charge you for it. It’s illegal for us to do it. So how do we, how do we provide service around firmware or bug reports or software that’s being serviced through the OEM? We do it through the OEM.
If hybrid maintenance is part of your proposed SLA, you want to ensure that the TPM provider has a relationship with the OEM:
The reality is that not every TPM is the same. Not every TPM is able to do these things. What you want is a TPM to hold a relationship with the OEM, so that if they’re covering the hardware on network equipment, they’re able to manage any piece of hardware that fails. They’re also able to look at potential bugs that come up or software problems that come up. We have a TAC that’s more than capable of evaluating any problem that comes up on any port or any switch or any network device at all. But they’re also able to say, ‘Hey, we’re going to recommend that you upgrade your software to this level. Here’s the level that you need. And I’m going to go ahead and place a call with the OEM, the manufacturer, to help facilitate the transfer of that software.’
This is exactly how CloudCover functions:
And so that’s what we’ll end up doing. And we work with OEMs all day and all night to produce all of the software that’s required because the OEM really isn’t warrantying their software in the same way they’re warrantying their hardware. They’re giving new software support for the life of the equipment, as long as they build this software. So they’ll offer that to all of their customers, and by extension, we’re able to work with them through that and offer that same thing to [the customer].
Software and firmware updates
IT hardware needs to be frequently updated with both software and firmware—at least until they reach their end-of-life (EOL), at which point updates become less frequent and may eventually cease altogether. Aaron addresses why TPM providers are well-positioned to service assets that have reached EOL:
OEMs are really faithful about updating their software. And the reason why is because bugs show up. You have a switch that has all sorts of problems three years after it was manufactured, right? Those problems may not be hardware related. What we find typically is those software [errors] are a bug. In fact, most of the time they’re known bugs. And so the OEM builds software patches to fix those known bugs and sends it out, or will send it out upon requests. The reality is those bugs take place all the time, and the OEMs are always repatching, rebuilding patches, sending those patches out for all of this equipment that they service. The problem becomes when the OEM stops providing software service to customers for particular product lines. And so that’s really an end-of-service life kind of a situation that takes place for OEMs. That does not mean that that equipment is now null and void and you need to throw it all away. It’s still something that can be maintained by a TPM. It’s just no new software updates are coming out for it, which may be fine because that level of software may be perfectly adequate. And you won’t experience new bugs on it either.
It’s okay to have questions about your SLA’s. Just be sure the service providers you are working with are asking you questions about your requirements, rather than defaulting to a blanket quote. The goal should be to get the coverage for exactly what your business will need.
Take a look at CloudCover
Learn More about our Services
Gartner-recognized CloudCover is completely unique to the marketplace and offers incredible value and control of your maintenance environment. If you’d like to learn more about the CloudCover Model and see our platform in action, click below.