The recent Max 8 incident was a wake-up call of sorts to modern businesses. It was a visceral reminder of our utter dependence on software. One hundred years ago we became a society that depended on electricity. Today we depend on software, often with life or death consequences.
And it’s not just Netflix, Google or Amazon for whom a technology glitch can cause millions of dollars in losses. It’s all businesses, especially in the aviation industry.
In fact, just six months into his job as the new CIO at Delta, Rahul Samant had his first trial by fire that proved how dependent aviation business models are on technology. All Delta’s IT operations were down for 72 hours, a devastating event Samant vowed would never happen again.
“The outage and resulting operational disruption we recently experienced shows just how important a solid, reinforced IT foundation is to our business,” said Samant in a recent Delta.com article.
The rise of operating costs – despite record-breaking revenue – are putting enormous pressure on profitability. Fuel costs, wage increases, and real estate appreciation are important contributors to this equation.
According to the Deloitte report, “Travel brands are already leaning on a mix of cost-cutting strategies to rein in expenses…” Part of the cost-cutting must include a reduction in IT spending while maximizing efficiency and optimizing operations.
Maintaining and growing profitability can’t depend solely on cost-cutting. However, airlines need to maximize revenue streams by continuing to appeal to value travelers, while simultaneously attracting the growing market segment that values premium services - and are willing to pay more for these services.
This includes the need to fuse marketing and IT functions, and hire technology leaders like Maya Leibman, American Airlines’ CIO, whose pedigree includes member acquisition, retention, marketing, co-branding, and analytics while leading AA’s Revenue Management department. Dynamic pricing needs to become more sophisticated so as to deliver the right offer to the right person at the right time using the right channel. Airlines need new dynamic pricing engines that can update fares as quickly as every 15 seconds. This cannot be achieved with simple plug-and-play solutions.
Consequently, carriers need to establish real-time integrations between CRM and revenue management systems, among other sophisticated structural data management processes and modern digital architectures.
Passengers have become more demanding. Amazon and other technology behemoths have created a desire for flawless, instant service. Competition from international carriers has also highlighted some of the service challenges of U.S. carriers. Passengers today are less tolerant of poor service than ever before.
Part of the solution lies in near 100% system availability, especially for customer-facing solutions that enable self-check-in and ticketing. But systems that leverage predictive analytics, cognitive computing and just-in-time disaster planning can also help prevent costly delays due to the perfect storm of travel surges, weather events and system downtime.
Delays cost airlines more than $68 per block minute, not to mention angering travelers who are more than willing to resort to social media when things go wrong (and some airlines have resorted to blocking these angry travelers!).
Marc Andreessen’s prescient 2011 prediction that software is eating the world is the status quo today. It’s not just the internet giants whose business models are software-centric, but traditional businesses as well, including (and especially) aviation.
Baggage handling. Aircraft and passenger operations. Customer facing apps, such as ticketing, check-ins, notifications about changes or delays, and more.
Given the current context we propose a 5-pillar technology framework for airlines to modernize their technology operations.
As Rahul Samant’s experience with the 72-hour Delta outage shows, downtime is unacceptable. Yet airline IT systems are still playing catchup.
The solution? Site Reliability Engineering, or SRE, the hottest trend in technology infrastructure management, in an always-on DevOps-centered world, a framework pioneered by Google.
According to Google, “SRE is what happens when you ask a software engineer to design an operations team.”
In the system’s admin approach to infrastructure management there was a built-in conflict between the software engineering team and systems administrators. Software engineers create new functionality and need to launch them into wild. Systems administrators want to keep launches of new functionality to a minimum, because change causes disruption and potential failure.
Systems admins take a manual approach to fixing problems after the fact. They are pager-driven, and thus reactive. When a disaster happens, they become heroic fire fighters.
But today you’d need a million-man army of systems admins just to keep up. In the current DevOps environment updates flow like water from a broken dam. The scale of technology operations makes throwing more labor at the problem cost-inefficient and unscalable, not to mention dangerous to your business model.
Site Reliability Engineering means that software engineers take care of tasks that used to be handled by sys admins. And a software engineer looks at things in a fundamentally different manner. If an SRE is forced to do something manually, they will do it one time, and reluctantly. Then they will ask themselves: “How can I automate this with software from now on?”
That’s the essence of Business Reliability. Airline operations need a software engineering approach to their mission critical operations. We can break this down into 9 parts:
To undergird business reliability, airlines need technology resiliency. Resilient hybrid technology platforms are the foundation for a solid resilient infrastructure, and consists of the following:
And if our business brethren thought technology was too abstract, now we can put the “sexy” into technology.
With a visual critical business dashboard, technology and business executives can set up their N.A.S.A.-like control room that enables predictive monitoring with triggers for proactive support to ensure availability of critical business functions.
This can include an incident response management tier dashboard to reduce the mean time to detect (MTTD) and mean time to resolve (MTTR) for incidents, including for potentially catastrophic outages that can knock airline operations off-line for hours.
Airlines need an organized way to resolve, prevent and proactively manage incidents, from minor to emergency level. The complex technology landscape at most airlines, that live in a mix of legacy technology and bolted on applications from different eras over the last thirty years, mean that potentially devastating outages can come from anywhere.
Escalating from non-critical incidents to major incidents, a critical incident response framework establishes processes to ensure the right people and the right responses are executed at the right time to keep MTTR to a minimum and avoid disasters from happening in the first place.
By taking an SRE approach we build an IT operation that drives bottom-line results while optimizing IT operations.
The traditional approach is to throw bodies at problems. The mindset was: the more people we can hire to “handle” tickets, the more I can manage the constant and never-changing flow of incidents.
With an SRE approach, costs go down, while value goes up. As we stated earlier, software engineers view operations differently. An SRE professional never saw a manual approach he/she wouldn’t try to automate.
In 2011 Marc Andreessen said that software was eating the world. In 2019 that’s yesterday’s news. We’ve already crossed the Rubicon. And as far as industries that pre-date the internet, airlines are on the leading edge of that transformation.
But IT operations for the most part have lagged behind. It’s time the airline industry implement a software-engineering approach to managing operations.
And it’s not just a technology prerogative. It’s a business prerogative. Maintaining near 100% up-time, being able to flexibly and dynamically deliver the right offer at the right price to the right people at the right time, and cutting costs while optimizing and modernizing operations, is critical to business survival.
Today, software can mean life and death, as we saw with the Max 8. And for business operations, it’s a matter of business life or death.