The AI Hype Cycle

Marcus Burton Architect, Cloud TechnologyMarch 16th 2020

Every year, Gartner publishes a hype cycle chart for emerging technologies. Get this: in the 2019 hype cycle, Gartner lists 29 emerging technologies, and at least 16 of them are related to data science, machine learning (ML), and artificial intelligence (AI)! Over half! That’s quite a dose of hype for one chart. The only topic with more industry hysteria might be 5G. But probably not.

ML/AI is definitely here to stay, so we need to understand it. Networking complexity keeps growing while businesses build deeper network integrations, so ML/AI becomes necessary to make networks more autonomous. Nonetheless, not every problem needs “data-driven AI,” since many networking challenges are well-solved by rule-driven, model-driven, or expert-driven approaches (more on this in a future post). But, it’s hard to get to that truth, or to tease apart the realities of general AI vs narrow AI. That’s why we’re embarking on this article series, because we want to sift through the noise and attempt to take an honest look at ML and AI, including the good, bad, and confusing.

To do that, let’s start with a little setup. ML/AI is emerging as the topic du jour because of 3 primary trends, which we will tackle next 

Cloud

I know, we beat this topic to death. Give me a chance to explain though because this cloud thread runs across all industries, products, and solutions. The right kind of cloud makes data science easier. It’s generally true that machine learning workloads in the cloud could be done on-premises (as they say, “cloud is just someone else’s computer”), but it’s exponentially more complex and costly, more difficult to evolve, less modular, and harder to manage programmatically with a thin IT team focused on network-driven business objectives.  

Consider some of the data and performance advantages with cloud solutions:

  • Data Centralization – one of the most common problems with distributed and on-premises management is the need to centralize data visibility. Where your management, configuration, and provisioning flows exist, your troubleshooting, analytics, and data-driven workflows should also exist. If you use the crystal ball, it’s very clear that the two flows are becoming more integrated every day, especially with ML and closed-loop automation. The UI is where you see the results of data science, and it’s most convenient day-to-day if we centralize UI in one place. 
  • Planning for Peak Load – data volume (from client traffic, events, device telemetry, diagnostic data, etc) naturally fluctuates as client load changes throughout days, weeks, and months. On-premises (i.e. “local”) solutions must be provisioned (with storage, compute, memory) for the peaks of fluctuation; but, at any operating load less than peak, capacity (and thus, cost) sits idle. In the cloud—with orchestration and serverless compute—dynamic, unpredictable, and occasional workloads are easy to address at the lowest possible cost and resource commitment.
  • Software Architecture Evolution – in ML data pipelines, the data stack, and underlying architecture may need to change occasionally to adopt more modern tools, but it’s difficult for traditional vendors to make these changes smoothly with fixed HW and VM resource allocations on-premises. Conversely, cloud architectures can evolve as needed with no particular form factor limitations or resource constraints, and often transparently for customers (but the new features keep comin’!).
  • Software Change Management – with the fast pace of data science evolution, customers want new data metrics, calculations, diagnostics, and features all the time. This pace of iterative change has proven very difficult to deliver with traditional enterprise software packages without bugs, migration issues, and support handholding. For stability, you’ll need to wait for a reliable maintenance release (i.e. MR2 or MR3) and then sit on it for a long time, which means your solution stops evolving to solve business objectives with data. In an agile cloud, a full-time operations team is managing this entire change process with small, controlled, and reliable steps, leveraging test environments and controlled migration alongside high-confidence AP firmware pushes to deliver new data applications.

Clearly it’s self-promoting to focus on the benefits of cloud, but it’s real. And cloud also helps us solve the next issue, which is related to data.

Data

To get data science right, you need good data. Shocker, right?! Really though, you need data that is accurate, granular, accessible, and representative of all the diverse interests of your data algorithms (again, more on this in a future post). You need enough of the right data, and the ability to adapt to new data needs quickly. At the same time, too much data creates transport, storage, and processing overhead, which drives up costs and compromises ROI. Thankfully, the “data-driven” mantra has created a technology culture that is increasingly data literate, and in some cases, well equipped with the right balance of accessible data.

You might be surprised, but despite the seeming ubiquity of “big data,” most scalable data systems that provide the kind of performance we need for ML pipelines are cumbersome to manage and maintain because they were built for DevOps teams with very large data centers. Data redundancy and clustering can be difficult wrangling, corruption happens, and indexing processes hang. Also, increasing data granularity or storage duration to solve data science problems can multiply storage requirements in a hurry, but you can’t just “throw more disk at it” all the time to scale. But, this is largely solved with DevOps’ help operating a distributed data platform, where it becomes possible to leverage data science on a stable foundation. And perhaps it’s another success story of public and private cloud, where small operators get the same benefits of scale and stability without the overhead.

Architectures with data-centric approaches have several benefits:

  • Data scientists have a wealth of anonymized (hopefully) data by which to test and choose the best ML technique for specific challenges
  • ML models can be trained on a huge dataset, which leads to higher confidence and accuracy out of the gate
  • End-to-end velocity improves as ML models are designed, trained, deployed, and retrained in an efficient lifecycle

So, cloud and data architectures are ready, but the lynchpin for ML/AI progress is the software.

Accessible ML/AI

If you take a quick survey of the data science industry, there are an endless number of tools and software solutions: data streaming and warehousing options, compute and ETL choices, data engineering platforms, complex event processing applications, transport (e.g. bus/queue) mechanisms, visualization libraries, and on and on. The sea of choice is vast, and growing rapidly as data becomes the language of the next decade.

This tooling ecosystem has completely revolutionized data science in just 5 years. If you rewind the clock just a bit and tried solving complex data problems “back then,” you’d hit new roadblocks (which require custom development work) around every corner. The delay and cost incurred by that one-off development were just too much to make many data projects successful. But, the open-source community and cloud computing giants have made effective data science possible. Several of the best tools today were jumpstarted as internal projects at the Big5 tech companies (Amazon, Apple, Facebook, Google, and Microsoft) and startups, and were either open sourced or are available at approachable cost as part of a cloud computing framework.

The accessibility (“democratization,” as they say) of this end-to-end ML/AI ecosystem makes it possible for everyone to build solutions with minimal ML/AI expertise. With even modest investment in ML/AI expertise alongside domain experts, you can build very sophisticated systems with impressive efficiency. And if you think about the value chain as companies are willing to open-source their algorithms, what this means is that the real value is not in the algorithm itself, but in the customer data; we just need to extract it and apply it to solving business problems.

Bringing it All Together

It’s becoming common knowledge that ML/AI is not really a new technology. The theory and basic algorithms have been around for decades. But, for ML to really gain traction, we needed all of these layers of the ecosystem to develop around it. We needed sufficient progress in end-to-end handling of data volume and cost; ubiquitous computing (i.e. mobile) to generate relevant data for business applications; cloud services to lower the barriers of entry and enable rapid prototyping; and flexible software tooling for diverse market applications. And we are finally here. 

Back to Gartner…their so-called “trough of disillusionment” is the post-hype stage when we face the disappointment of reality. If you remember the promises of SDN, the industry was all hyped up for automated everything; the reality of SDN was an incremental process improvement for some use cases, another tool in your toolbelt. ML/AI for networking will also follow the hype cycle, but the trough should be much more exciting. But, like most emerging technologies, hype happens because of misinformation—AI is treated as a magical wand. In goes the data, outcomes perfect insights.

That is the “why” behind this series. Again, ML/AI is here to stay, so we need to understand it. Everyone wants to sell its magical powers; but if we’re honest, there will be phases of evolution, stairsteps of progress with some disillusionment mixed in, as well as some really helpful use cases and operational enhancement too—we’ve already seen some of them. Artificial general intelligence (i.e. systems behaving like data-aware humans) is still distant. In the meantime, as networks are becoming more complex, intelligent data-driven systems are closing the gap.

So here’s the rub. Follow best practices in design. Continue building domain skills. Learn to use data-driven tools. AI is no magic cure for crappy deployments. But, ML and AI, and data-driven systems are raising awareness and facilitating problem-solving already, which is why Extreme believes in, and is investing in, all the varieties of data science at our fingertips.     

***This is the first of a series of blogs exploring data science, ML, and AI. In upcoming blogs and videos, we’ll get into some of the technology of data science in-depth, and explore some ways Extreme is leveraging data to solve problems.

For more information about Extreme’s intelligent public, private, and local cloud options, here’s a link to our wares: https://www.extremenetworks.com/products/.