5 Lessons We Learned from Adding 10 Terabits-per-Second to Our Network in 18 Months Flat

Kyle Okamoto, VP of Network, Technology and Operations, Verizon Digital Media Services
181
289
49

Kyle Okamoto, VP of Network, Technology and Operations, Verizon Digital Media Services

It was the best of times, but it could have been the worst of times. The goal: add a whopping ten terabits per second of network capacity to Verizon Digital Media Services’ Content Delivery Network while simultaneously redoing our entire network foundation. The deadline: 18 months. The number of things that could have gone wrong: too many to count.

But our employees rose to the challenge, and our customers saw 30 percent improvement in quality, n+1 to n+2 in terms of redundancy and less impact from routine maintenance procedures. Now that the smoke has cleared, I can pinpoint the lessons that this massive undertaking taught us. They may involve fiber networks and point-of-presence (PoP) access points, but they’re applicable to anyone who’s undertaking an ambitious new project.

Build yourself a crystal ball

During any sort of major company change, you’re probably asking a lot of questions. Ours went something like this: If we add a PoP (a physical location that connects to and helps other devices connect to the internet) in Oman, what impact does it have on London and Ashburn and Amsterdam? If a customer traditionally uses us for small downloads but then decides to start downloading massive amounts of information, how will that affect our modeling and forecasting? If our traffic base is growing at a steady rate, but then a slew of new customers get added onto the platform at unpredictable times due to our expansive growth in media, how does that impact the overall network outlook?

  ​During times of great scaling, it’s especially important to use your resources smartly 

To find the answers, we constructed something I like to call a smarter crystal ball. We built statistical scaling models using regression and sensitivity analyses to better predict where traffic would grow, how quickly and on which specific networks. This data was given to our partners. We then automated this functionality to accommodate traffic engineering methods, spiky customers, and the impact a new PoP or partner would have on other PoPs and partners. This allowed us to get a holistic look at our network, to predict and understand asymmetrical growth and to communicate better downstream.

For successful scaling up, it’s not enough just to ask the right questions. You need to make sure you build the correct tools to give yourself the answers.

Dive into a parallel universe

Another major question we faced: how to keep everything running smoothly for our customers as we rebuilt the network foundation behind the scenes?

Testing, testing and testing. We ended up building a completely separate environment—a parallel universe, if you will—that we termed Shadow. We directed .001 percent of our traffic through this lateral environment, which was running on new equipment, new processes, new everything. This allowed us to test out our network on real traffic instead of solely relying on predictions.

Shadow, our parallel universe, allowed us to segregate our resources and assets at a very granular level and then dole out those resources to specific applications or customers. We tested various hypothetical scenarios out on our parallel environment and came up with innovative ways to manage traffic, such as carving out a network within a network.

In short, we grew familiar enough with our parallel universe that by the time we began implementing changes in the “real world,” we were more than confident that everything would work as it should.

Keep calm and empower employees

As we strove to become a billion-dollar company, we couldn’t keep doing what we were doing and expect it to work on a larger scale. Researchers have learned that large companies have trouble making changes, especially when it comes to implementing innovative ideas (which fail 70-90 percent of the time), because of inertia.

We wanted to make sure we didn’t fall into that trap. As Entrepreneur points out, large companies would do well to take inspiration from the fertile, “beta” stage of startups: let employees take on active roles in creative problem solving, drop hierarchies, take chances and move fast.

So instead of continuing to run everything through the C-suite, we simply released the reins. We empowered the working level to make decisions, form their own teams, sign contracts and run their corners of the business; this meant we were taking the gamble of giving employees with 1-2 years of experience an immense amount of responsibility and accountability.

This wasn’t a blind gamble, though. Researchers have studied what happens when you give back some of the power to your employees, and guess what? It works. One study found that high empowerment leads to higher job satisfaction and lower attrition. Another learned that psychological empowerment was strongly related to job satisfaction and commitment to the organization at large.

I personally watched my employees respond instantly to this new bottom-up structure. For example, a woman on my team who had been here for a mere four months spearheaded an initiative for us to adopt Kanban, the uber-efficient Japanese scheduling system, on our operational side. This helped us to streamline our workflow, which ended up being so successful that other teams adopted it all across the company.

By maintaining a start-up attitude and a bit of irreverence toward hierarchy, employees who are practically brand-new feel empowered enough to suggest ideas—and you never know which idea will change everything.

Hire wisely, especially on a deadline

During times of great scaling, it’s especially important to use your resources smartly. It can seem that throwing money at the problem will speed things up, but taking time to find the righttool, the righthire, will more than pay off.

When we took the time to find unicorns—like the network engineer in Latin America who can speak Portuguese and Spanish, has a deep Rolodex, and knows how to facilitate regulatory and legal contracts over the entire continent—we ended up saving time and money in the long run. Studies have shown that when you find the right high-performing employee, he or she can deliver deliver 400 percent more productivity than your average worker.

Now that’s what I call scaling up fast.

Read Also

Streaming Video? Let a CDN do the Heavy Lifting

Casey Charvet, Ph.D, CTO, TourGigs

Kill the LAN!

Nick Roethel, Director of Technology Services (CIO), Metropolitan Transportation Commission