How can engineering management effectively address the escalating risks of system failure caused by growing system complexity and technical debt?

931 views1 Upvote4 Comments

Sort by:

CTO in Mediaa year ago

You'll need to understand your teams' ability to manage complexity and scale.

Some companies successfully manage size and complexity that would crush others.

Ensuring you have some level of system observability is needed, and keeping track of risks is advised.

If your team is at all mature, you would likely be able to start with a survey asking what your largest risks are.

Perhaps you only have one person that understands critical areas, or knows how to deploy or change critical infrastructure or services.

Ensure you have those risks covered.
Test your response processes so the team knows how to work in high stress situations (downtime, outage, DR).

Track other cycle times to see if you are getting slower over time (deployment frequency, change failure rate, downtime, etc...)

Also survey your team on their attitude and morale as an unhealthy technical system often leads to frustration in the team. Let them fix the things that cause the most frustration.

VP of Engineeringa year ago

This may sound obvious, but consider reducing complexity. A great measure of software quality is: how easy is it to change? Sometimes engineers tend to build overly complex systems in anticipation of future scale or business requirements that never materialize. An important part of addressing technical debt is to recognize where this has happened and whether it's really needed.

Sometimes microservices are not worth the complexity; a simpler monolithic application where multiple copies of data are not needed could be faster, more reliable, and easier to change. Over normalization of a database may be an obstacle to efficiency. Too much code abstraction or code re-use could mean changes are harder to make or cause bugs elsewhere in the system. The simpler an application can be and still meet its requirements, the better.

Director of Information Securitya year ago

Engineering management can mitigate the risks associated with system failure caused by growing complexity and the demand for quicker delivery by making the case to their business leaders and clients that their product, processes, and/or services get certified against recognized standards.

For example, the most rapidly growing risk area for many IT and telecom systems is cyber threat. As AI tools become more available, cyber bad actors will see their capabilities explode. Its very important for manufacturers and service providers to ensure their products and processes get certified against things such as the ISO 27000 family of standards or the NIST 800-53. Certification against these standards requires independent auditors to ensure that products and services are as safe as they can be. It keeps engineers and designers “honest” and can act as a marketing advantage to nervous clients who are daily reading about the latest DOS or Ransomware attacks.

VP of ITa year ago

An effective way to address system complexity and technical debt is to first acknowledge and prioritize the issue. Like the woodcutter and axe story, failing to take a step back to address system complexity increases the risk of system failure. To manage competing priorities, it's crucial to allocate some bandwidth for addressing technical debt in every scrum.

Content you might like

If you've been piloting Microsoft Copilot, what advice would you give to other CIOs considering implementing Copilot?

Are you planning a project to transition existing manual app testing to automated testing?

Currently satisfied with our level of test automation19%

Plan to start an automation project in the next 1-6 months56%

Plan to start an automation project in the next 6-12 months16%

Plan to start an automation project in 13 or more months5%

Don't know2%

View Results

How do you think AI will disrupt business across industries? Add to my list: 1. Content creation 2. Photos and video production 3. Basic coding and debugging 4. Strategic analysis to be highly complimented

What do organizations get wrong when it comes to data lifecycle management?

The difference between AIOps and DevSecOps is...

Misunderstood58%

Negligible36%

Non-existent — AIOps and DevSecOps are the same.5%

View Results

How can engineering management effectively address the escalating risks of system failure caused by growing system complexity and technical debt?

Sort by:

Content you might like

If you've been piloting Microsoft Copilot, what advice would you give to other CIOs considering implementing Copilot?

Are you planning a project to transition existing manual app testing to automated testing?

How do you think AI will disrupt business across industries? Add to my list: 1. Content creation 2. Photos and video production 3. Basic coding and debugging 4. Strategic analysis to be highly complimented

What do organizations get wrong when it comes to data lifecycle management?

The difference between AIOps and DevSecOps is...

What sets us apart?

RELATED ONE-MINUTE INSIGHTS

CrowdStrike Outage: Impact And Recovery

2024 Software Engineering Priorities and Challenges

Improving Software Developer Experience

Current State of Software Developer Experience

Emerging Software Security Risks: How Are Tech Leaders Preparing for 2024?

Take Your Insights On-the-Go