Should production support always be separate from sprint execution?

1.2k views2 Upvotes8 Comments

Sort by:

VP of IT in Software4 years ago

No... it should not always be separate from sprint execution.

It gets complicated after that. But a few things to consider in the answer.

If it is a questions of developers wanting to push code and ops wanting to preserve stability then you have a problem. This is what DevOps was trying to solve. This is not a good approach.

If you thing ops takes too much time away from projects. That isn't a good answer. All work is work. You want to work on the most valuable work. That may be ops or feature development. Creating a bucked just based on the source of the work means that you are expending resource on less valuable things reducing the total value delivered.

BUT... unplanned work is very disruptive. Production support has a lot of false alarms and busy work.

1. I create a team to catch the noise. I try to automate the noise away but until I do I don't want a bunch of it to hit my development teams.

2. Once we get passed the noise, if there is a small amount of work incidents then I can have the development support it. I generally reserve capacity for this each sprint based on the expected amount then have the product owner prioritize anything in excess.

For more complex environments....

3. For modern applications, I create an SRE team that is focused on the operational/scaling/instrumentation/automation side of things. Developers often aren't good with these things. This teams start by converting all of the operational decision into business facing metric that the business will care about if they are failing. I use this to balance work between running down technical debt and feature development. Otherwise the SRE team focuses on toil reduction, automation, scalability, latency planning, etc. and can change the code to increase resilience.

4. The SRE and Development teams are a single resource bucket. If resilience is below business requirements, resources shift to the SRE team until below. If resilience is above targets then resources shift back to the feature team to accelerate development.

In both cases though, the development team own accountability for the resilience and product performance of their application. All people involved with development/operations are measure by speed, throughput, ops cost per unit of value, product stability, customer and employee satisfaction. So we don't have different measurement systems.

Director of Information Security in Energy and Utilities4 years ago

You are really splitting ops from Dev at this point and ideally speaking you have folks who lend L3/deep engineering support to your operations folks following each release. This way your experienced folks who developed the features are available to support them in case they go wrong and educate your ops people on supporting them in the future. If you have Devs that only develop and then immediately switch to next sprint and not spend any time supporting ops as they deal with new release issues then you'd have pretty big knowledge gaps (unless you have a dedicated SWAT team in place).

Chief Information Officer in Software4 years ago

I think it's a cyclical thing, it's not a one and done decision. They get tired of this separation of production support from development and the production support team feels like they never get to develop and work on new features. So you go back to the 80/20 model but then you think, "We're not able to meet our commitments because the 20% soon becomes 50%." And you go back to production support.

It's okay to balance the two. What I'm proposing for now is separation because my team is not able to focus enough or have the discipline for 80/20. We're splitting the two but with the promise that we will have options to rotate in and out, since we're growing 50% plus year over year. So what I've said is, "As we grow and as you excel, you have an opportunity to go into the scrum teams, and then we’ll get new people in on the production support side." Some people just want to be in production support and that's fine too.

3 1 Reply

no title4 years ago

In an enterprise, if you're embracing things like scaled agile framework (SAFe) or LeSS, you have that improvement sprint that allows you to get all those user stories in so that you can methodically start working on those. But it becomes mechanical after a while. <br><br>People want to work on things that are exciting and actually bring delight. Gamifying it is something that our developers and operators really love. Every quarter we used to have an entire day of games where we would pull the user stories, squash as many bugs as possible, and then celebrate over beer. So try to gamify production support, otherwise, it becomes a chore and people will not embrace the true spirit of why we're doing it. That's something I would caution.

Senior Executive Advisor in Software4 years ago

I like the philosophy that if you create/build your feature, then you'll carry that feature with you and fix it in production. Because that is the promise of an agile team: you created it and if it's broken, you get to fix it. But it doesn't scale and you definitely want to have a separate organization that is focused on stability rather than feature development.

When I was running engineering and operations, one of the things that we did was dedicate 20% of the developers' time to proactive repair maintenance for any of the user stories that come out of production incidents. We called them improvement sprints and that time was baked into their day-to-day work. That was the only way that we were able to reduce while increasing the reliability on our platform and this was aside from all the innovation.

It was exciting for a lot of my developers because they were happy to be squashing so many bugs that could potentially come out in production. So we do need to have two different teams but at the same time, we still want to bring that agility of the combined approach and have visibility into problems all the way up to development so they can come in and fix them proactively.

1 Reply

no title4 years ago

That's a very good point because the IT operations side won’t be in a position to go back in and do lots of development, so it will go back to the bug fixes that need to be done by the development side of the house. But I think the key point is that the combined approach is also an incentive for developers to get it right the first time, otherwise they'll have to fix the mess they created.

CIO / Managing Partner in Manufacturing4 years ago

I recently started as CIO in an automotive parts manufacturing company that's growing very rapidly. They've now far outgrown their small IT group but from their history, production support and the actual development and project work were done by the same small number of people. The trouble is that when you have that situation, regardless of whether you're using agile or waterfall, the fire of the day disrupts the project. Then your projects are always late and always over budget.

So as you start to scale you have to separate that out. You have an IT operations group that runs your infrastructure, network, and applications on a day-to-day basis. You have your solution delivery piece that is all project-related. Apart from avoiding any disruption of projects, the key reason for doing that is actually different mindsets. The mindset of a group running IT operations is around metrics of uptime, number of incidents, number of problems solved—solving the root cause, the Pareto principle of common things that are happening, etc. The metrics of managing a project are around scope, budgets, timeline, etc. Because they're very different mindsets, if you have the two together then you have an inherent conflict.

2 1 Reply

no title4 years ago

I would agree with that. Developers are focused on pushing as many new features as possible, while the IT operations team wants to maintain stability, resiliency and scalability.

Content you might like

What percentage of your organization's teams have adopted Agile practices?

100%15%

Greater than 50%55%

50%13%

Less than 50%11%

0%3%

View Results

Is it ever necessary to curtail or restrain security measures in order to maintain high CI/CD Velocity?

Yes80%

No20%

Have you had to help your developers learn to orchestrate AI agents during their coding workflows, or was this something they were able to do without much guidance?

If you've been piloting Microsoft Copilot, what advice would you give to other CIOs considering implementing Copilot?

Has anyone used an LLM (like StarCoder2) to help convert mainframe code to modern languages? What worked, what didn’t, and any tips for combining grammar-based parsing with LLMs? Looking for real-world experiences.

We are mainly looking at some modules written in SAS, REX and Objectstar and would look at running these on open systems world containers with a preference for Java.

Should production support always be separate from sprint execution?

Sort by:

Content you might like

What percentage of your organization's teams have adopted Agile practices?

Is it ever necessary to curtail or restrain security measures in order to maintain high CI/CD Velocity?

Have you had to help your developers learn to orchestrate AI agents during their coding workflows, or was this something they were able to do without much guidance?

If you've been piloting Microsoft Copilot, what advice would you give to other CIOs considering implementing Copilot?

What sets us apart?

RELATED ONE-MINUTE INSIGHTS

CrowdStrike Outage: Impact And Recovery

2024 Software Engineering Priorities and Challenges

Improving Software Developer Experience

Current State of Software Developer Experience

Emerging Software Security Risks: How Are Tech Leaders Preparing for 2024?

Take Your Insights On-the-Go