Should production support always be separate from sprint execution?


1.1k views2 Upvotes8 Comments

CIO / Managing Partner in Manufacturing, 2 - 10 employees
I recently started as CIO in an automotive parts manufacturing company that's growing very rapidly. They've now far outgrown their small IT group but from their history, production support and the actual development and project work were done by the same small number of people. The trouble is that when you have that situation, regardless of whether you're using agile or waterfall, the fire of the day disrupts the project. Then your projects are always late and always over budget.

So as you start to scale you have to separate that out. You have an IT operations group that runs your infrastructure, network, and applications on a day-to-day basis. You have your solution delivery piece that is all project-related. Apart from avoiding any disruption of projects, the key reason for doing that is actually different mindsets. The mindset of a group running IT operations is around metrics of uptime, number of incidents, number of problems solved—solving the root cause, the Pareto principle of common things that are happening, etc. The metrics of managing a project are around scope, budgets, timeline, etc. Because they're very different mindsets, if you have the two together then you have an inherent conflict.
2 1 Reply
Senior Executive Advisor in Software, 10,001+ employees

I would agree with that. Developers are focused on pushing as many new features as possible, while the IT operations team wants to maintain stability, resiliency and scalability.

2
Senior Executive Advisor in Software, 10,001+ employees
I like the philosophy that if you create/build your feature, then you'll carry that feature with you and fix it in production. Because that is the promise of an agile team: you created it and if it's broken, you get to fix it. But it doesn't scale and you definitely want to have a separate organization that is focused on stability rather than feature development. 

When I was running engineering and operations, one of the things that we did was dedicate 20% of the developers' time to proactive repair maintenance for any of the user stories that come out of production incidents. We called them improvement sprints and that time was baked into their day-to-day work. That was the only way that we were able to reduce while increasing the reliability on our platform and this was aside from all the innovation.

It was exciting for a lot of my developers because they were happy to be squashing so many bugs that could potentially come out in production. So we do need to have two different teams but at the same time, we still want to bring that agility of the combined approach and have visibility into problems all the way up to development so they can come in and fix them proactively.
1 Reply
CIO / Managing Partner in Manufacturing, 2 - 10 employees

That's a very good point because the IT operations side won’t be in a position to go back in and do lots of development, so it will go back to the bug fixes that need to be done by the development side of the house. But I think the key point is that the combined approach is also an incentive for developers to get it right the first time, otherwise they'll have to fix the mess they created.

Chief Information Officer in Software, 1,001 - 5,000 employees
I think it's a cyclical thing, it's not a one and done decision. They get tired of this separation of production support from development and the production support team feels like they never get to develop and work on new features. So you go back to the 80/20 model but then you think, "We're not able to meet our commitments because the 20% soon becomes 50%." And you go back to production support.

It's okay to balance the two. What I'm proposing for now is separation because my team is not able to focus enough or have the discipline for 80/20. We're splitting the two but with the promise that we will have options to rotate in and out, since we're growing 50% plus year over year. So what I've said is, "As we grow and as you excel, you have an opportunity to go into the scrum teams, and then we’ll get new people in on the production support side." Some people just want to be in production support and that's fine too.
3 1 Reply
Senior Executive Advisor in Software, 10,001+ employees

In an enterprise, if you're embracing things like scaled agile framework (SAFe) or LeSS, you have that improvement sprint that allows you to get all those user stories in so that you can methodically start working on those. But it becomes mechanical after a while. 

People want to work on things that are exciting and actually bring delight. Gamifying it is something that our developers and operators really love. Every quarter we used to have an entire day of games where we would pull the user stories, squash as many bugs as possible, and then celebrate over beer. So try to gamify production support, otherwise, it becomes a chore and people will not embrace the true spirit of why we're doing it. That's something I would caution.

Director of Information Security in Energy and Utilities, 5,001 - 10,000 employees
You are really splitting ops from Dev at this point and ideally speaking you have folks who lend L3/deep engineering support to your operations folks following each release. This way your experienced folks who developed the features are available to support them in case they go wrong and educate your ops people on supporting them in the future. If you have Devs that only develop and then immediately switch to next sprint and not spend any time supporting ops as they deal with new release issues then you'd have pretty big knowledge gaps (unless you have a dedicated SWAT team in place).
VP of IT in Software, 10,001+ employees
No... it should not always be separate from sprint execution.

It gets complicated after that.  But a few things to consider in the answer.

If it is a questions of developers wanting to push code and ops wanting to preserve stability then you have a problem.  This is what DevOps was trying to solve.  This is not a good approach.

If you thing ops takes too much time away from projects. That isn't a good answer. All work is work. You want to work on the most valuable work.  That may be ops or feature development.  Creating a bucked just based on the source of the work means that you are expending resource on less valuable things reducing the total value delivered.

BUT... unplanned work is very disruptive.  Production support has a lot of false alarms and busy work.

1. I create a team to catch the noise.  I try to automate the noise away but until I do I don't want a bunch of it to hit my development teams.  

2. Once we get passed the noise, if there is a small amount of work incidents then I can have the development support it.  I generally reserve capacity for this each sprint based on the expected amount then have the product owner prioritize anything in excess.

For more complex environments....

3. For modern applications, I create an SRE team that is focused on the operational/scaling/instrumentation/automation side of things.  Developers often aren't good with these things.  This teams start by converting all of the operational decision into business facing metric that the business will care about if they are failing.  I use this to balance work between running down technical debt and feature development.  Otherwise the SRE team focuses on toil reduction, automation, scalability, latency planning, etc. and can change the code to increase resilience.

4. The SRE and Development teams are a single resource bucket.  If resilience is below business requirements, resources shift to the SRE team until below. If resilience is above targets then resources shift back to the feature team to accelerate development.

In both cases though, the development team own accountability for the resilience and product performance of their application.  All people involved with development/operations are measure by speed,  throughput, ops cost per unit of value, product stability, customer and employee satisfaction. So we don't have different measurement systems.
2

Content you might like

Yes, we schedule these as separate meetings37%

No, we discuss them at the same time during scheduled performance reviews51%

No, but we’re working to implement a process for both discussions10%

Not sure2%

Other (I’ll comment below)0%


115 PARTICIPANTS

1.1k views

CTO in Software, 201 - 500 employees
Without a doubt - Technical Debt! It's a ball and chain that creates an ever increasing drag on any organization, stifles innovation, and prevents transformation.
Read More Comments
47k views133 Upvotes324 Comments

Limited understanding of benefits17%

Organizational silos60%

Unclear communication54%

Employee skepticism54%

Resistance to existing workflows15%

Unclear roles19%

Job security concerns4%


52 PARTICIPANTS

611 views

Director of IT in Education, 10,001+ employees
Learning, Pseudocode, Code completion, quick answers
1
Read More Comments
2.7k views2 Upvotes2 Comments