Has anyone had any success with evaluating the impact of using Generative AI tools such as GitHub's Copilot on the productivity or performance impact on developers? I see a lot of qualitative discussions about how developers say they are more productive, but how are you measuring that impact?

One of the things we're looking at is the number of commits, PRs, deploys that are being done in a repo before and after devs start using Copilot. We're also considering test coverage % and post deployment issues. 
Here is a link to a study where they recruited 95 professional developers, split them randomly into two groups, and timed how long it took them to write an HTTP server in JavaScript.
Research: quantifying GitHub Copilot’s impact on developer productivity and happiness - The GitHub Blog


The group that used GitHub Copilot had a higher rate of completing the task (78%, compared to 70% in the group without Copilot).
The striking difference was that developers who used GitHub Copilot completed the task significantly faster–55% faster than the developers who didn’t use GitHub Copilot. Specifically, the developers using GitHub Copilot took on average 1 hour and 11 minutes to complete the task, while the developers who didn’t use GitHub Copilot took on average 2 hours and 41 minutes. These results are statistically significant (P=.0017) and the 95% confidence interval for the percentage speed gain is [21%, 89%].
Thanks for sharing. Interesting study 🤔

🤗 you are welcome. 

Thanks Romano. Yes, I had seen that study (really the only one I found that had actual metrics). It's a start, but really that's a fairly artificial example, since in real life we would never set a bunch of our developers up to all code the same thing. I was hoping that someone had done a live before and after measurement of developer productivity. The search continues . . .

We have a team of 30 developers using Copilot over the last five months. Exclusively Java and TypeScript developers building enterprise software. We’ve seen around 5% productivity gains, which is pretty much what we expected. It’s very good for well documented APIs and boilerplate code. It’s pretty much a wash with proprietary business logic.
Thanks Matthew. How was the 5% gain calculated? We're really looking to see if there is a way to actual measure the impact short of doing a survey and asking the devs if they thought they were more productive.

We measure the cycle time (from feature start until merge) across all of our development teams. 


