This post isn’t so much about this particular problem, but about the way we solved this.
We have a number of ways to track performance problems, but this is a good example, we can see that for some reason, this test has failed because it took too long to run:
In order to handle that, I don’t want to run the test, I don’t actually care that much about this. So I wanted to be able to run this independently.
To do that, I added:
This opens us the studio with all the data that we have for this test. Which is great, since this means that we can export the data.
That done, we can import it to an instance that we control, and start testing the performance. In particular, we can run in under a profiler, to see what it is doing.
The underlying reason ended up being an issue with how we flush things to disk, which was easily fixed once we could narrow it down. The problem was just getting it working in a reproducible manner. This approach, being able to just stop midway through a test and capture the full state of the system is invaluable in troubleshooting what is going on.