The Scavenger Method of Refactoring
Refactoring big pieces of code without getting lost
Like most startups, our code base had a lot of chaos in the beginning and has tamed over time. At first we didn't know if we were building the right product, so we focused on getting something out in front of people.
Now our core feature-set is solidifying and a lot of that early code has benefited from a refactoring. Most of the refactorings are fairly small, fix a type here, rename a function there, split something up a bit to be easier to read, and for that I use the Tit-for-tat method where I do a refactoring for each feature ticket.
I try to keep the refactorings proportional to the feature, so a small feature gets a small refactoring. That keeps things constantly improving.
But sometimes there are larger changes that need to be made.
There is one piece of code I'm working on right now, what it does isn't important, but I wrote the code then made an interface out of it because it is meant to be used in multiple contexts.
Now that I'm writing another implementation of the interface, I can see that it is not really an abstract description of how to solve the problem: it's just the type signatures of all the functions I wrote. I want to fix the interface, then fix the existing implementation, and then implement the second implementation.
The current implementation of the interface is a smidgen over a thousand lines of code, so not huge, but it's still a lot to refactor.
My solution to this is to do what I like to call Scavenger Refactoring.
All I do is I write the interface as I think it should be, completely ignoring how I solved the problem before. Then I implement it as if it's a new piece of code. I scavenge from the previous implementation for things that fit, copying them into my new implementation.
Sometimes this means I write all new code, sometimes I can modify pieces of the old implementation. The important part is that I leave the old implementation untouched. Once done, I can delete the old implementation.
For me, this is nice because:
I find it's easier to not get lost in the refactoring. When modifying in-place, there is no line of demarcation between my new code and the old code. Sometimes I can get a bit lost in a large change, not sure if I've refactored everything, what code I can delete, and so on. Having them be separate makes it really clear what is new and what is not and my progress.
I don't feel constrained by how the old version was implemented. I think I get better code this way. When I'm refactoring code in-place, I often feel like I'm trying to change as little as possible rather than turn it into the correct solution.
Having both versions alive in the code base means I can flip back and forth between them, which is nice if, for example, I run across a new test case I want to validate. I can write it in the original version, see what it does, and then make sure my new implementation does the same thing.
It's easier to pull in changes that happen while I'm working on my change. Because the old code is around, if I rebase off
main, that code is updated and I can modify my new implementation to match. It can be really painful to do this when modifying in-place because of the merge conflicts.
I originally got the idea when I stumbled upon something John Carmack said:
What I try to do nowadays is to implement new ideas in parallel with the old ones, rather than mutating the existing code. This allows easy and honest comparison between them, and makes it trivial to go back to the old reliable path when the spiffy new one starts showing flaws. The difference between changing a console variable to get a different behavior versus running an old exe, let alone reverting code changes and rebuilding, is significant.