A client project had a database server upgrade in early December, and as I eluded to in a different post from around that time, Git was the shit when it came to making my angle of that migration go smoothly. Past Me made Current Me's life a lot simpler.
After any major upgrade/change (this particular system usually sees one "large" annual release and more nuanced and frequent small changes), I like to keep a little closer eye on the logs. In this particular case, we have Google Analytics (GA) coupled with the system and we take advantage of its event reporting features. Specifically, any time a key or routine component fails or misbehaves in a way the end user will recognize, we send an event report in with the GA data. This allows us to logically analyze errors, look for trends, and so forth.
This time with the database server upgrade, we encountered some more unique problems or changes (versus the standard release) as things were configured a little differently than before and some of that stuff that "just worked" back in the day is now deprecated. Some of that is my own fault (poor design), and some of it I've inherited. What this means is we can be dealing with design decisions from 2005...and that can present some significant technical debt at times.
Fortunately the most egregious situations were all caught and addressed in advance of the actual cutover (through the dev/test environments and some use case testing). However, some interesting patterns had cropped up in the last few weeks that weren't caught in previous iterations.
Enter 'Budget' CI/CD
Full disclosure: there's really no true CI/CD happening here. Very little in this particular workflow is automated, though Git/GitHub makes this process immensely simpler than it used to be. For the most part this is tackling very small, specific issues, making/testing changes, rolling them to production, and watching the logs (and/or GA). With the advent of 'realtime' monitoring in GA, coupled with the aforementioned "events" reporting, I can really respond to making small changes almost instantaneously and roll them to production to see if they're working for the user base. I call this "budget" CI/CD, because while it takes a dedication of time, it's most interesting to see issues, fix issues, and verify fixes in production as they are actually happening.
It makes me feel like the one-man-show version of how the big boys like Amazon and others roll out tests to their user base.
One of the things I've learned from this process is that in addition to some simple breakfix stuff, I've been able to identify and solve some long-standing issues that haven't been well-reported, well-understood, or caught in the testing process. Casually I'd get a report of "hey, a user reported that x didn't work for them, but that's all we know..." with no data, information, or even a timeframe of when the issue cropped up. Worse, these issues were rarely critical failures so they don't show up in proper server logs for parsing.
Being able to leverage real, recent, and sometimes realtime user data can be super valuable in addressing or better identifying these seemingly random issues, or catching and correcting issues before they're even reported.
Headline image via giphy