Technology Solutions for Everyday Folks
Trimming a hedge with lawnmower attached to tractor loader

TRIM-ming up for Summer

Not really TRIM-ming for summer, but taking the opportunity to briefly write about Rule #1 when dealing with unknown (or incoming) data.

Sanitize. Thy. Inputs.

Without going into great detail (as it doesn't matter at the end of the day), last week I encountered a vendor that apparently doesn't know how to do this very well. Or at all. What ultimately transpired is the complete failure of a reasonably-routine change process to capture a critical failure in production.

You see, the presence of additional (and apparently unexpected) whitespace in a pipe-delimited export file appeared to import correctly (at least without fatal error). And I suppose it did actually import properly; however, the additional/unexpected whitespace also imported for each field in question. This led to several application failures (for the user-facing side), including the inability to communicate with end users.

TL;DR: the same format (including additional whitespace) was handled without issue in the vendor's previous import mechanism (also pipe delimited), and their documentation doesn't speak at all to anything about whitespace. So I used the same process, with additional and reorganized data points, I have used for many other processes and vendors.

Never Trust Outside Data

I find it disappointing that something so simple to handle caused so much unexpected chaos (and unnecessary downtime). What's further disappointing is that it was end users who first reported any issues. None of this very basic business had been caught in testing, including testing done with automatically-generated production data.

Whenever accepting input from an outside source (usually users, but also batch imports), you always sanitize, trim, validate, or otherwise check the data in whatever ways are necessary. And fail gracefully if something isn't write. Or at least log something useful.

My "fix," should it be necessary

As of this writing, the process isn't yet fixed by the vendor. Presumptively, I made a post-processing function to address the problem on the sourcing end. And so, with eight lines, this particular problem is just "fixed" without issue:

$handle = fopen($inputFile, 'r');
while($row = fgetcsv($handle,null,'|')){
  foreach($row as $colNum => $record) {
    $cols = count($row);
    $outputText .= trim($record) . ($colNum !== $cols - 1 ? '|' : "\r\n");
file_put_contents($outputFile, $outputText);

The above quite literally just pulls in the source file, splits it out by pipe into an array, trims the data, and rebuilds each line with its proper pipe (or carriage return at each line's end). The output string is written to file for transport to the destination where it'll presumably import correctly without issue. We'll see how this plays out...will the vendor deal with input sanitation?

Headline image via giphy


Post Comments

Restricted HTML

  • Allowed HTML tags: <a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd> <h2 id> <h3 id> <h4 id> <h5 id> <h6 id>
  • Lines and paragraphs break automatically.
  • Web page addresses and email addresses turn into links automatically.