I thought I was being practical. Our vendor kept forgetting to send files, so I carried the filename forward through every stage of the pipeline. That way, I could prove what came in and what didn’t. Backward traceability: where did this record come from?
Then I hit a bug. My batching logic merged files until I had ~500MB, then flushed. But I forgot an else
when the buffer ended without hitting the threshold. Two small days of files slipped right through. Silent failure.
The fix was simple: flush the buffer no matter what.
buffer = []
size = 0
for file in files:
buffer.append(file)
size += file.size
if size >= 500: # flush at threshold
flush(buffer)
buffer, size = [], 0
# the missing else
if buffer: # flush leftovers
flush(buffer)
Because I had filenames all the way through, I could trace forward. I parsed the parquet outputs and hash DB, spotted which files were “processed” but missing, marked them unprocessed, deleted the hashes, and replayed them.
Engineering Rule: Trace in Both Directions
- Provenance tells you where data came from.
- Lineage shows where it flows to.
- Without both, you’ll never see the whole system.
The takeaway:
- Backward traceability = provenance
- Forward traceability = resilience
If you build one, you often get the other “for free”.
Think of it like pizza delivery orders. You don’t just want to know where the pizza came from — you also need to know if it ever got to your house. Forget the flush, and you’ve got two days where the delivery guy just kept the pizzas in his car.
Nami’s supply runs were supposed to be simple. Pack goods into bags until they weighed about 500MB — I mean, pounds. Then ship them back to the Sunny.
But I forgot the else
. When the trip ended without hitting the threshold, two whole days of supplies never left the cart. Silent failure.
The fix? Flush the cart no matter what.
cart = []
weight = 0
for item in items:
cart.append(item)
weight += item.weight
if weight >= 500:
ship(cart)
cart, weight = [], 0
# the missing else
if cart: # ship leftovers
ship(cart)
I only carried the bag tags (filenames) because Nami’s suppliers sometimes “forgot” to send stock. I wanted to trace backward. But when the cart bug happened, those same tags let me trace forward. I saw which items never made it back to the Sunny, reset them, cleared the hash marks, and shipped them again.
That’s the power of tracing both ways.
Pirate Rule A1: Trace Both Ways
- A map that only shows where you came from won’t help you find the treasure.
- Carry your bag tags both ways: back to source and forward to the Sunny.
The moral:
- Trace backward to catch the world’s mistakes
- Trace forward to catch your own
Do both, and even when two days of supplies vanish in the fog, you can still bring them home.