Kent Beck on Empirical Software Design: When & Why

Mike's Notes

An ACM Tech Talk interview yesterday with Kent Beck, author of Tidy First.

Resources

References

  • Reference

Repository

  • Home > Handbook > 

Last Updated

18/04/2025

Kent Beck on Empirical Software Design: When & Why

By: Kent Beck and Margaret-Anne Storey
ACM Tech Talk: 18/04/2025

Kent Beck is an American software engineer and the creator of Extreme Programming, a software development methodology that eschews rigid formal specification for a collaborative and iterative design process. Beck was one of the 17 original signatories of the Agile Manifesto.

Beck pioneered Test-Driven Development, its successor TCR: Test && Commit || Revert, software design patterns, and 3X: Explore/Expand/Extract. He wrote the SUnit unit testing framework for Smalltalk, which spawned the xUnit series of frameworks, notably JUnit for Java, which Beck wrote with Erich Gamma. Beck popularized CRC cards with Ward Cunningham, the inventor of the wiki.

Margaret-Anne Storey is a Professor of Computer Science and a Canada Research Chair in Human and Social Aspects of Software Engineering at the University of Victoria. Together with her students and collaborators, she seeks to understand how software tools, communication media, data visualizations, and social theories can be leveraged to improve how software engineers and knowledge workers explore, understand, analyze, and share complex information and knowledge. She has published widely on these topics and collaborates extensively with high-tech companies and non-profit organizations to ensure real-world applicability of her research contributions and tools.

Since the publication of Parnas' "On the Criteria to Be Used in Decomposing Systems into Modules" we have had good advice on how to design software. However, most software is more difficult to change than it should be and that friction compounds over time. The Empirical Design Project seeks to resolve the seemingly-irresolvable tradeoff between short-term feature progress and long-term optionality, focusing on:

  • How is software actually designed? What can we learn from data about how software is designed?
  • When should software design decisions be made? What is the optimal moment given unclear and changing information & priorities?
  • How can we enhance the survival of software projects while expanding optionality?

Spoiler alert: make design decisions later and in small, safe steps.

Other talks

Tidy First? A Daily Exercise in Empirical Design • Kent Beck • GOTO 2024

Wikipedia Structured Contents

Mike's Notes

Good news from Kaggle and Wikimedia. An opportunity to get structured data.

"...

As part of Wikimedia's mission to make all knowledge freely accessible and useful, Wikimedia is publishing a beta version of its structured content on Kaggle in French and English. This release gives data scientists, researchers, and machine learning enthusiasts a new, streamlined way to explore and analyze this global information resource.

..." - Kaggle.com

Resources

References

  • Reference

Repository

  • Home > 

Last Updated

18/04/2025

Wikipedia Structured Contents

By: Wikimedia Enterprise Team
Wikimedia Enterprises: 16/04/2025

Wikimedia Enterprise has released a new beta dataset on Kaggle, featuring structured Wikipedia content in English and French. Designed with machine learning workflows in mind, this dataset simplifies access to clean, pre-parsed article data that’s immediately usable for modeling, benchmarking, alignment, fine-tuning, and exploratory analysis.

This release is powered by our Snapshot API’s Structured Contents beta, which outputs Wikimedia project data in a developer-friendly, machine-readable format. Instead of scraping or parsing raw article text, Kaggle users can work directly with well-structured JSON representations of Wikipedia content—making this ideal for training models, building features, and testing NLP pipelines.The dataset upload, as of 15 April 2025, includes high-utility elements such as abstracts, short descriptions, infobox-style key-value data, image links, and clearly segmented article sections (excluding references and other non-prose elements). Because all content is derived from Wikipedia, it is freely licensed under Creative Commons Attribution-Share-Alike 4.0 and the GNU Free Documentation License (GFDL), with some additional cases where public domain or alternative licenses may apply.

“As the place the machine learning community comes for tools and tests, Kaggle is extremely excited to be the host for the Wikimedia Foundation’s data. Kaggle is already a top place people go to find datasets, and there are few open datasets that have more impact than those hosted by the Wikimedia Foundation. Kaggle is excited to play a role in keeping this data accessible, available and useful." - Brenda Flynn, Partnerships Lead, Kaggle

As a beta release, this dataset is an invitation to explore, test, and improve. We welcome feedback, questions, and suggestions from the Kaggle community directly in the dataset’s discussion tab.

Get the Dataset

Access the dataset directly on Kaggle

About Kaggle

Kaggle is home to one of the world’s largest communities of machine learning practitioners, researchers, and data enthusiasts. With millions of users and an expansive ecosystem of datasets, notebooks, and competitions—including challenges like the Arc Prize—Kaggle provides an ideal environment for experimenting with open structured data like Wikimedia’s Structured Content. Whether you’re testing a new architecture, evaluating data quality, or building a pipeline from scratch, this Wikipedia dataset is ready to plug into your process.

More info at Google Blog

Neobrutalism: Definition and Best Practices

Mike's Notes

This article by Hayat Sheikh from the NN Group's newsletter is relevant to the Ajabbi Design System. This Design System has a simple, clunky design style, like the web was 20 years ago and will be the default style of the Ajabbi workplace apps, and the Ajabbi website. The UI priority is nice, simple, reliable, fast, secure, and functional.

Users can easily change the style sheets to use their own design system.

Resources

References


Repository

  • Home > Design System >
  • Home > Ajabbi Research > Library > Subscriptions > The NN/g Newsletter

Last Updated

16/04/2025

Neobrutalism: Definition and Best Practices

By: Hayat Sheikh
The NN/g Newsletter: 11/04/2025

Hayat Sheikh, a Senior Designer at Nielsen Norman Group, is celebrated for her award-winning designs and extensive experience from renowned agencies. She also teaches at Lebanese American University, focusing on branding and human-centric design, and manages her NFT collection 'The Self.'

Summary:

As a UI design style, neobrutalism focuses on raw, unrefined elements like bold colors, simple shapes, and intentionally "unfinished" aesthetics.

Emerging as a reaction against sleek, minimalistic designs, neobrutalism creates a striking (almost rebellious) visual style. But while neobrutalism draws attention, designers must carefully balance its distinctive look with usability to avoid ending up with an overwhelming or confusing interface.

Defining Neobrutalism

Neobrutalist website design blends bold colors and sharp contrast for striking interfaces.

Neobrutalism (or neubrutalism), an evolution of traditional brutalism, is a visual-design trend defined by high contrast, blocky layouts, bold colors, thick borders, and “unpolished” elements.

Brutalism vs. Neobrutalism

Brutalism and neobrutalism are both edgy visual-design styles that draw inspiration from the architectural movements they get their names from. In digital design, brutalism tends to appear raw, harsh, unfinished, or utilitarian. Brutalist websites might use plain HTML elements and limited color palettes.

For example, Drudge Report embodies brutalist aesthetics with its barebones HTML structure, monospaced headlines, and rigid table-based layout, evoking the look of the pre-CSS web.


Drudge Report’s website embraces a brutalist style with its stripped-down aesthetic.

In contrast, neobrutalism combines the brutalist design style with nostalgic 90s graphic-design elements. Unlike true brutalist web design, neobrutalist designs are likely to be more colorful and orderly.

A striking example is Look Beyond Limits by Halo Lab, which features oversized typography, bold dividers, thick strokes with a pop of bright colors.

Look Beyond Limits by Halo Lab embraces neobrutalism with its raw, structured layout, and oversized typography.

Characteristics of Neobrutalism

High Contrast and Bright Colors

Neobrutalist designs use bold, primary colors and high-contrast combinations to emphasize key functions and UI elements. This approach introduces striking, contrasting hues to capture attention and enhance visual impact. It also helps users focus on essential elements while creating an unconventional, memorable experience.


99percentoffsale.com embraces neobrutalism through bold, high-contrast colors for a striking visual style.

Thick Lines and Geometric Shapes

This style does not shy away from using thick borders, angular forms, and solid lines that create structure without relying on gradients or shadows.


byooooob.com: Thick borders, solid lines, bright colors, and striking, playful shapes are typical for the neobrutalist visual style.

Stark Drop Shadows

Unlike minimalism, neobrutalism encourages bold, striking shadows instead of soft, layered ones. It incorporates solid, single-color shadows (e.g., a black drop shadow offset by 4px) to add depth while maintaining the "raw" aesthetic.


Unlike minimalist designs, which usually emphasize a flat, simple look and feel, neobrutalism often features bold, solid shadows that create depth while preserving a raw aesthetic.

Bold Type

Neobrutalism promotes the use of bold, “unpolished” elements that often include quirky or slightly eccentric typefaces. Despite their expressive forms, these typeface choices are balanced by a generous use of whitespace, creating a visual rhythm that feels deliberate rather than overwhelming. Typography in neobrutalism serves both as a functional element and as a focal point of the overall design.


Tony’s Chocolonely eCommerce by Tinloof uses bold, quirky typography that reinforces the brand’s personality.

Skeuomorphic Elements

Neobrutalism might incorporate nostalgic elements from early digital interfaces, such as Windows 98-style buttons and monospace fonts. These features create a sense of familiarity while blending retro aesthetics with modern design. For example, a neobrutalist design might use UI elements from an old browser, with traditional iconic buttons and appearance mimicking early web experiences.

cyanbanister.com: Neobrutalism blends retro UI elements (such as Pixel art and old-style  browser windows) with modern design elements (such as contemporary typography and layout).

Examples of Neobrutalism in Practice

Many brands are embracing the bold, raw aesthetic of neobrutalism to create memorable experiences through striking contrasts, unconventional typography, and minimalistic design. This approach reflects a shift toward prioritizing purpose and functionality over excessive polish, allowing brands to stand out in a crowded digital landscape.

Brands like Figma and Gumroad incorporated bold, high-contrast colors and raw elements, with a focus on user experience and simplicity.

Figma's brand refresh, with its use of bold contrasts and unconventional typography, exemplifies neobrutalist design. Just like its tools, the refreshed identity emphasizes creative freedom, flexibility, and a dynamic user experience, allowing users to work in ways that feel authentic and engaging.


Figma’s bold, geometric design reflects creative freedom.

Similarly, Gumroad, an ecommerce platform for independent creators, uses neobrutalism's raw aesthetic to align with its ethos of empowering independent creators. By stripping away unnecessary polish and focusing on functionality over flourish, the platform emphasizes simplicity and accessibility, staying true to its purpose of providing creative freedom and a straightforward user experience.


Gumroad’s raw design empowers creators with simplicity.

Designing with Neobrutalism: Best Practices

While neobrutalism thrives on bold colors, heavy typography, and sharp contrasts, without balance, it can overwhelm users and hinder accessibility. These tips help create designs that are both visually striking and user-friendly.

Design with Usability at the Forefront

Prioritize usability with clear buttons, readable type, and ample whitespace to keep the experience intuitive and accessible, even within a bold, raw aesthetic.

The API World landing page Incorporates a neobrutalist aesthetic while maintaining usability through its clear search functionality and calls to action.

Contrast Ratios Matter

Bold colors must meet text-contrast standards. Avoid pairing vibrant hues like yellow and cyan that fail readability tests. Tools like Coolors' contrast checker ensure that combinations remain accessible while staying visually striking.


Although neobrutalism uses bright, contrasting colors, it still needs to meet readability and accessibility standards.

Limit Your Color Palette

Restrict your palette to 2–3 bold, high-contrast colors (e.g., black, neon green, electric blue) to avoid overwhelming users.


bieffeforniture.it uses 2 main high-contrast colors (electric blue and red) to help maintain clarity and avoid overwhelming users.

Prioritize Readability

Pair bold, unconventional headlines (e.g., a chunky sans-serif font) with clean, neutral body fonts like Roboto or Inter. Avoid overly decorative or condensed typefaces for paragraphs to maintain legibility across devices.

dodonut.com adopts a neobrutalist style while still maintaining clear buttons, readable text, and ample whitespace.

Use Whitespace Strategically

Offset dense geometric shapes and thick borders with generous padding (e.g., 24–32px margins) to create breathing room, prevent clutter, and guide users to key actions or content.


Content in a neobrutalist layout needs enough padding to create space and focus users’ attention on key elements.

Test Interactions

Ensure that interactive elements (buttons, links) remain recognizable. Use underlines on hover or subtle color shifts to indicate state changes. For example, a neon button could lighten on click to signal feedback without gradients or shadows.

Interactive elements in a neobrutalist layout need clear feedback. Use underlines or color shifts to signal interaction.

Avoid Oversimplification

Retain hierarchy through size variation (e.g., headlines twice as large as body text) and color intensity. Even in a minimalistic layout, ensure that clear calls to action and key usability elements stand out to create a seamless user interface.

sui.io/overflow#overview maintains visual hierarchy by using different font sizes for headers, page text, and button labels, thus ensuring that CTAs and interactive elements stand out.

Key Takeaways

Neobrutalism’s rebellious aesthetic can grab attention, but its success hinges on balancing boldness with usability. By grounding the style in accessibility principles and testing with users, designers can create interfaces that are both striking and functional.

Shadow Table Strategy for Seamless Service Extractions and Data Migrations

Mike's Notes

Here is an InfoQ article by Apoorv Mittal & Rafal Gancarz, referenced in Data Engineering Weekly. It covers a valuable way to migrate data while keeping critical production going. It is something to use in the future.

Resources

References


Repository

  • Home > Ajabbi Research > Library > Subscriptions >Data Engineering Weekly

Last Updated

14/04/2025

Shadow Table Strategy for Seamless Service Extractions and Data Migrations

By: Apoorv Mittal & Rafal Gancarz
InfoQ: 09/04/2025

Apoorv Mittal is a passionate Software Engineer at Block (CashApp) based in Seattle, WA. With over a decade of experience in distributed systems and fintech, he has led transformative projects at Block, Dropbox, and AWS. His expertise spans modernizing legacy systems into scalable microservices, architecting resilient financial infrastructures, and pioneering cloud security innovations, notably through his patented AWS Traffic Mirroring solution. You can find Apoorv on LinkedIn.

Key Takeaways

  • The shadow table strategy creates a synchronized duplicate of the data that keeps the production system fully operational during changes, enabling zero-downtime migrations.
  • Database triggers or change data capture frameworks actively replicate every change from the original system to the shadow table, ensuring data integrity.
  • The shadow table strategy supports diverse scenarios - including database migrations, microservices extractions, and incremental schema refactoring - that update live systems safely and progressively.
  • Shadow tables deliver stronger consistency and simplify recovery compared to dual-writes or blue-green deployments.
  • Industry case studies from GitHub, Shopify, and Uber demonstrate that the shadow table approach drives robust large-scale data migrations by actively maintaining continuous data integrity and offering rollback-friendly safeguards.

Introduction

Modern software systems often need to evolve without disrupting users. When you split a monolith into microservices or modify a database schema, you must migrate data with minimal downtime and risk. Shadow tables have emerged as a powerful strategy to achieve this. In a nutshell, the shadow table approach creates a duplicate of the data (a shadow version) and keeps it in sync with the original, allowing a smooth switchover once the new setup is ready.

This article explores how shadow tables help in different migration scenarios — database migrations, service extractions, and schema changes — while referencing real case studies and comparing this approach to alternatives like dual-writes, blue-green deployments, and event replay mechanisms.

What is the Shadow Table Strategy?

The shadow table strategy maintains a parallel copy of data in a new location (the "shadow" table or database) that mirrors the original system’s current state. The core idea is to feed data changes to the shadow in real time, so that by the end of the migration, the shadow data store is a complete, up-to-date clone of the original. At that point, you can seamlessly switch to the shadow copy as the primary source. In practice, implementing a shadow table migration typically follows a pattern:

  1. Create a Shadow Table: Prepare a new table (or database) with the desired schema or location. Although initially empty, you structure it to accommodate the migrated data.
  2. Backfill Initial Data: Copy existing records from the original data store into the shadow table, processing them in chunks to avoid overloading the system.
  3. Sync Ongoing Changes: As the system runs, apply every new write or update from the original data to the shadow. Use database triggers, change data capture (CDC) events, or application-level logic to propagate each INSERT, UPDATE, or DELETE from the source to the shadow to remain in sync.
  4. Verification: Optionally, run checks, such as comparing row counts or sample records, to confirm that the shadow’s data matches the source, giving you confidence that no data was missed.
  5. Cutover: Point the application to the shadow table (or perform a table rename/swapping in the database) once you verify it is up to date. The switch occurs with negligible downtime because you have kept the shadow current.
  6. Cleanup: Retire the old data store after cutover or keep it in read-only mode as a backup until you no longer need it. By using this approach, you can complete migrations with zero downtime. The production system continues running during the backfill and sync phases because reads and writes still hit the original data store while you build the shadow. When you are ready, you can quickly switch to the new store, often through a simple metadata update like a table rename or configuration change.

Figure 1: Data migration using the shadow table strategy

This strategy is sometimes also called the ghost table method (notably by GitHub’s schema migration tool gh-ost) because the new table is like a "ghost" of the original (gh-ost: GitHub's online schema migration tool for MySQL - The GitHub Blog).

Use Cases Where Shadow Tables Shine

Shadow tables offer a robust and flexible mechanism for managing complex migrations, service extractions, and schema refactorings while keeping production systems running uninterrupted. There are three common scenarios where shadow tables can be especially beneficial: database migrations with zero downtime, service extractions in a microservices transition, and incremental schema changes with data model refactoring.

Database Migrations with Zero Downtime

Modern applications often rely on large, heavily used production databases that cannot afford extended downtime for schema modifications or engine migrations. Direct alterations — like adding a new column, changing data types, or indexing — can cause long locking periods and stall critical operations. Shadow tables provide an alternative approach that minimizes the risk of disruption.

Begin by creating a new table that mirrors the structure of the production table while incorporating the desired schema changes. Although this shadow table starts empty or partially populated, you fill it using a controlled backfill procedure. A robust backfill procedure copies historical data from the production table into the shadow table in controlled batches, allowing the system to run concurrently.

After the backfill, set up a continuous synchronization mechanism by leveraging database triggers or CDC frameworks that propagate every new insertion, update, or deletion from the production table to the shadow table. This dual-write mechanism ensures that the shadow table remains an up-to-date replica of the production system.

Simultaneously, automated verification processes continuously compare key metrics between the two tables. Checksums, row counts, and deep object comparisons confirm data integrity and ensure that the shadow table accurately mirrors the production data. Only once these validations confirm that the shadow is consistent with the source can the final cutover be executed, often through a fast, atomic table rename or pointer switch. This approach enables the migration to be completed with minimal downtime, reducing risk and preserving user experience.

Service Extractions in a Microservices Transition

Transitioning from a monolithic architecture to a microservices-based system requires more than just rewriting code; you often must carefully migrate data associated with specific services. Extracting a service from a monolith risks inaccuracy if you do not transfer its dependent data accurately and consistently. Here, shadow tables play a crucial role in decoupling and migrating a subset of data without disrupting the existing system.

In a typical service extraction, the legacy system continues to handle all live operations while developers build a new microservice to handle a specific functionality. During extraction, engineers mirror the data relevant to the new service into a dedicated shadow database. Whether implemented through triggers or event-based replication, the dual-write mechanism ensures that the system simultaneously records every change made in the legacy system in the shadow database.

Once the new microservice processes data from the shadow database, engineers perform parallel validation to ensure that its outputs match expectations. A comparison framework automatically checks that the outputs of the new service match the expected results derived from the legacy system. This side-by-side validation allows engineers to identify discrepancies in real time and make adjustments as necessary.

Teams carefully manage the gradual transition of traffic from the legacy system to the new microservice. By initially routing only a small portion of user requests to the new service, teams can monitor performance, validate data consistency, and ensure that the new system behaves as expected.

Once the shadow database and the new microservice have proven to maintain the same level of data integrity and functionality as the legacy system, engineers execute a controlled, incremental cutover. Over time, they shift all operations to the new service and gradually reduce the legacy system’s role until they fully decommission it for that functionality. This phased approach mitigates risk and provides a built-in rollback mechanism if they detect any issues during the transition.

Incremental Schema Changes and Data Model Refactoring

Even for smaller-scale changes, such as refactoring a table or updating a data model, shadow tables offer a powerful way to mitigate risk. In many systems, evolving the data model is an ongoing challenge, whether splitting a single table into multiple logical parts, merging fields, or adding non-null constraints to previously optional columns.

Instead of applying changes directly to the live table, engineers create a shadow version to reflect the new design. The system simultaneously writes data to both the original and shadow tables, ensuring that it captures any update in real time across both structures. This dual-writing approach allows continuous validation of the new schema against the existing one, enabling engineers to compare outcomes and ensure that the refactored data model handles all business logic correctly. 

Automated comparison tools play an essential role during this phase. By continuously monitoring and comparing data between the old and new schemas, the tools can detect discrepancies early — whether they arise from differences in data type conversions, rounding issues, or unforeseen edge cases. Once engineers have thoroughly validated the shadow table and adjusted for anomalies, they can seamlessly switch the application to the new schema. They can then gradually phase out the original table, with the shadow table taking over as the primary data store.

This incremental approach to schema changes minimizes the need for extended maintenance windows and reduces the risk of data loss or service interruptions. It provides a controlled path to evolve the data model while maintaining full operational continuity.

Industry Examples and Best Practices

Successful migrations using shadow tables have been reported by many organizations, forming a set of best practices:

  • Online Schema Change Tools: Companies like GitHub and Facebook built tools (gh-ost and OSC) to perform online schema changes using shadow/ghost tables. These tools have become open-source solutions that others use. MySQL migrations now use the standard procedure of creating a shadow table, syncing changes, and then renaming (Zero downtime MySQL schema migrations for 400M row table). Similarly, used the open-source LHM gem in their Rails applications to safely add columns, as it "uses the shadow-table mechanism to ensure minimal downtime" Shopify (Safely Adding NOT NULL Columns to Your Database Tables - Shopify). The best practice here is to automate the shadow table process with rigorous checks (row counts, replication lag monitoring, etc.) and fallback paths if something goes wrong (for example, aborting the migration leaves the original table untouched, which is safer than a half-completed direct ALTER).
  • Strangler Pattern for Microservices: Combining the strangler fig pattern with shadow reads/writes has proven to be a successful approach for migrating from a monolith. Amazon, Netflix, and others have used the idea of routing a portion of traffic to a new system in shadow mode to build confidence. Over time, they shifted reads and finally writes to the new service, effectively strangling out the old component. Best practice here:  migrate in phases (e.g., shadow/dual-run, verify, then cutover) and use monitoring/metrics to ensure the accuracy of the new service’s data. The shadow phase can catch any discrepancies early, avoiding faulty migrations.
  • Data Pipeline and CDC Usage: When using event streams for migration, you must ensure ordering and idempotency. Teams often choose Kafka or similar durable logs to replay events to the shadow database. The order of events must match the source’s commit order to maintain consistency. Industry best practice recommends schema versioning and backward-compatible change events when using this method, so that the new system can process events even if the schema evolves during the migration. Decoupling the pipeline (so that the old and new systems communicate via the event log rather than direct dual writes) also reduces risk to the production load. However, teams should monitor the lag between source and shadow and have a way to reconcile differences if the pipeline falls behind.
  • Fallback and Rollback Plans: A migration is not truly safe without a rollback plan. In many cases, shadow table strategies lend themselves to easy rollback. If you find a problem during verification, simply discard the shadow table before switching over; this will not impact users. Even after a cutover, if the new system/table misbehaves, switch back to the old one (provided you kept it intact for a while). Uber’s migration post-mortem stresses having the ability to reverse traffic back to the old system if needed (Uber’s Billion Trips Migration Setup with Zero Downtime). As a best practice, keep the old system running in read-only mode for a short period after cutover, just in case you need to fall back. This safety net, combined with thorough monitoring, makes the migration resilient.

Comparing Shadow Tables to Alternative Migration Approaches

While shadow table (or shadow database) migrations are powerful, you should choose the right strategy for your situation.

Shadow Tables vs. Dual-Write Approach

Shadow table strategy often uses triggers or external pipelines to sync data, whereas a pure dual-write approach relies on the application to perform multiple writes. Dual-writing can achieve a similar goal of keeping two systems in sync, but the complexity of distributed transactions comes with it. 

Without careful design, dual writes can lead to race conditions or partial failures – for example, the app writes to the new database but crashes before writing to the old one, leaving data out of sync. To mitigate this, developers use patterns like the Outbox Pattern, where the application writes changes to the primary DB and also to a special outbox table in the same transaction; the application then asynchronously publishes these changes to the second system. 

In contrast, a trigger-based shadow table inherently ties the two writes into the source database’s transaction (the trigger runs inside the commit), and a CDC-based approach will capture the exact committed changes from the log. Such an approach often makes shadow table strategies more reliable for consistency than ad-hoc dual-write logic.

Figure 2: Data migration using dual-write approach

However,  when you control both systems, dual writes may be simpler to implement at the application level, and they avoid the need for database-level fiddling or extra tooling. In summary, dual writes give you more control in application code, but you must exercise extreme care to avoid inconsistency. In contrast, shadow table methods leverage the database or pipeline to guarantee consistency.

Shadow Tables vs. Blue-Green Deployments

Shadow table strategy complements blue-green setups: one can see the shadow table as part of the green environment being prepared. The key difference is that blue-green by itself doesn’t specify how to keep the data in sync – it assumes you have a way to copy and refresh data in the green environment. A full outage could do this (not ideal), or a shadow/copy process could. So, in many cases, shadow table migrations are an enabling technique to achieve a blue-green style cutover for databases.

Figure 3: Blue-green deployments working with shadow tables

The ability to test the entire new stack in parallel is the advantage of a blue-green deployment. For example, you might run a new version of your service against the shadow database (green) while the old version runs against the old database (blue). You can then switch over when ready, and even switch back if something fails, since the blue environment is still intact. The downside is cost and complexity: temporarily doubling your infrastructure. 

Maintaining two full environments (including databases) and keeping them in sync is not trivial. Shadow tables ease this by focusing on the data layer sync. If your migration is purely at the database layer (e.g., moving to a new database server or engine), a shadow table approach is a blue-green deployment of the database. If your migration also involves application changes, you might do a blue-green deployment of the app in tandem with the shadow table migration of the data.

Both strategies share the goal of a zero-downtime switch, and they pair well, but blue-green is a broader concept encompassing more than data. In contrast, the shadow table strategy is laser-focused on data consistency during the transition.

Shadow Tables vs. Event Replay (Rebuilding from Event Logs)

Event replay leverages an event log or sequence of change events to build up the state in a new system. It’s related to the CDC but slightly different in intent. In a replay scenario, you might start a brand new service by consuming a backlog of historical events (for example, reprocessing a Kafka topic of all transactions for the past year) to reconstruct its database state. Alternatively, if your system is event-sourced (storing an append-only log of changes), you can initialize a new read model or database by replaying all events from the start. This approach ensures that the new database’s state is equivalent to that of the old system, which is derived from the same sequence of inputs.

Figure 4: Data migration using event replay

Unlike shadow tables, event replay can be more time-consuming and is usually done offline or in a staging environment first because processing a considerable history of events can take a while. Shadow table migrations tend to operate on live data in real time, whereas you might use replay to bootstrap and then switch to a live sync method (like CDC) for the tail end. Another difference is that event replay might capture business-level events rather than low-level row changes. 

For example, instead of copying rows from a SQL table, you might replay a stream of "OrderPlaced" and "OrderShipped" events to rebuild the state. This approach can be useful if you’re also transforming the data model in the new system (since the new system can interpret events differently). However, if you miss any events or the event log isn’t a perfect record, you risk an incomplete migration.

In practice, engineers often use event replay in combination with shadow strategies: one might do an initial event replay to catch up a new system, then use incremental CDC or dual-writes to capture any new events that occur during the replay (so the shadow doesn’t fall behind). The combination yields the same outcome: a fully synced shadow ready to take over. The choice between using database-level shadow copy versus event-level replay often comes down to what data you have available. 

Replay might be straightforward if you have a clean event log (like an append-only journal). Otherwise, tapping into the database (via triggers or log capture) might be more manageable. Both approaches aim for eventual consistency, but shadow table syncing (especially trigger-based) will typically have the new store up-to-date within seconds of the original, whereas an event replay might apply changes in batches and catch up after some delay.

Conclusion

The shadow table strategy has proven effective in performing complex data migrations safely and incrementally. Teams keep a live replica of data changes; this enables them to migrate databases, extract services, or refactor schemas without halting the application. Companies apply this pattern to add columns without downtime, migrate enormous tables, or gradually siphon traffic to new microservices, all while preserving data integrity.

Of course, no single approach fits all situations. Shadow tables shine when you need up-to-the-second synchronization and confidence through parallel run comparisons. Alternatives like dual-writes or event replay might be more appropriate in systems built around event messaging or in simpler scenarios where a full shadow copy is overkill. Many real-world migrations end up using a blend of these techniques. For example, one might do an initial bulk load (replay), then switch to a live shadow sync, or use dual-writes in the app plus a trigger-based audit to double-check consistency.

It’s essential that software engineering teams plan migrations as first-class projects and leverage industry best practices: they should run systems in shadow mode to validate behavior, keep toggles or backstops for quick rollback, and monitor everything. When executed with discipline, the shadow table strategy provides a moderate complexity path to achieve significant changes with little downtime. It enables the evolutionary changes that modern software demands, all while keeping users blissfully unaware that anything changed under the hood.

Thermodynamics in plain sight

Mike's Notes

I figured out how to use thermodynamics to describe Pipi's state. The solution was hiding in plain sight. I credit Terrence Deacon's work and James "Jim" Miller for introducing me to Terrence's writings.

I also had to add a small process/state engine I built in 2018 and integrate some Markov.

The root Pipi system (sys) has now been updated.

And yes, there are feedback loops.

As part of refactoring Pipi 8 to Pipi 9, I had to run many systems manually to test them. To get ready for production, I'm slowly checking and finishing them to switch them over to complete automation.

The main problems I have discovered are;

  • Silly minor typos and inconsistent naming of variables.
  • Manual overrides of automated results, requiring tweaked parameters.
  • The odd thing that was never finished and forgotten about, like the problem just solved.
I should have a 48U rack cabinet to house Pipi in about a week.

Resources

References


Repository

  • Home > Ajabbi Research > Library > Author > Terrence Deacon
  • Home > Ajabbi Research > Library > Thermodynamics

Last Updated

15/04/2025

A workforce for Ajabbi

Mike's Notes

A highly skilled remote workforce will be required when Ajabbi scales. Finding people will be the thing.

I like PostHog, so I checked the PostHog Handbook to see what they do. Start with contractors and pay them well.

PostHog uses SaaS called Deel to perform all the administrative tasks of paying contractors. Deel can provide this service in 200 countries, using multiple currencies and payment methods to keep people happy.

Resources

References


Repository

  • Home > Handbook > Teams

Last Updated

13/04/2025

A workforce for Ajabbi

By: Mike Peters
On a Sandy Beach: 13/03/2025

Mike is the inventor and architect of Pipi and the founder of Ajabbi.

A community is slowly growing around Pipi 9. People are helping test the open-source aspects and acting as a sounding board for me to bounce crazy ideas off.

Sometime later this year, the 95% hidden parts of Ajabbi will become updated, visible and go live, with a proper home page that is interesting and changes. A demo version will become available. Open-source apps will appear on GitHub. The community grow more, and early adopters gather.

If Pipi 9 works as designed and tested, it will make money solving some enormous, expensive problems.

These surplus funds will go to a foundation to support the community and fund a research outfit with a paid workforce to support the closed core.

I am being sneaky. Bletchley Park had the right idea to find the code crackers, and I'm doing the same. See if you can find the puzzle and then solve it.

My approach is to find good people first and then find them jobs. There will be plenty to do.

Whenever I meet people in person or online, I ask myself, "Could I work with this person?"

I expect everyone to prove their worth by what they do and how they act. Attitude comes first. I learned this in Christchurch while leading one of the large earthquake recovery groups.

Take people on as contractors first, see how they go. Then ...

I am a loyal and generous person by nature. I like to work with the same people I know well and trust. Prove yourself, and the door will always be open.

Thermodynamics at the root of Pipi

Mike's Notes

This week, I had a hunch that I needed to use thermodynamics to describe the state of Pipi. You can read about this adventure in this latest blog post. Instead of answers, I have lots of questions. There will be experiments. So it's off into the deep end again.

Resources

References


Repository

  • Home > Ajabbi Research > Library > Thermodynamics

Last Updated

12/04/2025

Thermodynamics at the root of Pipi

By: Mike Peters
On a Sandy Beach: 12/03/2025

Mike is the inventor and architect of Pipi and the founder of Ajabbi.

The root system of Pipi (sys) gives each copy of Pipi a unique name. The names are taken from a list of the old gods of early human history. Loki, Zeus, etc. Each name stays with that Pipi regardless of version numbering. The System deals with the growth, replication and death of each Pipi. It will also enable the Pipis to talk to each other.

Each unique Pipi system consists of hundreds of multiple nested subsystems, which can be combined in various combinations to give different emergent properties.

It is said that the whole is more than the sum of the parts. Prof. Terrence Deacon believes that with life, the whole is less than the sum of the parts due to constraints (I think Terrence is correct about many things). That is the version I am playing with here.

So far, I am figuring out at a high level which bits constrain others and clarifying the self-organisation that occurs where and when—basically, I am building a working data model.

I suspect I will get deeply into Shannon information theory and Boltzmann entropy.

Will this act as a feedback loop? Probably.

Hopefully, I won't end up in some rabbit hole.

I discovered this issue a few days ago while methodically checking, testing and tidying up a long list of systems. Most of them were good to go, but I found something I had missed entirely.

It's a bit of a brain teaser, and I am making many drawings on paper. I want to understand the problem more clearly and then find a solution.

Time for more coffee.

Getting the Clients IP Address Using Lucee and ColdFusion

Mike's Notes

A useful blog post by Gregory Alexander on collecting IP addresses and client variables when using ColdFusion.

Resources

References


Repository

Home > Ajabbi Research > Library > Software > ColdFusion 

Last Updated

05/04/2025

Getting the Clients IP Address Using Lucee and ColdFusion

By: Gregory Alexander
Gregory's Blog: 2/03/2025

As a long-term ColdFusion developer, I have used CGI environment variables for the last twenty years to extract IP addresses and client variables. However, in this day and age, getting the IP address is much more complex and is no longer a trivial issue.

Background

When I first started programming, ColdFusion made it easy to extract client information using CGI environment variables. For example, getting the client IP address was simple—you only needed to use the remote_addr CGI environment variable. However, using CGI environment variables today is antiquated and no longer the preferred way to extract client information. 

When programming a visitor log for Galaxie Blog (the software that drives the blog you're looking at now), I tried using this CGI variable to extract the client IP and quickly learned I was getting garbage data. Every user seemed to be coming from a blank IP, or worse,127.0.0.1! 

CGI Environment Variables are Outdated and Have Security Risks

While the scope of this article is extracting client data, such as getting the IP and browser information, CGI variables are an outdated technology with several risks. GCI environment information is less reliable and efficient than HTTP headers, which use a newer technology and are an adopted standard for passing information between the client and server.

CGI environment variables also do not always work, especially when proxy servers handle networking traffic. The HTTP header standard offers flexibility when passing client data in various networking environments. If available, the HTTP headers information is also more secure and reliable. 

Using ColdFusion/Lucee to Extract Client Information From HTTP Headers

Ironically, Lucee and ColdFusion document how to get HTTP header information using the GetHTTPRequestData method, but they do not provide detailed examples of extracting the actual values. One method to acquire the client IP address is:

getHTTPRequestData()['headers']['x-forwarded-for']

However, as we will see, more methods are available.

The IP Address and Client Information May Not Be Available in the Header

While getting client information from the HTTP Headers is preferred, it is up to the network administrators to properly configure the server's headers, and the IP address and client information may not be available. If client information is available, you may have to look in multiple spots to find this information. 

It should also be noted that the IP address and client data are unreliable even if the server is configured correctly to pass this information. No matter what method you use, the IP address is never guaranteed to be authentic!

How to Get the Client IP

Using the CGI.Remote_Addr Environment Variable

The client IP address may be available using the CGI Remote_Addr environment variable. However, this may also be the IP address of the proxy server and not the real client IP. If you use the CGI.remote_addr and notice that your logs contain the same IP address, your web server is likely behind a proxy server.

<cfset clientIp = CGI.remote_addr>

Using X-Fowarded-For

The X-Forwarded-For HTTP Header key is the most widely used method for extracting the client's IP. However, this is not a formal standard, and multiple IP addresses can exist. If there are multiple IPs, the originating client IP is usually the first IP in the comma-delimited list. See https://en.wikipedia.org/wiki/X-Forwarded-For for more information.

<cfset clientIp = listGetAt( getHTTPRequestData()['headers']['x-forwarded-for'], 1 )>

The X-Real-Ip

The X-Real-Ip stores the client IP when load balancing is used, or an optional library is installed and configured with nginx servers. 

<cfset clientIp = getHttpRequestData().headers["x-real-ip"]>

Using the CF-Connecting-IP Key

The CF-Connecting-IP contains the client IP when your web server is behind a Cloudflare proxy. Cloudflare recommends using this key if available; however, after reading several threads on Reddit, you may have to upgrade your plan to use this key, and Cloudflare always seems to use the same value for the more commonly used X-Forwarded-For key.

<cfset clientIp = getHttpRequestData().headers["cf-connecting-ip"]>

Using Fastly-Client-Ip

Like the CF-Connecting-IP, the fastly-client-ip is a proprietary key that stores the client IP when Fastly proxy servers are used.

<cfset clientIp = getHTTPRequestData()['headers']['fastly-client-ip']>

Using the Forwarded Key

The forwarded HTTP header is a new standard for getting the client's IP. However, it is not frequently used and has other key-value pairs: by, for, host, and proto. All key value pairs are optional and can be used in multiple ways. I have never seen this header key used yet and may document it more in the future.

A ColdFusion/Lucee Script to get the Client IP

Based upon my current research, the following script can be used to get the most authoritative IP.

<!--- Get the HTTP Headers --->
<cfset httpHeaders = getHTTPRequestData()["headers"]>
<!---
If you already know what key you want, and that your server supports the the key, you can skip this step and use:
<cfset ipAddress = getHTTPRequestData()["headers"]["x-forwarded-for"]> replace x-forwarded-for with your own desired key.
--->

<!--- Determine if the x-forwarded-for key exists. --->
<cfif structKeyExists(httpHeaders,"x-forwarded-for")>
<!--- The x-forwarded-for is by far the most common header to get the IP. However, it still may not exist, espcially on Windows based servers! --->
<cfset ipAddress = httpHeaders["x-forwarded-for"]>
<cfelseif structKeyExists(httpHeaders,"x-real-ip")>
<!--- The x-real-ip is used when load balancing or on nginx servers --->
<cfset ipAddress = httpHeaders["x-real-ip"]>
<!--- This is a proprietary cloudfare header. From what I read, CloudFlare will still configure the x-forwarded-for along with this key and some have complained that the cf-connecting key is only available with a upgraded premium plan --->
<cfelseif structKeyExists(httpHeaders,"cf-connecting-ip")>
<cfset ipAddress = httpHeaders["cf-connecting-ip"]>
<!--- This is a proprietary fastly header --->
<cfelseif structKeyExists(httpHeaders,"fastly-client-ip")>
<cfset ipAddress = httpHeaders["fastly-client-ip"]>
<cfelse>
<cfset ipAddress = CGI.Remote_Addr>
</cfif>
<cfoutput>ipAddress: #ipAddress#</cfoutput>

Further Reading