Sections

Building High-Performance Teams in 2025: Beyond the Location Debate

Mike's Notes

A thoughtful article about in-office versus remote from Leah Brown, Managing Editor at IT Revolution. Pipi 9 has been built specifically for high-performance DevOps teams.

Resources

References

  • Reference

Repository

  • Home > Ajabbi Research > Library > Subscriptions > IT Revolution
  • Home > Handbook > 

Last Updated

17/05/2025

Building High-Performance Teams in 2025: Beyond the Location Debate

By: Leah Brown
IT Revolution: January 6 2025

The debate over in-office versus remote work misses a fundamental truth: high-performing teams succeed based on how they’re organized, not where they sit. Through extensive research across industries, Gene Kim and Dr. Steven J. Spear found that three key mechanisms consistently enable team excellence: slowing down to speed up, breaking down complexity, and amplifying problems early.

As they explain in their award-winning book Wiring the Winning Organization, the leaders of the highest-performing teams will use these three simple mechanisms to “wire their organization for success instead of mediocrity.” 

1. Slow Down to Speed Up

High-performing teams in 2025 must prioritize solving problems in controlled environments before they appear in production, what Kim and Spear term “slowification.” High-performing teams should look to:

  • Move complex problem-solving offline instead of firefighting during execution.
  • Create dedicated spaces for experimentation and learning.
  • Build standard approaches based on validated solutions.
  • Test new processes in low-stakes environments.

Toyota exemplifies this approach using careful preparation and practice to achieve industry-leading performance. Known as the Toyota Production System, this method of slowing down to solve problems has long been proven to help the highest-performing teams succeed. And it will continue to be a differentiator for high-performing teams in 2025 and beyond.

2. Break Down Complexity 

High-performing organizations like Amazon have transformed their performance by making complex work manageable through what Kim and Spear term “simplification.”

Simplification is the process of making complex work more manageable by:

  • Creating small, self-contained teams that own complete workflows.
  • Defining clear handoffs between specialized functions.
  • Implementing changes incrementally rather than all at once.
  • Designing linear processes with obvious next steps.

Amazon has used these principles to evolve from making only twenty software deployments per year to over 136,000 daily deployments. They achieved this by breaking down monolithic systems into smaller, independent services with clear interfaces.

3. Amplify Problems Early

Drawing from their research of high-performing organizations in manufacturing, healthcare, and technology, Kim and Spear found that great organizations create mechanisms to detect and respond to small issues before they become major disruptions. This “amplification,” as they call it, requires teams to maintain reserve capacity to swarm problems when they occur and share solutions across teams to prevent recurrence down the road.

In other words, high-performing teams:

  • Make problems visible immediately when they occur.
  • Create rapid feedback loops between dependent teams.
  • Maintain reserve capacity to swarm and contain issues.
  • Share solutions across teams to prevent recurrence.

Leading the High-Performing Team

To create and lead your high-performing teams, Kim and Spear recommend starting with what they call a “model line”—a small segment where new approaches can be tested. Their research shows three phases of implementing a model line in any organization:

  • Start Small: Choose one critical workflow, form an initial cross-functional team, and implement basic performance metrics.
  • Expand Thoughtfully: Add supporting capabilities, establish clear team interactions, and build knowledge-sharing mechanisms.
  • Optimize Continuously: Refine team boundaries and interfaces while maintaining focus on outcomes.

The organizations that thrive in 2025 and beyond will be those that create what Kim and Spear call effective “social circuitry”—the processes and norms that enable great collaboration. When teams have well-defined boundaries, clear visibility into work, and mechanisms to coordinate when needed, location becomes irrelevant.

The future belongs to organizations that focus on creating the right conditions for teams to excel, whether in a physical, remote, or hybrid environment. By implementing the three key mechanisms of great social circuitry, leaders can build high-performing teams that consistently deliver exceptional results, regardless of where they sit. 

The evidence presented in Wiring the Winning Organization makes this clear: excellence comes from organizational design, not office design.

Curiosity, Open Source, and Timing: The Formula Behind DeepSeek’s Phenomenal Success

Mike's Notes

This week's Turing Post had an excellent summary of DeepSeek and some valuable links.

The original post is on Turing Post, and a longer version is on HuggingFace. The missing links on this page can be found in the original post.

Turing Post is worth subscribing to.

LM Studio is free for personal use and can run DeepSeek and other LLM's. It can run on Mac, Windows Linux. Windows requires 16GB Ram.

Resources

References

  • Reference

Repository

  • Home > Ajabbi Research > Library > Subscriptions > Turing Post
  • Home > Handbook > 

Last Updated

17/05/2025

Curiosity, Open Source, and Timing: The Formula Behind DeepSeek’s Phenomenal Success

By: Ksenia Se
Turing Post: #85 January 27, 2025

How an open-source mindset, relentless curiosity, and strategic calculation are rewriting the rules in AI and challenging Western companies, plus an excellent reading list and curated research collection

When we first covered DeepSeek models in August 2024 (we are opening that article for everyone, do read it), it didn’t gain much traction. That surprised me! Back then, DeepSeek was already one of the most exciting examples of curiosity-driven research in AI, committed to open-sourcing its discoveries. They also employed an intriguing approach: unlike many others racing to beat benchmarks, DeepSeek pivoted to addressing specific challenges, fostering innovation that extended beyond conventional metrics. Even then, they demonstrated significant cost reductions.

“What’s behind DeepSeek-Coder-V2 that makes it so special it outperforms GPT-4 Turbo, Claude-3 Opus, Gemini 1.5 Pro, Llama 3-70B, and Codestral in coding and math?

DeepSeek-Coder-V2, costing 20–50x less than other models, represents a major upgrade over the original DeepSeek-Coder. It features more extensive training data, larger and more efficient models, improved context handling, and advanced techniques like Fill-In-The-Middle and Reinforcement Learning.” (Inside DeepSeek Models)

Although DeepSeek was making waves in the research community, it remained largely unnoticed by the broader public. But then they released R1-Zero and R1.

With that release they crushed industry benchmarks and disrupted the market by training their models at a fraction of the typical cost. But do you know what else they did? Not only did they prove that reinforcement learning (RL) is all you need in reasoning (R1 stands as solid proof of how well RL works), but they also embraced a trial-and-error approach – fundamental to RL – for their own business strategies. Previously overlooked, they calculated this release of R1 meticulously. Did you catch the timing? It was a strategic earthquake that shook the market and left everyone reeling:

  1. As ChinaTalk noticed: “R1's release during President Trump’s inauguration last week was clearly intended to rattle public confidence in the United States’ AI leadership at a pivotal moment in US policy, mirroring Huawei's product launch during former Secretary Raimondo's China visit. After all, the benchmark results of an R1 preview had already been public since November.”
  2. The release happened just one week before the Chinese Lunar New Year (this year on January 29), which typically lasts 15 days. However, the week leading up to the holiday is often quiet, giving them a perfect window to outshine other Chinese companies and maximize their PR impact.

So, while the DeepSeek family of models serves as a case study in the power of open-source development paired with relentless curiosity (from an interview with Liang Wenfeng, DeepSeek’s CEO: “Many might think there's an undisclosed business logic behind this, but in reality, it's primarily driven by curiosity.”), it’s also an example of cold-blooded calculation and triumph of reinforcement learning applied to both models and humans :). DeepSeek has shown a deep understanding of how to play Western games and excel at them. Of course, today’s market downturn, though concerning to many, will likely recover soon. However, if DeepSeek can achieve such outstanding results, Western companies need to reassess their strategies quickly and clarify their actual competitive moats.

Worries about NVIDIA

Of course, we’ll still need a lot of compute – everyone is hungry for it. That’s a quote from Liang Wenfeng, DeepSeek’s CEO: “For researchers, the thirst for computational power is insatiable. After conducting small-scale experiments, there's always a desire to conduct larger ones. Since then, we've consciously deployed as much computational power as possible.”

So, let’s not count NVIDIA out. What we can count on is Jensen Huang’s knack for staying ahead to find the way to stay relevant (NVIDIA wasn’t started as an AI company, if you remember). But what the rise of innovators like DeepSeek could push NVIDIA to is to double down on openness. Beyond the technical benefits, an aggressive push toward open-sourcing could serve as a powerful PR boost, reinforcing Nvidia’s centrality in the ever-expanding AI ecosystem.

As I was writing these words about NVIDIA, they sent a statement regarding DeepSeek: “DeepSeek is an excellent AI advancement and a perfect example of Test Time Scaling. DeepSeek’s work illustrates how new models can be created using that technique, leveraging widely-available models and compute that is fully export control compliant. Inference requires significant numbers of NVIDIA GPUs and high-performance networking. We now have three scaling laws: pre-training and post-training, which continue, and new test-time scaling.”

So – to wrap up – the main takeaway from DeepSeek breakthrough is that:

  • open-source and decentralize
  • stay curiosity-driven
  • apply reinforcement learning to everything

For DeepSeek, this is just the beginning. As curiosity continues to drive its efforts, it has proven that breakthroughs come not from hoarding innovation but from sharing it. As we move forward, it’s these principles that will shape the future of AI.

We are reading (it’s all about 🐳)

Here is a collection of superb articles covering everything you need to know about DeepSeek:

Curated Collections

7 Open-source Methods to Improve Video Generation and Understanding

Weekly recommendation from AI practitioner👍🏼

  • To run DeepSeek models offline using LM Studio:
  • Install LM Studio: Download the appropriate version for your operating system from the LM Studio website. Follow the installation instructions provided.
  • Download the DeepSeek Model: Open LM Studio and navigate to the "Discover" tab. Search for "DeepSeek" and select your desired model. Click "Download" to save the model locally.
  • Run the Model Offline: Once downloaded, go to the "Local Models" section. Select the DeepSeek model and click "Load." You can interact with the model directly within LM Studio without an internet connection.

News from The Usual Suspects ©

  • Data Center News
    $500B Stargate AI Venture by OpenAI, Oracle, and SoftBank
    With plans to build massive data centers and energy facilities in Texas, Stargate aims to bolster U.S. AI dominance. Partners like NVIDIA and Microsoft bring muscle to this high-stakes competition with China. Trump supports it, Musk trashes.

Meta's Manhattan-Sized AI Leap

  • Mark Zuckerberg’s AI ambitions come on a smaller scale (haha) – $65 billion for a data center so vast it could envelop Manhattan. With 1.3 million GPUs powering this, Meta aims to revolutionize its ecosystem and rival America’s AI heavyweights. The era of AI megaprojects is here.
  • Mistral’s IPO Plans: Vive la Résistance French AI startup Mistral isn’t selling out. With €1 billion raised, CEO Arthur Mensch eyes an IPO while doubling down on open-source LLMs. Positioned as a European powerhouse, Mistral’s independence signals Europe’s readiness to play hardball in the global AI race.
  • SmolVLM: Hugging Face Goes Tiny Hugging Face introduces SmolVLM, two of the smallest foundation models yet. This open-source release proves size doesn’t matter when efficiency leads the charge, setting new standards for compact AI development.
  • OpenAI's Agent Takes the Wheel CUA (Computer-Using Agent) redefines multitasking with Operator, seamlessly interacting with GUIs like a digital power user. From downloading PDFs to complex web tasks, it’s the closest we’ve come to a universal assistant .CUA is now in Operator's research preview for Pro users. Blog. System Card.
  • Google DeepMind A Year in Gemini’s Orbit They just published an overview of 2024. From Gemini 2.0's breakthroughs in multimodal AI to Willow chip’s quantum strides, innovation soared. Med-Gemini aced medical exams, AlphaFold 3 advanced molecular science, and ALOHA redefined robotics. With disaster readiness, educational tools, and responsible AI initiatives, DeepMind balanced cutting-edge tech with global impact. A Nobel-worthy streak indeed. Cost-Cutting AI with "Light Chips" Demis Hassabis unveils Google's next move – custom "light chips" designed to slash AI model costs while boosting efficiency. These chips power Gemini 2.0 Flash, with multimodal AI, 1M-token memory, and a "world model" vision for AGI. DeepMind’s edge? Owning every layer of the AI stack, from chips to algorithms.

Top models to pay attention to

  • DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Enhance reasoning in LLMs with multi-stage reinforcement learning, outperforming competitors in benchmarks like AIME 2024 and MATH-500.
  • Kimi K1.5: Scaling Reinforcement Learning with LLMs Scale reasoning capabilities with efficient reinforcement learning methods, optimizing token usage for both long- and short-chain-of-thought tasks.
  • VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding Advance image and video understanding with multimodal integration, achieving top results in temporal reasoning and long-video tasks.
  • Qwen2.5-1M Series Support 1M-token contexts with open-source models, leveraging sparse attention and lightning-fast inference frameworks for long-context tasks.

The freshest research papers, categorized for your convenience

There were quite a few TOP research papers this week, we will mark them with 🌟 in each section.

Specialized Architectures and Techniques

  • 🌟 Demons in the Detail: Introduces load-balancing loss for training Mixture-of-Experts models.
  • 🌟 Autonomy-of-Experts Models: Proposes expert self-selection to improve Mixture-of-Experts efficiency and scalability.
  • O1-Pruner: Length-Harmonizing Fine-Tuning: Reduces inference overhead in reasoning models through reinforcement learning-based pruning. Language Model Reasoning and Decision-Making
  • 🌟 Evolving Deeper LLM Thinking: Explores genetic search methods to enhance natural language inference for planning tasks, achieving superior accuracy.
  • 🌟 Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training: Develops a framework for LLMs to self-correct using Monte Carlo Tree Search and iterative refinement.
  • 🌟 Reasoning Language Models: A Blueprint: Proposes a modular framework integrating reasoning methods to democratize reasoning capabilities.
  • Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback: Enhances mathematical reasoning with stepwise binary feedback for more accurate LLM outputs.
  • Test-Time Preference Optimization: Introduces a framework for aligning LLM outputs to human preferences during inference without retraining.

Multi-Agent Systems and Coordination

  • SRMT: Shared Memory for Multi-Agent Lifelong Pathfinding: Demonstrates shared memory use for enhanced coordination in multi-agent systems.
  • Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks: Develops a hierarchical agent framework for mobile assistants with self-evolution capabilities.

**Generative and Retrieval-Augmented Models

  • Chain-of-Retrieval Augmented Generation: Presents a stepwise query and reasoning framework for retrieval-augmented generation.
  • Can We Generate Images with CoT?: Integrates Chain-of-Thought reasoning for compositional and iterative image generation.

Multi-Modal and GUI Systems

  • UI-TARS: Pioneering Automated GUI Interaction: Advances vision-based agents for human-like GUI task performance.
  • InternLM-XComposer2.5-Reward: Improves multi-modal reward modeling for text, image, and video alignment.

Robustness, Adaptability, and Uncertainty

  • Trading Inference-Time Compute for Adversarial Robustness: Examines inference-time compute scaling to improve robustness against adversarial attacks.
  • Evolution and the Knightian Blindspot of Machine Learning: Advocates integrating evolutionary principles into machine learning for resilience to uncertainty.

Planning and Execution in AI

  • LLMs Can Plan Only If We Tell Them: Proposes structured state tracking to enhance planning capabilities in LLMs.
  • Debate Helps Weak-to-Strong Generalization: Leverages debate methods to improve model generalization and alignment.

Social and Cognitive Insights

  • Multiple Predictions of Others’ Actions in the Human Brain: Examines neural mechanisms for predicting social behaviors under ambiguity.

AI Infrastructure and Hardware

  • Good Things Come in Small Packages: Advocates Lite-GPUs for scalable and cost-effective AI infrastructure.

Nine ways to shoot yourself in the foot with PostgreSQL

Mike's Notes

This handy post by Phil Booth on his website has some excellent warnings about misusing PostgreSQL.

There are a lot more in-article links in the original post.

Phil has a great website full of interesting articles.

Resources

References

  • Reference

Repository

  • Home > Ajabbi Research > Library >
  • Home > Handbook > 

Last Updated

17/05/2025

Nine ways to shoot yourself in the foot with PostgreSQL

By: Phil Booth
PhilBooth.com: 23/04/2023

Previously for Extreme Learning, I discussed all the ways I've broken production using healthchecks. In this post I'll do the same for PostgreSQL.

The common thread linking most of these gotchas is scalability. They're things that won't affect you while your database is small. But if one day you want your database not to be small, it pays to think about them in advance. Otherwise they'll came back and bite you later, potentially when it's least convenient. Plus in many cases it's less work to do the right thing from the start, than it is to change a working system to do the right thing later on.

1. Keep the default value for work_mem

The biggest mistake I made the first time I deployed Postgres in prod was not updating the default value for work_mem. This setting governs how much memory is available to each query operation before it must start writing data to temporary files on disk, and can have a huge impact on performance.

It's an easy trap to fall into if you're not aware of it, because all your queries in local development will typically run perfectly. And probably in production too at first, you'll have no issues. But as your application grows, the volume of data and complexity of your queries both increase. It's only then that you'll start to encounter problems, a textbook "but it worked on my machine" scenario.

When work_mem becomes over-utilised, you'll see latency spikes as data is paged in and out, causing hash table and sorting operations to run much slower. The performance degradation is extreme and, depending on the composition of your application infrastructure, can even turn into full-blown outages.

A good value depends on multiple factors: the size of your Postgres instance, the frequency and complexity of your queries, the number of concurrent connections. So it's really something you should always keep an eye on.

Running your logs through pgbadger is one way to look for warning signs. Another way is to use an automated 3rd-party system that alerts you before it becomes an issue, such as pganalyze (disclosure: I have no affiliation to pganalyze, but am a very happy customer).

At this point, you might be asking if there's a magic formula to help you pick the correct value for work_mem. It's not my invention but this one was handed down to me by the greybeards:

work_mem = ($YOUR_INSTANCE_MEMORY * 0.8 - shared_buffers) / $YOUR_ACTIVE_CONNECTION_COUNT

EDIT: Thanks to afhammad for pointing out you can also override work_mem on a per-transaction basis using SET LOCAL work_mem`.

2. Push all your application logic into Postgres functions and procedures

Postgres has some nice abstractions for procedural code and it can be tempting to push lots or even all of your application logic down into the db layer. After all, doing that eliminates latency between your code and the data, which should mean lower latency for your users, right? Well, nope.

Functions and procedures in Postgres are not zero-cost abstractions, they're deducted from your performance budget. When you spend memory and CPU to manage a call stack, less of it is available to actually run queries. In severe cases that can manifest in some surprising ways, like unexplained latency spikes and replication lag.

Simple functions are okay, especially if you can mark them IMMUTABLE or STABLE. But any time you're assembling data structures in memory or you have nested functions or recursion, you should think carefully about whether that logic can be moved back to your application layer. There's no TCO in Postgres!

And of course, it's far easier to scale application nodes than it is to scale your database. You probably want to postpone thinking about database scaling for as long as possible, which means being conservative about resource usage.

3. Use lots of triggers

Triggers are another feature that can be misused.

Firstly, they're less efficient than some of the alternatives. Requirements that can be implemented using generated columns or materialized views should use those abstractions, as they're better optimised by Postgres internally.

Secondly, there's a hidden gotcha lurking in how triggers tend to encourage event-oriented thinking. As you know, it's good practice in SQL to batch related INSERT or UPDATE queries together, so that you lock a table once and write all the data in one shot. You probably do this in your application code automatically, without even needing to think about it. But triggers can be a blindspot.

The temptation is to view each trigger function as a discrete, composable unit. As programmers we value separation of concerns and there's an attractive elegance to the idea of independent updates cascading through your model. If you feel yourself pulled in that direction, remember to view the graph in its entirety and look for parts that can be optimised by batching queries together.

A useful discipline here is to restrict yourself to a single BEFORE trigger and a single AFTER trigger on each table. Give your trigger functions generic names like before_foo and after_foo, then keep all the logic inline inside one function. Use TG_OP to distinguish the trigger operation. If the function gets long, break it up with some comments but don't be tempted to refactor to smaller functions. This way it's easier to ensure writes are implemented efficiently, plus it also limits the overhead of managing an extended call stack in Postgres.

4. Use NOTIFY heavily

Using NOTIFY, you can extend the reach of triggers into your application layer. That's handy if you don't have the time or the inclination to manage a dedicated message queue, but once again it's not a cost-free abstraction.

If you're generating lots of events, the resources spent on notifying listeners will not be available elsewhere. This problem can be exacerbated if your listeners need to read further data to handle event payloads. Then you're paying for every NOTIFY event plus every consequential read in the handler logic. Just as with triggers, this can be a blindspot that hides opportunities to batch those reads together and reduce load on your database.

Instead of NOTIFY, consider writing events to a FIFO table and then consume them in batches at regular cadence. The right cadence depends on your application, maybe it's a few seconds or perhaps you can get away with a few minutes. Either way it will reduce the load, leaving more CPU and memory available for other things.

A possible schema for your event queue table might look like this:

CREATE TABLE event_queue (
  id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id uuid NOT NULL REFERENCES users(id) ON DELETE CASCADE,
  type text NOT NULL,
  data jsonb NOT NULL,
  created_at timestamptz NOT NULL DEFAULT now(),
  occurred_at timestamptz NOT NULL,
  acquired_at timestamptz,
  failed_at timestamptz
);

With that in place you could acquire events from the queue like so:

UPDATE event_queue
SET acquired_at = now()
WHERE id IN (
  SELECT id
  FROM event_queue
  WHERE acquired_at IS NULL
  ORDER BY occurred_at
  FOR UPDATE SKIP LOCKED
  LIMIT 1000 -- Set this limit according to your usage
)
RETURNING *;

Setting acquired_at on read and using FOR UPDATE SKIP LOCKED guarantees each event is handled only once. After they've been handled, you can then delete the acquired events in batches too (there are better options than Postgres for permanently storing historical event data of unbounded size).

EDIT: Thanks to labatteg, notfancy, reese_john, xnickb and Mavvie for pointing out the missing FOR UPDATE SKIP LOCKED in this section.

5. Don't use EXPLAIN ANALYZE on real data

EXPLAIN is a core tool in every backend engineer's kit. I'm sure you diligently check your query plans for the dreaded Seq Scan already. But Postgres can return more accurate plan data if you use EXPLAIN ANALYZE, because that actually executes the query. Of course, you don't want to do that in production. So to use EXPLAIN ANALYZE well, there are a few steps you should take first.

Any query plan is only as good as the data you run it against. There's no point running EXPLAIN against a local database that has a few rows in each table. Maybe you're fortunate enough to have a comprehensive seed script that populates your local instance with realistic data, but even then there's a better option.

It's really helpful to set up a dedicated sandbox instance alongside your production infrastructure, regularly restored with a recent backup from prod, specifically for the purpose of running EXPLAIN ANALYZE on any new queries that are in development. Make the sandbox instance smaller than your production one, so it's more constrained than prod. Now EXPLAIN ANALYZE can give you confidence about how your queries are expected to perform after they've been deployed. If they look good on the sandbox, there should be no surprises waiting for you when they reach production.

6. Prefer CTEs over subqueries

If you're regularly using EXPLAIN this one probably won't catch you out, but it's caught me out before so I want to mention it explicitly.

Many engineers are bottom-up thinkers and CTEs (i.e. WITH queries) are a natural way to express bottom-up thinking. But they may not be the most performant way.

Instead I've found that subqueries will often execute much faster. Of course it depends entirely on the specific query, so I make no sweeping generalisations other than to suggest you should EXPLAIN both approaches for your own complex queries.

There's a discussion of the underlying reasons in the "CTE Materialization" section of the docs, which describes the performance tradeoffs more definitively. It's a good summary, so I won't waste your time trying to paraphrase it here. Go and read that if you want to know more.

EDIT: Thanks to Randommaggy and Ecksters for pointing out the subquery suggestion in this section is outdated. Since version 12, Postgres has been much better at optimising CTEs and will often just replace the CTE with a subquery anyway. I've left the section in place as the broader point about comparing approaches with EXPLAIN still stands and the "CTE Materialization" docs remain a worthwhile read. But bear in mind the comment thread linked above!

7. Use recursive CTEs for time-critical queries

If your data model is a graph, your first instinct will naturally be to traverse it recursively. Postgres provides recursive CTEs for this and they work nicely, even allowing you to handle self-referential/infinitely-recursive loops gracefully. But as elegant as they are, they're not fast. And as your graph grows, performance will decline.

A useful trick here is to think about how your application traffic stacks up in terms of reads versus writes. It's common for there to be many more reads than writes and in that case, you should consider denormalising your graph to a materialized view or table that's better optimised for reading. If you can store each queryable subgraph on its own row, including all the pertinent columns needed by your queries, reading becomes a simple (and fast) SELECT. The cost of that is write performance of course, but it's often worth it for the payoff.

8. Don't add indexes to your foreign keys

Postgres doesn't automatically create indexes for foreign keys. This may come as a surprise if you're more familiar with MySQL, so pay attention to the implications as it can hurt you in a few ways.

The most obvious fallout from it is the performance of joins that use a foreign key. But those are easily spotted using EXPLAIN, so are unlikely to catch you out.

Less obvious perhaps is the performance of ON DELETE and ON UPDATE behaviours. If your schema relies on cascading deletes, you might find some big performance gains by adding indexes on foreign keys.

9. Compare indexed columns with IS NOT DISTINCT FROM

When you use regular comparison operators with NULL, the result is also NULL instead of the boolean value you might expect. One way round this is to replace <> with IS DISTINCT FROM and replace = with IS NOT DISTINCT FROM. These operators treat NULL as a regular value and will always return booleans.

However, whereas = will typically cause the query planner to use an index if one is available, IS NOT DISTINCT FROM bypasses the index and will likely do a Seq Scan instead. This can be confusing the first time you notice it in the output from EXPLAIN.

If that happens and you want to force a query to use the index instead, you can make the null check explicit and then use = for the not-null case.

In other words, if you have a query that looks like this:

SELECT * FROM foo
WHERE bar IS NOT DISTINCT FROM baz;

You can do this instead:

SELECT * FROM foo
WHERE (bar IS NULL AND baz IS NULL)
OR bar = baz;

Design Token-Based UI Architecture

Mike's Notes

Pipi 9 has an existing design system engine, one of its many parts. This engine describes the CSS files but does not yet automate or generate code.

Design Tokens have significantly matured, and the draft standard has recently improved.

The article below is copied from Martin Fowler's website. It describes how ThoughtWorks uses Design Tokens for code generation, which I will use as a starting point. It should work well with the existing design system engine.

Resources

References

  • Reference

Repository

  • Home > Ajabbi Research > Library > Authors > Martin Fowler
  • Home > Handbook > 

Last Updated

17/05/2025

Design Token-Based UI Architecture

By: Andreas Kutschmann
MartinFowler.com: December 12 2024

Design tokens are design decisions as data and serve as a single source of truth for design and engineering. Utilizing deployment pipelines, they enable automated code generation across platforms, allowing for faster updates and improved consistency in design. Organizing tokens in layers—progressing from available options to tokens that capture how they are applied—ensures scalability and a better developer experience. Keeping option tokens (e.g. color palettes) private reduces file size and supports non-breaking changes. These benefits make design tokens particularly well-suited for organizations with large-scale projects, multi-platform environments or frequent design changes.

Contents

  • Role of design tokens
    • What are design tokens?
  • Establishing a single source of truth
  • Automated design token distribution
    • Fully automated pipeline
    • Pipeline including manual approval
  • Organizing tokens in layers
    • Option tokens: defining what design options are provided
    • Decision tokens: defining how styles are applied
    • Component tokens: defining where styles are applied
    • How many layers shall I use?
  • Token scope
    • File-based scope
    • A more flexible approach
  • Should I use design tokens?
    • When to use design tokens
    • When design tokens might not be necessary

Design tokens, or “tokens” are fundamental design decisions represented as data. They are the foundational building blocks of design systems.

Since the release of the second editor’s draft of the design token specification in 2022 and the call for tool makers to start implementing and providing feedback, the landscape of design token tools has evolved rapidly. Tools like code generators, documentation systems, and UI design software are now better equipped to support design tokens, underscoring their growing importance in modern UI architecture.

In this article, I'll explain what design tokens are, when they are useful and how to apply them effectively. We'll focus on key architectural decisions that are often difficult to change later, including:

  1. How to organize design tokens in layers to balance scalability, maintainability and developer experience.
  2. Whether all tokens should be made available to product teams or just a subset.
  3. How to automate the distribution process of tokens across teams.

Role of design tokens

Around 2017, I was involved in a large project that used the Micro Frontend Architecture to scale development teams. In this setup, different teams were responsible for different parts of the user interface, which could be even on the same page. Each team could deploy its micro-frontend independently.

There were various cases where components would be displayed on top of each other (such as dialogs or toasts appearing on top of content areas), which were not part of the same micro frontend. Teams used the CSS property z-index to control the stacking order, often relying on magic numbers—arbitrary values that weren’t documented or standardized. This approach did not scale as the project grew. It led to issues that took effort to fix, as cross-team collaboration was needed.

The issue was eventually addressed with design tokens and I think makes a good example to introduce the concept. The respective token file might have looked similar to this:

{
  "z-index": {
    "$type": "number",
    "default": {
      "$value": 1
    },
    "sticky": {
      "$value": 100
    },
    "navigation": {
      "$value": 200
    },
    "spinner": {
      "$value": 300
    },
    "toast": {
      "$value": 400
    },
    "modal": {
      "$value": 500
    }
  }
}

The design tokens above represent the set of z-index values that can be used in the application and the name gives developers a good idea of where to use them. A token file like this can be integrated into the designers’ workflow and also be used to generate code, in a format that each team requires. For example, in this case, the token file might have been used to generate CSS or SCSS variables:

css

:root {
    --z-index-default: 1;
    --z-index-sticky: 100;
    --z-index-navigation: 200;
    --z-index-spinner: 300;
    --z-index-toast: 400;
    --z-index-modal: 500;
  }

scss

$z-index-default: 1;
  $z-index-sticky: 100;
  $z-index-navigation: 200;
  $z-index-spinner: 300;
  $z-index-toast: 400;
  $z-index-modal: 500;

What are design tokens?

Salesforce originally introduced design tokens to streamline design updates to multiple platforms.

The Design Tokens Community Group describes design tokens as “a methodology for expressing design decisions in a platform-agnostic way so that they can be shared across different disciplines, tools, and technologies

Let’s break this down:

Cross-Disciplinary Collaboration: Design tokens act as a common language that aligns designers, developers, product managers, and other disciplines. By offering a single source of truth for design decisions, they ensure that everyone involved in the product life cycle is on the same page, leading to more efficient workflows.

Tool integration: Design tokens can be integrated into various design and development tools, including UI design software, token editors, translation tools (code generators), and documentation systems. This enables design updates to be quickly reflected in the code base and are synchronized across teams.

Technology adaptability: Design tokens can be translated into different technologies like CSS, SASS, and JavaScript for the web, and even used on native platforms like Android and iOS. This flexibility enables design consistency across a variety of platforms and devices.

Establishing a single source of truth

A key benefit of design tokens is their ability to serve as a single source of truth for both design and engineering teams. This ensures that multiple products or services maintain visual and functional consistency.

A translation tool takes one or more design token files as input and generates platform-specific code as output. Some translation tools can also produce documentation for the design tokens in the form of HTML. At the time of writing, popular translation tools include Style Dictionary, Theo, Diez or Specify App.


Figure 1: Translation tool

Automated design token distribution

In this section, we’ll explore how to automate the distribution of design tokens to product teams.

Let’s assume our goal is to provide teams with updated, tech-specific design tokens immediately after a designer makes a change. To achieve this, we can automate the translation and distribution process using a deployment pipeline for design tokens. Besides platform-specific code artifacts (like CSS for the web, XML for Android etc.), the pipeline might also deploy the documentation for the design tokens.

One crucial requirement is keeping design tokens under version control. Thankfully, plugins for popular design tools like Figma already integrate with Git providers like GitHub. It's considered best practice to use the Git repository as the single source of truth for design tokens—not the design tool itself. However, this requires the plugin to support syncing both ways between the repository and the design tool, which not all plugins do. As of now, Tokens Studio is a plugin that offers this bidirectional syncing. For detailed guidance on integrating Tokens Studio with different Git providers, please refer to their documentation. The tool enables you to configure a target branch and supports a trunk-based as well as a pull-request-based workflow.

Once the tokens are under version control, we can set up a deployment pipeline to build and deploy the artifacts needed by the product teams, which include platform-specific source code and documentation. The source code is typically packaged as a library and distributed via an artifact registry. This approach gives product teams control over the upgrade cycle. They can adopt updated styles by simply updating their dependencies. These updates may also be applied indirectly through updates of component libraries that use the token-based styles.


Figure 2: Automated design token distribution

This overall setup has allowed teams at Thoughtworks to roll out smaller design changes across multiple front-ends and teams in a single day.

Fully automated pipeline

The most straightforward way to design the pipeline would be a fully automated trunk-based workflow. In this setup, all changes pushed to the main branch will be immediately deployed as long as they pass the automated quality gates.

Such a pipeline might consist of the following jobs:

Check: Validate the design token files using a design token validator or a JSON validator.

  • Build: Use a translation tool like Style Dictionary to convert design token files into platform-specific formats. This job might also build the docs using the translation tool or by integrating a dedicated documentation tool.
  • Test: This job is highly dependent on the testing strategy. Although some tests can be done using the design token file directly (like checking the color contrast), a common approach is to test the generated code using a documentation tool such as Storybook. Storybook has excellent test support for visual regression tests, accessibility tests, interaction tests, and other test types.
  • Publish: Publish updated tokens to a package manager (for example, npm). The release process and versioning can be fully automated with a package publishing tool that is based on Conventional Commits like semantic-release. semantic-release also allows the deployment of packages to multiple platforms. The publish job might also deploy documentation for the design tokens.
  • Notify: Inform teams of the new token version via email or chat, so that they can update their dependencies.

    Figure 3: Fully automated deployment pipeline

    Pipeline including manual approval

    Sometimes fully automated quality gates are not sufficient. If a manual review is required before publishing, a common approach is to deploy an updated version of the documentation with the latest design token to a preview environment (a temporary environment).

    If a tool like Storybook is used, this preview might contain not only the design tokens but also show them integrated with the components used in the application.

    An approval process can be implemented via a pull-request workflow. Or, it can be a manual approval / deployment step in the pipeline.


    Figure 4: Deployment pipeline with manual approval

    Organizing tokens in layers

    As discussed earlier, design tokens represent design decisions as data. However, not all decisions operate at the same level of detail. Instead, ideally, general design decisions guide more specific ones. Organizing tokens (or design decisions) into layers allows designers to make decisions at the right level of abstraction, supporting consistency and scalability.

    For instance, making individual color choices for every new component isn’t practical. Instead, it’s more efficient to define a foundational color palette and then decide how and where those colors are applied. This approach reduces the number of decisions while maintaining a consistent look and feel.

    There are three key types of design decisions for which design tokens are used. They build on top of one another:

    • What design options are available to use?
    • How are those styles applied across the user interface?
    • Where exactly are those styles applied (in which components)?

    There are various names for these three types of tokens (as usual, naming is the hard part). In this article, we’ll use the terms proposed by Samantha Gordashko: option tokens, decision tokens and component tokens.

    Let’s use our color example to illustrate how design tokens can answer the three questions above.

    Option tokens: defining what design options are provided

    Option tokens (also called primitive tokens, base tokens, core tokens, foundation tokens or reference tokens) define what styles can be used in the application. They define things like color palettes, spacing/sizing scales or font families. Not all of them are necessarily used in the application, but they present reasonable options.

    Using our example, let’s assume we have a color palette with 9 shades for each color, ranging from very light to highly saturated. Below, we define the blue tones and grey tones as option-tokens:

    {
      "color": {
        "$type": "color",
        "options": {
          "blue-100": {"$value": "#e0f2ff"},
          "blue-200": {"$value": "#cae8ff"},
          "blue-300": {"$value": "#b5deff"},
          "blue-400": {"$value": "#96cefd"},
          "blue-500": {"$value": "#78bbfa"},
          "blue-600": {"$value": "#59a7f6"},
          "blue-700": {"$value": "#3892f3"},
          "blue-800": {"$value": "#147af3"},
          "blue-900": {"$value": "#0265dc"},
          "grey-100": {"$value": "#f8f8f8"},
          "grey-200": {"$value": "#e6e6e6"},
          "grey-300": {"$value": "#d5d5d5"},
          "grey-400": {"$value": "#b1b1b1"},
          "grey-500": {"$value": "#909090"},
          "grey-600": {"$value": "#6d6d6d"},
          "grey-700": {"$value": "#464646"},
          "grey-800": {"$value": "#222222"},
          "grey-900": {"$value": "#000000"},
          "white": {"$value": "#ffffff"}
        }
      }
    }

    Although it’s highly useful to have reasonable options, option tokens fall short of being sufficient for guiding developers on how and where to apply them.

    Decision tokens: defining how styles are applied

    Decision tokens (also called semantic tokens or system tokens) specify how those style options should be applied contextually across the UI.

    In the context of our color example, they might include decisions like the following:

    • grey-100 is used as a surface color.
    • grey-200 is used for the background of disabled elements.
    • grey-400 is used for the text of disabled elements.
    • grey-900 is used as a default color for text.
    • blue-900 is used as an accent color.
    • white is used for text on accent color backgrounds.

    The corresponding decision token file would look like this:

    {
      "color": {
        "$type": "color",
        "decisions": {
          "surface": {
            "$value": "{color.options.grey-100}",
            "description": "Used as a surface color."
          },
          "background-disabled": {
            "$value": "{color.options.grey-200}",
            "description":"Used for the background of disabled elements."
          },
          "text-disabled": {
            "$value": "{color.options.grey-400}",
            "description": "Used for the text of disabled elements."
          },
          "text": {
            "$value": "{color.options.grey-900}",
            "description": "Used as default text color."
          },
          "accent": {
            "$value": "{color.options.blue-900}",
            "description": "Used as an accent color."
          },
          "text-on-accent": {
            "$value": "{color.options.white}",
            "description": "Used for text on accent color backgrounds."
          }
        }
      }
    }

    As a developer, I would mostly be interested in the decisions, not the options. For example, color tokens typically contain a long list of options (a color palette), while very few of those options are actually used in the application. The tokens that are actually relevant when deciding which styles to apply, would be usually the decision tokens.

    Decision tokens use references to the option tokens. I think of organizing tokens this way as a layered architecture. In other articles, I have often seen the term tier being used, but I think layer is the better word, as there is no physical separation implied. The diagram below visualizes the two layers we talked about so far:


    Figure 5: 2-layer pattern

    Component tokens: defining where styles are applied

    Component tokens (or component-specific tokens) map the decision tokens to specific parts of the UI. They show where styles are applied.

    The term component in the context of design tokens does not always map to the technical term component. For example, a button might be implemented as a UI component in some applications, while other applications just use the button HTML element instead. Component tokens could be used in both cases.

    Component tokens can be organised in a group referencing multiple decision tokens. In our example, this references might include text- and background-colors for different variants of the button (primary, secondary) as well as disabled buttons. They might also include references to tokens of other types (spacing/sizing, borders etc.) which I'll omit in the following example:

    {
      "button": {
        "primary": {
          "background": {
            "$value": "{color.decisions.accent}"
          },
          "text": {
            "$value": "{color.decisions.text-on-accent}"
          }
        },
        "secondary": {
          "background": {
            "$value": "{color.decisions.surface}"
          },
          "text": {
            "$value": "{color.decisions.text}"
          }
        },
        "background-disabled": {
          "$value": "{color.decisions.background-disabled}"
        },
        "text-disabled": {
          "$value": "{color.decisions.text-disabled}"
        }
      }
    }

    To some degree, component tokens are simply the result of applying decisions to specific components. However, as this example shows, this process isn’t always straightforward—especially for developers without design experience. While decision tokens can offer a general sense of which styles to use in a given context, component tokens provide additional clarity.


    Figure 6: 3-layer pattern

    Note: there may be “snowflake” situations where layers are skipped. For example, it might not be possible to define a general decision for every single component token, or those decisions might not have been made yet (for example at the beginning of a project).

    How many layers shall I use?

    Two or three layers are quite common amongst the bigger design systems.

    However, even a single layer of design tokens already greatly limits the day-to-day decisions that need to be made. For example, just deciding what units to use for spacing and sizing became a somewhat nontrivial task with up to 43 units for length implemented in some browsers (if I counted correctly).

    A three-layer architecture should offer the best developer experience. However, it also increases maintenance effort and token count, as new tokens are introduced with each new component. This can result in a larger code base and heavier package size.

    Starting with two layers (option and decision tokens) can be a good idea for projects where the major design decisions are already in place and/or relatively stable. A third layer can still be added if there is a clear need.

    An additional component layer makes it easier for designers to change decisions later or let them evolve over time. This flexibility could be a driving force for a three-layer architecture. In some cases, it might even make sense to start with component tokens and to add the other layers later on.

    Ultimately, the number of layers depends on your project's needs and how much flexibility and scalability are required.

    Token scope

    I already mentioned that while option tokens are very helpful to designers, they might not be relevant for application developers using the platform-specific code artifacts. Application developers will typically be more interested in the decision/component tokens.

    Although token scope is not yet included in the design token spec, some design systems already separate tokens into private (also called internal) and public (also called global) tokens. For example, the Salesforce Lightning Design System introduced a flag for each token. There are various reasons why this can be a good idea:

    • it guides developers on which tokens to use
    • fewer options provide a better developer experience
    • it reduces the file size as not all tokens need to be included
    • private/internal tokens can be changed or removed without breaking changes

    A downside of making option tokens private is that developers would rely on designers to always make those styles available as decision or component tokens. This could become an issue in case of limited availability of the designers or if not all decisions are available, for example at the start of a project.

    Unfortunately, there is no standardized solution yet for implementing scope for design tokens. So the approach depends on the tool-chain of the project and will most likely need some custom code.

    File-based scope

    Using Style Dictionary, we can use a filter to expose only public tokens. The most straightforward approach would be to filter on the file ending. If we use different file endings for component, decision and option tokens, we can use a filter on the file path, for example, to make the option tokens layer private.

    Style Dictionary config

    const styleDictionary = new StyleDictionary({
        "source": ["color.options.json", "color.decisions.json"],
        "platforms": {
          "css": {
            "transformGroup": "css",
            "files": [
              {
                "destination": "variables.css",
                "filter": token => !token.filePath.endsWith('options.json'),
                "format": "css/variables"
              }
            ]
          }
        }
      });

    The resulting CSS variables would contain only these decision tokens, and not the option tokens.

    Generated CSS variables

    :root {
        --color-decisions-surface: #f8f8f8;
        --color-decisions-background-disabled: #e6e6e6;
        --color-decisions-text-disabled: #b1b1b1;
        --color-decisions-text: #000000;
        --color-decisions-accent: #0265dc;
        --color-decisions-text-on-accent: #ffffff;
      }

    A more flexible approach

    If more flexibility is needed, it might be preferable to add a scope flag to each token and to filter based on this flag:

    Style Dictionary config

     const styleDictionary = new StyleDictionary({
        "source": ["color.options.json", "color.decisions.json"],
        "platforms": {
          "css": {
            "transformGroup": "css",
            "files": [
              {
                "destination": "variables.css",
                "filter": {
                  "public": true
                },
                "format": "css/variables"
              }
            ]
          }
        }
      });

    If we then add the flag to the decision tokens, the resulting CSS would be the same as above:

    Tokens with scope flag

     {
        "color": {
          "$type": "color",
          "decisions": {
            "surface": {
              "$value": "{color.options.grey-100}",
              "description": "Used as a surface color.",
              "public": true
            },
            "background-disabled": {
              "$value": "{color.options.grey-200}",
              "description":"Used for the background of disabled elements.",
              "public": true
            },
            "text-disabled": {
              "$value": "{color.options.grey-400}",
              "description": "Used for the text of disabled elements.",
              "public": true
            },
            "text": {
              "$value": "{color.options.grey-900}",
              "description": "Used as default text color.",
              "public": true
            },
            "accent": {
              "$value": "{color.options.blue-900}",
              "description": "Used as an accent color.",
              "public": true
            },
            "text-on-accent": {
              "$value": "{color.options.white}",
              "description": "Used for text on accent color backgrounds.",
              "public": true
            }
          }
        }
      }

    Generated CSS variables

    :root {
        --color-decisions-surface: #f8f8f8;
        --color-decisions-background-disabled: #e6e6e6;
        --color-decisions-text-disabled: #b1b1b1;
        --color-decisions-text: #000000;
        --color-decisions-accent: #0265dc;
        --color-decisions-text-on-accent: #ffffff;
      }

    Such flags can now also be set through the Figma UI (if using Figma variables as a source of truth for design tokens). It is available as hiddenFromPublishing flag via the Plugins API.

    Should I use design tokens?

    Design tokens offer significant benefits for modern UI architecture, but they may not be the right fit for every project.

    Benefits include:

    • Improved lead time for design changes
    • Consistent design language and UI architecture across platforms and technologies
    • Design tokens being relatively lightweight from an implementation point of view

    Drawbacks include:

    • Initial effort for automation
    • Designers might have to (to some degree) interact with Git
    • Standardization is still in progress

    Consider the following when deciding whether to adopt design tokens:

    When to use design tokens

    1. Multi-Platform or Multi-Application Environments: When working across multiple platforms (web, iOS, Android…) or maintaining several applications or frontends, design tokens ensure a consistent design language across all of them.
    2. Frequent Design Changes: For environments with regular design updates, design tokens provide a structured way to manage and propagate changes efficiently.
    3. Large Teams: For teams with many designers and developers, design tokens facilitate collaboration.
    4. Automated Workflows: If you’re familiar with CI/CD pipelines, the effort to add a design token pipeline is relatively low. There are also commercial offerings.

    When design tokens might not be necessary

    1. Small projects: For smaller projects with limited scope and minimal design complexity, the overhead of managing design tokens might not be worth the effort.
    2. No issue with design changes: If the speed of design changes, consistency and collaboration between design and engineering are not an issue, then you might also not need design tokens.

    Acknowledgments

    Thanks to Berni Ruoff—I don't think I would have written this article without all the great discussions we had about design systems and design tokens (and for giving feedback on the first draft). Thanks to Shawn Lukas, Jeen Suratriyanont, Mansab Uppal and of course Martin for all the feedback on the subsequent drafts.

    Growing the development forest - with Martin Fowler

    Mike's Notes

    This interview with Martin Fowler was in a recent Refactoring Newsletter.

    Resources

    References

    • Reference

    Repository

    • Home > Ajabbi Research > Library > Subscriptions > Refactoring
    • Home > Handbook > 

    Last Updated

    17/05/2025

    Growing the development forest - with Martin Fowler

    By: Luca Rossi
    Refactoring: 24/01/2024

    Martin is chief scientist at ThoughtWorks. He is one of the original signatories of the Agile Manifesto and author of several legendary books, among which there is Refactoring, which shares the name with this podcast and this newsletter. 

    With Martin, we talked about the impact of AI on software development, from the development process to how human learning and understanding changes up to the future of software engineering jobs.

    Then we explored the technical debt metaphor, why it has been so successful, and Martin's own advice on dealing with it. And finally, we talked about the state of Agile, the resistance that still exists today towards many Agile practices and how to measure engineering effectiveness.

    (03:29) Introduction
    (05:20) Development cycle with AI
    (08:36) Less control and reduced learning
    (13:11) Splitting task between Human and AI
    (14:48) The skills shift
    (20:17) Betting on new technologies
    (27:22) Martin's Refactoring and technical debt
    (29:24) Accumulating "cruft"
    (33:14) Dealing with "cruft"
    (37:24) The financial value of refactoring
    (42:04) Measuring performances
    (46:19) Why the "forest" didn't spread
    (56:11) Make the forest appealing

    Show notes / useful links: