ECLASS Release 16.0

Mike's Notes

This is used by many manufacturers, including Rittal. This standard will be used by Pipi 9 for some of its internal models.

"ECLASS (formerly styled as eCl@ss) is a data standard for the classification of products and services using standardized ISO-compliant properties. The ECLASS Standard enables the digital exchange of product master data across industries, countries, languages or organizations. Its use as a standardized basis for a product group structure or with product-describing properties of master data is particularly widespread in ERP systems.

As an ISO-compliant and the world's only property-based classification standard, ECLASS also serves as a "language" for Industry 4.0 (IOTS)." 

- Wikipedia 

Resources

References

  • ECLASS

Repository

  • Home > Ajabbi Research > Library > Standards > ECLASS
  • Home > Handbook > 

Last Updated

29/11/2025

ECLASS Release 16.0

By: 
ECLASS Newsletter: 28/11/2025

License for ECLASS Release 16.0 in all available languages and all export formats. ECLASS 16.0 was released on Nov 27, 2025.

Product description

You purchase a license for ECLASS Release 16.0 in all available languages in the export formats BASIC (csv and XML) and ADVANCED (XML). Most language versions are fully translated, some are partially translated. In partly translated language versions, missing language content is filled in with the English original. You can receive language versions in 12 languages as default. All other language versions available from ECLASS can be ordered via our contact form.

The ECLASS Release 16.0 is an enhancement and modification of the ECLASS Release 15.0.

Technical innovations in ECLASS 16.0

No changes in the data model or the XML scheme.

New content

ECLASS 16.0 comprises in comparison to Release 15.0

  • 995 new classes (CC, AS, BL, AC), thereof 137 new classification classes (CCs) 
  • 1.253 new properties 
  • 985 new values 
  • 126 new value lists 

All available downloads at the ECLASS Shop generally contain a complete ECLASS version in csv format or XML format (for initial implementation). For information on the data structure, please refer to the enclosed README-file.

The content has been extensively expanded, particularly in the following segments: 

  • Segment 23 "Machine element, fastener, fixing, mounting" 
    • New commodity classes in 23-30 "Linear motion technology, Rotary systems", e.g. 
      • 23-30-17-00 "Linear motion module, Linear motion axis"
      • 23-30-18-00 "Electromechanical cylinder"
      • 23-30-24-00 "Electromechanical rotary actuator"
    • And new commodity classes on third level, e.g. 
      • 23-01-04-00 "Hand wheel / Crank handle (machine handle)"
      • 23-01-06-00 "Quick release fastener"
      • 23-01-07-00 "Leveling feet, leveling mount"
  • Segment 27 "Electric engineering, automation, process control engineering" 
    • New classes and enhancement of these with properties in 27-38-03-00 "Gripper (electric)", e.g.
      • 27-38-03-01 "Parallel gripper (electric)"
      • 27-38-03-02 "Angular gripper (electric)"
      • 27-38-03-03 "Rotary gripper (electric)"
    • New classes due to the ETIM Harmonisation, e.g. 
      • 27-21-07-17 "Immersion thermostat"
      • 27-21-07-18 "Fancoil thermostat"
      • 27-21-07-19 "Zone controller"
  • Segment 36 "Machine, apparatus" 
    • 36-64-14-00 "Machine Vision System"
  • Segment 50 "Interior furnishing", e.g. 
    • 50-11-09-25 "Rolling pin"
  • Segment 51 "Fluid power" 
    • New main group 51-58 "Electronics and software (hydraulics)", with new classification classes, e.g. 
      • 51-58-01-01 "Valve amplifier with feedback (hydraulics)"
      • 51-58-01-02 "Valve amplifier without feedback (hydraulics)" 
  • Extension of the "Material Declaration" aspect, which is attached to all classes
    • Includes information on critical ingredients and has been added for all classification classes (except classes describing services)
    • Enhancement with legal or regulatory requirement to disclose the specific components, compounds, or chemical substances that make up a manufactured item—ensuring transparency for safety, environmental compliance, or consumer awareness (to be found in Block 0173-1#01-AKA421#001)
  • Around 23.300 definitions of classification classes were generated using ChatGPT. If a definition has been generated by AI, this is indicated in the attribute "source of definition" with "machine generated by GPT-4o mini".

The product

With Release 7.0 an improved structure according to the underlying ISO13584 data model and an ISO-standard XML format (OntoML) was introduced. 

Starting with ECLASS Release 7.0, there are two different versions of ECLASS which contain the same classification classes but differ in the product description based on properties and values. Since Release 13.0, the ECLASS standard has included extensions for the Asset Administration Shell (AAS). Regarding the data model, in addition to BASIC and ADVANCED a new application class of the type "Asset" has been introduced.

BASIC (in csv or XML format)

The BASIC version contains only the content that could be represented in a csv format that was the exclusively used export format before 7.0. Therefore, it does not contain property block structures nor dynamic elements as in the ADVANCED version. BASIC does contain all classes of the ADVANCED, but the product description with the help of properties and values is structured a lot easier. BASIC is therefore a subset of ADVANCED and only includes properties that are flagged as "Basic relevant".

ADVANCED (only in XML format)

The ADVANCED version is the leading version in the database and built based on the data model ISO13584. It contains all structural elements of the ECLASS classification system including property blocks, dynamic elements such as reference properties, polymorphism, and cardinality blocks. The description of these extended structural components as well as additional information can be found under ADVANCED Version in our Technical Support.

Note: Each classification class refers to an ADVANCED, a BASIC application class, and an ASSET AC, that contains the product description with the help of properties and values. The ADVANCED version contains the complete content of ECLASS and is therefore an extension of the BASIC version.

In our Technical Support you find an overview (matrix of functionalities) which makes it possible for you to compare the two versions (BASIC and ADVANCED) and their possibilities. A provided reference table (XML-File: Mapping BASIC_ADVANCED_ADVANCED) enables users of the ADVANCED version to exchange technical data with users of the BASIC version. The necessary know-how must be delivered by the user of the ADVANCED version.

ASSET (only in XML format)

In Release 13.0, Asset Application Classes were created for all classes on the fourth level in the segments 17, 18, 19, 21, 23, 27, 28, 32, 33, 36, 49, 50 and 51. In addition, the so-called "submodel templates" for the AAS were created as aspects at the content level and were added to the Asset AC in these classes. In addition to the BASIC and ADVANCED representation, ECLASS e.V. publishes a derivation of the type "Asset". This contains all Asset Application Classes including the submodel template aspects.

ECLASS 16.0 (Asset) is free of charge for all users; no order is required. You can activate the product yourself:  

After entering the discount code "asset-158" under "My profile", ECLASS 16.0 (Asset) is immediately available to all registered ECLASS users free of charge in the personal download area.

Translations

For release 15.0, 14 additional languages have been added, meaning that ECLASS now offers a total of 31 languages in version 15.0. This means that ECLASS now has almost all official languages of the European Union in its portfolio and also covers many other international languages. The language content of the new language versions has been translated using the automatic translation tool DeepL. For the first time, complete translations of the content (‘preferred names’) can be provided in 29 of the 31 languages available in ECLASS.

Content

ECLASS 15.0 BASIC (csv)

  • classes file (complete)
  • properties file (contains properties marked as "Basic relevant")
  • keywords and synonyms file
  • values file (complete)
  • unit file (complete)
  • class-property relations file
  • value lists (restricted property-value relations file, only BOOLEAN properties)
  • proposal lists (suggested class-property-value relations file incl. constraints)
  • README file ECLASS standard

ECLASS 16.0 BASIC (XML) or ADVANCED (XML)

  • Dictionaries: include 39 XML files, hence for each segment one XML file that contains the complete content of the relevant segment (classes, keywords, properties, synonyms, values, value lists, units and all relations) (complete) 
  • Templates: Since Release 10.0.1, the "Templates" folder already known from previous Releases does not contain any content. The reason for this is the future handling of templates to deliver and publish only "default" templates with substantial content. This may change for future Releases. Templates contain a Data Requirement Statement for the data exchange, in which it can be defined, e.g. sequences or optional and mandatory fields between the data transmitter and the receiver. You can find more information in our Technical Support under Templates. 
  • ECLASS units ("UnitsML" in XML format) 
  • Only ADVANCED: Additionally for the ADVANCED user the mapping file BASIC - ADVANCED is included 
  • README-file 

In the following, you will find the structure of the XML files. The placeholder "xy" stands for the different language codes.

Structure of ECLASS 16.0 BASIC (XML)


Structure of ECLASS 16.0 ADVANCED (XML)

Building a Resilient Data Platform with Write-Ahead Log at Netflix

Mike's Notes

Lots of useful ideas and examples here about using write-ahead logs when scaling big.

Resources

References

  • Reference

Repository

  • Home > Ajabbi Research > Library >
  • Home > Handbook > 

Last Updated

29/11/2025

Building a Resilient Data Platform with Write-Ahead Log at Netflix

By: Prudhviraj Karumanchi, Samuel Fu, Sriram Rangarajan, Vidhya Arvind, Yun Wang, John Lu
Netflix TechBlog: 26/09/2025

Introduction

Netflix operates at a massive scale, serving hundreds of millions of users with diverse content and features. Behind the scenes, ensuring data consistency, reliability, and efficient operations across various services presents a continuous challenge. At the heart of many critical functions lies the concept of a Write-Ahead Log (WAL) abstraction. At Netflix scale, every challenge gets amplified. Some of the key challenges we encountered include:

  • Accidental data loss and data corruption in databases
  • System entropy across different datastores (e.g., writing to Cassandra and Elasticsearch)
  • Handling updates to multiple partitions (e.g., building secondary indices on top of a NoSQL database)
  • Data replication (in-region and across regions)
  • Reliable retry mechanisms for real time data pipeline at scale
  • Bulk deletes to database causing OOM on the Key-Value nodes

All the above challenges either resulted in production incidents or outages, consumed significant engineering resources, or led to bespoke solutions and technical debt. During one particular incident, a developer issued an ALTER TABLE command that led to data corruption. Fortunately, the data was fronted by a cache, so the ability to extend cache TTL quickly together with the app writing the mutations to Kafka allowed us to recover. Absent the resilience features on the application, there would have been permanent data loss. As the data platform team, we needed to provide resilience and guarantees to protect not just this application, but all the critical applications we have at Netflix.

Regarding the retry mechanisms for real time data pipelines, Netflix operates at a massive scale where failures (network errors, downstream service outages, etc.) are inevitable. We needed a reliable and scalable way to retry failed messages, without sacrificing throughput.

With these problems in mind, we decided to build a system that would solve all the aforementioned issues and continue to serve the future needs of Netflix in the online data platform space. Our Write-Ahead Log (WAL) is a distributed system that captures data changes, provides strong durability guarantees, and reliably delivers these changes to downstream consumers. This blog post dives into how Netflix is building a generic WAL solution to address common data challenges, enhance developer efficiency, and power high-leverage capabilities like secondary indices, enable cross-region replication for non-replicated storage engines, and support widely used patterns like delayed queues.

API

Our API is intentionally simple, exposing just the essential parameters. WAL has one main API endpoint, WriteToLog, abstracting away the internal implementation and ensuring that users can onboard easily.

rpc WriteToLog (WriteToLogRequest) returns (WriteToLogResponse) {...}
/**
  * WAL request message
  * namespace: Identifier for a particular WAL
  * lifecycle: How much delay to set and original write time 
  * payload: Payload of the message
  * target: Details of where to send the payload 
  */
message WriteToLogRequest {
  string namespace = 1;
  Lifecycle lifecycle = 2;
  bytes payload = 3;
  Target target = 4;
}
/**
  * WAL response message
  * durable: Whether the request succeeded, failed, or unknown
  * message: Reason for failure
  */
message WriteToLogResponse {
  Trilean durable = 1;
  string message = 2;
}

A namespace defines where and how data is stored, providing logical separation while abstracting the underlying storage systems. Each namespace can be configured to use different queues: Kafka, SQS, or combinations of multiple. Namespace also serves as a central configuration of settings, such as backoff multiplier or maximum number of retry attempts, and more. This flexibility allows our Data Platform to route different use cases to the most suitable storage system based on performance, durability, and consistency needs.

WAL can assume different personas depending on the namespace configuration.

Persona #1 (Delayed Queues)

In the example configuration below, the Product Data Systems (PDS) namespace uses SQS as the underlying message queue, enabling delayed messages. PDS uses Kafka extensively, and failures (network errors, downstream service outages, etc.) are inevitable. We needed a reliable and scalable way to retry failed messages, without sacrificing throughput. That’s when PDS started leveraging WAL for delayed messages.

"persistenceConfigurations": {
  "persistenceConfiguration": [
  {
    "physicalStorage": {
      "type": "SQS",
    },
    "config": {
      "wal-queue": [
        "dgwwal-dq-pds"
      ],
      "wal-dlq-queue": [
        "dgwwal-dlq-pds"
      ],
      "queue.poll-interval.secs": 10,
      "queue.max-messages-per-poll": 100
    }
  }
  ]
}

Persona #2 (Generic Cross-Region Replication)

Below is the namespace configuration for cross-region replication of EVCache using WAL, which replicates messages from a source region to multiple destinations. It uses Kafka under the hood.

"persistence_configurations": {
  "persistence_configuration": [
  {
    "physical_storage": {
      "type": "KAFKA"
    },
    "config": {
      "consumer_stack": "consumer",
      "context": "This is for cross region replication for evcache_foobar",
      "target": {
        "euwest1": "dgwwal.foobar.cluster.eu-west-1.netflix.net",
        "type": "evc-replication",
        "useast1": "dgwwal.foobar.cluster.us-east-1.netflix.net",
        "useast2": "dgwwal.foobar.cluster.us-east-2.netflix.net",
        "uswest2": "dgwwal.foobar.cluster.us-west-2.netflix.net"
      },
      "wal-kafka-dlq-topics": [],
      "wal-kafka-topics": [
        "evcache_foobar"
      ],
      "wal.kafka.bootstrap.servers.prefix": "kafka-foobar"
    }
  }
  ]
}

Persona #3 (Handling multi-partition mutations)

Below is the namespace configuration for supporting mutateItems API in Key-Value, where multiple write requests can go to different partitions and have to be eventually consistent. A key detail in the below configuration is the presence of Kafka and durable_storage. These data stores are required to facilitate two phase commit semantics, which we will discuss in detail below.

"persistence_configurations": {
  "persistence_configuration": [
  {
    "physical_storage": {
      "type": "KAFKA"
    },
    "config": {
      "consumer_stack": "consumer",
      "contacts": "unknown",
      "context": "WAL to support multi-id/namespace mutations for dgwkv.foobar",
      "durable_storage": {
        "namespace": "foobar_wal_type",
        "shard": "walfoobar",
        "type": "kv"
      },
      "target": {},
      "wal-kafka-dlq-topics": [
        "foobar_kv_multi_id-dlq"
      ],
      "wal-kafka-topics": [
        "foobar_kv_multi_id"
      ],
      "wal.kafka.bootstrap.servers.prefix": "kaas_kafka-dgwwal_foobar7102"
    }
  }
  ]
}

An important note is that requests to WAL support at-least once semantics due to the underlying implementation.

Under the Hood

The core architecture consists of several key components working together.

Message Producer and Message Consumer separation: The message producer receives incoming messages from client applications and adds them into the queue, while the message consumer processes messages from the queue and sends them to the targets. Because of this separation, other systems can bring their own pluggable producers or consumers, depending on their use cases. WAL’s control plane allows for a pluggable model, which, depending on the use-case, allows us to switch between different message queues.

SQS and Kafka with a dead letter queue by default: Every WAL namespace has its own message queue and gets a dead letter queue (DLQ) by default, because there can be transient errors and hard errors. Application teams using Key-Value abstraction simply need to toggle a flag to enable WAL and get all this functionality without needing to understand the underlying complexity.

  • Kafka-backed namespaces: handle standard message processing
  • SQS-backed namespaces: support delayed queue semantics (we added custom logic to go beyond the standard defaults enforced in terms of delay, size limits, etc)
  • Complex multi-partition scenarios: use queues and durable storage
  • Target Flexibility: The messages added to WAL are pushed to the target datastores. Targets can be Cassandra databases, Memcached caches, Kafka queues, or upstream applications. Users can specify the target via namespace configuration and in the API itself.


Architecture of WAL

Deployment Model

WAL is deployed using the Data Gateway infrastructure. This means that WAL deployments automatically come with mTLS, connection management, authentication, runtime and deployment configurations out of the box.

Each data gateway abstraction (including WAL) is deployed as a shard. A shard is a physical concept describing a group of hardware instances. Each use case of WAL is usually deployed as a separate shard. For example, the Ads Events service will send requests to WAL shard A, while the Gaming Catalog service will send requests to WAL shard B, allowing for separation of concerns and avoiding noisy neighbour problems.

Each shard of WAL can have multiple namespaces. A namespace is a logical concept describing a configuration. Each request to WAL has to specify its namespace so that WAL can apply the correct configuration to the request. Each namespace has its own configuration of queues to ensure isolation per use case. If the underlying queue of a WAL namespace becomes the bottleneck of throughput, the operators can choose to add more queues on the fly by modifying the namespace configurations. The concept of shards and namespaces is shared across all Data Gateway Abstractions, including Key-Value, Counter, Timeseries, etc. The namespace configurations are stored in a globally replicated Relational SQL database to ensure availability and consistency.


Deployment model of WAL

Based on certain CPU and network thresholds, the Producer group and the Consumer group of each shard will (separately) automatically scale up the number of instances to ensure the service has low latency, high throughput and high availability. WAL, along with other abstractions, also uses the Netflix adaptive load shedding libraries and Envoy to automatically shed requests beyond a certain limit. WAL can be deployed to multiple regions, so each region will deploy its own group of instances.

Solving different flavors of problems with no change to the core architecture

The WAL addresses multiple data reliability challenges with no changes to the core architecture:

  • Data Loss Prevention: In case of database downtime, WAL can continue to hold the incoming mutations. When the database becomes available again, replay mutations back to the database. The tradeoff is eventual consistency rather than immediate consistency, and no data loss.
  • Generic Data Replication: For systems like EVCache (using Memcached) and RocksDB that do not support replication by default, WAL provides systematic replication (both in-region and across-region). The target can be another application, another WAL, or another queue — it’s completely pluggable through configuration.
  • System Entropy and Multi-Partition Solutions: Whether dealing with writes across two databases (like Cassandra and Elasticsearch) or mutations across multiple partitions in one database, the solution is the same — write to WAL first, then let the WAL consumer handle the mutations. No more asynchronous repairs needed; WAL handles retries and backoff automatically.
  • Data Corruption Recovery: In case of DB corruptions, restore to the last known good backup, then replay mutations from WAL omitting the offending write/mutation.

There are some major differences between using WAL and directly using Kafka/SQS. WAL is an abstraction on the underlying queues, so the underlying technology can be swapped out depending on use cases with no code changes. WAL emphasizes an easy yet effective API that saves users from complicated setups and configurations. We leverage the control plane to pivot technologies behind WAL when needed without app or client intervention.

WAL usage at Netflix

Delay Queue

The most common use case for WAL is as a Delay Queue. If an application is interested in sending a request at a certain time in the future, it can offload its requests to WAL, which guarantees that their requests will land after the specified delay.

Netflix’s Live Origin processes and delivers Netflix live stream video chunks, storing its video data in a Key-Value abstraction backed by Cassandra and EVCache. When Live Origin decides to delete certain video data after an event is completed, it issues delete requests to the Key-Value abstraction. However, the large amount of delete requests in a short burst interfere with the more important real-time read/write requests, causing performance issues in Cassandra and timeouts for the incoming live traffic. To get around this, Key-Value issues the delete requests to WAL first, with a random delay and jitter set for each delete request. WAL, after the delay, sends the delete requests back to Key-Value. Since the deletes are now a flatter curve of requests over time, Key-Value is then able to send the requests to the datastore with no issues.

Requests being spread out over time through delayed requests

Additionally, WAL is used by many services that utilize Kafka to stream events, including Ads, Gaming, Product Data Systems, etc. Whenever Kafka requests fail for any reason, the client apps will send WAL a request to retry the kafka request with a delay. This abstracts away the backoff and retry layer of Kafka for many teams, increasing developer efficiency.

Backoff and delayed retries for clients producing to Kafka


Backoff and delayed retries for clients consuming from Kafka

Cross-Region Replication

WAL is also used for global cross-region replication. The architecture of WAL is generic and allows any datastore/applications to onboard for cross-region replication. Currently, the largest use case is EVCache, and we are working to onboard other storage engines.

EVCache is deployed by clusters of Memcached instances across multiple regions, where each cluster in each region shares the same data. Each region’s client apps will write, read, or delete data from the EVCache cluster of the same region. To ensure global consistency, the EVCache client of one region will replicate write and delete requests to all other regions. To implement this, the EVCache client that originated the request will send the request to a WAL corresponding to the EVCache cluster and region.

Since the EVCache client acts as the message producer group in this case, WAL only needs to deploy the message consumer groups. From there, the multiple message consumers are set up to each target region. They will read from the Kafka topic, and send the replicated write or delete requests to a Writer group in their target region. The Writer group will then go ahead and replicate the request to the EVCache server in the same region.


EVCache Global Cross-Region Replication Implemented through WAL

The biggest benefits of this approach, compared to our legacy architecture, is being able to migrate from multi-tenant architecture to single tenant architecture for the most latency sensitive applications. For example, Live Origin will have its own dedicated Message Consumer and Writer groups, while a less latency sensitive service can be multi-tenant. This helps us reduce the blast radius of the issues and also prevents noisy neighbor issues.

Multi-Table Mutations

WAL is used by Key-Value service to build the MutateItems API. WAL enables the API’s multi-table and multi-id mutations by implementing 2-phase commit semantics under the hood. For this discussion, we can assume that Key-Value service is backed by Cassandra, and each of its namespaces represents a certain table in a Cassandra DB.

When a Key-Value client issues a MutateItems request to Key-Value server, the request can contain multiple PutItems or DeleteItems requests. Each of those requests can go to different ids and namespaces, or Cassandra tables.

message MutateItemsRequest {
 repeated MutationRequest mutations = 1;
 message MutationRequest {
  oneof mutation {
    PutItemsRequest put = 1;
    DeleteItemsRequest delete = 2;
  }
 }
}

The MutateItems request operates on an eventually consistent model. When the Key-Value server returns a success response, it guarantees that every operation within the MutateItemsRequest will eventually complete successfully. Individual put or delete operations may be partitioned into smaller chunks based on request size, meaning a single operation could spawn multiple chunk requests that must be processed in a specific sequence.

Two approaches exist to ensure Key-Value client requests achieve success. The synchronous approach involves client-side retries until all mutations complete. However, this method introduces significant challenges; datastores might not natively support transactions and provide no guarantees about the entire request succeeding. Additionally, when more than one replica set is involved in a request, latency occurs in unexpected ways, and the entire request chain must be retried. Also, partial failures in synchronous processing can leave the database in an inconsistent state if some mutations succeed while others fail, requiring complex rollback mechanisms or leaving data integrity compromised. The asynchronous approach was ultimately adopted to address these performance and consistency concerns.

Given Key-Value’s stateless architecture, the service cannot maintain the mutation success state or guarantee order internally. Instead, it leverages a Write-Ahead Log (WAL) to guarantee mutation completion. For each MutateItems request, Key-Value forwards individual put or delete operations to WAL as they arrive, with each operation tagged with a sequence number to preserve ordering. After transmitting all mutations, Key-Value sends a completion marker indicating the full request has been submitted.

The WAL producer receives these messages and persists the content, state, and ordering information to a durable storage. The message producer then forwards only the completion marker to the message queue. The message consumer retrieves these markers from the queue and reconstructs the complete mutation set by reading the stored state and content data, ordering operations according to their designated sequence. Failed mutations trigger re-queuing of the completion marker for subsequent retry attempts.

Architecture of Multi-Table Mutations through WAL


Sequence diagram for Multi-Table Mutations through WAL

Closing Thoughts

Building Netflix’s generic Write-Ahead Log system has taught us several key lessons that guided our design decisions:

  • Pluggable Architecture is Core: The ability to support different targets, whether databases, caches, queues, or upstream applications, through configuration rather than code changes has been fundamental to WAL’s success across diverse use cases.
  • Leverage Existing Building Blocks: We had control plane infrastructure, Key-Value abstractions, and other components already in place. Building on top of these existing abstractions allowed us to focus on the unique challenges WAL needed to solve.
  • Separation of Concerns Enables Scale: By separating message processing from consumption and allowing independent scaling of each component, we can handle traffic surges and failures more gracefully.
  • Systems Fail — Consider Tradeoffs Carefully: WAL itself has failure modes, including traffic surges, slow consumers, and non-transient errors. We use abstractions and operational strategies like data partitioning and backpressure signals to handle these, but the tradeoffs must be understood.

Future work

  • We are planning to add secondary indices in Key-Value service leveraging WAL.
  • WAL can also be used by a service to guarantee sending requests to multiple datastores. For example, a database and a backup, or a database and a queue at the same time etc.

Acknowledgements

Launching WAL was a collaborative effort involving multiple teams at Netflix, and we are grateful to everyone who contributed to making this idea a reality. We would like to thank the following teams for their roles in this launch.

  • Caching team — Additional thanks to Shih-Hao Yeh, Akashdeep Goel for contributing to cross region replication for KV, EVCache etc. and owning this service.
  • Product Data System team — Carlos Matias Herrero, Brandon Bremen for contributing to the delay queue design and being early adopters of WAL giving valuable feedback.
  • KeyValue and Composite abstractions team — Raj Ummadisetty for feedback on API design and mutateItems design discussions. Rajiv Shringi for feedback on API design.
  • Kafka and Real Time Data Infrastructure teams — Nick Mahilani for feedback and inputs on integrating the WAL client into Kafka client. Sundaram Ananthanarayan for design discussions around the possibility of leveraging Flink for some of the WAL use cases.
  • Joseph Lynch for providing strategic direction and organizational support for this project.

Micro-Frontends: A Sociotechnical Journey Toward a Modern Frontend Architecture

Mike's Notes

A good way to think of this. It's added clarity to Pipi UI's architecture, so I need to make a tiny change to the CMS Engine, etc. Thanks, Luca.

The CMS Engine datamodel > page_section needed an additional column for the workflow process run by a specific team, agent or system.

30 minutes thinking, 1 minute coding. Done 😁

Now

Page > Module > Component

Resources

References

  • Reference

Repository

  • Home > Ajabbi Research > Library > Subscriptions > InfoQ Weekly Digest
  • Home > Handbook > 

Last Updated

28/11/2025

Micro-Frontends: A Sociotechnical Journey Toward a Modern Frontend Architecture

By: Luca Mezzalira
InfoQ: 24/11/2025

Principal Serverless Specialist Solutions Architect @AWS.

Luca Mezzalira is a Principal Solutions Architect and the author of Building Micro-Frontends. He helps organizations design evolutionary architectures and leads modernization programs for global engineering teams. Luca is also the host of the Micro-Frontends in the Trenches podcast, where he explores the future of distributed frontend systems getting deeper into heuristics, pitfalls, and lessons learned from teams that turned this architectural evolution into a competitive advantage. The podcast can be listened to on Spotify, Apple Podcasts or YouTube

  • Micro-frontends are different from components. Components are abstraction mechanisms designed for standardisation and reuse, while micro-frontends optimise for autonomy and flow.
  • Micro-frontends are not a technical pattern, but a sociotechnical shift mirroring Conway’s law.
  • The micro-frontends migration is not a binary decision but a continuum. The migration should start where autonomy brings the most value. The architecture needs to be aligned with the team structure.
  • Embrace duplication when it accelerates flow and favours iterative delivery over big rewrites. 
  • The first micro-frontend should go end-to-end: from design and development through deployment and observability. That vertical slice will surface every challenge you’ll face later - routing, shared dependencies, authentication, monitoring - on a scale that’s still manageable. 

For years, distributed systems have defined how we think about backend architecture. We’ve learned to break apart monoliths into independently deployable services, embracing autonomy, faster feedback, and continuous change. But on the frontend, many organisations are still trapped in the same cycle we escaped on the backend: large codebases that slow teams down, coupled deployments that introduce risk, and interfaces so entangled that any change becomes an exercise in fear management.

The rise of micro-frontends is not simply a reaction to this pain; it’s part of a deeper sociotechnical evolution. The same forces that once drove backend modularisation are now reshaping the frontend. As organisations demand faster delivery, greater autonomy, and continuous modernisation, our frontend architectures must evolve in step with our teams.

The distributed frontend era is here, but it’s not defined by new frameworks or fancy tooling. It’s defined by the way we align people, processes, and architecture around a shared goal: delivering value faster without losing control.

In this article, you’ll learn when micro-frontends make sense, how to evolve existing systems safely, and how to handle cross-cutting concerns like routing, state, and user experience without disrupting delivery. The goal is to show how micro-frontends represent not a trend, but a natural sociotechnical evolution.


Multiple teams working together on the same UI leveraging the independent nature of micro-frontends

Rethinking Micro-Frontends

This article focuses on architectural principles rather than any specific framework or vendor solution.

Micro-frontends are often introduced as a technical pattern - a way to break a large frontend into smaller, independently deployable pieces. But that framing misses the point. Micro-frontends are not a new stack; they are a new way of structuring work. They represent a sociotechnical shift - one that mirrors Conway’s Law, which tells us that system design reflects communication structures.

When teams are forced to coordinate through a single release train, decision-making slows. When every change requires syncing across multiple domains, creativity fades. The result is not just technical debt but organisational inertia. Micro-frontends reverse that dynamic. They allow teams to own slices of the product end-to-end - domain, design, delivery - without waiting for centralised approval.

It’s also essential to distinguish micro-frontends from components. Components are a software abstraction mechanism designed to standardise and reuse behaviours or interfaces within a single application or shared ecosystem. They optimise consistency and maintainability. Micro-frontends, instead, maximise independence and flow. They exist to reduce cognitive load, enable teams to make decisions autonomously, and accelerate delivery by removing cross-team dependencies. As a result, micro-frontends operate at a much coarser granularity than components. Where components encapsulate behaviours, micro-frontends encapsulate responsibility, for example, a complete vertical slice of the system owned and operated by a single team.

That autonomy doesn’t come for free. Distributed systems always bring complexity. But when applied thoughtfully, micro-frontends unlock a kind of evolutionary architecture that’s been missing from the client side: the ability to change direction safely and iteratively, at the speed of the business.

When Micro-Frontends Make Sense

Not every system needs to be decomposed. A small product built by a single team can thrive perfectly well with a modular monolith. The signal that you may need micro-frontends is rarely technical - it’s organisational.

If your release cadence slows down as your team grows, if changes in one area routinely break others, if onboarding new engineers feels like navigating an archaeological dig through years of intertwined code - these are the symptoms of a system that has outgrown its structure.

Micro-frontends address this by restoring local autonomy. Each team can ship independently, evolve its stack, and respond to customer needs without waiting for the rest of the organisation to move. The return on investment becomes visible early because, unlike a complete rewrite, migration to micro-frontends is incremental. Each small step delivers value on its own, reducing risk and building confidence.

At one media company I worked with, the shift from a shared frontend to domain-owned micro-frontends reduced coordination effort per release by more than half. Deployment frequency rose tenfold within months. That momentum created the trust needed to continue the migration safely.

But micro-frontends are not a silver bullet. For small teams or products with limited complexity, the overhead might outweigh the benefits. The goal is not to adopt a pattern for its own sake but to solve concrete problems: delivery bottlenecks, scaling limits, and the inability to modernise safely.

Evolving Existing Systems

The hardest part of adopting micro-frontends is not starting fresh; it’s evolving what already exists. Most organisations operate in brownfield environments - large, mature systems that can’t be simply replaced. The challenge is to introduce modularity without breaking everything that works today.

The first principle is to think iteratively. A successful migration doesn’t require shutting down the old world to build the new one. Instead, it uses the old system as scaffolding while gradually introducing new components. Each iteration becomes a slight, measurable improvement - an opportunity to validate assumptions, build new capabilities, and reduce uncertainty.

An iterative approach delivers faster ROI and lowers risk. It allows the business to continue operating while modernisation happens in parallel. It also keeps teams motivated, as each release produces visible progress rather than waiting months for the "big reveal".

From Big Bang to Incremental Change

In backend modernisation, the strangler fig pattern has long been used to gradually replace monolithic systems. The same principle applies beautifully to the frontend. Instead of mixing old and new code in the same pages, route traffic at the edge to decide which version of the application should serve a given request.

By placing routing logic at the CDN or edge layer, you can divert specific paths - say, /checkout or /dashboard - to a new micro-frontend while leaving the rest of the site untouched. If something goes wrong, rollback is instant: just change a routing rule. There’s no need to redeploy or revert code.


Micro-frontends iterative migration with edge-compute

This pattern also unlocks powerful release strategies. You can test new experiences with canary deployments, feature flags, or country-based rollouts, and collect honest feedback before a full release. This iterative rhythm builds trust between technical and business stakeholders. Each deployment becomes both a delivery and a learning opportunity.

The key is to resist the temptation to blend old and new UI within the same page. Mixing rendering systems multiplies complexity and breaks the isolation that makes micro-frontends valuable in the first place. Clear, page-level or route-level boundaries keep the migration safe and reversible.

Planning the Migration

Every migration is a balance between impact and safety. The first module you choose sets the tone for everything that follows. It should be meaningful enough to prove value but isolated enough to minimise risk.

In practice, this often means starting with a new feature or a module already scheduled for major refactoring. That way, the migration effort aligns naturally with business goals - you’re not just modernising for its own sake, you’re enabling new capabilities.

Your first micro-frontend should go end-to-end: from design and development through deployment and observability. That vertical slice will surface every challenge you’ll face later - routing, shared dependencies, authentication, monitoring - on a scale that’s still manageable. The lessons learned there will inform every subsequent migration step.

Think of it as your pilot. If it works, it becomes a reusable template; if it doesn’t, you’ve lost little and gained invaluable insight. Treat migration not as a project but as an ongoing evolutionary process. Each success builds momentum. Each mistake refines your heuristics.

Designing for Modularity

When starting a greenfield project, micro-frontends should align business domains, not technical layers. Instead of organising code by framework or feature type, design independent product capabilities that map to real user needs - catalogue, checkout, profile, analytics.

This domain-driven alignment enables micro-frontends to scale. Each module becomes a boundary between both code and communication. Teams own their space end to end, choosing their technologies, deployment pipelines, and release rhythms. Over time, this reduces coupling not only between systems but between people.

That autonomy requires a contract of trust. Shared guidelines - like how routing works, how design tokens are managed, or how observability is implemented - create coherence without reintroducing central control. The goal is a federation of teams, not anarchy of frameworks.

Evolutionary architecture is not about predicting the future; it’s about being ready for it. Designing change means optimising reversibility. Every decision - tooling, boundary, dependency - should be easy to revisit. The systems that last are those that can adapt, not those that were perfect on day one.

Handling Cross-Cutting Concerns

As with any distributed system, the most complex parts are the seams: routing, authentication, shared state, and user experience. These are the invisible threads that make a product feel cohesive.

Routing is the backbone of your migration. Centralising it at the edge keeps logic out of your application code and simplifies rollback. Using absolute URLs for navigation between systems ensures clarity and predictability. If something breaks, users never land in an undefined state - they’re simply redirected to the stable version.

In practice, this approach turns the edge into a single source of truth for traffic control during the migration period. Instead of embedding conditional logic inside every frontend, edge functions can decide, in milliseconds, whether a request should go to the legacy monolith or to a specific micro-frontend. This also enables progressive rollout strategies, such as canary releases or blue-green deployments, without touching your frontend code.

One of the most valuable advantages is the ability to roll back instantly without polluting your legacy codebase or new micro-frontends with temporary routing hacks. If an issue appears, you simply switch the routing rule at the edge, and all traffic flows back to the stable version. There’s no redeployment, no manual intervention inside the applications, and no need to maintain hybrid rendering layers that mix old and new UI code. The separation between systems remains clean and reversible, which is critical for long-running migrations.

Centralised routing also reduces cognitive load for teams and improves platform stability. Developers no longer need to maintain throwaway routing logic in multiple repositories or synchronise URL patterns between systems. It also simplifies observability, since all incoming requests pass through a single control point that can emit consistent metrics and logs.

Authentication can often be handled more simply than teams expect. As long as all micro-frontends share the same subdomain as the legacy app, they can access the same cookies and session data.

Implementing refreshed token logic in both worlds keeps sessions alive without complex cross-app communication.

State management in micro-frontends should remain local to each application to preserve independence and avoid cross-team coupling. When multiple micro-frontends need to communicate, prefer loosely coupled events that broadcast intent and data where necessary over enforcing shared runtime dependencies. This reinforces architectural boundaries while enabling collaboration where required.

Some configurations, such as authentication tokens, locale, or feature flags, can be shared across systems using stable mechanisms, such as cookies or local storage, or by injecting a lightweight context into each micro-frontend. The key is to keep this shared layer minimal and predictable.

Embracing Duplication (Wisely)

Duplication in distributed systems should never be accidental; it should be intentional. The goal is not to eliminate all repetition but to make conscious decisions about where duplication helps teams move faster and where it introduces unnecessary overhead.

Good heuristics lie at the intersection of complexity and rate of change. For example, a design system evolves quickly during its early phases but stabilises over time. Once mature, its release cycle slows down - nobody changes the design system every day. Similarly, a shared logging library across multiple micro-frontends has low volatility and clear behaviour, so centralising it makes sense. On the other hand, consider a complex video player in a streaming platform. Its heuristics may vary by browser or device to optimise buffering, latency, or startup times. The complexity is high and the rate of change is frequent, so duplication would only multiply maintenance pain without tangible benefits.

Conversely, if you’re dealing with a simple component - say, a basic form of field or small utility - that doesn’t change often and requires minimal effort to implement, feel free to duplicate it. Abstraction can always come later once patterns are proven, and the need for consistency is evident.

Every shared abstraction introduces governance, versioning, and ownership responsibilities. Leaving ownership inside the team responsible for a micro-frontend simplifies the process and preserves autonomy. The best abstractions are those that emerge naturally, not those imposed prematurely. Intentional duplication buys you speed and flexibility; thoughtful consolidation gives you long-term coherence. Balancing the two is what makes distributed architectures evolve gracefully.

The Fastest Path to Modernisation

One of the most persistent myths about micro-frontends is that you need microservices to support them. You don’t. The frontend and backend evolve at different speeds, and the frontend almost always moves faster.

Micro-frontends focus on how teams build and deliver interfaces, not on how data is served. As long as you have stable API contracts, your frontend can modernise independently of your backend. You can keep your monolithic APIs while introducing modular frontends on top of them. The stateless nature of frontends makes them ideal for incremental modernisation.

Backend migrations are often lengthy because data has gravity. Schema changes, replication strategies, and legacy dependencies can stretch timelines for months or years. Frontend migrations, by contrast, can deliver visible value in weeks. You can start improving performance, maintainability, and user experience immediately, without waiting for backend modernisation to catch up.

At a retail company I advised, the frontend migration to micro-frontends was completed in roughly 14 months. The backend modernisation took twice as long, but the organisation saw immediate value early on through faster releases and a better user experience, even while the backend was still monolithic.

The frontend can lead the way, serving as a proving ground for distributed practices and a catalyst for broader organisational change.

Modernisation at Human Speed

Modernisation is not a single event; it’s a journey. The temptation to start over is strong, especially when legacy systems feel immovable. But the "big bang" approach rarely succeeds. It freezes business progress, drains morale, and often ends in partial rewrites that never reach production.

Micro-frontends offer a different path, one that aligns with how organisations truly evolve. They let you move at human speed: fast enough to show progress, slow enough to stay safe. They encourage experimentation, continuous learning, and local ownership.

Every migration is a balancing act between ideal architecture and practical delivery. The teams that succeed are those that accept imperfections as part of the process. They know that good architecture is not about purity; it’s about flow.

If there’s a guiding principle to remember, it’s this: evolutionary beats revolutionary. Iterate, learn, and adapt. The systems you build today will change tomorrow, and that’s a feature, not a flaw.

For teams exploring this path, the distributed frontend era is not about frameworks or bundlers. It’s about designing systems and organisations that can evolve continuously, safely, and with purpose.

Conclusion

Micro-frontends are more than a technical pattern; they reflect how modern organisations build software. They embody the shift from centralised control to distributed ownership, from significant releases to continuous flow, from architectural rigidity to evolutionary change.

By approaching migration iteratively - starting small, learning fast, and keeping business goals at the centre - you can modernise your frontend without halting innovation. Whether your backend is still monolithic or already distributed doesn’t matter. What matters is your ability to evolve safely and deliver value continuously.

Architecture, at its best, mirrors the way people collaborate. When teams are empowered, systems follow. Micro-frontends are simply the architectural expression of that truth.

How to write a great agents.md: Lessons from over 2,500 repositories

Mike's Notes

Great working example here of how to do this. Agents.md is supported by multiple agents. Useful for future integration. Thanks, Matt.

Resources

References

  • Reference

Repository

  • Home > Ajabbi Research > Library >
  • Home > Handbook > 

Last Updated

28/11/2025

How to write a great agents.md: Lessons from over 2,500 repositories

By: Matt Nigh
GitHub Blog: 19/11/2025

Program Manager Director, I lead the AI for Everyone program at GitHub.

Learn how to write effective agents.md files for GitHub Copilot with practical tips, real examples, and templates from analyzing 2,500+ repositories.

We recently released a new GitHub Copilot feature: custom agents defined in agents.md files. Instead of one general assistant, you can now build a team of specialists: a @docs-agent for technical writing, a @test-agent for quality assurance, and a @security-agent for security analysis. Each agents.md file acts as an agent persona, which you define with frontmatter and custom instructions.

agents.md is where you define all the specifics: the agent’s persona, the exact tech stack it should know, the project’s file structure, workflows, and the explicit commands it can run. It’s also where you provide code style examples and, most importantly, set clear boundaries of what not to do.

The challenge? Most agent files fail because they’re too vague. “You are a helpful coding assistant” doesn’t work. “You are a test engineer who writes tests for React components, follows these examples, and never modifies source code” does.

I analyzed over 2,500 agents.md files across public repos to understand how developers were using agents.md files. The analysis showed a clear pattern of what works: provide your agent a specific job or persona, exact commands to run, well-defined boundaries to follow, and clear examples of good output for the agent to follow. 

Here’s what the successful ones do differently.

What works in practice: Lessons from 2,500+ repos

My analysis of over 2,500 agents.md files revealed a clear divide between the ones that fail and the ones that work. The successful agents aren’t just vague helpers; they are specialists. Here’s what the best-performing files do differently:

  • Put commands early: Put relevant executable commands in an early section: npm test, npm run build, pytest -v. Include flags and options, not just tool names. Your agent will reference these often.
  • Code examples over explanations: One real code snippet showing your style beats three paragraphs describing it. Show what good output looks like.
  • Set clear boundaries: Tell AI what it should never touch (e.g., secrets, vendor directories, production configs, or specific folders). “Never commit secrets” was the most common helpful constraint.
  • Be specific about your stack: Say “React 18 with TypeScript, Vite, and Tailwind CSS” not “React project.” Include versions and key dependencies.
  • Cover six core areas: Hitting these areas puts you in the top tier: commands, testing, project structure, code style, git workflow, and boundaries. 

Example of a great agent.md file

Below is an example for adding a documentation agent.md persona in your repo to .github/agents/docs-agent.md:

---
name: docs_agent
description: Expert technical writer for this project
---
You are an expert technical writer for this project.

## Your role
- You are fluent in Markdown and can read TypeScript code
- You write for a developer audience, focusing on clarity and practical examples
- Your task: read code from `src/` and generate or update documentation in `docs/`

## Project knowledge
- **Tech Stack:** React 18, TypeScript, Vite, Tailwind CSS
- **File Structure:**
  - `src/` – Application source code (you READ from here)
  - `docs/` – All documentation (you WRITE to here)
  - `tests/` – Unit, Integration, and Playwright tests

## Commands you can use
Build docs: `npm run docs:build` (checks for broken links)
Lint markdown: `npx markdownlint docs/` (validates your work)

## Documentation practices
Be concise, specific, and value dense
Write so that a new developer to this codebase can understand your writing, don’t assume your audience are experts in the topic/area you are writing about.

## Boundaries
- ✅ **Always do:** Write new files to `docs/`, follow the style examples, run markdownlint
- ⚠️ **Ask first:** Before modifying existing documents in a major way
- 🚫 **Never do:** Modify code in `src/`, edit config files, commit secrets

Why this agent.md file works well

  • States a clear role: Defines who the agent is (expert technical writer), what skills it has (Markdown, TypeScript), and what it does (read code, write docs).
  • Executable commands: Gives AI tools it can run (npm run docs:build and npx markdownlint docs/). Commands come first.
  • Project knowledge: Specifies tech stack with versions (React 18, TypeScript, Vite, Tailwind CSS) and exact file locations.
  • Real examples: Shows what good output looks like with actual code. No abstract descriptions.
  • Three-tier boundaries: Set clear rules using always do, ask first, never do. Prevents destructive mistakes.

How to build your first agent

Pick one simple task. Don’t build a “general helper.” Pick something specific like:

  • Writing function documentation
  • Adding unit tests
  • Fixing linting errors

Start minimal—you only need three things:

  • Agent name: test-agent, docs-agent, lint-agent
  • Description: “Writes unit tests for TypeScript functions”
  • Persona: “You are a quality software engineer who writes comprehensive tests”

Copilot can also help generate one for you. Using your preferred IDE, open a new file at .github/agents/test-agent.md and use this prompt:

Create a test agent for this repository. It should:

- Have the persona of a QA software engineer.
- Write tests for this codebase
- Run tests and analyzes results
- Write to “/tests/” directory only
- Never modify source code or remove failing tests
- Include specific examples of good test structure

Copilot will generate a complete agent.md file with persona, commands, and boundaries based on your codebase. Review it, add in YAML frontmatter, adjust the commands for your project, and you’re ready to use @test-agent.

Six agents worth building

Consider asking Copilot to help generate agent.md files for the below agents. I’ve included examples with each of the agents, which should be changed to match the reality of your project. 

@docs-agent

One of your early agents should write documentation. It reads your code and generates API docs, function references, and tutorials. Give it commands like npm run docs:build and markdownlint docs/ so it can validate its own work. Tell it to write to docs/ and never touch src/

  • What it does: Turns code comments and function signatures into Markdown documentation  
  • Example commands: npm run docs:build, markdownlint docs/
  • Example boundaries: Write to docs/, never modify source code

@test-agent

This one writes tests. Point it at your test framework (Jest, PyTest, Playwright) and give it the command to run tests. The boundary here is critical: it can write to tests but should never remove a test because it is failing and cannot be fixed by the agent. 

  • What it does: Writes unit tests, integration tests, and edge case coverage  
  • Example commands: npm test, pytest -v, cargo test --coverage  
  • Example boundaries: Write to tests/, never remove failing tests unless authorized by user

@lint-agent

A fairly safe agent to create early on. It fixes code style and formatting but shouldn’t change logic. Give it commands that let it auto-fix style issues. This one’s low-risk because linters are designed to be safe.

  • What it does: Formats code, fixes import order, enforces naming conventions  
  • Example commands: npm run lint --fix, prettier --write
  • Example boundaries: Only fix style, never change code logic

@api-agent

This agent builds API endpoints. It needs to know your framework (Express, FastAPI, Rails) and where routes live. Give it commands to start the dev server and test endpoints. The key boundary: it can modify API routes but must ask before touching database schemas.

  • What it does: Creates REST endpoints, GraphQL resolvers, error handlers  
  • Example commands: npm run dev, curl localhost:3000/api, pytest tests/api/
  • Example boundaries: Modify routes, ask before schema changes

@dev-deploy-agent

Handles builds and deployments to your local dev environment. Keep it locked down: only deploy to dev environments and require explicit approval. Give it build commands and deployment tools but make the boundaries very clear.

  • What it does: Runs local or dev builds, creates Docker images  
  • Example commands: npm run test
  • Example boundaries: Only deploy to dev, require user approval for anything with risk

Starter template

---
name: your-agent-name
description: [One-sentence description of what this agent does]
---
You are an expert [technical writer/test engineer/security analyst] for this project.

## Persona
- You specialize in [writing documentation/creating tests/analyzing logs/building APIs]
- You understand [the codebase/test patterns/security risks] and translate that into [clear docs/comprehensive tests/actionable insights]
- Your output: [API documentation/unit tests/security reports] that [developers can understand/catch bugs early/prevent incidents]

## Project knowledge
- **Tech Stack:** [your technologies with versions]
- **File Structure:**
  - `src/` – [what's here]
  - `tests/` – [what's here]

## Tools you can use
- **Build:** `npm run build` (compiles TypeScript, outputs to dist/)
- **Test:** `npm test` (runs Jest, must pass before commits)
- **Lint:** `npm run lint --fix` (auto-fixes ESLint errors)

## Standards
Follow these rules for all code you write:
**Naming conventions:**
- Functions: camelCase (`getUserData`, `calculateTotal`)
- Classes: PascalCase (`UserService`, `DataController`)
- Constants: UPPER_SNAKE_CASE (`API_KEY`, `MAX_RETRIES`)

**Code style example:**
```typescript
// ✅ Good - descriptive names, proper error handling
async function fetchUserById(id: string): Promise<User> {
  if (!id) throw new Error('User ID required');
  
  const response = await api.get(`/users/${id}`);
  return response.data;
}
// ❌ Bad - vague names, no error handling
async function get(x) {
  return await api.get('/users/' + x).data;
}

Boundaries
- ✅ **Always:** Write to `src/` and `tests/`, run tests before commits, follow naming conventions
- ⚠️ **Ask first:** Database schema changes, adding dependencies, modifying CI/CD config
- 🚫 **Never:** Commit secrets or API keys, edit `node_modules/` or `vendor/`


Key takeaways

Building an effective custom agent isn’t about writing a vague prompt; it’s about providing a specific persona and clear instructions.

My analysis of over 2,500 agents.md files shows that the best agents are given a clear persona and, most importantly, a detailed operating manual. This manual must include executable commands, concrete code examples for styling, explicit boundaries (like files to never touch), and specifics about your tech stack. 

When creating your own agents.md cover the six core areas: Commands, testing, project structure, code style, git workflow, and boundaries. Start simple. Test it. Add detail when your agent makes mistakes. The best agent files grow through iteration, not upfront planning.

Now go forth and build your own custom agents to see how they level up your workflow first-hand!


Agent.md supported Agents

  • Codex from OpenAI
  • Amp
  • Jules from Google
  • Cursor
  • Factory
  • RooCode
  • Aider
  • Gemini CLI from Google
  • Kilo Code
  • opencode
  • Phoenix
  • Zed
  • Semgrep
  • Warp
  • Coding agent from GitHub Copilot
  • VS Code logo
  • VS Code
  • Ona logo
  • Ona
  • Devin logo
  • Devin from Cognition
  • Coded Agents from UiPath