How To Design Complex Data Tables (+ Figma Kits)

Mike's Notes

Note

Resources

References

  • Reference

Repository

  • Home > Ajabbi Research > Library > Subscriptions > Smashing Magazine
  • Home > Handbook > 

Last Updated

17/05/2025

How To Design Complex Data Tables (+ Figma Kits)

By: Vitaly Friedman
Linkedin: 19/06/2024

Architecting a complex data table is quite an adventure. Wonderful work by Goldman Sachs team.

Complex data tables are difficult to get right. They always come along with filters, sorting, customization options, batch actions, cell states, pagination and a huge amount of data. Their purpose is usually to help people compare data points and find insights — yet navigating a table is often painfully slow and frustrating, especially on mobile.

Let’s explore practical techniques and useful Figma toolkits to help users find and compare the right data faster, without relying on endless horizontal scroll.

Architecting A Complex Data Table

When we start designing a complex data table, we need to first understand what features, states and accessories we actually need. Slava Shestopalov has put together a tree of table features — a practical overview of what goes into complex tables, along with all features, states, accessories that might need to be considered in the design process.


A comprehensive tree of features for a data table. Neatly put together by Slava Shestopalov.

In the design, we start with observing, collecting and prioritizing user needs. Based on them, we define a full set of complex functionality that we need — such as drag-n-drop, resizing, reshuffling or multi-sorting. These features will require separate accessibility considerations as all draggable controls must be keyboard-accessible due to WCAG 2.2 AA requirements.


The different types of cells, from the (incredible!) Goldman Sachs Design System.

Then, we define the different types of table cells that we need. Some of them will be accessible to everyone, others will have restrictions applied to them. So we discuss logic and permissions, such as read-only, comment-only or editable. We explore filtering, sorting and customization features. We discuss sticky headers and columns. And for each of them, we set default values, presets and templates.


For nested filters, you might consider an overlay with horizontal stacking, instead of a tree.

Eventually, we move to the fine little details of the data table design. Things like truncation, wrapping, stretching and resizing rules. We look at interaction design with validation rules and error messages. Some tables might require very long technical titles or localization, so stress test your design with very long and very short titles — this might also require compact, comfortable and condensed modes.


Data table design with a dedicated "Actions" buttons might perform better than hover actions.

And: whenever possible, try to avoid row hover actions: they often cause errors and rage clicks. Instead, use a standalone button ("Actions"), or few buttons, on each row instead.

Drawing a table tree diagram like the one pictured above is a good way to document your decisions — and understand the beast that is actually in front of you. A data table might seem like just another regular component, but its complexity is often underrated and it's effectiveness is often undermined. Especially when it comes to mobile display.

Useful resources:

Complex Data Tables on Mobile

We often assume that customers expect data tables to appear exactly the same on mobile and on desktop. That's not necessarily true. What they do expect is that features that they heavily rely on for their work exist in all environments — but these features don't have to work or look exactly the same way.


Row-column-data-tables are terribly inefficient on mobile — you might consider cards instead. Example: Goldman Sachs Design System.

In general, row-column-data-tables are terribly inefficient on mobile — that's where users often struggle, making mistakes and scrolling back and forth to make sure that they are looking at the right piece of data.

Instead, it's a good idea to think about the data alone, rather than its tabular structure. See how to aggregate data and span it across fewer columns. Show only what users really need, then show more on tap. And while doing so, try to leave out unnecessary data and details and eliminate repetition. For example, we could abbreviate dates, long labels, units of measure and currency. Replace statuses and permissions with icons and badges.


Users rarely need all columns and rows at once. We can use drop-downs to navigate and explore data cells in bulks. By Joe Winter.

As of interaction design, expand rows to show details if your data doesn't need much vertical space, and use a drawer when your data does need it — preferably instead, not in addition to, modal dialogs. However, don’t rely on tooltips or hover to show critical details.

It's worth noting that users rarely navigate through all columns in the table. So let them show and hide columns, for example with a “Columns” button. There, let them also re-arrange, lock and reset columns. You could use tabs above the table to change the view, or use tabs within the table to jump between columns.


Clever: use tabs within the data table to navigate its columns. By Netty Konovalova.

For row actions, you might be better off with a bottom sheet (edit, delete, move). A helpful way to make the content more accessible is by re-inggroup data from columns across multiple rows (pivoting). You could also combine columns within vertical accordions (stacked columns) and add a sticky filter in each column to help users navigate faster. Finally, if you do use pagination, show it above and below the data list.

Bottom line: Show only what users really need. Think “card”, not “row” to present a single record of data. Aggregate and re-group data across the table. You might not always need labels, but keep them available to screen reader users. And most importantly: re-organize, rethink and redesign data, rather than squeezing a multi-column table layout in a narrow mobile space.

Useful resources:

Data Tables Figma Kits

Designing data tables in Figma from scratch is remarkably tedious and time-consuming. You can get off the ground with a few helpful kits, kindly shared and released by the community:


Data Tables Figma Kit, by Jordan Hughes.

A huge thank you to contributors, authors and designers who spend time and effort and energy into making these resources available for everyone to use!

Naming convention for img files

Mike's Notes

Here is a naming convention for images that the Content Management System will use. This was inspired by a StackOverflow post listed below in resources.

This has not gone into production yet.

Resources

References

  • Reference

Repository

  • Home > Ajabbi Research > Library >
  • Home > Handbook > 

Last Updated

17/05/2025

Naming convention for img files

By: Mike Peters
On a Sandy Beach: 25/06/2024

Mike is the inventor and architect of Pipi and the founder of Ajabbi.

Pattern

  • {imageType}-{name}.{imageExtension}

{imageType}

  • icon (e.g., question mark icon for help content)
  • img (e.g., a header image inserted via <img /> element)
  • button (e.g., a graphical submit button)
  • bg (image is used as a background image in CSS)
  • sprite (image is used as a background image in CSS and contains multiple "versions")
{name}

  • Name to come from the design system

{imageExtension}

  • jpg
  • png
  • gig

Examples

  • icon-help.gif
  • img-logo.gif
  • sprite-main_headlines.jpg
  • bg-gradient.gif

Naming convention for assets

Mike's Notes

Here is a standardised directory structure for each Ajabbi website, which the Content Management System (CMS) will automatically render and provision.

This is now in production.

Resources

References

  • Reference

Repository

  • Home > Ajabbi Research > Library >
  • Home > Handbook > 

Last Updated

17/05/2025

Article

By: Mike Peters
On a Sandy Beach: 24/06/2024

Mike is the inventor and architect of Pipi and the founder of Ajabbi.

  • Use the 3-letter language codes from ISO 639-3, e.g. "eng".
  • Use the 2-letter country codes from ISO 3166-1 alpha-2, e.g. "nz".

  • Each localisation directory has its own assets subdirectory structure.

Directory Structure

  • eng/
  • eng-uk/
    • _assets/
      • ...
  • eng-us/
  • fra/
  • fra-ca/
  • ...
  • robots.txt
  • index.html
  • site.manifest
  • fravicon.ico
  • sitemap
  • assets/ (for static content used by the client (browser))
    • data/ (xml etc)
    • image/ (photos)
    • lib/ (unmodified libraries)
      • github/
        • [path]
          • [library/project name]
            • vX.Y.Z (version)
      • htmx/
      • jquery.com/
        • jquery/
          • jquery-migrate-3.4.1/ (version)
          • jquery-3.7.1.min/
      • react.dev/
        • ...
        • media/ (video, audio)
        • script/
          • js/
          • ts/
        • style/
          • css/
            • all/
              • component/
              • page/
            • print/
            • screen/
              • component/
              • page/
          • font/
            • adobe/
            • google/
          • img/ (icons)

      WCAG 2.2 is here

      Mike's Notes

      WCAG 2.2 is the new accessibility standard that Ajabbi must work very hard to meet.

      Resources

      References

      • Reference

      Repository

      • Home > Ajabbi Research > Library >
      • Home > Handbook > 

      Last Updated

      17/05/2025

      WCAG 2.2 is here

      By: Chris Pycroft
      Intopia: 05/10/2023

      "The World Wide Web Consortium (W3C) has finalised the next version of the Web Content Accessibility Guidelines, known as WCAG 2.2.

      After much anticipation, the latest version of the Web Content Accessibility Guidelines (WCAG) has finally arrived. Version 2.2 has officially been made a ‘Recommendation’ by the World Wide Web Consortium (also known as the W3C), which means it is stable with no further changes. For any organisation wanting to conformance to the latest version of WCAG, WCAG 2.2 will be your go-to version.

      It’s the first major update to the Web Content Accessibility Guidelines in nearly five and a half years, with the last update, WCAG 2.1, launched in June 2018.

      So what’s new in WCAG 2.2?

      As a part of the updated Guidelines, there are 9 new success criteria.

      There are 2 new Level A criteria:

      • 3.2.6 Consistent Help
      • 3.3.7 Redundant Entry

      There are 4 new Level AA criteria:

      • 2.4.11 Focus Not Obscured (Minimum)
      • 2.5.7 Dragging Movements
      • 2.5.8 Target Size (Minimum)
      • 3.3.8 Accessible Authentication (Minimum)

      There are 3 new Level AAA criteria.

      • 2.4.12 Focus Not Obscured (Enhanced)
      • 2.4.13 Focus Appearance
      • 3.3.9 Accessible Authentication (Enhanced)
      Most of the new criteria are design-driven requirements.  They also take into consideration the increased use of certain technologies, such as multi-factor authentication.

      There is also one change to an existing success criterion – 4.1.1 Parsing (Level A) has been marked as obsolete and removed. This criteria, released as part of WCAG 2.0, was necessary when assistive technologies needed to directly parse HTML, and parsing errors weren’t handled as well as they are today. As a result, issues that were previously covered by this criterion either no longer exist, or are covered by other success criteria. Notes have also been added to previous versions of WCAG (2.0 and 2.1) to reflect this update.

      Does this mean I need to start thinking about WCAG 2.2?

      The short answer is yes.

      Now that the new version is finalised, the W3C recommends that websites meet WCAG 2.2. Embracing the new requirements also means a more accessible experience for people using your product or service.

      We anticipate that policies and standards that reference WCAG 2.1 or below, such as the European Commission’s Web Accessibility Directive, or EN 301 549, will be updated to reflect WCAG 2.2 in the future. We’ll keep you updated of changes over the coming months. In the meantime, to future-proof your products and services, consider moving to WCAG 2.2 as soon as you can, especially for new content or features.

      The good news is that if you’re already meeting all the requirements of WCAG 2.1 (or are in the process doing so), you’re already most of the way there. The most common conformance level that organisations typically aim for is Level AA. This means that there are 6 new Level A and AA success criteria that you need to take into consideration. By conforming to WCAG 2.2, you’ll also conform to previous versions of the Guidelines (2.1 and 2.0).

      Where can I find more information about WCAG 2.2?

      There is already some useful information and resources that are available to you, and there’s also plenty more on the way.

      The main source of truth, the World Wide Web Consortium, has published the standard in full, along with supporting documentation to help you understand the guidelines:

      We also have an updated version of our much-loved WCAG Map! The WCAG 2.2 Map is now available and can be used to map out what success criteria you need to consider (see what we did there).

      We’ll also be hosting an Ask Us Anything free webinar on Wednesday 18 October at 1pm AEDT (GMT +11). You’ll have the opportunity to hear from our team about what they think of the updates to WCAG, and ask them any questions about WCAG 2.2.

      We’ll also have more information and helpful resources available over the coming weeks. These will help you understand exactly what the new success criteria are, and how to meet them. Follow us on LinkedIn, X (formerly Twitter), Facebook and YouTube, or subscribe to our newsletter (which you can do at the bottom of this page) to stay up to date. We’ll also be updating our Not-Checklist soon, stay tuned.

      If you’re keen to hop down the rabbit hole and begin the journey to meeting WCAG 2.2, we’re here to help. In fact, we’ve already helped some of our client organisations who have been implementing WCAG 2.2 before it was finalised. Contact us through our website, and someone from our team will get back to you.

      Happy WCAG 2.2 release day!"

      WCAG 2.2 Map



      Tool Options for the Design System

      Mike's Notes

      I need to determine what third-party design and management software to use and how to integrate it with the Ajabbi Design System.

      This is for both future customers and the Ajabbi team.

      These notes came from Vitaly Friedman, editor of Smashing Magazine, a fantastic resource.

      Resources

      • Resource

      References

      • Reference

      Repository

      • Home > Ajabbi Research > Library >
      • Home > Handbook > 

      Last Updated

      17/04/2025

      Tool Options for the Design System 

      By: Vitaly Friedman
      Smashing Magazine: 22/06/2024

      Requirements

      Needs to work with design tokens.

      Options

      • Figma
      • Token Studio
      • SuperNova

      Figma

      Notes

      "Figma is a collaborative web application for interface design, with additional offline features enabled by desktop applications for macOS and Windows. The feature set of Figma focuses on user interface and user experience design, with an emphasis on real-time collaboration, utilising a variety of vector graphics editor and prototyping tools. The Figma mobile app for Android and iOS allows viewing and interacting with Figma prototypes in real-time on mobile and tablet devices." - Wikipedia

      Pricing

      • Starter (free)
      • Professional ($15/m per seat)
      • Organisation ($45/m per seat)
      • Enterprise ($75/m per seat)

      Resources

      Token Studio

      Notes

      Tokens Studio for Figma is a Figma Plugin allowing you to integrate Tokens into your Figma designs.

      It gives you reusable tokens that can be used for a whole range of design options, from border radius or spacer units to semantic color and typography styles. It allows you to change tokens and see these changes applied to the whole document or its styles. - Token Studio

      Pricing

      • Free (free)
      • Pro (19 Euro/m per user)

      Resources

      SuperNova

      Notes

      "Supernova connects your design and engineering data in a single design system tool to accelerate and scale your product development."

      "Supernova manages the entire design system lifecycle in one place. It's designed to fit with the way your team already works — without changing tools or maintaining self-built workflows and integrations — to enable your team to build better products.

      Connect Figma files to your Supernova design system, sync design tokens via our Tokens Studio integration, and import variables from Figma via our plugin to ensure you constantly have the most-up-to-date design data in your design system.

      Create advanced documentation using this data to keep documentation in sync whenever changes are made. Finally, connect design and code by automating code delivery with code pipelines to deliver tokens, styles, icons, components and documentation to your codebases." - SuperNova

      Pricing

      • Free (free)
      • Team ($45/m per seat)
      • Company ($75/m per seat)

      Resources

      Alternatives

      Open-source

      • Penpot
      • Lunacy

      Paid

      • Adobe XD
      • UXPin
      • Zeplin
      • LucidChart
      • Miro
      • Framer
      • AxureRP

      The Ontolog Forum

      Mike's Notes

      Some years ago, while searching for examples of how to build temporal databases, I came across Dr. Matthew West. He had worked on and written about an approach called 4Dism. I discovered archives of his papers and articles to read.

      Then I discovered he was a member of the Ontolog Forum, which looked very useful. So, I asked him if I could join. It has a rich Google Group

      Resources

      References

      • Reference

      Repository

      • Home > Ajabbi Research > Library >
      • Home > Handbook > 

      Last Updated

      17/05/2025

      The Ontolog Forum

      By: Mike Peters
      On a Sandy Beach: 21/06/2024

      Mike is the inventor and architect of Pipi and the founder of Ajabbi.

      Discussion Example


      12/06/2024 Post

      "Guys,

      >> However, IMHO, If the model is a model of the “real world” then is it not a “data” model.  I would rather call things what they are: if it’s a “data model” you’re modelling data structures.  If you’re modelling the “real world” then it is not a “data” model – because it’s not a model of data.  (The term “conceptual data model” should be banished from everyone’s vocabulary!)  If you are modelling the real world, it could be called an “ontology”. 

      Bill and I have been having a long conversation about this offline.

      Bill is , of course, quite right about the mainstream orthodox view of modelling in the IS community. 
      This envisages a starting point where one develops a model of the real world (the model represents/refers to the real world)
      At some stage (the data modelling stage) one then uses exactly the same model as a data model to represent the data structure of the system you are building. Where, of course, some of the data then represents the real world - so a two level model. 

      I have, in the last decade or so, come to think that there is a different and better way to describe what is going on - one that opens up new ways of working.

      The kind of conceptual models we build using things like UML modelling tools, it seems to me, can be regarded as representing the real world, but also (at the same time) showing how this representation can be captured as data.
      (For more on showing see Macbeth, D. (2012). Seeing How It Goes: Paper-and-Pencil Reasoning in Mathematical Practice. Philosophia Mathematica, 20(1), 58–85. https://doi.org/10.1093/philmat/nkr006)
      So the shift from conceptual to data models can be seen as not a shift in representation but different roles for the same model - one of representing the other of showing.
      And that during the SDLC this model evolves becoming a closer and closer match for how it shows the data in the system - and then (with modern MDA tools) it becomes the system.

      Interestingly, this then gives these IS models a similar sense to models elsewhere, e.g. in architecture, where a scale model is built to show what the building would look like.
      Also, closer to the sense of model in science that is explored by e.g. Margaret Morrison and Mary Morgan.

      There is a lot more to say, but just two quick points.
      If one is reverse engineering, so  travering the SDLC in the opposite direction, then this showing perspective is a much better match for the practice we follow in bCLEARer. So we start with a system and then we 'evolve' the whole system to one where the 'conceptual' underpinning (the real world representing) is much clearer and explicit. 
      If one accepts that one is both representing the real world and showing the system, then the current practices of only modelling (representing and showing) entity types at the conceptual modelling stage seems odd - why not model the individual entities - the particulars. ..." - Chris Partridge

      17/06/2024 Post

      "Hi John and other interested parties,

      There was a thread on this forum a while back that ultimately lead to my publication of a presentation titled “Understanding Data” to slideshare, which also has a safer PDF variantPDF variant on our website.

      Basically, I defined data as observation in reusable form.b..." - Kingsley Idehen

      How to target only IE (any version) within a stylesheet?

      Mike's Notes

      I need to make Ajabbi backwards-compatible with any old versions of IE that are still in use.

      This is an issue with older people who have kept the same computer for 20 years and impoverished countries relying on old gear.

      Getting the Content Management System (CMS) to do this is a minor issue. It's knowing what code to use. And here it is.

      From StackOverflow some years ago.

      Resources

      References

      • Reference

      Repository

      • Home > Ajabbi Research > Library >
      • Home > Handbook > 

      Last Updated

      18/04/2025

      How to target only IE (any version) within a stylesheet?

      By:
      Stackoverflow: 09/02/2015

      Internet Explorer 9 and lower:

      You could use conditional comments to load an IE-specific stylesheet for any version (or combination of versions) that you wanted to target specifically, like the one below, using an external stylesheet.

      <!--[if IE]>
        <link rel="stylesheet" type="text/css" href="all-ie-only.css" />
      <![endif]-->

      However, beginning in version 10, conditional comments are no longer supported in IE.

      Internet Explorer 10 & 11:

      Create a media query using -ms-high-contrast, where you place your IE 10 and 11-specific CSS styles. Because -ms-high-contrast is Microsoft-specific (and only available in IE 10+), it will only be parsed in Internet Explorer 10 and greater.

      @media all and (-ms-high-contrast: none), (-ms-high-contrast: active) {
           /* IE10+ CSS styles go here */
          }
          
      Microsoft Edge 12 :

      Can use the @supports rule Here is a link with all the info about this rule.

      @supports (-ms-accelerator:true) {
        /* IE Edge 12+ CSS styles go here */ 
      }
      

      Inline rule IE8 detection I have 1 more option but only detect IE8 and below version.

        /* For IE css hack */
        margin-top: 10px\9 /* apply to all ie from 8 and below */
        *margin-top:10px;  /* apply to ie 7 and below */
        _margin-top:10px; /* apply to ie 6 and below */

      As you specified for an embedded stylesheet. I think you need to use media query and conditional comment for the below version.

      Visual design rules you can safely follow every time

      Mike's Notes

      I came across this via Smashing Magazine. It has detailed CSS settings to use.

      Resources

      References

      • Reference

      Repository

      • Home > Ajabbi Research > Library >
      • Home > Handbook > 

      Last Updated

      17/05/2025

      Article

      By: Mike Peters
      On a Sandy Beach: 19/06/2024

      Mike is the inventor and architect of Pipi and the founder of Ajabbi.

      "You do not have to follow these rules every time. If you have a good reason to break any of them, do. But they are safe to follow every time. ..." - Anthony Hobday

      Ten Things You Need to Know About Your Autistic Employee

      Mike's Notes

      Note

      Resources

      References

      • Reference

      Repository

      • Home > Ajabbi Research > Library >
      • Home > Handbook > Teams > Disability

      Last Updated

      17/05/2024

      Ten Things You Need to Know About Your Autistic Employee

      By: Dr. Michelle Garnett and Professor Tony Attwood
      Attwood & Garnett: 18/06/2024

      By Dr. Michelle Garnett and Professor Tony Attwood

      In today’s dynamic and diverse workplace, it is crucial to recognize the unique strengths and perspectives that neurodivergent individuals bring to the table. Autistic employees can be a tremendous asset to any organization, provided they are understood, supported, and valued. Here are ten important considerations for employers to keep in mind when interviewing and employing autistic individuals, grounded in research and a strengths-based approach.

      Focus on Strengths, Not Stereotypes

      Autistic individuals often possess exceptional abilities in various areas, such as single-minded focus, attention to detail, pattern recognition, and ethical and creative problem-solving. For example, many autistic people excel in roles that require precision and analytical thinking, others excel in the visual and dramatic arts, and others in the caring professions. They often have a strong moral compass, and are loyal, hard-working, committed, and compassionate. Recognizing and valuing these strengths in your autistic employee raises the bar for all employees.

      Clear and Direct Communication

      Clear, direct, and unambiguous communication is often the most effective way to interact with autistic individuals. Some autistic people find it challenging to interpret non-verbal cues or implied meanings, whilst others have made an art of it. Others find auditory information a struggle to quickly interpret and later remember. To enhance communication further for some, make it visual. Providing clear instructions and feedback will enhance understanding, performance and motivation.

      Structured and Predictable Environment

      Many people dislike change and uncertainty, but it is important to know that for an autistic person these are significant stressors. Thus, a structured work environment with predictable routines can help autistic employees thrive. Sudden changes or chaotic settings can interfere with work performance because they are so stressful. When possible, give advance notice of changes to schedules or tasks, and maintain a consistent work environment.

      Sensory Considerations

      Many autistic individuals are sensitive to sensory stimuli such as bright lights, loud noises, or strong smells. Be mindful of the sensory environment and make accommodations as needed. This might include offering noise-cancelling headphones, adjusting lighting, or providing a quiet, uncluttered workspace. Offering a retreat space where there is a very minimal sensory load can go a long way to assisting an autistic person to re-calibrate as needed throughout their working hours. Conduct a sensory assessment of the workplace with your employee and regularly check in to ensure that any adjustments are working.

      Inclusive Interview Techniques

      Traditional interview processes tend not to showcase the strengths of autistic candidates. Consider alternative interview methods such as practical assessments, work trials, or having an autistic interviewer on the panel. This allows candidates to demonstrate their abilities in a comfortable setting. If a candidate has disclosed their autism prior to interview, reach out to ask about any adjustments that help them feel more at ease, including sensory adjustments for the setting, like wearing a visor, or providing interview questions in advance.

      Support for Social Interactions

      Social interactions in the workplace can be challenging for both the autistic and non-autistic employees. Offer support, such as a mentor, to your autistic employees and organise training in autism for your nonautistic employees. Co-discover team-building activities that are inclusive and respectful of neurodiversity. For example, an autistic employee may not enjoy a weekend away with work colleagues with no space to recharge their battery in solitude, especially if the expectation is nonstop socialising.

      Reasonable Accommodations

      Under different legislation in different countries, employers are required to provide reasonable accommodations to autistic employees. These might include sensory accommodations as above, flexible work hours, remote work options, or specific tools and technologies. However, any accommodations are severely undermined when there is workplace stigma for asking for them or seeing them implemented. Organise training in autism for staff to debunk common myths and misconceptions and to teach about the realities of autism to directly fight negative stigma. Discuss openly with the employee to determine what accommodations are necessary for their success. Check in with other employees about their needs also, many accommodations for autistic people work very well for humans in general.

      Focus on Connection and Well-being

      Autistic individuals may be more prone to anxiety or stress, especially in a work environment that is not accommodating. A non-accommodating work environment tells the person that their concerns are not important, and even worse, not valid. Autistic people are very perceptive of emotional atmosphere. If they feel unsupported, they are likely to feel unsafe, and their well-being will be affected. Promote a culture of connection and well-being by offering resources such as a focus on healthy relationships at work that are driven by caring and respect, access to stigma-free counselling services as needed for all employees, stress management programmes, and creating a supportive, flexible work environment. All employees will benefit.

      Professional Development Opportunities

      Invest in the professional development of autistic employees. Provide opportunities for further training and career advancement. Recognize their potential for growth and offer pathways for them to enhance their skills and advance within the company. Autistic employees, like all employees, suffer stress when there are too many demands or too few. Many autistic people are driven, achievement-oriented, thrive on being challenged and love learning.

      Create an Inclusive Culture

      Fostering an inclusive workplace culture benefits everyone. Educate all employees about neurodiversity and the value it brings to the organization. Encourage empathy, respect, and understanding. Celebrate the contributions of autistic employees and ensure they feel valued and included.

      Conclusion

      Employing autistic individuals is not just about compliance with legal requirements; it is about embracing diversity and reaping the benefits of a diverse workforce. By understanding and accommodating the unique needs of autistic employees, employers can create a more inclusive, productive, and innovative workplace. This approach not only benefits autistic individuals but also enhances the overall organizational culture, leading to greater success and fulfillment for all employees.

      Where to from here?

      We have prepared a half-day training on autism, Autism Working, for employers, autistic and non-autistic employees, autistic people looking for work, and parents and family members. We will be discussing the advantages of autism in the workplace, common challenges, and ways to navigate the challenges successfully.

      Reliably Processing Trillions of Kafka Messages Per Day

      Mike's Notes

      I discovered this by reading the Walmart Global Tech Blog (SubStack) by Ravinder Matte, 15/06/2024

      Resources

      References

      • Reference

      Repository

      • Home > Ajabbi Research > Library >
      • Home > Handbook > 

      Last Updated

      17/05/2025

      Reliably Processing Trillions of Kafka Messages Per Day

      By: Ananth Packkildurai
      Dtata Engineering Weekly 17/06/2024

      Summary

      Walmart deploys Apache Kafka, with 25K+ Kafka consumers, across multiple clouds (public and private). Supporting business critical use cases including data movement, event driven microservices and streaming analytics. These use-cases demand 4 nines (i.e., 99.99) of availability and require us to quickly drain any backlogs arising from sudden traffic spikes. At Walmart scale, we have a diverse set of Kafka consumer applications written in multiple languages. This diversity combined with our reliability requirements require consumer applications to adopt best practices to ensure high availability SLOs. High consumer lag due to Kafka consumer rebalancing is the most common challenge in operationalizing Kafka consumers at scale. In this article we highlight how Apache Kafka messages are reliably processed at a scale of trillions of messages per day with low cost and elasticity.

      Challenges

      Consumer rebalancing

      A frequent problem encountered in production deployments of Kafka had to do with consumer rebalancing. Kafka rebalancing is the process by which Kafka redistributes partitions across consumers to ensure that each consumer is processing a roughly equal number of partitions. This ensures that data processing is distributed evenly across consumers and that each consumer is processing data as efficiently as possible. Kafka applications run either on containers or on VMs (a.k.a. Virtual Machines). For this article’s purposes, we focus on containers as that is prevalent in industry today. Kafka consumer applications built as container images run on WCNP (Walmart Cloud Native Platform) — an enterprise-grade, multi-cloud, container orchestration framework built on top of Kubernetes. Consumer rebalancing can thus be triggered by multiple causes, including:

      A consumer pod entering a consumer group: This can be caused by K8s deployments or rolling-restarts or automatic/manual scale-outs.

      The Kafka broker believing that a consumer has failed (e.g., if the broker has not received a heartbeat from a consumer within session.timeout.ms ): This will be triggered if the JVM exits or has a long stop-the-world garbage collection pause

      The Kafka broker believing a consumer is stuck (e.g., if the consumer takes longer than max.poll.interval.ms to poll for the next batch of records to consume): This will be triggered if processing of the previously polled records exceeds this interval

      While consumer rebalancing achieves resiliency in the face of both planned maintenance (e.g., code releases), standard operational practices (e.g., manually changing min pods / max pods settings), and automatic self-healing (e.g., pod crashes, autoscaling), it negatively impacts latency. Given the near real-time nature of commerce today, many Kafka use-cases have tight delivery SLAs — these applications suffered from constant lag alarms due to frequent and unpredictable rebalances in production.

      There is no clean way to configure consumers to avoid rebalancing in Kafka today. Although the community provides static consumer membership and co-operative incremental rebalancing, these approaches come with their own challenges.

      Poison pill

      Head-of-line (HOL) blocking is a performance-limiting phenomenon that can occur in networking and messaging systems. It happens if a Kafka consumer encounters a message that will never successfully be processed. If message processing results in an uncaught exception thrown to the Kafka consumer thread, the consumer will re-consume the same message batch on the next poll of the broker — predictably, the same batch containing the “poison pill” message will result in the same exception. This loop will continue indefinitely until a code fix is deployed to the Kafka consumer application skipping the problematic message or correctly processing it or the problematic message is skipped by changing consumer offset. This poison-pill problem is yet another problem associated with in-order processing of partitioned data streams. Apache Kafka does not handle poison pill messages automatically.

      Cost

      There is strong coupling between the partitions in a topic and the consumer threads that read from them. The maximum number of consumers of a topic cannot exceed the number of partitions in that topic. If consumers are unable to keep up (i.e., maintain consistently low consumer lag) with topic flow, adding more consumers will only help until all partitions are assigned to a dedicated consumer thread. At this point, the number of partitions will need to be increased to increase the max number of consumers. While this might sound like a fine idea, there are general rules on the numbers of partitions you can add to a broker before needing to vertically scale the broker nodes up to the next biggest size (4000 partitions / broker)). As you can see, a problem with increasing consumer lag results in increasing partitions and potentially also scaling to larger brokers, even though the broker itself may have ample physical resources (e.g., memory, CPU, and storage). This strong coupling between partitions and consumers has long been the bane of many engineers who seek to maintain low latency in the face of increasing traffic in Kafka.

      Kafka partition scalability

      When there are thousands of pipelines, increasing the number of partitions becomes operationally onerous as it requires coordination among producers, consumers, and platform teams and imposes a small window of downtime. Sudden traffic spikes and large backlog draining both require increases in partitions and consumer pods.

      Solution

      To remedy some of the challenges above (e.g., the Kafka Consumer Rebalancing), the Kafka community has proposed the following Kafka Improvement Proposal: KIP-932: Queues for Kafka.

      The Messaging Proxy Service (MPS) is a different path that is available. MPS decouples Kafka consumption from the constraint of partitions by proxying messages over HTTP to REST endpoints behind which consumers now wait. Via the MPS approach, Kafka consumption no longer suffers from rebalancing, while also allowing for greater throughput with a lower number of partitions.

      An added benefit of the MPS approach is that application teams no longer must use Kafka consumer clients. This frees any Kafka team from having to chase application teams to upgrade Kafka client libraries.

      Design

      The MPS Kafka consumer consists of two independent thread groups: the Kafka message_reader thread (i.e., a group of 1 thread) and message_processing_writer threads. These thread groups are separated by a standard buffering pattern (pendingQueue). The reader thread writes to a bounded buffer (during the poll) and writer threads read from this buffer.

      A bounded buffer also provides control on the speed of reader and writer threads. The message_reader thread will pause the consumer when the pendingQueue reaches a max buffer size.

      This separation of reader and writer threads makes the reader thread incredibly lightweight and does not trigger a rebalance operation by exceeding max.poll.interval.ms. Now, the writer threads can take the time needed to process messages. The following diagram provides a pictorial view of the components and design.

      Pictorial view of the components and design.

      Sequence diagram models the interaction between the components as a sequence of calls.

      The architecture above is composed of the following key components:

      Reader Thread

      The reader thread’s job is to make progress through the inbound topics, applying back-pressure when the PendingQueue is full.

      Order Iterator

      The order-iterator guarantees that keyed messages are processed in order. It iterates through all messages in pendingQueue and leaves the messages (i.e., temporarily skips) if there is already a message with the same key in flight. Skipped messages will be processed in subsequent poll calls once earlier messages with the same key are processed. By ensuring that no more than 1 message per key is in flight, MPS guarantees in-order delivery by key.

      Writer Thread

      The writer thread is part of a pool that provides greater throughput via parallelism. It’s job is to reliably write data to REST endpoints and DLQ’ing messages if either retries are exhausted or non retry-able HTTP response codes are received.

      Dead Letter Queue (DLQ)

      A DLQ topic can be created in every Kafka cluster. The message_processing_writer thread initially retries messages a fixed number of times with exponential back off. If this fails, the message is put in the DLQ topic. Applications can handle these messages later or discard them. Messages can be placed in this queue when the consumer service has an outage (e.g., timeouts) or if the consumer service encounters a poison pill (e.g., 500 HTTP Response).

      Consumer Service

      Consumer Service is stateless REST service for applications to process messages. This service contains business logic that was part of the processing originally available in the Kafka consumer application. With this new model, Kafka consumption (MPS) can be separated from message processing (Consumer Service). Below, you will find the REST API spec that must be implemented by any Consumer Service:

      Kafka Offset Commit Thread

      Kafka offset committing is implemented as a separate thread (i.e., the offset_commit thread). This thread wakes up at regular intervals (e.g., 1 minute) and commits the latest contiguous offsets which are successfully processed by writer threads.

      In the picture above, the offset_commit thread commits offsets 124 and 150 for partitions 0 and 1, respectively

      API



      Implementation Details

      MPS was implemented as a sink connector on Kafka connect. The Kafka Connect framework is well suited for the multiple reasons:

      • Multi-tenancy: Multiple connectors can be deployed on a single Kafka Connect cluster
      • DLQ handling: Kafka Connect already provides a basic framework for DLQ processing
      • Commit flow: Kafka Connect provides convenience methods for commits

      In-built NFRs (Non-Functional Requirements): Kafka Connect provides many non-functional features (e.g., scalability, reliability)

      Conclusion

      MPS has eliminated rebalances due to slowness in downstream systems as it guarantees reader thread will put all messages in poll list in pendingQueue within allocated time of max.poll.interval.ms 5 minutes. The only rebalances we see are due to Kubernetes POD restarts and exceedingly rare network slowness between Kafka cluster and MPS. But with small consumer groups, the duration of these cycles is negligible and do not exceed processing SLAs (Service Level Agreements). MPS service and Kafka cluster should be hosted in the same cloud and region to reduce network related issues between them.

      Cooperative handling of poison pills with applications detecting them and notifying MPS through the return codes 600 & 700 works as planned.

      The cost benefits of this solution are realized in two areas. First, stateless consumer services can scale quickly in the Kubernetes environment and do not have to be scaled upfront for the holidays or any campaign events. Secondly, Kafka cluster sizes are no longer dependent on the partition sizes, they can be truly scaled for the throughput with about 5–10 MB per partition.

      Huge improvements have been seen in rebalance related site issues and operational requests in Kafka pipelines due to yearly scaling of Kafka clusters for the holidays.

      Sudden spikes in the traffic do not need to scale Kafka partitions anymore as stateless consumer services are easily auto scaled in the Kubernetes environment to handle message bursts.

      Acknowledgements

      This work would not be successful without the tireless aid of many people, some of whom are listed here:

      (Aditya Athalye, Anuj Garg, Chandraprabha Rajput, Dilip Jaiswar, Gurpinder Singh, Hemant Tamhankar, Kamlesh Sangani, Malavika Gaddam, Mayakumar Vembunarayanan, Peter Newcomb, Raghu Baddam, Rohit Chatter, Sandip Mohod, Srikanth Bhattiprolu, Sriram Uppuluri, Thiruvalluvan M. G.)