DeepSeek Debates: Chinese Leadership On Cost, True Training Cost, Closed Model Margin Impacts

Mike's Notes

This excerpt from a recent article from SemiAnalysis sheds more light on the recent AI hype. DeepSeek is a significant improvement, but not everything is what it seems.

The full article requires a login to SemiAnalysisResources.

I have also added some other articles to the references.

References

DeepSeek Debates: Chinese Leadership On Cost, True Training Cost, Closed Model Margin Impacts

By: Dylan Patel, AJ Kourabi, Doug O'Laughlin, Reyk Knuhtsen
SemiAnalysis: January 31, 2025

The DeepSeek Narrative Takes the World by Storm

DeepSeek took the world by storm. For the last week, DeepSeek has been the only topic that anyone in the world wants to talk about. As it currently stands, DeepSeek daily traffic is now much higher than Claude, Perplexity, and even Gemini.

But to close watchers of the space, this is not exactly “new” news. We have been talking about DeepSeek for months (each link is an example). The company is not new, but the obsessive hype is. SemiAnalysis has long maintained that DeepSeek is extremely talented and the broader public in the United States has not cared. When the world finally paid attention, it did so in an obsessive hype that doesn’t reflect reality.

We want to highlight that the narrative has flipped from last month, when scaling laws were broken, we dispelled this myth, now algorithmic improvement is too fast and this too is somehow bad for Nvidia and GPUs.

The narrative now is that DeepSeek is so efficient that we don't need more compute, and everything has now massive overcapacity because of the model changes. While Jevons paradox too is overhyped, Jevons is closer to reality, the models have already induced demand with tangible effects to H100 and H200 pricing.

DeepSeek and High-Flyer

High-Flyer is a Chinese Hedge fund and early adopters for using AI in their trading algorithms. They realized early the potential of AI in areas outside of finance as well as the critical insight of scaling. They have continuously increasing their supply of GPUs as a result. After experimentation with models with clusters of thousands of GPUs, High Flyer made an investment in 10,000 A100 GPUs in 2021 before any export restrictions. That paid off. As High-Flyer improved, they realized that it was time to spin off “DeepSeek” in May 2023 with the goal of pursuing further AI capabilities with more focus. High-Flyer self funded the company as outside investors had little interest in AI at the time, with the lack of a business model being the main concern. High-Flyer and DeepSeek today often share resources, both human and computational.

DeepSeek now has grown into a serious, concerted effort and are by no means a “side project” as many in the media claim.  We are confident that their GPU investments account for more than $500M US dollars, even after considering export controls.

Source: SemiAnalysis, Lennart Heim

The GPU Situation

We believe they have access to around 50,000 Hopper GPUs, which is not the same as 50,000 H100, as some have claimed. There are different variations of the H100 that Nvidia made in compliance to different regulations (H800, H20), with only the H20 being currently available to Chinese model providers today. Note that H800s have the same computational power as H100s, but lower network bandwidth. We believe DeepSeek has access to around 10,000 of these H800s and about 10,000 H100s. Furthermore they have orders for many more H20's, with Nvidia having produced over 1 million of the China specific GPU in the last 9 months. For more specific detailed analysis, please refer to our Accelerator Model.

Source: SemiAnalysis

Our analysis shows that the total server CapEx for DeepSeek is almost $1.3B, with a considerable cost of $715M associated with operating such clusters.

DeepSeek has sourced talent exclusively from China, with no regard to previous credentials, placing a heavy focus on capability and curiosity. DeepSeek regularly runs recruitment events at top universities like PKU and Zhejiang, where many of the staff graduated from. Roles are not necessarily pre-defined and hires are given flexibility, with jobs ads even boasting of access to 10,000s GPUs with no usage limitations. They are extremely competitive, and allegedly offer salaries of over $1.3 million dollars USD for promising candidates, well a over big Chinese tech companies. They have ~150 employees, but are growing rapidly.

As history shows, a small well-funded and focused startup can often push the boundaries of what’s possible. DeepSeek lacks the bureaucracy of places like Google, and since they are self funded can move quickly on ideas. However, like Google, DeepSeek (for the most part) runs their own datacenters, without relying on an external party or provider. This opens up further ground for experimentation, allowing them to make innovations across the stack.

We believe they are the single best “open weights" lab today, beating out Meta’s Llama effort, Mistral, and others.

DeepSeek’s Cost and Performance

DeepSeek’s price and efficiencies caused the frenzy this week, with the main headline being the “$6M” dollar figure training cost of DeepSeek V3. This is wrong. This akin to pointing to a specific (and large) part of a bill of materials and attribute it as the entire cost. The pre-training cost is a very narrow portion of the total cost.

Training Cost

We believe the pre-training number is nowhere near close to the actual amount spent on the model. We are confident their hardware spend is well higher than $500M over the company history. To develop new architecture innovations, during the model development, there is a considerable spend on testing new ideas, new architecture ideas, and ablations. Multi-Head Latent Attention, a key innovation of DeepSeek, took several months to develop and cost a whole team of manhours and GPU hours.

The $6M cost in the paper is attributed to just the GPU cost of the pre-training run, which is only a portion of total cost of the model. Excluded are important pieces of the puzzle like R&D and TCO of the hardware itself. For reference, Claude 3.5 Sonnet cost $10s of millions to train, and if that was the total cost Anthropic needed, then they would not raise billions from Google and tens of billions from Amazon. It's because they have to experiment, come up with new architectures, gather and clean data, pay employees, and much more.

So how was DeepSeek able to have such a large cluster? The lag in export controls is the key, and will be discussed in the export section below.

Closing the Gap - V3’s Performance

V3 is no doubt an impressive model, but it is worth highlighting impressive relative to what. Many have compared V3 to GPT-4o and highlight how V3 beats the performance of 4o. That is true but GPT-4o was released in May of 2024. AI moves quickly and May of 2024 is another lifetime ago in algorithmic improvements. Further we are not surprised to see less compute to achieve comparable or stronger capabilities after a given amount of time. Inference cost collapsing is a hallmark of AI improvement.

Source: SemiAnalysis

An example is small models that can be run on laptops have comparable performance to GPT-3, which required a supercomputer to train and multiple GPUs to inference. Put differently, algorithmic improvements allow for a smaller amount of compute to train and inference models of the same capability, and this pattern plays out over and over again. This time the world took notice because it was from a lab in China. But smaller models getting better is not new.

Source: SemiAnalysis, Artificialanalysis.ai, Anakin.ai, a16z

So far what we've witnessed with this pattern is that AI labs spend more in absolute dollars to get even more intelligence for their buck. Estimates put algorithmic progress at 4x per year, meaning that for every passing year, 4x less compute is needed to achieve the same capability. Dario, CEO of Anthropic arguea that algorithmic advancements are even faster and can yield a 10x improvement. As far as inference pricing goes for GPT-3 quality, costs have fallen 1200x.

When investigating the cost for GPT-4, we see a similar decrease in cost, although earlier in the curve. While the decreased difference in cost across time can be explained by no longer holding the capability constant like the graph above. In this case, we see algorithmic improvements and optimizations creating a 10x decrease in cost and increase in capability.

Source: SemiAnalysis, OpenAI, Together.ai

To be clear DeepSeek is unique in that they achieved this level of cost and capabilities first. They are unique in having released open weights, but prior Mistral and Llama models have done this in the past too. DeepSeek has achieved this level of cost but by the end of the year do not be shocked of costs fall another 5x.

Is R1’s Performance Up to Par with o1?

On the other hand, R1 is able to achieve results comparable to o1, and o1 was only announced in September. How has DeepSeek been able to catch so fast?

The answer is that reasoning is a new paradigm with faster iteration speeds and lower hanging fruit with meaningful gains for smaller amounts of compute than the previous paradigm. As outlined in our scaling laws report, the previous paradigm depended on pre-training, and that is becoming both more expensive and difficult to achieve robust gains with.

The new paradigm, focused on reasoning capabilities through synthetic data generation and RL in post-training on an existing model, allows for quicker gains with a lower price. The lower barrier to entry combined with the easy optimization meant that DeepSeek was able to replicate o1 methods quicker than usual. As players figure out how to scale more in this new paradigm, we expect the time gap between matching capabilities to increase.

Note that the R1 paper makes no mention of the compute used. This is not an accident – a significant amount of compute is needed to generate synthetic data for post-training R1. This is not to mention RL. R1 is a very good model, we are not disputing this, and catching up to the reasoning edge this quickly is objectively impressive. The fact that DeepSeek is Chinese and caught up with less resources makes it doubly impressive.

But some of the benchmarks R1 mention are also misleading. Comparing R1 to o1 is tricky, because R1 specifically doesn't mention benchmarks that they are not leading in. And while R1 is matches reasoning performance, it's not a clear winner in every metric and in many cases it is worse than o1.


Source: (Yet) another tale of Rise and Fall: DeepSeek R1

And we have not mentioned o3 yet. o3 has significantly higher capabilities than both R1 or o1. In fact, OpenAI recently shared o3’s results, and the benchmark scaling is vertical. "Deep learning has hit a wall", but of a different kind.

Source: AI Action Summit

Google’s Reasoning Model is as Good as R1

While there is a frenzy of hype for R1, a $2.5T US company released a reasoning model a month before for cheaper: Google’s Gemini Flash 2.0 Thinking. This model is available for use, and is considerably cheaper than R1, even with a much larger context length for the model through API.

On reported benchmarks, Flash 2.0 Thinking beats R1, though benchmarks do not tell the whole story. Google only released 3 benchmarks so it's an incomplete picture. Still, we think Google’s model is robust, standing up to R1 in many ways while receiving none of the hype. This could be because of Google’s lackluster go to market strategy and poor user experience, but also R1 is a Chinese surprise. 


Source: SemiAnalysis

To be clear, none of this detracts from DeepSeek’s remarkable achievements. DeepSeek’s structure as a fast moving, well-funded, smart and focused startup is why it's beating giants like Meta in releasing a reasoning model, and that's commendable.

Technical Achievements

DeepSeek has cracked the code and unlocked innovations that leading labs have not yet been able to achieve. We expect that any published DeepSeek improvement will be copied by Western labs almost immediately.  

What are these improvements? Most of the architectural achievements specifically relate to V3, which is the base model for R1 as well. Let’s detail these innovations.

Training (Pre and Post)

DeepSeek V3 utilizes Multi-Token Prediction (MTP) at a scale not seen before, and these are added attention modules which predict the next few tokens as opposed to a singular token. This improves model performance during training and can be discarded during inference. This is an example of an algorithmic innovation that enabled improved performance with lower compute.   

There are added considerations like doing FP8 accuracy in training, but leading US labs have been doing FP8 training for some time.

DeepSeek v3 is also a mixture of experts model, which is one large model comprised of many other smaller models that specialize in different things. One struggle MoE models have faced has been how to determine which token goes to which sub-model, or “expert”. DeepSeek implemented a “gating network” that routed tokens to the right expert in a balanced way that did not detract from model performance. This means that routing is very efficient, and only a few parameters are changed during training per token relative to the overall size of the model. This adds to the training efficiency and to the low cost of inference.

Despite concerns that Mixture-of-Experts (MoE) efficiency gains might reduce investment, Dario points out that the economic benefits of more capable AI models are so substantial that any cost savings are quickly reinvested into building even larger models. Rather than decreasing overall investment, MoE's improved efficiency will accelerate AI scaling efforts. The companies are laser focused on scaling models to more compute and making them more efficient algorithmically.             

In terms of R1, it benefited immensely from having a robust base model (v3). This is partially because of the Reinforcement Learning (RL). There were two focuses in RL: formatting (to ensure it provides a coherent output) and helpfulness and harmlessness (to ensure the model is useful). Reasoning capabilities emerged during the fine-tuning of the model on a synthetic dataset. This, as mentioned in our scaling laws article, is what happened with o1. Note that in the R1 paper no compute is mentioned, and this is because mentioning how much compute was used would show that they have more GPUs than their narrative suggests. RL at this scale requires a considerable amount of compute, especially to generate synthetic data.

Additionally a portion of the data DeepSeek used seems to be data from OpenAI’s models, and we believe that will have ramifications on policy on distilling from outputs. This is already illegal in the terms of service, but going forward a new trend might be a form of KYC (Know Your Customer) to stop distillation.  

And speaking of distillation, perhaps the most interesting part of the R1 paper was being able to turn non-reasoning smaller models into reasoning ones via fine tuning them with outputs from a reasoning model. The dataset curation contained a total of 800k samples, and now anyone can use R1’s CoT outputs to make a dataset of their own and make reasoning models with the help of those outputs. We might see more smaller models showcase reasoning capabilities, bolstering performance of small models.

Multi-head Latent Attention (MLA)

MLA is a key innovation responsible for a significant reduction in the inference price for DeepSeek. The reason is MLA reduces the amount of KV Cache required per query by about 93.3% versus standard attention. KV Cache is a memory mechanism in transformer models that stores data representing the context of the conversation, reducing unnecessary computation.

As discussed in our scaling laws article, KV Cache grows as the context of a conversation grows, and creates considerable memory constraints. Drastically decreasing the amount of KV Cache required per query decreases the amount of hardware needed per query, which decreases the cost. However we think DeepSeek is providing inference at cost to gain market share, and not actually making any money. Google Gemini Flash 2 Thinking remains cheaper, and Google is unlikely to be offering that at cost. MLA specifically caught the eyes of many leading US labs. MLA was released in DeepSeek V2, released in May 2024.

DeepSeek has also enjoyed more efficiencies under inference with the H20, due to higher memory and bandwidth capacity compared to the H100. They have also announced partnerships with Huawei but very little has been done with them so far with Ascend compute.

We believe the most interesting implications is specifically on margins, and what that means for the entire ecosystem. Below we have a view of the future pricing structure of the entire AI industry, and we detail why we think DeepSeek is subsidizing price, as well as why we see early signs that Jevons paradox is carrying the day. We comment on the implications on export controls, how the CCP might react with added DeepSeek domninance, and more.


Super Colour Palette

Mike's Notes

Dave Gray from Explorations and Explanations mentioned this great tool in his recent Substack. It allows you to choose a colour palette that can be exported.

Resources

Super Colour Palette


The Four Team Types from Team Topologies

Mike's Notes

More on how to organise IT developers into teams.

Resources

The Four Team Types from Team Topologies

By: IT Revolution
IT Revolution: February 23, 2023

In many organizations, especially at the enterprise level, there are many types of teams, even teams who take on multiple roles. As the organization continues to grow and these team types continue to sprawl, it becomes increasingly hard to visualize the full organizational landscape, and, consequently, to get things done.

One of the most important things to remember, is that we can be intentional about how we build and organize teams. We can design our teams and our organization for the results we want (Conway’s Law).

So what does this intentional organization look like? The idea of organizing a company of hundreds, thousands, or even tens of thousands of people is daunting.

But, according to the research of Matthew Skelton and Manuel Pais, there are really only four fundamental team types.

What are the Four Team Types?

When used well, these four team topologies are the only team types you need to build and run modern software systems.

  • Stream-Aligned Team
  • Enabling Team
  • Complicated-Subsystem Team
  • Platform Team

As Matthew and Manuel say in their bestselling book Team Topologies: Organizing Business and Technology Teams for Fast Flow,

"When these four team types are combined with effective software boundaries and team interactions, the restriction of these four team types acts as a powerful template for effective organization design." - Team Topologies: Organizing Business and Technology Teams for Fast Flow

Let’s dive into what each of these teams is in more detail.

Stream-Aligned Teams

A “stream” is the continuous flow of work aligned to a single business domain or org capability. A stream-aligned team is aligned to a single, valuable stream of work like a single product or service, a set of features, or even a user journey or user persona. This team is empowered to build and deliver value quickly, safely, and (this is key) independently. There shouldn’t be any hand-offs to other teams to complete parts of the work. They “own” it from beginning to end.

Stream-aligned teams should be the primary team type in an organization. All other team types exist to reduce the burden on stream-aligned teams.

Stream-aligned teams should be close to the customer so they can quickly incorporate feedback and monitor their software in production. This allows the stream-aligned team to react in near real-time and adapt to changes as needed. They are quick, agile, dedicated.

Enabling Teams

An enabling team bridges the capability gap. In Accelerate, we learn that high-performing teams are continuously improving their capabilities to stay ahead of the curve. This is difficult when you have end-to-end ownership of a value stream (as in stream-aligned teams). With all that work, it can be hard to find the time for research, learning, and practising new skills.

Enabling teams are composed of specialists in a given domain of knowledge, which might be more technical, or more product-focused, or any other domain where there is a gap in skills in (part of) the organization. They cross-cut to stream-aligned teams and have the bandwidth for research, experimentation, etc. They bring this knowledge and expertise back to the stream-aligned team.

A successful enabling team should be strongly collaborative in nature. They must work to understand the problems faced by the stream-aligned team and then provide proper guidance. Enabling teams must avoid becoming an “ivory tower” of knowledge. They do not exist to dictate technical choices. Instead, an enabling team helps stream-aligned teams understand and comply with organization-wide constraints. They are the “servant leaders” of the team types.

Complicated-Subsystem Teams

The complicated-subsystem team is responsible for building and maintaining a system that requires heavy specialist knowledge. Each member on the team should be a specialist in that area of knowledge and be able to make changes to the subsystem.

The goal of the complicated-subsystem team is to reduce the cognitive load of stream-aligned teams working on the system. This is more cost effective than embedding a specialist onto every stream-aligned team, and avoids distracting the stream-aligned team from their main goal of delivery value.

Platform Teams

A platform team enables a stream-aligned team to deliver work with substantial autonomy by providing internal services to reduce their cognitive load. A digital platform, as defined by Evan Bottcher, is

". . . a foundation of self-service APIs, tools, services, knowledge and support which are arranged as a compelling internal product. Autonomous delivery teams can make use of the platform to deliver product features at a higher pace, with reduced coordination." -https://martinfowler.com/articles/talk-about-platforms.html

The platform team’s knowledge is made available via self-service capabilities on the web or via a programmable API. These should be made easy for the stream-aligned teams to consume, instead of lengthy instruction manuals.

Ease of use is fundamental to successful product teams.

The Benefit of Restricting Team Types

Organizations that are struggling with rapid, sustainable software delivery typically have a wide and ever-expanding group of teams and team types. Usually these teams have poorly defined roles and responsibilities. By restricting teams to just the four fundamental types explored in the post and expanded upon in the book Team Topologies, the organization can focus their time on team interaction patterns that are known to promote flow and deliver value faster.

We’ll explore the three essential team interaction modes in an upcoming post.

Building High-Performing Teams with Team Topologies

Mike's Notes

This thoughtful article about team structure is from Leah Brown, the Managing Editor at IT Revolution. Pipi 9 is specifically built for high-performance DevOps teams building big enterprise SaaS systems.

The book Team Topologies has a reader's guide.

Resources

Building High-Performing Teams with Team Topologies

By: Leah Brown
IT Revolution: January 13 2025

To achieve success and rise above competitors in 2025, organizations must focus on building high-performance teams. One strategy to do this is to empower small, long-lived teams as the fundamental building blocks of your organization’s design. 

As we looked at in the previous blog in our series on high-performing teams, building high-performance teams has less to do with where your teams sit and more with how you build the social circuitry between and within teams. The Team Topologies approach provides a proven framework for designing and evolving these teams to maximize flow and adaptability.

Limit Team Cognitive Load

A core tenet of Team Topologies is that teams should minimize cognitive load to increase flow. This can be achieved by ensuring a team is only responsible for a limited number of domains, projects, software subsystems, products, etc. that match their cognitive capacity. By restricting teams to a maximum of 2-3 “simple” domains or 1 “complicated” domain, organizations can ensure teams have the focus and autonomy to truly master their areas of responsibility. 

This team-first approach to organizational boundaries stands in contrast to the traditional practice of aligning teams to organizational silos or technical specialties. Instead, work is divided into pieces that fit the team’s cognitive load, creating a natural correspondence between the team structure and the larger system or organization architecture.

Establish Clear Team Interactions

Beyond structuring teams, Team Topologies also emphasize defining well-bounded interactions between teams. Clear and effective communication has long been a differentiator in high-performing organizations, but it can be challenging to achieve in sprawling enterprises. 

One way to build effective communications is to establish “team APIs” that establish clearly and visibly how other teams can interact with a given team’s code, documentation, and working practices. This also means consciously designing the physical and virtual spaces that enable appropriate levels of collaboration, from high bandwidth within teams to low bandwidth between most teams.

By creating these structured team interactions, organizations can reduce cognitive load, promote autonomy, and ensure smooth handoffs between dependent teams. This paves the way for sustainable, high-velocity flow of value to the customer.

Evolve Team Topologies over Time

Of course, projects and organizational contexts are constantly shifting. Perhaps at no time has that been more true than now. How many times in the past week have you seen or read the phrase “in these rapidly changing times”? The Team Topologies approach embraces this by providing guidance on evolving team structures and interactions in response to changing requirements. This is also a key differentiator of the highest-performing teams and organizations: the ability to adapt to changing situations quickly and confidently.

This can be achieved by splitting a team responsible for too many domains or projects, merging teams with overlapping responsibilities, or introducing new team types like “enabling” or “platform” teams to support the core “stream-aligned” delivery teams.

The key is maintaining a dynamic, adaptive organizational design that keeps pace with the business. Just a year ago, implementing AI in daily operations was still a rarity. But today, organizations are adopting AI into their work at a dizzying pace. Teams and organizations that can adjust and adapt to this new context with skill and confidence will surely outperform those that are built more rigid.

The Human Element

Ultimately, the Team Topologies approach recognizes that knowledge work, like software delivery, is a deeply human endeavor. By optimizing team size, boundaries, and interactions, organizations can create the conditions for small, autonomous groups to thrive and deliver exceptional results, even in the face of exceptional change. This team-centric focus is the foundation for sustainable, high-performing teams in 2025 and beyond.

Kubernetes Cloud Repatriation Saves Millions for Data Platform Provider

Mike's Notes

This is a good example of the savings possible from using a private cloud. To control costs, the environment, and security, I am considering putting the core of Pipi9 on a private cloud.

Resources

Kubernetes Cloud Repatriation Saves Millions for Data Platform Provider

By: Matt Saunders
InfoQ: January 29, 2025

Yellowbrick, an SQL data platform provider, has significantly reduced costs by moving workloads from the public cloud to its own private Kubernetes-based infrastructure. It has reported an annual saving of $3.9 million by moving its development and testing environments away from AWS, Azure, and Google Cloud Platform.

According to Neil Carson of Yellowbrick, the company had been spending about $6 million per year across the three major cloud providers when the repatriation project began in 2022. Yellowbrick built its new private cloud using hardware that had previously been used for another purpose within the company. The company traditionally sells appliances to run its database product on, and it realised it could reuse these appliances when its customers upgraded and returned the older servers.

'"We thought elasticity in the cloud had to be cheaper than building appliances, but we found out the hard way, it wasn't cheaper. It was much more expensive" - Neil Carson, Yellowbrick

The private cloud solution, named EC3 (Emerald City), uses two types of racks: compute racks and object storage racks. The system employs MinIO for object storage and LINSTOR for persistent block storage, running on a complex networking setup that utilises InfiniBand networking.

The current EC3 deployment consists of over 200 servers returned by their customers, providing over 8,000 vCPUs and about  2 petabytes of object storage. The primary ongoing cost is $50,000 per month in colocation facility fees in Utah. In comparison, Carson estimates that equivalent capacity on AWS would cost them around $375,000 per month.

Yellowbrick's cloud spend per month

Image courtesy of Neil Carson, Yellowbrick

The transition wasn't without its challenges. The initial implementation needed dedicated focus, with the equivalent of two full-time engineers working on it for six months. However, now the cloud is live, ongoing administration needs a couple of developers to spend a few hours weekly on maintenance. The company reports that regular hardware failures occur approximately every couple of weeks, leading to plans for implementing automated problem detection systems.

While Yellowbrick's situation is unusual due to the availability of returned paid-off hardware at effectively zero capital cost, Carson suggests that companies starting from scratch could still see substantial savings. He estimates that the initial capital expenditure for new equipment would be approximately $1.65 million, including $1.3 million for compute, $80,000 for switching and cables, and $270,000 for SSD storage.

The success of EC3 has influenced Yellowbrick's product development. Their third-generation appliances, codenamed Griffin, will be powered by RedHat OpenShift and incorporate lessons learned from building the private cloud infrastructure. The company's experience suggests that Kubernetes has become a game-changer in the infrastructure space. As Carson explains, "All the elasticity, scale up/down, and flexibility that used to require the public cloud can now be done on equipment we own, too."

Carson acknowledges that advocating for on-premises solutions remains controversial in the tech industry. "When I've heard others trying to share these viewpoints, they have been accused of lying, not calling out hidden costs, being backward, etc.," he writes. However, he maintains that for compute-intensive workloads, the financial benefits of repatriating are clear.

The repatriation initiative came to the world's attention with a well-publicised move of David Heinemeier Hansson's HEY and Basecamp infrastructure out of the cloud in 2022, demonstrating that consistent and non-spiky CPU-intensive workloads can often be run at a significant discount on customer's own equipment.

A recent blog post from Puppet details why some organisations choose to move some workloads away from the public cloud. Cost is still a primary driver, with organisations seeing increased costs across computing, storage, and data transfer. Security and compliance are also big factors, with some choosing to repatriate workloads to avoid having to answer complex questions about data storage locations, access controls, and regulatory compliance across multiple cloud environments.

Why Cloud Repatriation is Trending

Puppet also explains how performance limitations have prompted some companies to look away from public clouds, especially for workloads needing low latency. Also, some organisations are concerned about the risk of vendor lock-in and don't want to depend on specific providers for their software, platform, or infrastructure needs.

The Puppet article also raises technical challenges unique to the public cloud - such as accidental misconfigurations potentially having drastic consequences (citing a 2022 incident where a single AWS S3 bucket misconfiguration exposed millions of sensitive files).

Despite the significant advantage of having a pool of returned servers to repurpose, the success of Yellowbrick's private cloud implementation adds to a growing body of evidence that some companies, particularly those with predictable compute-intensive workloads, may benefit from evaluating alternatives to public cloud services.

Creating Accessible Websites Using the Web Content Accessibility Guidelines

Mike's Notes

A great introduction to WCAG.

Resources

Creating Accessible Websites Using the Web Content Accessibility Guidelines

By: Ben Linders
InfoQ: January 30,2025

Web accessibility is about making web content available to users with disabilities. Development teams can use the success criteria of the Web Content Accessibility Guidelines to improve accessibility and create an inclusive website.

Joanna Falkowska gave a talk about creating accessible websites at DEV: Challenge Accepted.

There are different kinds of disabilities, Falkowska said. Think of users with sensory limitations (e.g. visual, hearing), physical ones (e.g. missing limbs), neurological diseases (e.g. Parkinson’s disease), or cognitive disability (e.g. Down syndrome).

The WHO estimates that about 16% of the world’s population is affected by significant disability:

"If we add all of the mild ones, as well as groups that usually do not consider themselves disabled but might be facing similar problems, such as the elderly, the number becomes even higher."

Falkowska mentioned that development teams who do not know where to start their accessibility journey may feel overwhelmed by the number of disabilities they should take into consideration. They may have no idea about the limitations different users with disabilities face when opening their web app:

"What they need is a benchmark that gives them clear criteria as to what makes an app accessible."

The Web Content Accessibility Guidelines (WCAG) is a document that serves people who create web apps. It has been drafted to make the web accessible and it gets updated every now and then to keep up with the development of the technology people use, Falkowska said. Just as the offline world can be more or less friendly for people with disabilities, the same way the websites that we browse can be more or less accessible, she added.

The WCAG gives you a set of success criteria that have assigned conformance levels, as Falkowska explained:

The lowest and the most basic one is A. The Web Accessibility Initiative (WAI), who is responsible for drafting and updating WCAG, states that any website we browse should address at least this level of conformance.

AA is a bit more complex, but at the same time it is the level that the authors (and many legal acts) recommend to follow, Falkowska said. The third and generally the most difficult to achieve is AAA. Usually the entities that address triple A will either be governmental/federal offices or companies/associations that serve specific groups of people with disabilities, Falkowska mentioned.

The levels are a bit like the wooden nesting doll "matryoshka", Falkowska said; if you want to address AAA with your website, you also need to cover AA as well as A’s success criteria.

Falkowska mentioned that one of the most commonly missing success criteria tells you to add alternative text to any content that makes sense only if you can perceive it with your eyes. For example, if your website contains an image, a person who cannot see and uses a screen reader will only hear the name of the file, which may just be a meaningless string of numbers and characters. In order to make sure they can perceive this content in a meaningful way, developers add an "alt" attribute in html, Falkowska said. The content of this attribute will not be visible on the website, but the screen reader will read it to a person using assistive technology.

InfoQ interviewed Joanna Falkowska about the success criteria for an accessible website.

InfoQ: Can you give some examples of success criteria?

"Joanna Falkowska: An example would be keyboard focus. It should be possible for the user to navigate all of the interactive elements with a keyboard, which is achieved with the "tabindex" attribute in html. It is beneficial not only for users with visual disabilities, but also to those who cannot use a mouse due to hand tremors, e.g. because of Parkinson’s disease. It is also a very good example of how a success criterion may support users without disability. I cannot count the number of situations when my mouse was discharged and I needed to navigate the web with a keyboard only while waiting for the mouse battery to fill up...

Another example is that of screen orientation (which is mainly important for tablets and smartphones). It should not be limited to landscape or portrait-only mode. Some users may use a mobile device in one orientation only. Think of the users with quadriplegia who have their phone attached to a special handle or a tripod. They cannot move their phone around. We should not lock their display to one specific orientation with the CSS rotate and/or transform property.

One of the newest success criteria discusses authentication issues the users may have. Authentication should not strain short-term memory with puzzles, so we need to make sure that our login feature allows for copy-paste and/or the use of a password manager. Make sure your input fields provide proper "label", "type" and "autocomplete" html attributes. Apart from that, we should no longer require the users to solve the CAPTCHa in which the content needs to be deciphered and then typed into an input field. Object recognition is still allowed at AA level, but if you wish to succeed with AAA, even this type of CAPTCHa should be removed."

How to set Pipi configuration

Mike's Notes

Here are my notes from today's work on the Website Engine (wbs).

Resources

How to set Pipi configuration

By: Mike Peters
01/02/2025

Mike is the inventor and architect of Pipi and the founder of Ajabbi.

I'm manually migrating my first customer's website to Pipi's hosting. This is an excellent product test of the Website Engine (wbs), which sets up website hosting.

The next problem is enabling simple bullet-proof configurations for future customers to set up websites using no-code.

In the future, each of the hundreds of modules will have many configuration options, so a standardised system-wide process is needed.

This morning, I sketched some ideas on how the Admin User Interface (UI) might look.

This afternoon, I figured out that the Namespace Engine (nsp) already has a way to register the interfaces of every engine that is automatically built by the Factory Engine (fac).

I added another Interface Class Type, "Config", and solved the problem.

This allowed the fast addition of these examples of global properties that a website might have as options for the admin.

  • Meta Title
  • Meta Keywords
  • Meta Description
  • Domain Name
  • Default Language, e.g. eng
  • Plugins
  • Default Theme
  • Use References
  • Use See Also
  • Use Keywords
  • etc

Doing this also automatically generates config variable names used in internal messaging.

This would also enable configuration storage in XML or other open formats for interchange purposes and documentation.

This configuration system will also work for all the other engines.

If you want to become good at system design, then learn these case studies

Mike's Notes

I get Neo Kim's regular system design newsletter. It explains how large, popular systems scale in production by examining their design. Many of the solutions are elegant. This is an excellent newsletter.

Below is a recent note from Neo with links to popular articles. I reformated the links so they look better on my blog.

The article on Url shortening is my favourite.

Resources

If you want to become good at system design, then learn these case studies

By: Neo Kim
The System Design Newsletter: Jan 24 2025

What would you add to this list?

PS - Join 100,000 and get the powerful system design template (it's free): newsletter.systemdesign…

If you liked this note, restack & help others find it

Building High-Performance Teams in 2025: Beyond the Location Debate

Mike's Notes

A thoughtful article about in-office versus remote from Leah Brown, Managing Editor at IT Revolution. Pipi 9 has been built specifically for high-performance DevOps teams.

Resources

Building High-Performance Teams in 2025: Beyond the Location Debate

By: Leah Brown
IT Revolution: January 6 2025

The debate over in-office versus remote work misses a fundamental truth: high-performing teams succeed based on how they’re organized, not where they sit. Through extensive research across industries, Gene Kim and Dr. Steven J. Spear found that three key mechanisms consistently enable team excellence: slowing down to speed up, breaking down complexity, and amplifying problems early.

As they explain in their award-winning book Wiring the Winning Organization, the leaders of the highest-performing teams will use these three simple mechanisms to “wire their organization for success instead of mediocrity.” 

1. Slow Down to Speed Up

High-performing teams in 2025 must prioritize solving problems in controlled environments before they appear in production, what Kim and Spear term “slowification.” High-performing teams should look to:

  • Move complex problem-solving offline instead of firefighting during execution.
  • Create dedicated spaces for experimentation and learning.
  • Build standard approaches based on validated solutions.
  • Test new processes in low-stakes environments.

Toyota exemplifies this approach using careful preparation and practice to achieve industry-leading performance. Known as the Toyota Production System, this method of slowing down to solve problems has long been proven to help the highest-performing teams succeed. And it will continue to be a differentiator for high-performing teams in 2025 and beyond.

2. Break Down Complexity 

High-performing organizations like Amazon have transformed their performance by making complex work manageable through what Kim and Spear term “simplification.”

Simplification is the process of making complex work more manageable by:

  • Creating small, self-contained teams that own complete workflows.
  • Defining clear handoffs between specialized functions.
  • Implementing changes incrementally rather than all at once.
  • Designing linear processes with obvious next steps.

Amazon has used these principles to evolve from making only twenty software deployments per year to over 136,000 daily deployments. They achieved this by breaking down monolithic systems into smaller, independent services with clear interfaces.

3. Amplify Problems Early

Drawing from their research of high-performing organizations in manufacturing, healthcare, and technology, Kim and Spear found that great organizations create mechanisms to detect and respond to small issues before they become major disruptions. This “amplification,” as they call it, requires teams to maintain reserve capacity to swarm problems when they occur and share solutions across teams to prevent recurrence down the road.

In other words, high-performing teams:

  • Make problems visible immediately when they occur.
  • Create rapid feedback loops between dependent teams.
  • Maintain reserve capacity to swarm and contain issues.
  • Share solutions across teams to prevent recurrence.

Leading the High-Performing Team

To create and lead your high-performing teams, Kim and Spear recommend starting with what they call a “model line”—a small segment where new approaches can be tested. Their research shows three phases of implementing a model line in any organization:

  • Start Small: Choose one critical workflow, form an initial cross-functional team, and implement basic performance metrics.
  • Expand Thoughtfully: Add supporting capabilities, establish clear team interactions, and build knowledge-sharing mechanisms.
  • Optimize Continuously: Refine team boundaries and interfaces while maintaining focus on outcomes.

The organizations that thrive in 2025 and beyond will be those that create what Kim and Spear call effective “social circuitry”—the processes and norms that enable great collaboration. When teams have well-defined boundaries, clear visibility into work, and mechanisms to coordinate when needed, location becomes irrelevant.

The future belongs to organizations that focus on creating the right conditions for teams to excel, whether in a physical, remote, or hybrid environment. By implementing the three key mechanisms of great social circuitry, leaders can build high-performing teams that consistently deliver exceptional results, regardless of where they sit. 

The evidence presented in Wiring the Winning Organization makes this clear: excellence comes from organizational design, not office design.

Curiosity, Open Source, and Timing: The Formula Behind DeepSeek’s Phenomenal Success

Mike's Notes

This week's Turing Post had an excellent summary of DeepSeek and some valuable links.

The original post is on Turing Post, and a longer version is on HuggingFace. The missing links on this page can be found in the original post.

Turing Post is worth subscribing to.

LM Studio is free for personal use and can run DeepSeek and other LLM's. It can run on Mac, Windows Linux. Windows requires 16GB Ram.

Resources

Curiosity, Open Source, and Timing: The Formula Behind DeepSeek’s Phenomenal Success

By: Ksenia Se
Turing Post: #85 January 27, 2025

How an open-source mindset, relentless curiosity, and strategic calculation are rewriting the rules in AI and challenging Western companies, plus an excellent reading list and curated research collection

When we first covered DeepSeek models in August 2024 (we are opening that article for everyone, do read it), it didn’t gain much traction. That surprised me! Back then, DeepSeek was already one of the most exciting examples of curiosity-driven research in AI, committed to open-sourcing its discoveries. They also employed an intriguing approach: unlike many others racing to beat benchmarks, DeepSeek pivoted to addressing specific challenges, fostering innovation that extended beyond conventional metrics. Even then, they demonstrated significant cost reductions.

“What’s behind DeepSeek-Coder-V2 that makes it so special it outperforms GPT-4 Turbo, Claude-3 Opus, Gemini 1.5 Pro, Llama 3-70B, and Codestral in coding and math?

DeepSeek-Coder-V2, costing 20–50x less than other models, represents a major upgrade over the original DeepSeek-Coder. It features more extensive training data, larger and more efficient models, improved context handling, and advanced techniques like Fill-In-The-Middle and Reinforcement Learning.” (Inside DeepSeek Models)

Although DeepSeek was making waves in the research community, it remained largely unnoticed by the broader public. But then they released R1-Zero and R1.

With that release they crushed industry benchmarks and disrupted the market by training their models at a fraction of the typical cost. But do you know what else they did? Not only did they prove that reinforcement learning (RL) is all you need in reasoning (R1 stands as solid proof of how well RL works), but they also embraced a trial-and-error approach – fundamental to RL – for their own business strategies. Previously overlooked, they calculated this release of R1 meticulously. Did you catch the timing? It was a strategic earthquake that shook the market and left everyone reeling:

  1. As ChinaTalk noticed: “R1's release during President Trump’s inauguration last week was clearly intended to rattle public confidence in the United States’ AI leadership at a pivotal moment in US policy, mirroring Huawei's product launch during former Secretary Raimondo's China visit. After all, the benchmark results of an R1 preview had already been public since November.”
  2. The release happened just one week before the Chinese Lunar New Year (this year on January 29), which typically lasts 15 days. However, the week leading up to the holiday is often quiet, giving them a perfect window to outshine other Chinese companies and maximize their PR impact.

So, while the DeepSeek family of models serves as a case study in the power of open-source development paired with relentless curiosity (from an interview with Liang Wenfeng, DeepSeek’s CEO: “Many might think there's an undisclosed business logic behind this, but in reality, it's primarily driven by curiosity.”), it’s also an example of cold-blooded calculation and triumph of reinforcement learning applied to both models and humans :). DeepSeek has shown a deep understanding of how to play Western games and excel at them. Of course, today’s market downturn, though concerning to many, will likely recover soon. However, if DeepSeek can achieve such outstanding results, Western companies need to reassess their strategies quickly and clarify their actual competitive moats.

Worries about NVIDIA

Of course, we’ll still need a lot of compute – everyone is hungry for it. That’s a quote from Liang Wenfeng, DeepSeek’s CEO: “For researchers, the thirst for computational power is insatiable. After conducting small-scale experiments, there's always a desire to conduct larger ones. Since then, we've consciously deployed as much computational power as possible.”

So, let’s not count NVIDIA out. What we can count on is Jensen Huang’s knack for staying ahead to find the way to stay relevant (NVIDIA wasn’t started as an AI company, if you remember). But what the rise of innovators like DeepSeek could push NVIDIA to is to double down on openness. Beyond the technical benefits, an aggressive push toward open-sourcing could serve as a powerful PR boost, reinforcing Nvidia’s centrality in the ever-expanding AI ecosystem.

As I was writing these words about NVIDIA, they sent a statement regarding DeepSeek: “DeepSeek is an excellent AI advancement and a perfect example of Test Time Scaling. DeepSeek’s work illustrates how new models can be created using that technique, leveraging widely-available models and compute that is fully export control compliant. Inference requires significant numbers of NVIDIA GPUs and high-performance networking. We now have three scaling laws: pre-training and post-training, which continue, and new test-time scaling.”

So – to wrap up – the main takeaway from DeepSeek breakthrough is that:

  • open-source and decentralize
  • stay curiosity-driven
  • apply reinforcement learning to everything

For DeepSeek, this is just the beginning. As curiosity continues to drive its efforts, it has proven that breakthroughs come not from hoarding innovation but from sharing it. As we move forward, it’s these principles that will shape the future of AI.

We are reading (it’s all about 🐳)

Here is a collection of superb articles covering everything you need to know about DeepSeek:

Curated Collections

7 Open-source Methods to Improve Video Generation and Understanding

Weekly recommendation from AI practitioner👍🏼

  • To run DeepSeek models offline using LM Studio:
  • Install LM Studio: Download the appropriate version for your operating system from the LM Studio website. Follow the installation instructions provided.
  • Download the DeepSeek Model: Open LM Studio and navigate to the "Discover" tab. Search for "DeepSeek" and select your desired model. Click "Download" to save the model locally.
  • Run the Model Offline: Once downloaded, go to the "Local Models" section. Select the DeepSeek model and click "Load." You can interact with the model directly within LM Studio without an internet connection.

News from The Usual Suspects ©

  • Data Center News
    $500B Stargate AI Venture by OpenAI, Oracle, and SoftBank
    With plans to build massive data centers and energy facilities in Texas, Stargate aims to bolster U.S. AI dominance. Partners like NVIDIA and Microsoft bring muscle to this high-stakes competition with China. Trump supports it, Musk trashes.

Meta's Manhattan-Sized AI Leap

  • Mark Zuckerberg’s AI ambitions come on a smaller scale (haha) – $65 billion for a data center so vast it could envelop Manhattan. With 1.3 million GPUs powering this, Meta aims to revolutionize its ecosystem and rival America’s AI heavyweights. The era of AI megaprojects is here.
  • Mistral’s IPO Plans: Vive la Résistance French AI startup Mistral isn’t selling out. With €1 billion raised, CEO Arthur Mensch eyes an IPO while doubling down on open-source LLMs. Positioned as a European powerhouse, Mistral’s independence signals Europe’s readiness to play hardball in the global AI race.
  • SmolVLM: Hugging Face Goes Tiny Hugging Face introduces SmolVLM, two of the smallest foundation models yet. This open-source release proves size doesn’t matter when efficiency leads the charge, setting new standards for compact AI development.
  • OpenAI's Agent Takes the Wheel CUA (Computer-Using Agent) redefines multitasking with Operator, seamlessly interacting with GUIs like a digital power user. From downloading PDFs to complex web tasks, it’s the closest we’ve come to a universal assistant .CUA is now in Operator's research preview for Pro users. Blog. System Card.
  • Google DeepMind A Year in Gemini’s Orbit They just published an overview of 2024. From Gemini 2.0's breakthroughs in multimodal AI to Willow chip’s quantum strides, innovation soared. Med-Gemini aced medical exams, AlphaFold 3 advanced molecular science, and ALOHA redefined robotics. With disaster readiness, educational tools, and responsible AI initiatives, DeepMind balanced cutting-edge tech with global impact. A Nobel-worthy streak indeed. Cost-Cutting AI with "Light Chips" Demis Hassabis unveils Google's next move – custom "light chips" designed to slash AI model costs while boosting efficiency. These chips power Gemini 2.0 Flash, with multimodal AI, 1M-token memory, and a "world model" vision for AGI. DeepMind’s edge? Owning every layer of the AI stack, from chips to algorithms.

Top models to pay attention to

  • DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Enhance reasoning in LLMs with multi-stage reinforcement learning, outperforming competitors in benchmarks like AIME 2024 and MATH-500.
  • Kimi K1.5: Scaling Reinforcement Learning with LLMs Scale reasoning capabilities with efficient reinforcement learning methods, optimizing token usage for both long- and short-chain-of-thought tasks.
  • VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding Advance image and video understanding with multimodal integration, achieving top results in temporal reasoning and long-video tasks.
  • Qwen2.5-1M Series Support 1M-token contexts with open-source models, leveraging sparse attention and lightning-fast inference frameworks for long-context tasks.

The freshest research papers, categorized for your convenience

There were quite a few TOP research papers this week, we will mark them with 🌟 in each section.

Specialized Architectures and Techniques

  • 🌟 Demons in the Detail: Introduces load-balancing loss for training Mixture-of-Experts models.
  • 🌟 Autonomy-of-Experts Models: Proposes expert self-selection to improve Mixture-of-Experts efficiency and scalability.
  • O1-Pruner: Length-Harmonizing Fine-Tuning: Reduces inference overhead in reasoning models through reinforcement learning-based pruning. Language Model Reasoning and Decision-Making
  • 🌟 Evolving Deeper LLM Thinking: Explores genetic search methods to enhance natural language inference for planning tasks, achieving superior accuracy.
  • 🌟 Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training: Develops a framework for LLMs to self-correct using Monte Carlo Tree Search and iterative refinement.
  • 🌟 Reasoning Language Models: A Blueprint: Proposes a modular framework integrating reasoning methods to democratize reasoning capabilities.
  • Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback: Enhances mathematical reasoning with stepwise binary feedback for more accurate LLM outputs.
  • Test-Time Preference Optimization: Introduces a framework for aligning LLM outputs to human preferences during inference without retraining.

Multi-Agent Systems and Coordination

  • SRMT: Shared Memory for Multi-Agent Lifelong Pathfinding: Demonstrates shared memory use for enhanced coordination in multi-agent systems.
  • Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks: Develops a hierarchical agent framework for mobile assistants with self-evolution capabilities.

**Generative and Retrieval-Augmented Models

  • Chain-of-Retrieval Augmented Generation: Presents a stepwise query and reasoning framework for retrieval-augmented generation.
  • Can We Generate Images with CoT?: Integrates Chain-of-Thought reasoning for compositional and iterative image generation.

Multi-Modal and GUI Systems

  • UI-TARS: Pioneering Automated GUI Interaction: Advances vision-based agents for human-like GUI task performance.
  • InternLM-XComposer2.5-Reward: Improves multi-modal reward modeling for text, image, and video alignment.

Robustness, Adaptability, and Uncertainty

  • Trading Inference-Time Compute for Adversarial Robustness: Examines inference-time compute scaling to improve robustness against adversarial attacks.
  • Evolution and the Knightian Blindspot of Machine Learning: Advocates integrating evolutionary principles into machine learning for resilience to uncertainty.

Planning and Execution in AI

  • LLMs Can Plan Only If We Tell Them: Proposes structured state tracking to enhance planning capabilities in LLMs.
  • Debate Helps Weak-to-Strong Generalization: Leverages debate methods to improve model generalization and alignment.

Social and Cognitive Insights

  • Multiple Predictions of Others’ Actions in the Human Brain: Examines neural mechanisms for predicting social behaviors under ambiguity.

AI Infrastructure and Hardware

  • Good Things Come in Small Packages: Advocates Lite-GPUs for scalable and cost-effective AI infrastructure.