Firing up the data centre

Mike's Notes

Work is in progress to move Pipi 9 to the beginning of the Pipi data centre. Once completed, work will shift to rendering the 2nd version of the workspaces for user testing.

Resources

References

  • Reference

Repository

  • Home > Ajabbi Research > Library >
  • Home > Handbook > 

Last Updated

13/01/2026

Firing up the data centre

By: Mike Peters
On a Sandy Beach: 13/01/2026

Mike is the inventor and architect of Pipi and the founder of Ajabbi.

Long planned, Pipi 9's migration to its own data centre is now underway. The job has top priority and should be easily completed in a week. It needs to be completed for Pipi to render on demand.

The existing Ajabbi computer network has been split into two.

  • One is completely isolated from the internet with all wifi and Bluetooth disabled. It is largely housed in a 45U rack. This is the small beginnings of a full data centre to be built out over time.

  • The other is connected to the internet for email, Zoom/Meet/Teams calls, office work, writing documentation, graphics, film editing, etc. This is the beginning of an office network.

Data Centre

A developer's laptop is attached to the isolated rack to work with the servers, with the following minimal developer stack.

  • BoxLang (JRE 21)
  • CFML Server Dev edition
  • QGIS
  • PostgreSQL
  • DBeaver
  • Python
  • Dreamweaver
  • MS Access
  • MS Excell
  • Visual Studio Code
  • Acrobat
  • NoteTab Light
The servers have a different stack.

The data centre will often be turned off. It can be turned on so the hundreds of Pipi Agents can autonomously run batch jobs in preparation for rendering the new workspace UI. Eventually, many racks will be running 24x7x365.

Moving Pipi 9 to a data centre now opens the door to Pipi 10, which builds itself.

Office Network

Several laptops with printers, set up for video communications, etc., and eventually office servers, etc. Development work can also be done here, along with testing of all output copied to and from the data centre via a clean memory stick.

Mission Control

The future Mission Control UI, connected to the data centre, could be shared via a live video camera-generated feed from a live monitor, ensuring security against hackers. It could even be a YouTube Live Stream of probes for those who don't like to sleep. 😀 Though my cat tells me it is much better to watch squirrels on YouTube.

Scrolling Alone

Mike's Notes

A growing problem in society. The misuse of cloud software is contributing to the problem because of the moral values of its owners: making money at any human cost. This excellent article is copied from After Babel. The original article is full of links.

Resources

References

  • Bowling Alone by Robert Putman.

Repository

  • Home > Ajabbi Research > Library > Subscriptions > After Babel
  • Home > Handbook > 

Last Updated

12/02/2026

Scrolling Alone

By: Andrew Trousdale and Erik Larson
After Babel: 03/02/2026

Andrew Trousdale: I'm a researcher and designer. My initiatives and projects bridge positive psychology, human-computer interaction, and the creative arts. I run the nonprofit research initiative APOSSIBLE.

Erik J Larson: Author of The Myth of Artificial Intelligence. I write about the limits of technology and the tension between tech and human flourishing.

A brief history of the trade-off between convenience and connection in America

Intro by Zach Rausch:

The Anxious Generation is best understood as a three-act tragedy. Act I begins in the mid-20th century, when new social and entertainment technologies (e.g., air conditioning and television) set in motion a long, gradual collapse of local community. Act II begins in the 1980s, as the loss of local community weakened social trust and helped erode the play-based childhood. Act III begins in the early 2010s, with the arrival of the phone-based childhood that filled the vacuum left behind.

This post, written by Andrew Trousdale and Erik Larson, goes deep into Act I. Andrew is a psychology researcher and human-computer interaction designer who is co-running a project on the psychological tradeoffs of progress. Erik is the author of The Myth of Artificial Intelligence, writes the Substack Colligo, and is completing the MIT Press book Augmented Human Intelligence: Being Human in an Age of AI, due in 2026. Together, they show how the isolation we experience today did not begin with smartphones but began decades earlier, as Americans, often for good and understandable reasons, traded connection for convenience, and place-based relationships for privacy and control.

Tracing these trade-offs across the twentieth century, Andrew and Erik help explain the problem of loneliness we face today, and offer some guidance for how we can turn it around and reconnect with our neighbors. Robert Putnam, who read a recent draft, described it as “easily the best, most comprehensive, and most persuasive piece on the contemporary social capital conundrum I’ve yet read.”

— Zach

Scrolling Alone

By Andrew Trousdale and Erik Larson

Americans today accumulate hundreds, even thousands, of Facebook “friends” and Instagram followers. Yet 35% report having less than three close friends and 17% report having none. A quarter of Americans lack social and emotional support. We’re supposedly more connected than ever, but according to the Surgeon General we are facing an epidemic of loneliness and isolation.

It’s tempting to believe that smartphones and social media were introduced to an ideal society and ruined everything. But the social problems we face today — while linked to contemporary digital technologies — are deeper and more nuanced than that. They originated from 20th century technological and cultural forces that also brought extraordinary benefits. It is only by looking back at these benefits that we can see today’s social problems clearly: as the result of trade-offs we have, for decades, been willing to make.

The post-war period in America was a time of enormous economic progress. Between 1947 and 1970, median family income doubled and home ownership soared. This expansion of the middle class brought with it a growing orientation toward mass comfort and convenience as the measure of everyday progress. The dream of labor-saving technology wasn’t new, but the postwar boom made it newly attainable for millions. Innovations like dishwashers, TVs, air conditioning, and remote controls flooded American homes. The Jetsons — with its push-button meals and moving sidewalks — captured an emerging vision for how technology would make life better.

These technologies did free up time, save money, reduce drudgery, and give us more control over our environments. But, as Robert Putnam first posited in his groundbreaking book Bowling Alone, they also disentangled us from one another — eliminating norms and shared experiences that, however effortful, also provided connection. As we grew accustomed to privacy, efficiency, and ease, maintaining our social lives and communities increasingly became a hassle. Independence replaced interdependence. After more than 70 years of making this trade-off, this is the culture we inherited and participate in daily.

The Convenience vs Connection Trade-off

In 1997, John Lambert received a kidney from Andy Boschma, a fellow bowler from his Tuesday night league in Kalamazoo, Michigan. They weren’t relatives. They weren’t even close friends. They just bowled together once a week and that was enough. Putnam opens Bowling Alone with this story because it captures what we’ve been losing: the kind of trust where casual friends would give you a kidney.

Stories like Lambert and Boschma’s emerged from a world of regular, low-stakes, in-person interaction. In 1964, 55% of Americans believed “most people can be trusted.” As Putnam recounts, the average adult belonged to about two organizations. Family dinners were nightly rituals for half of Americans. Dropping by a neighbor’s house unannounced was normal. This was, by Putnam’s measures, the high-water mark of American civic life.

By 2000, when Putnam published Bowling Alone, that world was already disappearing. Trust had fallen to around 30%. Organizational membership fell sharply. He shows that by the 1990s, Americans were joining organizations at just one-quarter the rate they had in the 1960s, and community meeting attendance had dropped by a third. Hosting friends at home fell by 35%.

Four Decades of Dwindling Trust, 1960-1999

Figure 1. From Bowling Alone showing decline from 1960 to 2000 in the percentage of people who say “most people can be trusted.” The data from 2000 to 2024 shows trust roughly flatlining around 35%.

What happened? Starting in the 1950s, America underwent a wave of changes that looked like unalloyed progress. The 1956 Federal Highway Act funded 41,000 miles of interstate, opening up a suburban frontier where families could afford their own homes with yards, driveways, and privacy. Women entered the workforce en masse, expanding freedom and equality and adding to household incomes. The television — which provided cheap, effortless entertainment — was adopted faster than any technology in history, from 10% of homes in 1950 to 90% by 1959, according to Putnam. Air conditioning made homes comfortable year-round. Shopping migrated from Main Street to climate-controlled malls with better prices and wider selection.

These changes were widely embraced because they made life better for millions of people in countless ways. But as Putnam documents, they quietly eroded community, shifting American life toward comfort, privacy, and control, and away from the places and habits that had held communities together.

Suburbs scattered neighbors across cul-de-sacs designed for privacy over casual interaction. The front porch — where you might wave to a neighbor and end up talking for an hour — gave way to the private backyard deck and the two-car garage. Television privatized entertainment, moving what once happened in theaters, dance halls, and community centers into living rooms where, by the 1990s, the average American adult was watching almost four hours a day, and, Putnam tells us, half of adults usually watched alone. Dual incomes often meant neither parent had time for the PTA meeting or volunteer shift. Local shops on main street closed because they couldn’t compete with the mall.

Generation by generation, the habits of connection weakened while the scope of everyday comfort, privacy, and control grew. Then came the digital revolution — with the internet and smartphones — and these isolating forces accelerated.

Digital technology extends the logic of suburban sprawl: it allows us to live not just physically apart, but entirely in parallel. In the past decade, e-commerce jumped from 7% to 16% of retail while physical stores shuttered. Online grocery sales are growing 28% year over year. Home exercise has surged in popularity. Twenty-eight percent of Americans work from home, up from just 8% in 2019. Across every sphere — shopping, working, exercising, socializing — we’re choosing staying in over going out because we enjoy the privacy and convenience.


Figure 2. *How Couples Meet and Stay Together. It is great that digital tools help people meet romantic partners. The problem visible in this chart is the decline in all other forms of socialization.

Meanwhile productivity technologies are dissolving the boundaries between work and personal life. While work used to have clear boundaries, today, for knowledge workers in particular, a laptop and Wi-Fi mean the office never closes. Work bleeds into every hour, every room. Microsoft’s Work Trend Index reports that “the average employee now sends or receives more than 50 messages outside of core business hours, and by 10 p.m., nearly a third (29%) of active workers dive back into their inboxes.” More than a third of U.S. workers now do gig work, which offers the freedom to work whenever you want. But when you can always be earning, social commitments become harder to justify. Giurge, Whillans, and West argue that “time poverty” — the chronic feeling of having too much to do and not enough time to do it — is increasing and hits affluent knowledge workers hardest. They use time-saving tools not to free up social or leisure time, but to take on more work commitments. These innovations in how we work make us more productive and create earning opportunities. But they also place a round-the-clock demand on our time. And when we optimize for individual productivity, we sacrifice the shared time — after-hours and weekends — that enables community life.

Workplace innovations are consuming social time and bandwidth, and so are the televisions in our pocket. Putnam found that most of the leisure gains since 1965 have gone to screen-based activities rather than face-to-face social ones. He called television “the only leisure activity that seems to inhibit participation outside the home.” And he argued TV didn’t just consume time, it also rewired leisure from shared experience toward solitary consumption.

Whereas television stays in one room, smartphones are with us everywhere — at bus stops, in waiting rooms, at restaurants, and while “watching” our kids at the playground. Americans still watch 3.5 hours of TV daily in addition to 4.7 hours on smartphones. The internet and smartphones didn’t replace television; they stacked on top, crowding out a mix of other activities. Scott Wallsten found that “a cost of online activity is less time spent with other people.”And when Hunt Allcott randomly deactivated people’s Facebook accounts, they got back an average of 60 minutes per day and spent more of it with people in person. As Netflix co-founder Reed Hastings put it, “we compete with sleep.”

Figure 3. Our World in Data showing rise in daily hours spent with digital media in the U.S. Satista found that in 2021 this figure reached over eight hours and has remained there since

When we spoke with Putnam recently, he said "things are way worse than I thought." Today, only 30% of Americans socialize on any given day. As of 2023, young people spend 45% more time alone than 15 years earlier. Two-thirds of Americans under 30 believe most people can't be trusted. According to Sherry Turkle, even time together with others is compromised by our connected devices, which make us less present to those around us.

Figure 4. Our World in Data figure showing increase in time alone and decrease in time spent with all other groups among Americans between 15 and 29

There’s a reason these tools have saturated our lives. They save us time, make us more productive, free us from drudgery, engage us when we’re bored, connect us when we’re otherwise alone. But for all that technology can do, it is rarely an adequate substitute for physical presence, shared vulnerability, or the willingness to be inconvenienced for the sake of others.

For better and for worse, we built a world where you can work, shop, eat, exercise, learn, and socialize without ever leaving your home, where work and leisure are increasingly things we do alone in front of screens. In other words, we’ve allowed social interaction to become more optional than ever.

The Path Forward

When we asked Robert Putnam what gives him hope, he pointed to history. In The Upswing, he reminds us that Americans faced a similar crisis before. The Gilded Age brought economic inequality, industrialization, and the rise of anonymous urban life. Small-town bonds gave way to tenements and factory floors. Trust collapsed. By the 1890s, social capital had reached historic lows — roughly where it stands today.

The Progressive reformers found this new world unacceptable, but they didn’t try to turn back the clock. Cities and factories were here to stay. Instead, they adapted, creating new forms of connection suited to their changed reality, from settlement houses for anonymous neighborhoods to women’s clubs that built networks of mutual aid. They didn’t reject modernity; they metabolized it, showing up day after day to create new institutions and communities suited to the industrialized world.

Decades ago Neil Postman observed in Amusing Ourselves to Death that we haven’t been conquered by technology — we’ve surrendered to it because we like the stimulation and cheap amusement. More recently, Nicholas Carr concludes in Superbloom that we’re complicit in our loneliness because we embrace these superficial, mediated forms of connection. Like Postman and Carr, the Progressive Era reformers understood where they had agency when technology upended their world. It isn’t in demanding that others fix systems we willingly participate in, nor is it in outright rejecting technologies that deliver real benefits — it’s in changing how we ourselves live with and make use of the tools that surround us.

There are already signs that people are willing to do this. In a small, three-day survey, Talker Research found that 63% of Gen Z now intentionally unplug — the highest rate of any generation — and that half of Americans are spending less time on screens for their well-being, and their top alternative activity is time with friends and family. And they found that two-thirds of Americans are embracing “slow living,” with 84% adopting analog lifestyle choices like wristwatches and paper notebooks that help them unplug. Meanwhile in Eventbrite’s “Reset to Real” survey, 74% of young adults say in-person experiences matter more than digital ones. New devices like the Light Phone, Brick, Meadow, and Daylight Computer signal a growing demand for utility without distraction.

Unplugging isn’t enough on its own. The time and energy we reclaim has to go toward building social connections: hosting the dinner party despite the hassle, staying for coffee after church when you’d rather go home, sitting through the awkward silence, offering or asking for help.

Ultimately, we can’t expect deep social connection in a culture that prioritizes individual ease and convenience. Nor is community something technology can deliver for us. What’s required is a change of culture, grounded in a basic fact of human nature: that authentic connection requires action and effort, and that this action and effort is part of what makes connection fulfilling in the first place.

We can form new rituals and institutions that allow us to adapt to technology, ultimately changing it to our liking. But it starts with the tools we use and the choices we make each day. If we all prioritize the individual comforts and conveniences we’ve grown accustomed to, no one else will restore the community we say we miss. No one else can. If we want deeper relationships and better communities than we have, we’re going to have to put more of our time, effort, and attention into the people around us.

History shows that we can adapt, building communities suited to changing times. The question is: Will we stay in and scroll? Or will we go out and choose one another?

Tying Engineering Metrics to Business Metrics

Mike's Notes

Robust measures of financial health, tied back to engineering work, would be very useful to Ajabbi, a social enterprise that needs to be viable. The default is full transparency unless there is a very good reason not to. The plan is to have Pipi run the measurement process automatically and provide feedback loops.

To do

  • Build these measures into the DevOps Engine.
  • Add items to the workspace dashboard UI

Resources

References

  • Reference

Repository

  • Home > Ajabbi Research > Library >
  • Home > Handbook > 

Last Updated

11/02/2026

Tying Engineering Metrics to Business Metrics

By: Iccha Sethi
Medium: 26/11/2025

Interests include technology, building team culture, books and food. Engineering Leader..

Most engineering organizations I’ve worked in or led have tracked some form of engineering metrics. These range from simple metrics like uptime and incident count to more complex frameworks like DORA. As an engineering leader, you’ve probably been asked, either by someone within or outside of engineering: Why do these metrics matter? or How do they align with our business goals?

This post is aimed at demystifying some of this. We will cover:

  • Key Business Metrics
  • Lagging and Leading engineering metrics and how they connect to the key business metrics

While this isn’t an exhaustive list of engineering metrics, the goal is to provide a practical framework that you can adapt to your context.

Here is a TLDR of it, and let’s break it down along the way:

Key Business Metrics

Below are some key business metrics that most business use:

  • ARR (Annual Recurring Revenue): The total recurring revenue a company expects to receive annually from its customers. (Wall Street Prep)
  • NRR (Net Revenue Retention): A metric that measures the percentage of recurring revenue retained from existing customers over a specific period, accounting for expansions, contractions, and churn. (Planhat)
  • GRR (Gross Revenue Retention): The percentage of recurring revenue retained from existing customers over a specific period, excluding any revenue gained from expansions or upsells. (ChurnZero)
  • CAC (Customer Acquisition Cost): The total cost incurred by a company to acquire a new customer, including marketing and sales expenses. (Cast)

These metrics are lagging indicators, sometimes as lagging as 12 months, where a customer churns at the end of their yearly contract impacting the GRR.

Let us look at some potential Intermediate Outcomes which may impact these key business metrics.

Intermediate Outcomes

High GRR and NRR reflect loyal, satisfied customers who find the product valuable, easy to use (user experience), and reliable (system reliability). These customers are more likely to expand their usage, purchase additional features, and remain long-term advocates for your platform.

Acquiring new customers is generally more expensive than retaining existing ones. Studies indicate that attracting a new customer can cost up to five times more than retaining an existing one. Additionally, the probability of selling to an existing customer ranges between 60–70%, whereas the probability of selling to a new prospect is only 5–20%. These statistics underscore the financial benefits of focusing on customer retention strategies.

To grow the business via ARR and reduce CAC simultaneously, we must prioritize shipping product features quickly (feature velocity) without compromising the factors that sustain GRR and NRR.

Engineering Metrics (Lagging)

There are a number of engineering metrics which are lagging, but in much lesser magnitude of time than GRR/NRR/CAC/ARR. Metrics like uptime, time to detect and recover incidents, performance, support tickets, bugs, and team velocity can be measured over shorter timeframes.

As an engineering leader I have found that they’re most insightful when reviewed monthly and analyzed for trends over 3–6 months. These can be earlier indicators of unhappy customers and can enable the teams to take quick action, before the customer becomes a churn risk. Some examples include:

  • If there is an uptick in support tickets, growing disproportionately to customer base, or team is unable to keep up with support ticket SLAs, it is an indication of potentially higher number of bugs in the product, or an unintuitive user experience, leading to unhappy customers.
  • Increasing number of incidents, or high TTD, TTR along with decrease in Uptime means there are periods of time the product is unavailable or not working as expected again impacting customer trust.
  • Slow web app performance means it takes longer to get tasks done and unideal user experience.
  • Team velocity impacts the ability to ship customer-requested features.

Engineering Metrics (Leading)

Sometimes even months might be too late to come back and fix something. Luckily we have a number of best practices, and a set of metrics related to these best practices when done right have a high correlation to the lagging engineering indicators. These metrics though imperfect in their own ways, generally are a decent real time indicator of potential impact to lagging indicators. Some of these leading indicators include: Test coverage, PR size, Feature flag usage, deployment frequency, lead time for change, etc. Some of these can be reviewed on a per Pull request basis, or even daily. Ideally individual teams, or engineers feel a high sense of ownership for these.

Summary

Tying these all together — short lead time for change, means PRs get quickly into production. This is not only amazing for team and product velocity because we are shipping changes quickly and get to validate them quicker in production, but also allow us to decrease our Time to recover during incidents by applying a fix quickly. With lower impacting incidents, means less unhappy customers which help us maintain our GRR. Similarly with quicker time for features to get in production, means our product has higher value quicker, thereby making it easier to gain new customers and increase our ARR.

I hope this post clarifies the connection between engineering and business metrics. The next time someone asks why code coverage or deployment frequency matters to the business, you’ll have the answer — and a framework to back it up! 😀

AI’s Memorization Crisis

Mike's Notes

I agree with this article.

Resources

References

  • Reference

Repository

  • Home > Ajabbi Research > Library >
  • Home > Handbook > 

Last Updated

10/02/2026

AI’s Memorization Crisis

By: Alex Reisner
The Atlantic: 9/01/2026

Alex Reisner is a staff writer at The Atlantic.

Large language models don’t “learn”—they copy. And that could change everything for the tech industry.

Editor’s note: This work is part of AI Watchdog, The Atlantic’s ongoing investigation into the generative-AI industry.

On Tuesday, researchers at Stanford and Yale revealed something that AI companies would prefer to keep hidden. Four popular large language models—OpenAI’s GPT, Anthropic’s Claude, Google’s Gemini, and xAI’s Grok—have stored large portions of some of the books they’ve been trained on, and can reproduce long excerpts from those books.

In fact, when prompted strategically by researchers, Claude delivered the near-complete text of Harry Potter and the Sorcerer’s Stone, The Great Gatsby, 1984, and Frankenstein, in addition to thousands of words from books including The Hunger Games and The Catcher in the Rye. Varying amounts of these books were also reproduced by the other three models. Thirteen books were tested.

This phenomenon has been called “memorization,” and AI companies have long denied that it happens on a large scale. In a 2023 letter to the U.S. Copyright Office, OpenAI said that “models do not store copies of the information that they learn from.” Google similarly told the Copyright Office that “there is no copy of the training data—whether text, images, or other formats—present in the model itself.” Anthropic, Meta, Microsoft, and others have made similar claims. (None of the AI companies mentioned in this article agreed to my requests for interviews.)

The Stanford study proves that there are such copies in AI models, and it is just the latest of several studies to do so. In my own investigations, I’ve found that image-based models can reproduce some of the art and photographs they’re trained on. This may be a massive legal liability for AI companies—one that could potentially cost the industry billions of dollars in copyright-infringement judgments, and lead products to be taken off the market. It also contradicts the basic explanation given by the AI industry for how its technology works.

AI is frequently explained in terms of metaphor; tech companies like to say that their products learn, that LLMs have, for example, developed an understanding of English writing without explicitly being told the rules of English grammar. This new research, along with several other studies from the past two years, undermines that metaphor. AI does not absorb information like a human mind does. Instead, it stores information and accesses it.

In fact, many AI developers use a more technically accurate term when talking about these models: lossy compression. It’s beginning to gain traction outside the industry too. The phrase was recently invoked by a court in Germany that ruled against OpenAI in a case brought by GEMA, a music-licensing organization. GEMA showed that ChatGPT could output close imitations of song lyrics. The judge compared the model to MP3 and JPEG files, which store your music and photos in files that are smaller than the raw, uncompressed originals. When you store a high-quality photo as a JPEG, for example, the result is a somewhat lower-quality photo, in some cases with blurring or visual artifacts added. A lossy-compression algorithm still stores the photo, but it’s an approximation rather than the exact file. It’s called lossy compression because some of the data are lost.

From a technical perspective, this compression process is much like what happens inside AI models, as researchers from several AI companies and universities have explained to me in the past few months. They ingest text and images, and output text and images that approximate those inputs.

But this simple description is less useful to AI companies than the learning metaphor, which has been used to claim that the statistical algorithms known as AI will eventually make novel scientific discoveries, undergo boundless improvement, and recursively train themselves, possibly leading to an “intelligence explosion.” The whole industry is staked on a shaky metaphor.

Garfunkel_and_Oates_from_cdn-pastemagazine-com.jpg
Source: Courtesy of Kyle Christy / IFC


Garfunkel_and_Oates_from_stable_diffusion.png
Output from Stable Diffusion 1.4

The problem becomes clear if we look at AI image generators. In September 2022, Emad Mostaque, a co-founder and the then-CEO of Stability AI, explained in a podcast interview how Stable Diffusion, Stability’s image model, was built. “We took 100,000 gigabytes of images and compressed it to a two-gigabyte file that can re-create any of those and iterations of those” images, he said.

One of the many experts I spoke with while reporting this article was an independent AI researcher who has studied Stable Diffusion’s ability to reproduce its training images. (I agreed to keep the researcher anonymous, because they fear repercussions from major AI companies.) Above is one example of this ability: On the left is the original from the web—a promotional image from the TV show Garfunkel and Oates—and on the right is a version that Stable Diffusion generated when prompted with a caption the image appears with on the web, which includes some HTML code: “IFC Cancels Garfunkel and Oates.” Using this simple technique, the researcher showed me how to produce near-exact copies of several dozen images known to be in Stable Diffusion’s training set, most of which include visual residue that looks something like lossy compression—the kind of glitchy, fuzzy effect you may notice in your own photos from time to time.

Karla_Ortiz_from_Karla_Ortiz_com.jpeg
Source: Karla Ortiz
Original artwork by Karla Ortiz (The Death I Bring, 2016, graphite)


Karla_Ortiz_from_stable_diffusion.png
Source: United States District Court,  Northern District of California
Output from Stability's Reimagine XL product (based on Stable Diffusion XL)

Above is another pair of images taken from a lawsuit against Stability AI and other companies. On the left is an original work by Karla Ortiz, and on the right is a variation from Stable Diffusion. Here, the image is a bit further from the original. Some elements have changed. Instead of compressing at the pixel level, the algorithm appears to be copying and manipulating objects from multiple images, while maintaining a degree of visual continuity.

As companies explain it, AI algorithms extract “concepts” from training data and learn to make original work. But the image on the right is not a product of concepts alone. It’s not a generic image of, say, “an angel with birds.” It’s difficult to pinpoint why any AI model makes any specific mark in an image, but we can reasonably assume that Stable Diffusion can render the image on the right partly because it has stored visual elements from the image on the left. It isn’t collaging in the physical cut-and-paste sense, but it also isn’t learning in the human sense the word implies. The model has no senses or conscious experience through which to make its own aesthetic judgments.

Google has written that LLMs store not copies of their training data but rather the “patterns in human language.” This is true on the surface but misleading once you dig into it. As has been widely documented, when a company uses a book to develop an AI model, it splits the book’s text into tokens or word fragments. For example, the phrase hello, my friend might be represented by the tokens he, llo, my, fri, and end. Some tokens are actual words; some are just groups of letters, spaces, and punctuation. The model stores these tokens and the contexts in which they appear in books. The resulting LLM is essentially a huge database of contexts and the tokens that are most likely to appear next.

The model can be visualized as a map. Here’s an example, with the actual most-likely tokens from Meta’s Llama-3.1-70B:

flow chart 
Source: The Atlantic / Llama

When an LLM “writes” a sentence, it walks a path through this forest of possible token sequences, making a high-probability choice at each step. Google’s description is misleading because the next-token predictions don’t come from some vague entity such as “human language” but from the particular books, articles, and other texts that the model has scanned.

By default, models will sometimes diverge from the most probable next token. This behavior is often framed by AI companies as a way of making the models more “creative,” but it also has the benefit of concealing copies of training text.

Sometimes the language map is detailed enough that it contains exact copies of whole books and articles. This past summer, a study of several LLMs found that Meta’s Llama 3.1-70B model can, like Claude, effectively reproduce the full text of Harry Potter and the Sorcerer’s Stone. The researchers gave the model just the book’s first few tokens, “Mr. and Mrs. D.” In Llama’s internal language map, the text most likely to follow was: “ursley, of number four, Privet Drive, were proud to say that they were perfectly normal, thank you very much.” This is precisely the book’s first sentence. Repeatedly feeding the model’s output back in, Llama continued in this vein until it produced the entire book, omitting just a few short sentences.

Using this technique, the researchers also showed that Llama had losslessly compressed large portions of other works, such as Ta-Nehisi Coates’s famous Atlantic essay “The Case for Reparations.” By prompting with the essay’s first sentence, more than 10,000 words, or two-thirds of the essay, came out of the model verbatim. Large extractions also appear to be possible from Llama 3.1-70B for George R. R. Martin’s A Game of Thrones, Toni Morrison’s Beloved, and others.

The Stanford and Yale researchers also showed this week that a model’s output can paraphrase a book rather than duplicate it exactly. For example, where A Game of Thrones reads “Jon glimpsed a pale shape moving through the trees,” the researchers found that GPT-4.1 produced “Something moved, just at the edge of sight—a pale shape, slipping between the trunks.” As in the Stable Diffusion example above, the model’s output is extremely similar to a specific original work.

This isn’t the only research to demonstrate the casual plagiarism of AI models. “On average, 8–15% of the text generated by LLMs” also exists on the web, in exactly that same form, according to one study. Chatbots are routinely breaching the ethical standards that humans are normally held to.

Memorization could have legal consequences in at least two ways. For one, if memorization is unavoidable, then AI developers will have to somehow prevent users from accessing memorized content, as law scholars have written. Indeed, at least one court has already required this. But existing techniques are easy to circumvent. For example, 404 Media has reported that OpenAI’s Sora 2 would not comply with a request to generate video of a popular video game called Animal Crossing but would generate a video if the game’s title was given as “‘crossing aminal’ [sic] 2017.” If companies can’t guarantee that their models will never infringe on a writer’s or artist’s copyright, a court could require them to take the product off the market.

A second reason that AI companies could be liable for copyright infringement is that a model itself could be considered an illegal copy. Mark Lemley, a Stanford law professor who has represented Stability AI and Meta in such lawsuits, told me he isn’t sure whether it’s accurate to say that a model “contains” a copy of a book, or whether “we have a set of instructions that allows us to create a copy on the fly in response to a request.” Even the latter is potentially problematic, but if judges decide that the former is true, then plaintiffs could seek the destruction of infringing copies. Which means that, in addition to fines, AI companies could in some cases face the possibility of being legally compelled to retrain their models from scratch, with properly licensed material.

In a lawsuit, The New York Times alleged that OpenAI’s GPT-4 could reproduce dozens of Times articles nearly verbatim. OpenAI (which has a corporate partnership with The Atlantic) responded by arguing that the Times used “deceptive prompts” that violated the company’s terms of service and prompted the model with sections from each of those articles. “Normal people do not use OpenAI’s products in this way,” the company wrote, and even claimed “that the Times paid someone to hack OpenAI’s products.” The company has also called this type of reproduction “a rare bug that we are working to drive to zero.”

But the emerging research is making clear that the ability to plagiarize is inherent to GPT-4 and all other major LLMs. None of the researchers I spoke with thought that the underlying phenomenon, memorization, is unusual or could be eradicated.

In copyright lawsuits, the learning metaphor lets companies make misleading comparisons between chatbots and humans. At least one judge has repeated these comparisons, likening an AI company’s theft and scanning of books to “training schoolchildren to write well.” There have also been two lawsuits in which judges ruled that training an LLM on copyrighted books was fair use, but both rulings were flawed in their handling of memorization: One judge cited expert testimony that showed that Llama could reproduce no more than 50 tokens from the plaintiffs’ books, though research has since been published that proves otherwise. The other judge acknowledged that Claude had memorized significant portions of books but said that the plaintiffs had failed to allege that this was a problem.

Research on how AI models reuse their training content is still primitive, partly because AI companies are motivated to keep it that way. Several of the researchers I spoke with while reporting this article told me about memorization research that has been censored and impeded by company lawyers. None of them would talk about these instances on the record, fearing retaliation from companies.

Meanwhile, OpenAI CEO Sam Altman has defended the technology’s “right to learn” from books and articles, “like a human can.” This deceptive, feel-good idea prevents the public discussion we need to have about how AI companies are using the creative and intellectual works upon which they are utterly dependent.

Using slide presentations to describe Pipi

Mike's Notes

Thoughts on how to give useful slide presentation talks about Pipi, record them, and make them available on YouTube as a way to explain how Pipi works.

Resources

References

  • Content Management Bible 2nd Ed., by Bob Boiko. Wiley. 2005.

Repository

  • Home > Ajabbi Research > Library >
  • Home > Handbook > 

Last Updated

09/02/2026

Using slide presentations to describe Pipi

By: Mike Peters
On a Sandy Beach: 09/02/2026

Mike is the inventor and architect of Pipi and the founder of Ajabbi.

I gave a slide talk last night at the regular Open Research Group online meeting about future blog posts being created by a human using a Workspace, transferred to the CMS Engine (cms), processed, and then automatically published to Google Blogger. Creating the slides made me realise the opportunity available to use this format to visually explain the many parts of Pipi simply.

I will give a slide presentation on the Workspace Engine (wsp) at the next meeting. I will also give one on the Workspaces for Screen to the local Film Industry group this month.

Backstory

When I was a young adult, I went around with a group of good people, many of whom have since become lifelong friends, who encouraged me to give some talks. The only problem was that their approach was to write the talk in advance and then read it aloud to the audience. They were all very good at it, and I was hopeless.

  • The first problem was that I found it impossible to write.
  • The second problem was reading out loud what was written. I tripped over the words.

Years later, I had this idea that just talking about something in front of me might work a lot better, a picture, a map, a physical gadget, for example. I have no problems talking about something I understand.

When I became National President of NZERN, I had to give many talks, and that was the method: show slides and just talk about the pictures or diagrams without using notes, unless there was a name or date to remember, often using a whiteboard to draw answers for people who asked questions in a meeting.

I ended up giving hundreds of talks at conferences and workshops across NZ. The longest was 2 1/2 hours, given to the South Island DOC IMU workshop about the NZERN GIS project using ESRI software, and it was highly technical. No notes, just 50 slides.

Using computers like a typewriter has been a tremendous help because of cut-and-paste, which is much easier than shuffling bits of paper. Besides, I use Arial 16pt, which is much easier to read than my handwriting.

Mrs Grammarly

Later, I learned to use assistive technology to help me write. Grammarly Pro rewrites every single sentence that has my name on it, including this post. Grammarly is set on formal British business English. I hope my personal secretary, Mrs Grammarly, is doing a good job.

Big Challenge

Pipi is largely undocumented because it was designed and built visually. There must be thousands of hand coloured drawings on A4 paper, some neatly filed in 50+ 3-hole A4 ring binders, and the rest in many cartons waiting to be filed. Pipi needs to be documented so others can use it. There is a steadily growing interest in Pipi worldwide.

Solutions

  1. Getting Pipi to self-document is well underway, using structured templates that render from hundreds of databases. A rough estimate is that 20,000 web pages of developer technical documentation will be required due to the scale and scope of this enterprise platform.
  2. Setting up a community forum where users can ask questions and provide answers will take the load off me.
  3. I also need to explain verbally the more complicated bits that I find too difficult to write about. Give a slide presentation and record it to share on YouTube.
  4. Use screen capture to record live demos of Pipi in use.
  5. Provide regular Office Hours that can be booked for video chats via Google Meet or Zoom. I'm doing that a lot, and it seems to work.
  6. Record video interviews with the people who wrote most of the articles that I have copied and republished on this engineering blog, On a Sandy Beach. They could be two-way and a chance to discuss some deep issues.
  7. Teaching someone something complex by making it simple is the best way to learn it. So, giving many talks will also help me understand more clearly.

Slide Presentations

Here is a possible list of some overview talks about just one engine as an example. Then there could be more detailed talks on the same subjects. There are hundreds of Agent Engines. Each talk could have about 10-20 slides.

CMS Engine

  • 101 Introduction
  • 102 Content Management System
  • 103 Publication
  • 104 Website
  • 105 Blog
  • 106 Wiki
  • 107 Docs
  • 108 Help
  • 109 Workspace

Next Steps

Once I get into the swing of it, it should get easy. I need to learn to speak more slowly, develop a visual style for the slides, establish a simple slide-naming convention, and address related details. Each slide set will need a webpage for downloading the PDF/PowerPoint/Google Slides, watching the YouTube Video, a printable PDF handout, and links to related information.

The recorded slides, talks, and demos could all be organised using the existing Diataxis framework and Learning Objects, which Pipi uses elsewhere.

GIS mapping options

Mike's Notes

Some thoughts about adding Geographic Information Systems (GIS) mapping to Pipi 10, the next major release of Pipi.

Spacetime

An unresolved issue is how to integrate GIS with 4D spacetime. See the work of Ontologists Chris Partridge on BORO and NATO, and Matthew West on 4Dism, Shell Oil Refineries, and the Ontological foundations behind the UK Digital Twin Project for Built InfrastructureChris Partridge also raised a related question in the Ontolog Forum recently.

Resources

References

  • Reference

Repository

  • Home > Ajabbi Research > Library >
  • Home > Handbook > 

Last Updated

09/02/2026

GIS mapping options

By: Mike Peters
On a Sandy Beach: 08/02/2026

Mike is the inventor and architect of Pipi and the founder of Ajabbi.

Pipi 4 (2005-2008)

NZERN, with the help of Parker Jones at Eagle Technology, secured an ESRI Conservation GIS grant to add GIS mapping capabilities.

I gave a live demo and presentation of Pipi 4 at an NZ ESRI User Conference held in Wellington. The head of engineering at ESRI was in the audience. In a few weeks, over NZ$600,000 worth of ESRI software was on its way.

ESRI gave us everything they had. Multiple (up to 10) licenses at version 8.2

  • ArcIMS
  • ArcSDE
  • Workstation
  • All the extensions.
  • Everything!
There was so much boxed software that it came on a pallet.

The plan was to provide free, dedicated, and customised web map hosting for every conservation project in NZ that wanted it. The smallest mapped project was 1500 sq m, scaling up to large landscape-scale, whole catchment projects. Every project was different, so the provided GIS was customised to meet their needs. There was even a visit by an ESRI staff member who proposed enabling NZERN to extend this to conservation efforts in the Pacific Island states, by providing training using ESRI-supplied laptops.

I got to create all the GeoDatabases and hack JTX to run in reverse to manage user map-edits history.

It was going very well, and many individual projects were getting dedicated dynamic web maps with all their data and GIS layers. All labour was donated (Ten thousand hours). 

QE2 National Trust and other national conservation-related organisations were also interested in using this shared GIS system.

Then the government funding that covered the core annual running costs dried up.

Core costs included;

  • Power
  • Bandwidth
  • Hardware
  • Repairs
  • Software books

Then the Key government came in, followed by the Christchurch Earthquake.
What a waste of opportunity for conservation.

After that, governments love to reinvent the wheel, so there have been many well-funded attempts to develop GIS for biodiversity/stream health for community use in NZ. None of them has been as good as Pipi, and most of them disappear after a while. So we are going to do something about that, except it will be available globally in many human languages, across many industries, and will use open-source GIS software.

Parker Jones, with Bonita, went on to create a GIS for Conservation organisation in NZ, and has done a great job. All power to them.

Pipi 9 (2023 - )

GIS Plugins include

  • Apple Map
  • ArcGIS Map
  • Azure Map
  • Google Map

Pipi 10

Customers will be able to integrate Pipi with their own ESRI GIS account deployments. Pipi GIS will use OGC standards. I have to say here that I love ESRI software, and the Eagle Technology people were great, but it is far too expensive and restrictive for this social-enterprise startup.

Options

Use open-source; it's free, and DIY everything.

Default Option

  • QGIS
  • GeoServer 3
  • GeoNode
  • PostGIS + PostgreSQL

Open Geospatial Consortium (OGC) 

These products are mature and conform to the OGC standards.

Geospatial Libraries

  • FDO – API (C++, .Net) between GIS application and sources; for manipulating, defining and analysing geospatial data.
  • GDAL/OGR – Library between GIS applications and sources; for reading and writing raster geospatial data formats (GDAL) and simple features vector data (OGR).
  • GeoTools – Open source GIS toolkit (Java); to enable the creation of interactive geographic visualization clients.
  • GEOS – A C++ port of the Java Topology Suite (JTS), a geometry model.
  • MetaCRS – Projections and coordinate system technologies, including PROJ.
  • Orfeo ToolBox (OTB) – Open source tools to process satellite images and extract information.
  • OSSIM: Extensive geospatial image processing libraries with support for satellite and aerial sensors and common image formats.
  • PostGIS – Spatial extensions for the PostgreSQL database, enabling geospatial queries.

Desktop Applications

  • QGIS – Desktop GIS for data viewing, editing and analysis — Windows, Mac and Linux.
  • GRASS GIS – an extensible GIS for image processing and analysing raster, topological vector and graphic data.
  • OSSIM – Libraries and applications used to process imagery, maps, terrain, and vector data.
  • Marble – Virtual globe and world atlas.
  • gvSIG – Desktop GIS for data capturing, storing, handling, analysing and deploying. Includes map editing.
  • uDIG

Web Mapping Server

  • MapServer – Fast web mapping engine for publishing spatial data and services on the web; written in C.
  • Geomajas – Development software for web-based and cloud-based GIS applications.
  • GeoServer – Allows users to share and edit geospatial data. Written in Java using GeoTools.
  • deegree – Java framework
  • PyWPS – implementation of the OGC Web Processing Service standard, using Python
  • pygeoapi - A Python server implementation of the OGC API suite of standards for geospatial data.

Web Mapping Client

  • GeoMoose – JavaScript Framework for displaying distributed GIS data.
  • Mapbender – Framework to display, overlay, edit and manage distributed Web Map Services using PHP and JavaScript.
  • MapGuide Open Source – Platform for developing and deploying web mapping applications and geospatial web services. Windows-based, native file format.
  • MapFish – Framework for building rich web-mapping applications based on the Pylons Python web framework.
  • OpenLayers – an AJAX library (API) for accessing geographic data layers of all kinds.

Hosting

The GeoServer and PostGIS + PostgreSQL Geodatabase will need to be deployed in the Pipi Data Centre and used by the spatial agent engine. Providing hosted GIS to customers will require Ajabbi to purchase or lease bare-metal servers to host open-source GIS Web Servers.

GeoServer 3 will be available in Docker. All doable.

Support

Sponsor open-source and pay for support from GeoSolutions, etc.