Tying Engineering Metrics to Business Metrics

Mike's Notes

Robust measures of financial health, tied back to engineering work, would be very useful to Ajabbi, a social enterprise that needs to be viable. The default is full transparency unless there is a very good reason not to. The plan is to have Pipi run the measurement process automatically and provide feedback loops.

To do

  • Build these measures into the DevOps Engine.
  • Add items to the workspace dashboard UI

Resources

References

  • Reference

Repository

  • Home > Ajabbi Research > Library >
  • Home > Handbook > 

Last Updated

11/02/2026

Tying Engineering Metrics to Business Metrics

By: Iccha Sethi
Medium: 26/11/2025

Interests include technology, building team culture, books and food. Engineering Leader..

Most engineering organizations I’ve worked in or led have tracked some form of engineering metrics. These range from simple metrics like uptime and incident count to more complex frameworks like DORA. As an engineering leader, you’ve probably been asked, either by someone within or outside of engineering: Why do these metrics matter? or How do they align with our business goals?

This post is aimed at demystifying some of this. We will cover:

  • Key Business Metrics
  • Lagging and Leading engineering metrics and how they connect to the key business metrics

While this isn’t an exhaustive list of engineering metrics, the goal is to provide a practical framework that you can adapt to your context.

Here is a TLDR of it, and let’s break it down along the way:

Key Business Metrics

Below are some key business metrics that most business use:

  • ARR (Annual Recurring Revenue): The total recurring revenue a company expects to receive annually from its customers. (Wall Street Prep)
  • NRR (Net Revenue Retention): A metric that measures the percentage of recurring revenue retained from existing customers over a specific period, accounting for expansions, contractions, and churn. (Planhat)
  • GRR (Gross Revenue Retention): The percentage of recurring revenue retained from existing customers over a specific period, excluding any revenue gained from expansions or upsells. (ChurnZero)
  • CAC (Customer Acquisition Cost): The total cost incurred by a company to acquire a new customer, including marketing and sales expenses. (Cast)

These metrics are lagging indicators, sometimes as lagging as 12 months, where a customer churns at the end of their yearly contract impacting the GRR.

Let us look at some potential Intermediate Outcomes which may impact these key business metrics.

Intermediate Outcomes

High GRR and NRR reflect loyal, satisfied customers who find the product valuable, easy to use (user experience), and reliable (system reliability). These customers are more likely to expand their usage, purchase additional features, and remain long-term advocates for your platform.

Acquiring new customers is generally more expensive than retaining existing ones. Studies indicate that attracting a new customer can cost up to five times more than retaining an existing one. Additionally, the probability of selling to an existing customer ranges between 60–70%, whereas the probability of selling to a new prospect is only 5–20%. These statistics underscore the financial benefits of focusing on customer retention strategies.

To grow the business via ARR and reduce CAC simultaneously, we must prioritize shipping product features quickly (feature velocity) without compromising the factors that sustain GRR and NRR.

Engineering Metrics (Lagging)

There are a number of engineering metrics which are lagging, but in much lesser magnitude of time than GRR/NRR/CAC/ARR. Metrics like uptime, time to detect and recover incidents, performance, support tickets, bugs, and team velocity can be measured over shorter timeframes.

As an engineering leader I have found that they’re most insightful when reviewed monthly and analyzed for trends over 3–6 months. These can be earlier indicators of unhappy customers and can enable the teams to take quick action, before the customer becomes a churn risk. Some examples include:

  • If there is an uptick in support tickets, growing disproportionately to customer base, or team is unable to keep up with support ticket SLAs, it is an indication of potentially higher number of bugs in the product, or an unintuitive user experience, leading to unhappy customers.
  • Increasing number of incidents, or high TTD, TTR along with decrease in Uptime means there are periods of time the product is unavailable or not working as expected again impacting customer trust.
  • Slow web app performance means it takes longer to get tasks done and unideal user experience.
  • Team velocity impacts the ability to ship customer-requested features.

Engineering Metrics (Leading)

Sometimes even months might be too late to come back and fix something. Luckily we have a number of best practices, and a set of metrics related to these best practices when done right have a high correlation to the lagging engineering indicators. These metrics though imperfect in their own ways, generally are a decent real time indicator of potential impact to lagging indicators. Some of these leading indicators include: Test coverage, PR size, Feature flag usage, deployment frequency, lead time for change, etc. Some of these can be reviewed on a per Pull request basis, or even daily. Ideally individual teams, or engineers feel a high sense of ownership for these.

Summary

Tying these all together — short lead time for change, means PRs get quickly into production. This is not only amazing for team and product velocity because we are shipping changes quickly and get to validate them quicker in production, but also allow us to decrease our Time to recover during incidents by applying a fix quickly. With lower impacting incidents, means less unhappy customers which help us maintain our GRR. Similarly with quicker time for features to get in production, means our product has higher value quicker, thereby making it easier to gain new customers and increase our ARR.

I hope this post clarifies the connection between engineering and business metrics. The next time someone asks why code coverage or deployment frequency matters to the business, you’ll have the answer — and a framework to back it up! 😀

AI’s Memorization Crisis

Mike's Notes

I agree with this article.

Resources

References

  • Reference

Repository

  • Home > Ajabbi Research > Library >
  • Home > Handbook > 

Last Updated

10/02/2026

AI’s Memorization Crisis

By: Alex Reisner
The Atlantic: 9/01/2026

Alex Reisner is a staff writer at The Atlantic.

Large language models don’t “learn”—they copy. And that could change everything for the tech industry.

Editor’s note: This work is part of AI Watchdog, The Atlantic’s ongoing investigation into the generative-AI industry.

On Tuesday, researchers at Stanford and Yale revealed something that AI companies would prefer to keep hidden. Four popular large language models—OpenAI’s GPT, Anthropic’s Claude, Google’s Gemini, and xAI’s Grok—have stored large portions of some of the books they’ve been trained on, and can reproduce long excerpts from those books.

In fact, when prompted strategically by researchers, Claude delivered the near-complete text of Harry Potter and the Sorcerer’s Stone, The Great Gatsby, 1984, and Frankenstein, in addition to thousands of words from books including The Hunger Games and The Catcher in the Rye. Varying amounts of these books were also reproduced by the other three models. Thirteen books were tested.

This phenomenon has been called “memorization,” and AI companies have long denied that it happens on a large scale. In a 2023 letter to the U.S. Copyright Office, OpenAI said that “models do not store copies of the information that they learn from.” Google similarly told the Copyright Office that “there is no copy of the training data—whether text, images, or other formats—present in the model itself.” Anthropic, Meta, Microsoft, and others have made similar claims. (None of the AI companies mentioned in this article agreed to my requests for interviews.)

The Stanford study proves that there are such copies in AI models, and it is just the latest of several studies to do so. In my own investigations, I’ve found that image-based models can reproduce some of the art and photographs they’re trained on. This may be a massive legal liability for AI companies—one that could potentially cost the industry billions of dollars in copyright-infringement judgments, and lead products to be taken off the market. It also contradicts the basic explanation given by the AI industry for how its technology works.

AI is frequently explained in terms of metaphor; tech companies like to say that their products learn, that LLMs have, for example, developed an understanding of English writing without explicitly being told the rules of English grammar. This new research, along with several other studies from the past two years, undermines that metaphor. AI does not absorb information like a human mind does. Instead, it stores information and accesses it.

In fact, many AI developers use a more technically accurate term when talking about these models: lossy compression. It’s beginning to gain traction outside the industry too. The phrase was recently invoked by a court in Germany that ruled against OpenAI in a case brought by GEMA, a music-licensing organization. GEMA showed that ChatGPT could output close imitations of song lyrics. The judge compared the model to MP3 and JPEG files, which store your music and photos in files that are smaller than the raw, uncompressed originals. When you store a high-quality photo as a JPEG, for example, the result is a somewhat lower-quality photo, in some cases with blurring or visual artifacts added. A lossy-compression algorithm still stores the photo, but it’s an approximation rather than the exact file. It’s called lossy compression because some of the data are lost.

From a technical perspective, this compression process is much like what happens inside AI models, as researchers from several AI companies and universities have explained to me in the past few months. They ingest text and images, and output text and images that approximate those inputs.

But this simple description is less useful to AI companies than the learning metaphor, which has been used to claim that the statistical algorithms known as AI will eventually make novel scientific discoveries, undergo boundless improvement, and recursively train themselves, possibly leading to an “intelligence explosion.” The whole industry is staked on a shaky metaphor.

Garfunkel_and_Oates_from_cdn-pastemagazine-com.jpg
Source: Courtesy of Kyle Christy / IFC


Garfunkel_and_Oates_from_stable_diffusion.png
Output from Stable Diffusion 1.4

The problem becomes clear if we look at AI image generators. In September 2022, Emad Mostaque, a co-founder and the then-CEO of Stability AI, explained in a podcast interview how Stable Diffusion, Stability’s image model, was built. “We took 100,000 gigabytes of images and compressed it to a two-gigabyte file that can re-create any of those and iterations of those” images, he said.

One of the many experts I spoke with while reporting this article was an independent AI researcher who has studied Stable Diffusion’s ability to reproduce its training images. (I agreed to keep the researcher anonymous, because they fear repercussions from major AI companies.) Above is one example of this ability: On the left is the original from the web—a promotional image from the TV show Garfunkel and Oates—and on the right is a version that Stable Diffusion generated when prompted with a caption the image appears with on the web, which includes some HTML code: “IFC Cancels Garfunkel and Oates.” Using this simple technique, the researcher showed me how to produce near-exact copies of several dozen images known to be in Stable Diffusion’s training set, most of which include visual residue that looks something like lossy compression—the kind of glitchy, fuzzy effect you may notice in your own photos from time to time.

Karla_Ortiz_from_Karla_Ortiz_com.jpeg
Source: Karla Ortiz
Original artwork by Karla Ortiz (The Death I Bring, 2016, graphite)


Karla_Ortiz_from_stable_diffusion.png
Source: United States District Court,  Northern District of California
Output from Stability's Reimagine XL product (based on Stable Diffusion XL)

Above is another pair of images taken from a lawsuit against Stability AI and other companies. On the left is an original work by Karla Ortiz, and on the right is a variation from Stable Diffusion. Here, the image is a bit further from the original. Some elements have changed. Instead of compressing at the pixel level, the algorithm appears to be copying and manipulating objects from multiple images, while maintaining a degree of visual continuity.

As companies explain it, AI algorithms extract “concepts” from training data and learn to make original work. But the image on the right is not a product of concepts alone. It’s not a generic image of, say, “an angel with birds.” It’s difficult to pinpoint why any AI model makes any specific mark in an image, but we can reasonably assume that Stable Diffusion can render the image on the right partly because it has stored visual elements from the image on the left. It isn’t collaging in the physical cut-and-paste sense, but it also isn’t learning in the human sense the word implies. The model has no senses or conscious experience through which to make its own aesthetic judgments.

Google has written that LLMs store not copies of their training data but rather the “patterns in human language.” This is true on the surface but misleading once you dig into it. As has been widely documented, when a company uses a book to develop an AI model, it splits the book’s text into tokens or word fragments. For example, the phrase hello, my friend might be represented by the tokens he, llo, my, fri, and end. Some tokens are actual words; some are just groups of letters, spaces, and punctuation. The model stores these tokens and the contexts in which they appear in books. The resulting LLM is essentially a huge database of contexts and the tokens that are most likely to appear next.

The model can be visualized as a map. Here’s an example, with the actual most-likely tokens from Meta’s Llama-3.1-70B:

flow chart 
Source: The Atlantic / Llama

When an LLM “writes” a sentence, it walks a path through this forest of possible token sequences, making a high-probability choice at each step. Google’s description is misleading because the next-token predictions don’t come from some vague entity such as “human language” but from the particular books, articles, and other texts that the model has scanned.

By default, models will sometimes diverge from the most probable next token. This behavior is often framed by AI companies as a way of making the models more “creative,” but it also has the benefit of concealing copies of training text.

Sometimes the language map is detailed enough that it contains exact copies of whole books and articles. This past summer, a study of several LLMs found that Meta’s Llama 3.1-70B model can, like Claude, effectively reproduce the full text of Harry Potter and the Sorcerer’s Stone. The researchers gave the model just the book’s first few tokens, “Mr. and Mrs. D.” In Llama’s internal language map, the text most likely to follow was: “ursley, of number four, Privet Drive, were proud to say that they were perfectly normal, thank you very much.” This is precisely the book’s first sentence. Repeatedly feeding the model’s output back in, Llama continued in this vein until it produced the entire book, omitting just a few short sentences.

Using this technique, the researchers also showed that Llama had losslessly compressed large portions of other works, such as Ta-Nehisi Coates’s famous Atlantic essay “The Case for Reparations.” By prompting with the essay’s first sentence, more than 10,000 words, or two-thirds of the essay, came out of the model verbatim. Large extractions also appear to be possible from Llama 3.1-70B for George R. R. Martin’s A Game of Thrones, Toni Morrison’s Beloved, and others.

The Stanford and Yale researchers also showed this week that a model’s output can paraphrase a book rather than duplicate it exactly. For example, where A Game of Thrones reads “Jon glimpsed a pale shape moving through the trees,” the researchers found that GPT-4.1 produced “Something moved, just at the edge of sight—a pale shape, slipping between the trunks.” As in the Stable Diffusion example above, the model’s output is extremely similar to a specific original work.

This isn’t the only research to demonstrate the casual plagiarism of AI models. “On average, 8–15% of the text generated by LLMs” also exists on the web, in exactly that same form, according to one study. Chatbots are routinely breaching the ethical standards that humans are normally held to.

Memorization could have legal consequences in at least two ways. For one, if memorization is unavoidable, then AI developers will have to somehow prevent users from accessing memorized content, as law scholars have written. Indeed, at least one court has already required this. But existing techniques are easy to circumvent. For example, 404 Media has reported that OpenAI’s Sora 2 would not comply with a request to generate video of a popular video game called Animal Crossing but would generate a video if the game’s title was given as “‘crossing aminal’ [sic] 2017.” If companies can’t guarantee that their models will never infringe on a writer’s or artist’s copyright, a court could require them to take the product off the market.

A second reason that AI companies could be liable for copyright infringement is that a model itself could be considered an illegal copy. Mark Lemley, a Stanford law professor who has represented Stability AI and Meta in such lawsuits, told me he isn’t sure whether it’s accurate to say that a model “contains” a copy of a book, or whether “we have a set of instructions that allows us to create a copy on the fly in response to a request.” Even the latter is potentially problematic, but if judges decide that the former is true, then plaintiffs could seek the destruction of infringing copies. Which means that, in addition to fines, AI companies could in some cases face the possibility of being legally compelled to retrain their models from scratch, with properly licensed material.

In a lawsuit, The New York Times alleged that OpenAI’s GPT-4 could reproduce dozens of Times articles nearly verbatim. OpenAI (which has a corporate partnership with The Atlantic) responded by arguing that the Times used “deceptive prompts” that violated the company’s terms of service and prompted the model with sections from each of those articles. “Normal people do not use OpenAI’s products in this way,” the company wrote, and even claimed “that the Times paid someone to hack OpenAI’s products.” The company has also called this type of reproduction “a rare bug that we are working to drive to zero.”

But the emerging research is making clear that the ability to plagiarize is inherent to GPT-4 and all other major LLMs. None of the researchers I spoke with thought that the underlying phenomenon, memorization, is unusual or could be eradicated.

In copyright lawsuits, the learning metaphor lets companies make misleading comparisons between chatbots and humans. At least one judge has repeated these comparisons, likening an AI company’s theft and scanning of books to “training schoolchildren to write well.” There have also been two lawsuits in which judges ruled that training an LLM on copyrighted books was fair use, but both rulings were flawed in their handling of memorization: One judge cited expert testimony that showed that Llama could reproduce no more than 50 tokens from the plaintiffs’ books, though research has since been published that proves otherwise. The other judge acknowledged that Claude had memorized significant portions of books but said that the plaintiffs had failed to allege that this was a problem.

Research on how AI models reuse their training content is still primitive, partly because AI companies are motivated to keep it that way. Several of the researchers I spoke with while reporting this article told me about memorization research that has been censored and impeded by company lawyers. None of them would talk about these instances on the record, fearing retaliation from companies.

Meanwhile, OpenAI CEO Sam Altman has defended the technology’s “right to learn” from books and articles, “like a human can.” This deceptive, feel-good idea prevents the public discussion we need to have about how AI companies are using the creative and intellectual works upon which they are utterly dependent.

Using slide presentations to describe Pipi

Mike's Notes

Thoughts on how to give useful slide presentation talks about Pipi, record them, and make them available on YouTube as a way to explain how Pipi works.

Resources

References

  • Content Management Bible 2nd Ed., by Bob Boiko. Wiley. 2005.

Repository

  • Home > Ajabbi Research > Library >
  • Home > Handbook > 

Last Updated

09/02/2026

Using slide presentations to describe Pipi

By: Mike Peters
On a Sandy Beach: 09/02/2026

Mike is the inventor and architect of Pipi and the founder of Ajabbi.

I gave a slide talk last night at the regular Open Research Group online meeting about future blog posts being created by a human using a Workspace, transferred to the CMS Engine (cms), processed, and then automatically published to Google Blogger. Creating the slides made me realise the opportunity available to use this format to visually explain the many parts of Pipi simply.

I will give a slide presentation on the Workspace Engine (wsp) at the next meeting. I will also give one on the Workspaces for Screen to the local Film Industry group this month.

Backstory

When I was a young adult, I went around with a group of good people, many of whom have since become lifelong friends, who encouraged me to give some talks. The only problem was that their approach was to write the talk in advance and then read it aloud to the audience. They were all very good at it, and I was hopeless.

  • The first problem was that I found it impossible to write.
  • The second problem was reading out loud what was written. I tripped over the words.

Years later, I had this idea that just talking about something in front of me might work a lot better, a picture, a map, a physical gadget, for example. I have no problems talking about something I understand.

When I became National President of NZERN, I had to give many talks, and that was the method: show slides and just talk about the pictures or diagrams without using notes, unless there was a name or date to remember, often using a whiteboard to draw answers for people who asked questions in a meeting.

I ended up giving hundreds of talks at conferences and workshops across NZ. The longest was 2 1/2 hours, given to the South Island DOC IMU workshop about the NZERN GIS project using ESRI software, and it was highly technical. No notes, just 50 slides.

Using computers like a typewriter has been a tremendous help because of cut-and-paste, which is much easier than shuffling bits of paper. Besides, I use Arial 16pt, which is much easier to read than my handwriting.

Mrs Grammarly

Later, I learned to use assistive technology to help me write. Grammarly Pro rewrites every single sentence that has my name on it, including this post. Grammarly is set on formal British business English. I hope my personal secretary, Mrs Grammarly, is doing a good job.

Big Challenge

Pipi is largely undocumented because it was designed and built visually. There must be thousands of hand coloured drawings on A4 paper, some neatly filed in 50+ 3-hole A4 ring binders, and the rest in many cartons waiting to be filed. Pipi needs to be documented so others can use it. There is a steadily growing interest in Pipi worldwide.

Solutions

  1. Getting Pipi to self-document is well underway, using structured templates that render from hundreds of databases. A rough estimate is that 20,000 web pages of developer technical documentation will be required due to the scale and scope of this enterprise platform.
  2. Setting up a community forum where users can ask questions and provide answers will take the load off me.
  3. I also need to explain verbally the more complicated bits that I find too difficult to write about. Give a slide presentation and record it to share on YouTube.
  4. Use screen capture to record live demos of Pipi in use.
  5. Provide regular Office Hours that can be booked for video chats via Google Meet or Zoom. I'm doing that a lot, and it seems to work.
  6. Record video interviews with the people who wrote most of the articles that I have copied and republished on this engineering blog, On a Sandy Beach. They could be two-way and a chance to discuss some deep issues.
  7. Teaching someone something complex by making it simple is the best way to learn it. So, giving many talks will also help me understand more clearly.

Slide Presentations

Here is a possible list of some overview talks about just one engine as an example. Then there could be more detailed talks on the same subjects. There are hundreds of Agent Engines. Each talk could have about 10-20 slides.

CMS Engine

  • 101 Introduction
  • 102 Content Management System
  • 103 Publication
  • 104 Website
  • 105 Blog
  • 106 Wiki
  • 107 Docs
  • 108 Help
  • 109 Workspace

Next Steps

Once I get into the swing of it, it should get easy. I need to learn to speak more slowly, develop a visual style for the slides, establish a simple slide-naming convention, and address related details. Each slide set will need a webpage for downloading the PDF/PowerPoint/Google Slides, watching the YouTube Video, a printable PDF handout, and links to related information.

The recorded slides, talks, and demos could all be organised using the existing Diataxis framework and Learning Objects, which Pipi uses elsewhere.

GIS mapping options

Mike's Notes

Some thoughts about adding Geographic Information Systems (GIS) mapping to Pipi 10, the next major release of Pipi.

Spacetime

An unresolved issue is how to integrate GIS with 4D spacetime. See the work of Ontologists Chris Partridge on BORO and NATO, and Matthew West on 4Dism, Shell Oil Refineries, and the Ontological foundations behind the UK Digital Twin Project for Built InfrastructureChris Partridge also raised a related question in the Ontolog Forum recently.

Resources

References

  • Reference

Repository

  • Home > Ajabbi Research > Library >
  • Home > Handbook > 

Last Updated

09/02/2026

GIS mapping options

By: Mike Peters
On a Sandy Beach: 08/02/2026

Mike is the inventor and architect of Pipi and the founder of Ajabbi.

Pipi 4 (2005-2008)

NZERN, with the help of Parker Jones at Eagle Technology, secured an ESRI Conservation GIS grant to add GIS mapping capabilities.

I gave a live demo and presentation of Pipi 4 at an NZ ESRI User Conference held in Wellington. The head of engineering at ESRI was in the audience. In a few weeks, over NZ$600,000 worth of ESRI software was on its way.

ESRI gave us everything they had. Multiple (up to 10) licenses at version 8.2

  • ArcIMS
  • ArcSDE
  • Workstation
  • All the extensions.
  • Everything!
There was so much boxed software that it came on a pallet.

The plan was to provide free, dedicated, and customised web map hosting for every conservation project in NZ that wanted it. The smallest mapped project was 1500 sq m, scaling up to large landscape-scale, whole catchment projects. Every project was different, so the provided GIS was customised to meet their needs. There was even a visit by an ESRI staff member who proposed enabling NZERN to extend this to conservation efforts in the Pacific Island states, by providing training using ESRI-supplied laptops.

I got to create all the GeoDatabases and hack JTX to run in reverse to manage user map-edits history.

It was going very well, and many individual projects were getting dedicated dynamic web maps with all their data and GIS layers. All labour was donated (Ten thousand hours). 

QE2 National Trust and other national conservation-related organisations were also interested in using this shared GIS system.

Then the government funding that covered the core annual running costs dried up.

Core costs included;

  • Power
  • Bandwidth
  • Hardware
  • Repairs
  • Software books

Then the Key government came in, followed by the Christchurch Earthquake.
What a waste of opportunity for conservation.

After that, governments love to reinvent the wheel, so there have been many well-funded attempts to develop GIS for biodiversity/stream health for community use in NZ. None of them has been as good as Pipi, and most of them disappear after a while. So we are going to do something about that, except it will be available globally in many human languages, across many industries, and will use open-source GIS software.

Parker Jones, with Bonita, went on to create a GIS for Conservation organisation in NZ, and has done a great job. All power to them.

Pipi 9 (2023 - )

GIS Plugins include

  • Apple Map
  • ArcGIS Map
  • Azure Map
  • Google Map

Pipi 10

Customers will be able to integrate Pipi with their own ESRI GIS account deployments. Pipi GIS will use OGC standards. I have to say here that I love ESRI software, and the Eagle Technology people were great, but it is far too expensive and restrictive for this social-enterprise startup.

Options

Use open-source; it's free, and DIY everything.

Default Option

  • QGIS
  • GeoServer 3
  • GeoNode
  • PostGIS + PostgreSQL

Open Geospatial Consortium (OGC) 

These products are mature and conform to the OGC standards.

Geospatial Libraries

  • FDO – API (C++, .Net) between GIS application and sources; for manipulating, defining and analysing geospatial data.
  • GDAL/OGR – Library between GIS applications and sources; for reading and writing raster geospatial data formats (GDAL) and simple features vector data (OGR).
  • GeoTools – Open source GIS toolkit (Java); to enable the creation of interactive geographic visualization clients.
  • GEOS – A C++ port of the Java Topology Suite (JTS), a geometry model.
  • MetaCRS – Projections and coordinate system technologies, including PROJ.
  • Orfeo ToolBox (OTB) – Open source tools to process satellite images and extract information.
  • OSSIM: Extensive geospatial image processing libraries with support for satellite and aerial sensors and common image formats.
  • PostGIS – Spatial extensions for the PostgreSQL database, enabling geospatial queries.

Desktop Applications

  • QGIS – Desktop GIS for data viewing, editing and analysis — Windows, Mac and Linux.
  • GRASS GIS – an extensible GIS for image processing and analysing raster, topological vector and graphic data.
  • OSSIM – Libraries and applications used to process imagery, maps, terrain, and vector data.
  • Marble – Virtual globe and world atlas.
  • gvSIG – Desktop GIS for data capturing, storing, handling, analysing and deploying. Includes map editing.
  • uDIG

Web Mapping Server

  • MapServer – Fast web mapping engine for publishing spatial data and services on the web; written in C.
  • Geomajas – Development software for web-based and cloud-based GIS applications.
  • GeoServer – Allows users to share and edit geospatial data. Written in Java using GeoTools.
  • deegree – Java framework
  • PyWPS – implementation of the OGC Web Processing Service standard, using Python
  • pygeoapi - A Python server implementation of the OGC API suite of standards for geospatial data.

Web Mapping Client

  • GeoMoose – JavaScript Framework for displaying distributed GIS data.
  • Mapbender – Framework to display, overlay, edit and manage distributed Web Map Services using PHP and JavaScript.
  • MapGuide Open Source – Platform for developing and deploying web mapping applications and geospatial web services. Windows-based, native file format.
  • MapFish – Framework for building rich web-mapping applications based on the Pylons Python web framework.
  • OpenLayers – an AJAX library (API) for accessing geographic data layers of all kinds.

Hosting

The GeoServer and PostGIS + PostgreSQL Geodatabase will need to be deployed in the Pipi Data Centre and used by the spatial agent engine. Providing hosted GIS to customers will require Ajabbi to purchase or lease bare-metal servers to host open-source GIS Web Servers.

GeoServer 3 will be available in Docker. All doable.

Support

Sponsor open-source and pay for support from GeoSolutions, etc.

Using Google Blogger API v3

Mike's Notes

On a Sandy Beach is a publication of Ajabbi Research.

Changes needed

Enable other people at Ajabbi Research to also contribute to On a Sandy Beach using the Workspaces for Research UI.

It looks very straightforward. This job will be done once the Workspaces for Research are available for researchers to use.

Preparation

Last year, all existing blog posts were reformatted.

New setup

The Pipi CMS Engine (cms) will format, store in a database, and export content to On a Sandy Beach, hosted on Google Blogger, using the Blogger API ver 3.

Either XML/Atom or JSON can be used.

Steps

  1. Import the existing blog posts into the CMS Engine via the Blogger API
  2. Reformat every post (see 4th resource link below)
  3. Store in the CMS database
  4. Render posts
  5. Export back to Blogger via the Blogger API.
  6. All future posts will be created by a human using a workspace, transferred to the CMS Engine, processed, and then published to Google Blogger.

Notes

Here are some initial notes taken from the Google Blogger API documentation and other sites, tweaked using Gemini and rewritten by Grammarly.

These notes will be presented using slides at tomorrow night's online Open Research Group meeting.

Resources

References

  • Reference

Repository

  • Home > Ajabbi Research > Library >
  • Home > Handbook > 

Last Updated

09/02/2026

Using Google Blogger API v3

By: Mike Peters
On a Sandy Beach: 07/02/2026

Mike is the inventor and architect of Pipi and the founder of Ajabbi.

Blogger supports manual import and export of blog content via the Blogger dashboard or the Blogger API. The export file format for both methods is Atom, an XML-based format that includes all posts and comments. 

Blogger API for Import/Export

Developers can use the Blogger API (v3 is the current version) to programmatically manage blog content. The export file generated via the manual process uses the same Atom format as the API's feed requests. 

Key Points for API Use

  • Authentication: All operations for private data (including import/export) require authentication, typically using OAuth 2.0.
  • Format: The API uses REST APIs and JSON for standard operations. The specific import/export functions rely on the raw XML/Atom format described in the developer guides.
  • Functionality: The API allows retrieving, creating, updating, and deleting posts, comments, and pages, enabling the creation of custom import/export tools. For example, you can use API clients for Python, Java, or Node.js to manage content programmatically.
  • Custom Import Scripts: Developers can write scripts (e.g., in PHP or Python) to import content from other sources into Blogger via the API, which involves parsing the source data and making POST requests to Blogger API endpoints. 

For detailed API documentation, refer to the Blogger API Developer site.

REST

The Blogger API is a RESTful interface provided by Google that allows developers to integrate Blogger content and functionality into their own applications. It enables programmatic access to resources such as blogs, posts, comments, pages, and users via HTTP requests and JSON. 

Key Features and Concepts

  • Resources: The API revolves around five core resource types:
    • Blogs: The central container for all content and metadata.
    • Posts: The primary published content items, meant to be timely.
    • Comments: User reactions to specific posts.
    • Pages: Static content (e.g., about me, contact info).
    • Users: Represents a non-anonymous person interacting with Blogger, as an author, admin, or reader.
  • Operations: Developers can perform various operations on these resources, including list, get, insert (create), update, patch, and delete.
  • Authentication:
    • Public Data: Requests for public data (e.g., retrieving a public blog post) only require an API key.
    • Private Data: Operations involving private user data (e.g., creating a post, editing a comment) must be authorised using OAuth 2.0 tokens.
  • API Version: The latest recommended version is Blogger API v3. Support for the older v2.0 API ended on September 30, 2024, so applications must be updated to continue functioning. 

How to Access the API

Google recommends using their client libraries, which handle much of the authorisation and request/response processing for you. Libraries are available for a variety of programming languages: 

  • Go
  • Java (get started with the Java client library)
  • JavaScript
  • .NET (install the Google APIs NuGet package)
  • Node.js
  • PHP
  • Python (install the client library using pip install google-api-python-client)
  • Ruby 

Alternatively, you can interact with the API directly using RESTful HTTP requests. The Google APIs Explorer tool lets you test API calls in your browser. 

  • Pipi API Engine (api)  using CFML
  • BoxLang API using BX

For full documentation and developer guides, visit the official Blogger API documentation site on Google for Developers. 

Older Data API (v1/v2) Format: Atom XML 

The previous versions of the Blogger API relied heavily on the Atom Publishing Protocol (AtomPub) and Google Data API feeds for managing blog content. The API used standard HTTP methods (GET, POST, PUT, DELETE) to transport Atom-formatted XML payloads. 

Important: Support for the v2.0 Google Data API ended on September 30th, 2024, so applications must use the latest version to continue functioning. 

Exporting and Accessing Feeds via XML

Blogger still uses Atom XML for blog syndication and content backup: 

  • Public Blog Feed: Every Blogger blog has a public Atom feed, typically at an address like yourblogname.blogspot.com/feeds/posts/default or://yourblogname.blogspot.com.
  • Content Backup: Users can back up their blog's posts and comments as a single .xml file from the Blogger dashboard.

    • Sign in to Blogger.
    • Select your blog.
    • Go to Settings > Manage Blog.
    • Click Back up content and Download XML file. 
    • This downloaded file is in a specific Atom format for import and export. 

REST in the Blogger API

The supported Blogger operations map directly to REST HTTP verbs, as described in Blogger API operations.

The specific format for Blogger API URIs are:

  • https://www.googleapis.com/blogger/v3/users/userId
  • https://www.googleapis.com/blogger/v3/users/self
  • https://www.googleapis.com/blogger/v3/users/userId/blogs
  • https://www.googleapis.com/blogger/v3/users/self/blogs
  • https://www.googleapis.com/blogger/v3/blogs/blogId
  • https://www.googleapis.com/blogger/v3/blogs/byurl
  • https://www.googleapis.com/blogger/v3/blogs/blogId/posts
  • https://www.googleapis.com/blogger/v3/blogs/blogId/posts/bypath
  • https://www.googleapis.com/blogger/v3/blogs/blogId/posts/search
  • https://www.googleapis.com/blogger/v3/blogs/blogId/posts/postId
  • https://www.googleapis.com/blogger/v3/blogs/blogId/posts/postId/comments
  • https://www.googleapis.com/blogger/v3/blogs/blogId/posts/postId/comments/commentId
  • https://www.googleapis.com/blogger/v3/blogs/blogId/pages
  • https://www.googleapis.com/blogger/v3/blogs/blogId/pages/pageId

The full explanation of the URIs used and the results for each supported operation in the API is summarised in the Blogger API Reference document.

Examples

List the blogs that the authenticated user has access rights to:

  • GET https://www.googleapis.com/blogger/v3/users/self/blogs?key=YOUR-API-KEY

Get the posts on the code.blogger.com blog, which has blog ID 3213900:

  • GET https://www.googleapis.com/blogger/v3/blogs/3213900?key=YOUR-API-KEY

REST from JavaScript

You can invoke the Blogger API from JavaScript using the callback query parameter and a callback function. When the browser loads the script, the callback function is executed, and the response is passed to it. This approach allows you to write rich applications that display Blogger data without requiring server-side code.

The following example retrieves a post from the code.blogger.com blog, after you replace YOUR-API-KEY with your API key.

<html>
  <head>
    <title>Blogger API Example</title>
  </head>
  <body>
    <div id="content"></div>
    <script>
      function handleResponse(response) {
        document.getElementById("content").innerHTML += "<h1>" + response.title + "</h1>" + response.content;
      }
    </script>
    <script     src="https://www.googleapis.com/blogger/v3/blogs/3213900/posts/8398240586497962757?callback=handleResponse&key=YOUR-API-KEY"></script>
  </body>
</html>

Data format

JSON (JavaScript Object Notation) is a common, language-independent data format that provides a simple text representation of arbitrary data structures. For more information, see json.org.

Blogger API operations

You can invoke a number of different methods on collections and resources in the Blogger API, as described in the following table.

Operation Description REST HTTP mappings
list Lists all resources within a collection. GET on a collection URI.
get Gets a specific resource. GET on a resource URI.
getByUrl Gets a resource, looking it up by URL. GET with the URL passed in as a parameter.
getByPath Gets a resource by looking it up by its path. GET with the Path passed in as a parameter.
listByUser Lists resources owned by a User. GET on a user owned collection.
search Search for resources, based on a query parameter. GET on a Search URL, with the query passed in as a parameter.
insert Create a resource in a collection. POST on a collection URI.
delete Deletes a resource. DELETE on a resource URI.
patch Update a resource, using Patch semantics. PATCH on a resource URI.
update Update a resource. PUT on a resource URI.

The table below shows which methods are supported by each resource type. All list and get operations on private blogs require authentication.

Resource Type Supported Methods
list get getByUrl getByPath listByUser search insert delete patch update
Blogs N Y Y N Y N N N N N
Posts Y Y N Y N Y Y Y Y Y
Comments Y Y N N N N N N N N
Pages Y Y N N N N N N N N
Users N Y N N N N N N N N