Diffing AOSP Releases
On this page
Every time a new Android version lands I want the same narrow thing: which
repositories changed, and which of those changes are worth opening the source
for. AOSP is a few hundred git projects stitched together by a repo manifest,
so “what’s new in Android 17” is never one diff. It’s the union of a few hundred
per-project diffs, almost all of them routine, a handful of them the actual
story.
I got tired of eyeballing the manifest by hand, so for the AOSP Internals
book I wrote a small tool to answer the question
mechanically. It lives in the book repo as tools/manifest_snapshot.py, and its
output lands under manifest-snapshots/. This post is how it works, and how the
mechanical part hands off to the part that can’t be automated: reading the
source.
Why diff by manifest
A repo manifest pins every project to a remote and a revision. After
repo sync on a release branch, each project sits on one commit. So a release
is just a mapping from project path to commit SHA, and the difference between two
releases is four buckets:
- Added — paths in the new release that weren’t in the old one.
- Removed — paths that were in the old release and are gone.
- Moved — same path, different SHA.
- Unchanged — same path, same SHA.
That split is the whole tool. Once you know a project moved, the local git object
store already has both endpoints, so git log old..new gives you the exact
commits that landed there. No network, no guessing. The manifest is the index;
git holds the bodies.
snapshot → compare → history
Four subcommands, three steps.
snap — pin a release
python3 tools/manifest_snapshot.py snap --aosp-root $ANDROID_BUILD_TOP
repo manifest -r --pretty resolves every project to its current HEAD SHA, and
the tool writes that pinned XML plus a small metadata.json under
manifest-snapshots/<branch>/<date>/. One repo manifest call, no history walk,
so it’s quick. A snapshot is the project-to-SHA mapping for one release, frozen
so you can diff against it later.
The metadata records only portable facts:
{
"schema_version": 1,
"captured_at": "2026-06-19T11:20:56+00:00",
"captured_at_unix": 1781889656,
"default_revision": "android17-release",
"default_remote": "aosp",
"manifest_branch": "android17-release",
"repo_version": "v2.54",
"label": "android17-release",
"notes": ""
}
aosp_root, host, hostname, user, and cwd are rejected by a validator,
not just omitted, so a committed snapshot can’t leak where it was taken.
history — every commit, per repo
python3 tools/manifest_snapshot.py history --aosp-root $ANDROID_BUILD_TOP
This walks git log across every non-shallow project and writes one flat file,
manifest-snapshots/_history/<branch>_<date>.history.txt: a header, the skipped
list, then one section per repo with its SHA and full <sha> <subject> log. It’s
slow (minutes) and big (hundreds of MB on a full tree). The reason to bother is
that it captures each release’s commit list on its own, which the next step
needs.
compare — the diff
python3 tools/manifest_snapshot.py compare \
manifest-snapshots/<branchA>/<dateA> \
manifest-snapshots/<branchB>/<dateB> \
--aosp-root $ANDROID_BUILD_TOP
compare sorts projects into added/removed/moved/unchanged, then for each moved
project runs git log --no-merges --pretty=oneline old..new against the local
.repo/projects/<path>.git. compare-history does the same job from two
history files plus the newer snapshot, set-diffing the two SHA lists into “new
in B” and “dropped from A” per repo. Reach for compare-history when the two
sides were captured at slightly different sub-revisions: it keeps the project list
and the commit lists from the same source instead of mixing one release’s SHAs
with another’s log.
Either path produces one directory per comparison:
manifest-snapshots/_compare/<oldrev>-to-<newrev>/
report.md navigator: summary, moved-by-group, skipped, added/removed tables
changes.txt per-repo commit lists (NEW / DROPPED blocks)
added-removed.txt added/removed projects, with full inline history
Three artifacts, two of which you never read whole
The split across those three files is the one design decision I’d defend hardest, and it’s purely practical: one file is meant to be read top to bottom, and the other two never are.
report.md is the navigator. A few hundred KB, safe to read straight
through. It opens with a summary table, then the moved projects bucketed by their
manifest module group, then the skipped list and the added/removed tables. A
moved-projects row looks like this:
## Moved projects by module group
### Group: _ungrouped
| Project | Path | new | dropped | Compare |
|---|---|---|---|---|
| platform/packages/apps/UniversalMediaPlayer | `packages/apps/UniversalMediaPlayer` | 1 | 0 | <https://android.googlesource.com/platform/packages/apps/UniversalMediaPlayer/+log/520a29df..08d2f325> |
Path, commit count, and a Googlesource compare link per project. Start here. It tells you where to look without dropping a single commit body in your lap.
changes.txt is the body, and it’s too big to open. It’s a
kernel-changelog-style dump of every moved project’s commits, tens of MB. Grep the
one repo you care about:
D=manifest-snapshots/_compare/<dir>
grep -n -A 60 '^frameworks/base (' $D/changes.txt | head -80
The section that lands on:
frameworks/base (platform/frameworks/base)
old 45034f0663f960d9ee5fb0a101a4732b71f6e2f4 -> new 94b4c163b7dfe5ce3607f7bb8456f9573f7de57d
-----------------------------------------------------------------
NEW (15196):
e164d9a65f3e... Fix ActiveServicesTest compile failure by adding back import files
9dfed79281a8... Make ContextLogger do all of its work on the executor
b2b726fd471b... Fix privileged permission allowlist lookup in PackageManagerShellCommand
...
15,196 commits in one repo’s NEW block is the whole argument for grepping the
section you want instead of opening the file. Each header is <path> (<name>),
followed by NEW (n): (plus DROPPED (m): for compare-history).
added-removed.txt carries the full inline history of each added and removed
project, so you can read a brand-new repo from its first commit. The header for an
added project, here the SDV reference device that showed up in 17:
## ADDED (84)
================================================================
device/google/sdv (device/google/sdv)
sha f4bfa128d5d9fb21adca8430694cc16c90b6144a groups: swcar, pdk
-----------------------------------------------------------------
Same rule: grep the one you’re after.
A concrete example: android-16 → android-17
Running this for the book against the real tree produced
manifest-snapshots/_compare/android-16.0.0_r4-to-android17-release/. The summary
in report.md:
| Category | Count |
|---|---|
| Moved | 696 |
| Added | 84 |
| Removed | 11 |
| Unchanged | 206 |
| New commits (total) | 137470 |
| Dropped commits | 715 |
137,470 commits is not a number anyone reads. The navigator’s job is to make it
irrelevant. You scan the moved-by-group tables, spot the projects with outsized
commit counts (platform/art around 890, build/soong around 1040,
external/angle around 1335), and read the added table for genuinely new
capabilities. That’s where a new external dependency like external/acpica shows
up as a signal that some platform feature arrived, not as code you have to digest
line by line.
Handing off to source reading
The diff isn’t the answer. It’s the targeting system. It tells you, with no ambiguity, which projects and which commits changed, which is exactly what you need to read the right source instead of wandering the tree.
The workflow I settled on has two phases. First the tool produces the changeset, the three artifacts above, mechanical and reproducible. Then a human, or an agent team, reads source for the high-signal repos: open the actual files in the new release for each project the navigator flagged, and write up what the change is and why it matters, citing real paths.
I optimize for accuracy over coverage. A small writeup where every claim is checked against source beats a broad one that paraphrases commit subjects and quietly gets things wrong. The diff narrows “all of AOSP” down to “these 696 moved repos, ranked by commit volume and grouped by subsystem,” and you read source from there. Never the reverse.
One gap is easy to miss. A project-level diff catches new projects, but the
changes I most want are often new modules added inside a repo that merely moved.
A new system service in frameworks/base never appears as an added project. So
the source pass also greps the NEW commit subjects of the high-signal moved
repos for “add … service/module/manager/daemon” and checks each hit against
source. The diff is a cheap index into where to look; reading source is what makes
a claim trustworthy.
The AI tasks it feeds
I built this for the book because “update the book for Android 17” is a hopeless thing to hand an agent directly. There’s no boundary and no way to check the result, and an LLM asked what changed in AOSP will produce a confident, plausible, wrong answer. The changeset turns that one unbounded request into a pile of small, bounded, checkable tasks. That’s the only reason the AI side works at all.
The tasks the three artifacts drove on the Android 17 pass:
- Prioritize.
report.mdranks moved projects by commit volume, so the agent team spends effort where the release actually changed (platform/art,build/soong, the graphics stack) instead of re-reading chapters whose repos barely moved. - Find what’s new. The added table, plus a grep of the
NEWsubjects of high-signal moved repos for “add … service/module/manager/daemon”, feeds a gap-analysis pass that finds capabilities with no chapter yet. That’s how the pass turned up the Software Defined Vehicle platform, the NPU Manager module, the LFI in-process sandbox, and a Trusty TEE target for the new Android Desktop form factor. None of those surface if you only re-read the chapters you already have. - Rewrite, then disbelieve. For each flagged chapter, one agent rewrites the prose against the source the diff pointed at. A separate reviewer agent opens every cited path and checks the claim against this release, and another renders the diagrams to catch the ones that parse cleanly but draw the architecture wrong. The author wants to ship; the reviewer wants to disbelieve. Splitting them is what catches the confident mistake.
The thread through all of it: the diff is the leash. Every task an agent gets is scoped to “these repos changed, here are the commits,” never “all of AOSP,” so its context stays small and its output stays checkable against a specific file. The mechanical changeset is what keeps the AI honest, the same job the verify scripts and source citations do in the harness patterns I wrote about earlier. Without it you’re trusting an agent’s gut feeling about a 137,470-commit diff, which is the one thing you should never do.
The rules that keep it correct and safe
These are the invariants the tool depends on. Break one and the output is either wrong, or the run damages your checkout.
- Read-only against the tree, always. Every subcommand runs only
repo manifest -randgit log. It never fetches, gcs, commits, or touches refs;snapwrites only insidemanifest-snapshots/. The test suite asserts this as an invariant. - Don’t read
changes.txtoradded-removed.txtwhole. Tens of MB each. Grep the one repo section (^<path> ().report.mdis the only file you read end to end. - Don’t
repo gcbetween the two syncs. For full-depth projects, switching the checkout to the newer release leaves the older SHAs in the object store, socomparecan rungit log old..newoffline. A gc or prune in between deletes those objects, and the repo falls back to a Googlesource link instead of a real commit list. - Keep the snapshot and the history coherent. A snapshot and a history file can
sit on slightly different sub-revisions, with different SHAs. Don’t mix one
release’s SHAs with another’s commit lists — use
compare-historyso the project set and the commit lists come from the same place. - Slugify ref-name revisions. A default revision is often a tag like
refs/tags/android-16.0.0_r4. Flatten the slashes so it becomes one safe path component instead of nested directories. - Decode git output as bytes. Old and imported commit subjects carry raw
non-UTF-8 bytes. Decode with
errors="replace"or the run dies partway through. - Skip what you can’t fairly diff. Shallow projects (a manifest
clone-depth, or a live.repo/projects/<path>.git/shallowmarker) can’t be diffed across a branch switch, so they’re listed as skipped rather than run throughgit log.manifest-snapshots/ignore-globs.txtfilters obvious noise likeprebuilts/*andtoolchain/*. - Treat
external/*as a signal, not a changelog. A newexternal/*dependency means a platform capability arrived. Describe what pulled it in and why, not its internal history. - Verify every source claim independently. Whatever the diff suggests, the writeup earns trust only when a separate pass opens each cited path in the real tree and confirms the claim for that release. The diff points; source confirms.
What it doesn’t do
It’s deliberately small. It doesn’t parse commits semantically, group changes by
feature, or tell you what a change means. That’s the source-reading phase, and I
don’t think it should be automated. It also leans on the local object store: if
you’ve already gc’d, or only have a shallow clone of the older release, moved
projects degrade to Googlesource links, fine for browsing but useless for grep.
And it says nothing about prebuilts or anything else ignore-globs.txt filters
out, on purpose.
Closing
A manifest is a clean, machine-readable index of a release, and a release diff is
a set difference over project-to-SHA mappings plus a git log per moved project.
tools/manifest_snapshot.py does that reproducibly, and the three files under
manifest-snapshots/_compare/ turn “what changed in Android 17” from an
unanswerable 137,470-commit question into a ranked, grepped list of which source
to open next. Finding where is the easy half. Reading the source, with every
claim checked, is still the half that matters.