Claude vs Me - managing vs hacking

2026-04-17

I was using Anki to learn to recognise bird sounds using this fantastic deck. However, there were a couple of problems with the deck, so I decided to create my own deck. I obtained sounds and images from the Dutch Vogelbescherming, wrote some scripts, and my girlfriend and me spent quite a few evenings practising bird sounds. She told some of her birding friends what we were doing and people got interested. However, there were two issues with this approach:

I do not have the copyright of the audio and images of the Vogelbescherming, and therefore cannot share them.
Non-techies wanted to do this: they had to install Anki, download this shared deck, import it, and use it. Anki is good, but unfortunately not that friendly in its UX.

So, I decided to create a website (https://vogelgeluidjes.nl) which tackled these issues. My experience with LLMs was limited to copy-pasting code to and fro chatinterfaces, which sucks, so I could use this to experiment with LLMs as well. I decided to build the website twice, once using Claude Code, and once on my own. I wanted to compare the following things (let's call them our research questions):

How long do both methods take?
How does it feel?
What is the end-result, both in quality of the product, and in terms of quality / maintainability of the codebase.

Answering the first question fairly is tricky, since if I implemented it on my own first, I would have a very in-depth knowledge on a good structure and difficulties, which would be very beneficial for the Claude implementation. If I implement it with Claude and look at the implementation, creating it for myself would be easier. I ended up with a trade-off: implement via Claude first, and don't look at the code details, focus on the high-level structure and on the results. This might influence the other questions, but I think it's fair and I'll explain this later in the section Ignore the code.

Design goals

Simple: easy to deploy and maintain in the long-term
Good UX
Data does not leave your device; no telemetry
No user management, secrets, or personal information
Good a11y; use proper HTML elements, avoid a complicated JS framework

Claude

I wrote down some requirements and a description of the website I had in mind and started a chat with Claude to create a SPEC.md file (Opus 4.6, Extended thinking, 2026-03-10, link to chat). With this spec, I started instructing Claude. This is the first real message I sent¹:

USER | 2026-03-10 12:20

I saved the spec in docs/SPEC.md. The spec refers to birds.json, you can find that in docs/birds.json. It needs some more information on my part: taxonomic groupings, habitat group tags, starter deck species, commonality. For now however, I want you to assume that these will be covered later. I want you to build Phase 1 only. Set up the project, seed script with some placeholder birds, SM-2 in Python with tests, and the review screen with <details>/<summary>. Don't stub Phase 2-4 code, just build what's needed now. Keep the code as simple as possible, take special care to avoid overcomplicating the IndexedDB storage.

I used the following structure when implementing:

Ask Claude to implement the next phase in the spec
When done, start a new chat with an agent to ask for comprehensive feedback and to check whether the original design goals (in SPEC.md) were being upheld.
When done, start a new chat to incorporate this feedback. When I am happy, continue to the next phase, and repeat steps 1-3.

This went quite well. Claude generated a reasonable project structure, the code looked alright. I refrained from fussing over the code details, and focused on the high-level structure. For the first few messages, I had to remind myself to not try to understand the code it created too well (in order to not taint my brain for Part Two: Me). Within a few minutes, however, this process became automatic: for various reasons, programming with Claude code teaches you to ignore the code.

Ignore the code

The first thing I noticed was that due to a few aspects of Claude code's design, I was automatically ignoring the code it created. Claude:

is quite slow. Answering a question and getting (a lot) of code takes 10-60 seconds² while Claude is "thinking" about it. This is longer than the last reponse time limit Jakob Nielsen describes, "10 seconds is about the limit for keeping the user's attention". This means that it feels very difficult to stay critical of the output because I found myself having to put in a bit of effort to return after each message.
outputs a shitload of code that looks decent at first glance. Occasionally the (amount of) code is reasonable, but occasionally it is not, and telling it do so something simpler instead never works. There are two solutions to overcomplicated output:
1. think about the problem, devise a simple solution, tell Claude
2. exit the chat, start a new one with a fresh context, reframe the prompt, pray
When doing the former, it often feels like it's more productive to start coding without Claude. The latter breaks flow, is quite unpredictable and feels like hyperparameter tuning (and I imagine, alchemy)

More often than not, I ended up uncritically accepting the huge amounts of code and continuing on. Part of me was also interested in doing this to find out what happens next (I mean, outsourcing programming is part of the sell of AI, right?)

Overcomplications

I know all about overcomplicating software, I started programming with OOP. I figured that Claude would be no different and I prepared myself for this point. My SPEC.md initial instruction focused on simplicity and the spec also highlighted this in various forms³.

And to credit Claude, it started simple! It did duplicate some core JS functionality in server-side python code that was never called, but other than that, the structure was reasonably simple.

The problem starts when you start adding features, or worse, modify existing features. This can work decently well with a lot of handholding (focus first on making the change easy, then make the easy change), but by default, when you replace a feature with a simpler feature, the code will not become simpler. Claude will implement the simple feature with the old, complicated code as foundation.

When modifying features, Claude will also happily deviate from prior instructions, either as stated in SPEC.md/CLAUDE.md or stated explicitly in the conversation before. An example is that I wanted part of my website to work with HTML's <details> rather than hiding/showing divs with JavaScript. I had this very specific requirement stated in my spec, yet I had to remind Claude twice about it after it had removed it in favour of a family of divs. Perhaps this problem will go away when models get larger and larger context windows, I do not know.

It is also difficult to stay disciplined about review in these moments. You describe a small feature change, and Claude updates 2 files, removing 12 and adding 25 lines and you're happy, until you realise that it got something wrong. You describe that and it then adds about 30 more lines, which continues for a few iterations until you've spelled out your requirements to the dot, and now end up with a complicated mess. At this point saying "simplify your changes" does not work for me, so what do you do now? Start a new chat with a modified prompt? Code it yourself? Ask it to create a feature.md file with the requirements and start a new chat? Or do you accept it, since it works… 😈

Speed!

The speed with which Claude can output useful code really is amazing, at least on a greenfield project like this. This is especially so with languages that I'm not fluent in (CSS!) and for parts that I do not care too much about. My app is about bird sounds, and I needed bird sounds. xeno-canto.org is a fantastic resource for free bird sounds, and Claude nearly one-shotted a browser extension that enabled me to easily match appropriate bird sounds and bird images.

In the end, Claude did in 7 hours what I did in 16 hours, and Claude had in that time built some features that I didn't end up implementing (because they weren't necessary; something I found out when I actually built and tested it).

A higher level of programming?

This is a possible answer to the problems described in the previous two sections: LLMs are just yet another, higher level of abstraction. It's fine to ignore the code, just like it's fine to ignore the bytecode that compilers produce. Overcomplications are a problem for humans reading the code and do not matter as long as Claude can keep fixing bugs and adding features in the future.

We programmed in assembly before, and now in python we do not care about registers, heaps or stacks, or even pointers! With LLMs we now just care about markdown, bullet points and the English language. However, there are two problems with this argument:

While LLMs are a higher level of abstraction than say python or JavaScript, the abstraction is leaky in a way that compilers aren't. LLMs are unpredictable and quite frequently wrong. A compiler can be wrong, but in that case it is actually wrong. It violates a contract, or not. You can create a bug report and it should be fixed. This concept of a definite contract does not map to LLMs.
Claude code is a closed-source and Anthropic actively blocks open-source alternatives such as OpenCode. I cannot run local AI models on my poor old laptop. I'll get back to this in Free software.

Free software

As someone who values free software partly because it allows me to hack on software on my own terms, this dis-empowers me. According to Can I Run AI locally?, my laptop can run no AI models (I have no dedicated GPU), and I don't have the money to purchase a heavy GPU + pay for the electricity bill. Using LLMs can be freedom-respecting, but it is hard. I don't like Matlab for this reason, and I don't like this part of current SOTA LLMs. Granted, this is a temporary issue and not completely unique to LLMs, but it is a currently relevant issue.

Subtle bugs

I created an Anki alternative, which is spaced repetition software. Wiki has a good article on it, and Nicky Case has a great game about it, so I won't explain it in detail here. However, the gist is that you learn to memorise flashcards as the software takes care of when you see what flashcards. You always get a mix of cards you have seen before (to help retain them) and new cards (to learn new things), and the order in which you get this matters.

I did not fully specify this, and Claude just did some arbitrary thing: first the existing cards, and then the new ones. This came up immediately in testing, since it is pretty bad. It is a somewhat easy fix, but highlights a

The code Claude writes is generally good and generally works. However there are subtle bugs that don't surface until you test your program thoroughly or actually understand the code (which is very difficult: Ignore the code).

Unpredictable vs nondeterministic

Me

With the initial version of Claude done, I did my best to forget all about it and start creating it on my own. I took more than twice as long to create what Claude did, and if I wasn't familiar with Flask, it would have taken me much longer. My goal was to compare these two versions in terms of output and process. Recall my three research questions:

How long do both methods take?
How does it feel?
What is the end-result, both in quality of the product, and in terms of quality / maintainability of the codebase.

Answering question 1 is easy: 7 hours for Claude, 16 hours for myself. Question 2 is more interesting, so let's get into the key difference: programming with Claude feels like being a manager / PO; programming on my own feels like hacking.

Hacker vs manager

Here, I see a hacker as someone who loves to dive deep into something and completely understand something. I for one would still love to create an interpreter for lisp for example, or my own kernel. I don't consider myself 100% of a hacker (I haven't created those things yet), but I have a desire to do so mainly because it forces me to learn stuff that I take for granted.

A manager on the other hand is someone who just wants to see results. They have a list of requirements and want to see it finished. It doesn't matter if the code is good or if someone understands it (as long as bad code / lack of understanding doesn't hinder future fulfilment of requirements). What matters is the creation of concrete value.

I am a hacker for some parts, and a manager for some. For CSS I'm a manager and I want something that looks good. For other parts, I'm a hacker. I'm building spaced repetition software, and my initial version with Claude Code was created without me knowing how spaced repetition actually works. This makes me feel very uneasy.

Creating my own version forced me to read articles about the Optimum interval or the Forgetting index. I have to understand these concepts, know how they relate to one another (and my goals) and implement them.

Note how in the paragraph above, I called this version "my own version". Intuitively, it feels like the Claude Code version is not "my" program. It seems like the US copyright people would agree with me on this, but legality aside, it only felt like my program once I started coding it.

Ownership

Footnotes

¹ You can view the full Claude code history at claude-history.txt ↩

² Mind, compared to writing all these characters yourself this is very fast, but compared to other computer operations, it is painfully slow. ↩

³ The full initial SPEC.md ↩

  # BirdSRS — Implementation Specification

A spaced-repetition webapp for memorizing bird sounds of the Netherlands.

---

## 1. Product Overview

### Core concept
Users listen to a bird sound recording, think about which species it is, reveal the answer, then self-rate their recall (Again / Hard / Good / Easy). The app schedules future reviews using SM-2.

### Key principles
- **Local-first**: SRS state lives in the browser (IndexedDB). The server stores a canonical copy for sync, but the app works without network during reviews.
- **Boring technology**: Python + Flask, SQLite, minimal JS. No build step. No SPA framework. `uv` for Python dependency management.
- **Simple auth**: Generated usernames (like Mullvad VPN), no passwords, no email.
- **Accessible and fast**: Semantic HTML (including native `<details>`/`<summary>` for the card reveal), keyboard-navigable, works on slow connections.

### Scale
~250 species, ~750 sound cards total. The full catalog JSON (~100KB) can be loaded at once in the browser — no pagination needed.

---

## 2. User Experience

### 2.1 First visit (home page)

A new user (no data in IndexedDB, no username) sees the **landing/onboarding view** instead of an empty review screen:

1. **Brief explanation**: what this app is ("Learn to recognize Dutch bird sounds with spaced repetition"), how it works (listen → guess → rate → the app schedules your next review).
2. **Starter deck info**: "We've preselected ~15 common birds to get you started: Koolmees, Merel, Roodborst, Vink..." with a note that these are songs of birds you'll likely hear in your garden.
3. **How to add more**: "Want to learn waders or raptors? Head to Explore Birds to browse all ~250 Dutch species and pick the sounds you want to study."
4. **Start button**: prominent "Start Learning" to begin with the starter deck.
5. **Account nudge** (subtle, not blocking): a small note at the bottom — something like: "Want to sync progress across devices? Generate a username — it takes one click, no email needed." This should not be a modal or banner. Think: a single line of muted text with an inline link.

Once the user has any review data, the home page becomes the review screen.

### 2.2 The review screen

This is the primary screen. It must be fast and distraction-free.

**Use `<details>`/`<summary>` as the card flip mechanism.** This provides native accessible expand/collapse without JS, works with keyboard (Enter/Space), and is announced correctly by screen readers. JS enhances it (keyboard shortcuts, auto-advance, audio control) but the basic reveal works without JS.

**Layout (single column, centered, mobile-first):**

```html
<details id="card">
  <summary>
    <!-- Card front -->
    <span class="progress">3 of 12 due</span>
    <button class="play-btn" aria-label="Play bird sound">▶ Play Sound</button>
    <span class="hint">Think about which bird this is, then open to check</span>
  </summary>

  <!-- Card back (revealed) -->
  <div class="answer">
    <button class="play-btn" aria-label="Replay bird sound">▶ Replay</button>

    <h2>Koolmees</h2>
    <p class="subtitle">Great Tit (<em>Parus major</em>)</p>
    <p class="sound-type">Sound type: Song</p>

    <div class="bird-image">
      <!-- Macaulay Library embed -->
      <iframe src="https://macaulaylibrary.org/asset/XXXXXX/embed"
              width="320" height="240"
              title="Photo of Koolmees (Great Tit)"
              loading="lazy"></iframe>
    </div>

    <div class="rating-buttons">
      <button data-rating="1">Again<span class="interval">&lt;1m</span></button>
      <button data-rating="2">Hard<span class="interval">&lt;10m</span></button>
      <button data-rating="3">Good<span class="interval">1d</span></button>
      <button data-rating="4">Easy<span class="interval">4d</span></button>
    </div>
  </div>
</details>
```

**Behavior:**
- Sound auto-plays when the card appears (with a visible play button to replay).
- Keyboard shortcuts: Space = play/replay, Enter = show answer (opens `<details>`), 1/2/3/4 = rate.
- The four rating buttons show the approximate next review interval beneath them.
- After rating, JS closes the `<details>`, replaces the card content, and opens to the front of the next card. No page reload.
- The answer includes one or more Macaulay Library image embeds (via `<iframe>`) to combine visual and auditory modalities for better memorization. The embed URLs are stored per species in the seed data.
- When the session is done: "You're done for today! Next review in X hours." with a link to the card browser.

### 2.3 The card browser ("Explore Birds")

This is where users choose which cards to study.

**Primary grouping: high-level taxonomic groups**, following the structure used in Collins Bird Guide and similar field guides. These are broad, familiar categories that birders already think in:

- Zwanen, Ganzen en Eenden (with subgroups: Zwanen, Ganzen, Eenden)
- Hoenders
- Duikers
- Futen
- Reigers
- Roofvogels
- Steltlopers
- Meeuwen en Sterns
- Duiven
- Uilen
- Spechten
- Zangvogels (with subgroups: Lijsters, Mezen, Vinken, Gorzen, Kwikstaarten, etc.)
- ...

These groups and subgroups are curated by the developer, inspired by the standard ordering in Dutch field guides. Each species belongs to exactly one group (and optionally one subgroup).

**Structure:**

```
┌──────────────────────────────────────┐
│  Search: [________________] 🔍       │
│                                      │
│  View by: [Taxonomic ▾] [Habitat]   │
│                                      │
│  ─── Mezen (Paridae) ──────────     │
│  [+ Add all common sounds]          │
│                                      │
│  ▸ Koolmees (Great Tit)             │
│    [✓ Song] [✓ Call] [☐ Alarm]      │
│                                      │
│  ▸ Pimpelmees (Blue Tit)            │
│    [✓ Song] [☐ Call]                │
│                                      │
│  ─── Lijsters (Turdidae) ──────     │
│  [+ Add all common sounds]          │
│                                      │
│  ▸ Merel (Blackbird)                │
│    [✓ Song] [☐ Call] [☐ Alarm]      │
│                                      │
└──────────────────────────────────────┘
```

**Key UX decisions:**

- **Primary view: taxonomic groups** as described above. Group headers show the Dutch group name with the Latin family in parentheses.
- **"View by" toggle**: switches between the taxonomic view (default) and a habitat/ecological view (Garden, Woodland, Waterbirds, etc.) which uses a tag-based many-to-many mapping. The ecological view is secondary — useful for "I want to learn garden birds" but not the primary navigation.
- **"Add all common sounds" per group**: one-click to activate the primary song + primary call for every species in a group. This is the key action for "the songbirds are returning, I want to learn those."
- **Search** filters across Dutch name, English name, Latin name, and family name. Typing "Pari" surfaces all Paridae. Typing "mees" surfaces all tits.
- **Each species expands** to show its available sound types as individual checkboxes (Song, Call, Alarm call, etc.). Each checkbox = one SRS card. Most species have 2-3 sound types; some (like Great Tit) have 5-6.
- **Status indicators** per card: new (never studied), learning (in progress), due (needs review), with small colored dots.

### 2.4 Navigation

Three views, accessible via a simple top nav or bottom tab bar (on mobile):

1. **Review** — the study screen (default/home; shows onboarding for new users)
2. **Explore** — the card browser
3. **Stats** — simple progress overview (cards learned, streak, upcoming reviews graph)

Plus a settings/account area (accessible from a menu icon) for:
- Sync: shows username, "Sync now" button, last sync time
- Account: generate username, log in with existing username
- About/help

### 2.5 Sync flow

- On every rating action, state is saved to IndexedDB immediately.
- A "Sync" button in the nav (or settings) pushes local state to the server and pulls any changes. Visual indicator shows sync status (synced / unsynced changes / error).
- Auto-sync can happen on page load and periodically (every 5 minutes) if the user is logged in, but never blocks the UI.
- **Conflict resolution**: last-write-wins per card. Since this is single-user (one username = one person), conflicts are rare (only if they use two devices simultaneously without syncing). This is acceptable.

---

## 3. Data Model

### 3.1 Bird data (read-only, shipped with the app)

```sql
-- The canonical bird species list
CREATE TABLE species (
    id              INTEGER PRIMARY KEY,
    dutch_name      TEXT NOT NULL,           -- "Koolmees"
    english_name    TEXT NOT NULL,           -- "Great Tit"
    latin_name      TEXT NOT NULL,           -- "Parus major"
    family_latin    TEXT NOT NULL,           -- "Paridae"
    family_dutch    TEXT NOT NULL,           -- "Mezen"
    sort_order      INTEGER NOT NULL,        -- taxonomic sort order (IOC)
    macaulay_asset_ids TEXT                  -- comma-separated Macaulay Library asset IDs for images
);

-- High-level taxonomic groups (Collins-style)
CREATE TABLE taxonomic_groups (
    id              INTEGER PRIMARY KEY,
    name_dutch      TEXT NOT NULL,           -- "Mezen"
    name_latin      TEXT,                    -- "Paridae" (optional, for display)
    parent_id       INTEGER REFERENCES taxonomic_groups(id),  -- for subgroups
    sort_order      INTEGER NOT NULL
);

CREATE TABLE species_taxonomic_group (
    species_id      INTEGER NOT NULL REFERENCES species(id),
    group_id        INTEGER NOT NULL REFERENCES taxonomic_groups(id),
    PRIMARY KEY (species_id, group_id)
);

-- Ecological/habitat groupings (secondary, many-to-many)
CREATE TABLE habitat_groups (
    id              INTEGER PRIMARY KEY,
    name            TEXT NOT NULL,           -- "Garden Birds"
    slug            TEXT NOT NULL UNIQUE,    -- "garden"
    sort_order      INTEGER NOT NULL
);

CREATE TABLE species_habitat (
    species_id      INTEGER NOT NULL REFERENCES species(id),
    group_id        INTEGER NOT NULL REFERENCES habitat_groups(id),
    PRIMARY KEY (species_id, group_id)
);

-- Each card = one species + one sound type
CREATE TABLE cards (
    id              INTEGER PRIMARY KEY,
    species_id      INTEGER NOT NULL REFERENCES species(id),
    sound_type      TEXT NOT NULL,           -- "song", "call", "alarm_call", etc.
    sound_label     TEXT,                    -- human-readable label, e.g. "Teacher-teacher call"
    xc_id           INTEGER NOT NULL,        -- xeno-canto recording ID
    is_primary      BOOLEAN NOT NULL DEFAULT 0,  -- primary song or call (for "add all common")
    is_starter      BOOLEAN NOT NULL DEFAULT 0,  -- part of the starter deck
    description     TEXT                     -- optional note about this recording
);

CREATE INDEX idx_cards_species ON cards(species_id);
```

Note: `species_taxonomic_group` is technically one-to-one (each species in exactly one group), but using a junction table keeps the schema consistent and allows for edge cases.

This data is bundled as a SQLite file shipped with the app, populated from a `birds.json` seed file on first deploy.

### 3.2 User data (server-side, synced)

```sql
-- Users identified only by generated username
CREATE TABLE users (
    id              INTEGER PRIMARY KEY,
    username        TEXT NOT NULL UNIQUE,    -- "forest-warbler-7291"
    created_at      TEXT NOT NULL DEFAULT (datetime('now'))
);

-- Which cards a user has activated (chosen to study)
CREATE TABLE user_cards (
    user_id         INTEGER NOT NULL REFERENCES users(id),
    card_id         INTEGER NOT NULL REFERENCES cards(id),
    active          BOOLEAN NOT NULL DEFAULT 1,
    updated_at      TEXT NOT NULL DEFAULT (datetime('now')),
    PRIMARY KEY (user_id, card_id)
);

-- SRS state per card per user
CREATE TABLE reviews (
    user_id         INTEGER NOT NULL REFERENCES users(id),
    card_id         INTEGER NOT NULL REFERENCES cards(id),
    ease_factor     REAL NOT NULL DEFAULT 2.5,
    interval_days   REAL NOT NULL DEFAULT 0,
    repetitions     INTEGER NOT NULL DEFAULT 0,
    due_at          TEXT NOT NULL,           -- ISO 8601 datetime
    last_reviewed   TEXT,                    -- ISO 8601 datetime
    updated_at      TEXT NOT NULL DEFAULT (datetime('now')),
    PRIMARY KEY (user_id, card_id)
);

CREATE INDEX idx_reviews_due ON reviews(user_id, due_at);

-- Review log for stats and potential algorithm improvements
CREATE TABLE review_log (
    id              INTEGER PRIMARY KEY,
    user_id         INTEGER NOT NULL REFERENCES users(id),
    card_id         INTEGER NOT NULL REFERENCES cards(id),
    rating          INTEGER NOT NULL,        -- 1=Again, 2=Hard, 3=Good, 4=Easy
    ease_factor     REAL NOT NULL,           -- ease factor AFTER this review
    interval_days   REAL NOT NULL,           -- interval AFTER this review
    reviewed_at     TEXT NOT NULL DEFAULT (datetime('now'))
);
```

### 3.3 Browser-side storage (IndexedDB)

The browser stores a mirror of the user's `user_cards`, `reviews`, and `review_log` tables. The schema is identical, plus:

```
sync_status: "synced" | "pending"  -- per record
last_sync: ISO 8601 datetime       -- global
```

All writes go to IndexedDB first. Sync pushes `pending` records to the server and pulls the latest state.

---

## 4. SRS Algorithm (SM-2, abstracted)

### 4.1 Interface

Define a clear interface so the algorithm can be swapped later (e.g., to FSRS):

```python
# srs/algorithm.py

from dataclasses import dataclass
from enum import IntEnum

class Rating(IntEnum):
    AGAIN = 1
    HARD = 2
    GOOD = 3
    EASY = 4

@dataclass
class CardState:
    ease_factor: float      # >= 1.3
    interval_days: float    # days until next review
    repetitions: int        # consecutive correct answers

@dataclass
class ReviewResult:
    new_state: CardState
    next_due_delta_days: float  # how many days from now until next review

class SRSAlgorithm:
    """Abstract base. Swap implementations without touching the rest of the app."""
    def review(self, state: CardState, rating: Rating) -> ReviewResult:
        raise NotImplementedError

    def preview_intervals(self, state: CardState) -> dict[Rating, float]:
        """Return the interval for each rating option (for display on buttons)."""
        raise NotImplementedError

class SM2Algorithm(SRSAlgorithm):
    def review(self, state: CardState, rating: Rating) -> ReviewResult:
        ...  # Standard SM-2 implementation

    def preview_intervals(self, state: CardState) -> dict[Rating, float]:
        ...
```

### 4.2 SM-2 logic (for reference)

- **Again (1)**: Reset repetitions to 0, interval to 1 minute (for re-learning within session). Decrease ease factor by 0.2 (minimum 1.3).
- **Hard (2)**: If first review, interval = 1 day. Otherwise, interval = previous interval × 1.2. Decrease ease factor by 0.15.
- **Good (3)**: If first, 1 day. If second, 6 days. Otherwise, interval × ease factor.
- **Easy (4)**: Like Good but multiply interval by an additional 1.3. Increase ease factor by 0.15.

### 4.3 JavaScript mirror

The same algorithm must be implemented in JS for client-side use. Both implementations must be tested against the same set of test vectors to guarantee they produce identical results.

---

## 5. Architecture

### 5.1 Overview

```
┌──────────────┐         ┌──────────────────────┐
│   Browser    │  sync   │   Flask Server        │
│              │◄───────►│                       │
│  HTML/CSS/JS │  JSON   │  /api/sync            │
│  IndexedDB   │         │  /api/register        │
│  SM-2 (JS)   │         │  /api/audio/<xc_id>   │
│  <details>   │         │                       │
│  htmx (nav)  │         │  SQLite (bird data    │
│              │         │   + user data)        │
└──────────────┘         └──────────────────────┘
```

### 5.2 Server (Python + Flask)

**Routes:**

| Route | Method | Description |
|---|---|---|
| `/` | GET | Home / review screen (HTML), or onboarding for new users |
| `/explore` | GET | Card browser (HTML) |
| `/stats` | GET | Stats page (HTML) |
| `/api/register` | POST | Generate username, return it |
| `/api/sync` | POST | Accept local changes, return server state |
| `/api/audio/<xc_id>` | GET | Serve pre-downloaded audio file |
| `/api/birds` | GET | Full bird/card catalog as JSON (for browser cache) |

**Key decisions:**
- Pages are server-rendered HTML. Use htmx for interactions that benefit from it (e.g., toggling cards in the browser, search filtering), but the review screen is pure client-side JS (no round-trip per card flip).
- Audio files: pre-downloaded during the seed step and served as static files from `static/audio/<xc_id>.mp3`. No runtime dependency on xeno-canto.
- Single SQLite database file for both bird data and user data. Simple to backup.

### 5.3 Client-side JS

Keep it small and readable. No framework. Vanilla JS organized into a few modules:

```
static/js/
  srs.js          -- SM-2 algorithm (mirror of Python version)
  storage.js      -- IndexedDB wrapper (get/set card state, queue changes)
  sync.js         -- push/pull sync with server
  review.js       -- review screen logic (play, reveal, rate, next)
  explore.js      -- card browser interactions (supplementing htmx)
  audio.js        -- audio playback helper (preloading, error handling)
```

Total JS should be well under 2000 lines. No build step, no bundler. Use ES modules (`<script type="module">`).

### 5.4 htmx usage

Use htmx for:
- Card browser: filtering by group, search, toggling card activation (PATCH requests that swap HTML fragments).
- Stats page: loading charts/data.
- "View by" toggle in the card browser.

Do NOT use htmx for:
- The review screen. All review logic is client-side JS operating on IndexedDB. Zero network requests during a review session (except audio and image loading).

### 5.5 CSS

A single `style.css` file. No framework. Use CSS custom properties for theming. Mobile-first. Target ~300-500 lines. A small CSS reset (e.g., Andy Bell's modern reset) as a base.

---

## 6. Audio Handling

### 6.1 Source
Xeno-canto recordings, referenced by ID. The developer maintains a curated list of high-quality recordings (one per card) in the seed data.

### 6.2 Storage
Pre-download all audio files during a build/seed step:

```bash
# seed/download_audio.py (pseudocode)
for card in cards:
    download(f"https://xeno-canto.org/{card.xc_id}/download")
    -> static/audio/{xc_id}.mp3
```

Serve as static files. No runtime dependency on xeno-canto.

### 6.3 Browser caching
Set long `Cache-Control` headers on audio files (they never change for a given xc_id). No service worker needed for v1.

### 6.4 Preloading
When a review session starts, preload the audio for the next 2-3 cards in the queue using `new Audio(url)`.

---

## 7. Sync Protocol

### 7.1 Registration

```
POST /api/register
Response: { "username": "forest-warbler-7291" }
```

Username format: `{adjective}-{bird}-{4 digits}`. Generated server-side from a curated word list. Stored in localStorage on the client.

To "log in" on another device, user simply enters their username. No password. Acceptable because:
- The only data at risk is SRS progress — low sensitivity.
- Usernames are hard to guess (adjective-bird-4digits = millions of combinations).
- Tradeoff is explicitly toward simplicity.

### 7.2 Sync endpoint

```
POST /api/sync
Headers: X-Username: forest-warbler-7291
Body: {
    "last_sync": "2025-01-15T10:00:00Z",
    "changes": {
        "user_cards": [
            { "card_id": 42, "active": true, "updated_at": "..." },
            ...
        ],
        "reviews": [
            { "card_id": 42, "ease_factor": 2.5, "interval_days": 4.0,
              "repetitions": 3, "due_at": "...", "updated_at": "..." },
            ...
        ],
        "review_log": [
            { "card_id": 42, "rating": 3, "ease_factor": 2.5,
              "interval_days": 4.0, "reviewed_at": "..." },
            ...
        ]
    }
}

Response: {
    "server_time": "2025-01-15T12:00:00Z",
    "changes": {
        "user_cards": [...],
        "reviews": [...]
    }
}
```

**Conflict resolution**: For `reviews` and `user_cards`, compare `updated_at` — latest wins. For `review_log`, append-only (no conflicts).

---

## 8. Accessibility

- **Card reveal uses `<details>`/`<summary>`**: native keyboard support (Enter/Space), announced by screen readers as expandable, works without JS.
- All interactive elements are real `<button>` and `<a>` elements (not divs).
- Audio player uses a visible `<button>` with `aria-label="Play bird sound"`. Not relying on autoplay alone.
- Rating buttons have aria-labels: `aria-label="Again — review in 1 minute"`.
- Card browser checkboxes are real `<input type="checkbox">` with associated `<label>`.
- Skip-to-content link on every page.
- Focus management: after rating a card, focus moves to the play button of the next card.
- Sufficient color contrast (WCAG AA minimum).
- Keyboard shortcuts documented in a help modal; they don't conflict with screen reader keys.
- `prefers-reduced-motion`: disable any transitions.
- `prefers-color-scheme`: support dark mode via CSS custom properties.
- Macaulay Library iframes include descriptive `title` attributes.

---

## 9. Project Structure

```
bird-srs/
├── pyproject.toml          # uv/PEP 621 project config
├── uv.lock                 # lockfile
├── app.py                  # Flask app, routes, API endpoints
├── config.py               # Configuration (DB path, audio path, etc.)
├── srs/
│   ├── __init__.py
│   ├── algorithm.py        # SM-2 implementation + abstract interface
│   └── models.py           # DB access functions (no ORM, plain SQL)
├── seed/
│   ├── birds.json          # Canonical bird + card data
│   ├── seed_db.py          # Create/populate SQLite from birds.json
│   └── download_audio.py   # Fetch audio from xeno-canto
├── static/
│   ├── css/
│   │   └── style.css
│   ├── js/
│   │   ├── srs.js
│   │   ├── storage.js
│   │   ├── sync.js
│   │   ├── review.js
│   │   ├── explore.js
│   │   └── audio.js
│   └── audio/              # Pre-downloaded .mp3 files (gitignored)
│       └── 12345.mp3
├── templates/
│   ├── base.html           # Shared layout, nav, head
│   ├── home.html           # Onboarding for new users
│   ├── review.html         # Review screen
│   ├── explore.html        # Card browser
│   ├── explore_partials/   # htmx fragments for card browser
│   │   ├── species_list.html
│   │   └── species_row.html
│   └── stats.html
├── tests/
│   ├── test_algorithm.py   # SM-2 test vectors (shared with JS tests)
│   ├── test_sync.py        # Sync conflict resolution tests
│   └── test_api.py         # API endpoint tests
├── Dockerfile
├── docker-compose.yml
└── README.md
```

---

## 10. Deployment

### Docker

```dockerfile
FROM python:3.12-slim
COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv
WORKDIR /app
COPY pyproject.toml uv.lock ./
RUN uv sync --frozen --no-dev
COPY . .
RUN uv run python seed/seed_db.py
EXPOSE 8000
CMD ["uv", "run", "gunicorn", "-w", "2", "-b", "0.0.0.0:8000", "app:app"]
```

```yaml
# docker-compose.yml
services:
  web:
    build: .
    ports:
      - "8000:8000"
    volumes:
      - ./data:/app/data                  # SQLite DB persists here
      - ./static/audio:/app/static/audio  # Audio files
    restart: unless-stopped
```

### Backup

```bash
# Cron job: daily SQLite backup
sqlite3 /app/data/bird_srs.db ".backup /backups/bird_srs_$(date +%F).db"
```

SQLite is a single file. Backup = copy the file (or use `.backup` for a safe hot copy).

### Reverse proxy

Caddy for HTTPS (automatic Let's Encrypt):

```
birdsrs.example.com {
    reverse_proxy localhost:8000
}
```

---

## 11. Dependencies

### Python (via uv)
- Flask
- gunicorn

That's it. No ORM, no migration tool, no task queue. SQLite is in the stdlib.

### JavaScript
- htmx (~14KB gzipped, vendored in static/)
- No other dependencies

### Seed/build time (dev dependency)
- requests (for downloading xeno-canto audio)

---

## 12. Implementation Order (suggested)

### Phase 1: Core review loop
1. Set up project structure with `uv init`, Flask, SQLite schema, seed script with ~15 starter birds.
2. Implement SM-2 in Python (with tests).
3. Build the review screen: server-rendered HTML with `<details>`/`<summary>` + client-side JS for audio playback, rating, and card progression.
4. Implement IndexedDB storage for SRS state.
5. Port SM-2 to JS (test against same vectors as Python version).
6. End-to-end: user can review starter deck entirely client-side.

### Phase 2: Card browser
7. Build the explore page: species list grouped by taxonomic groups (Collins-style).
8. Implement card activation (checkboxes → IndexedDB), including "add all common sounds" per group.
9. Add search/filter and habitat "view by" toggle with htmx.
10. Connect activated cards to the review queue.

### Phase 3: Sync & accounts
11. Registration endpoint (generate username).
12. Sync endpoint (push/pull with last-write-wins).
13. Client-side sync logic.
14. Login on another device.

### Phase 4: Polish
15. Onboarding home page for new users.
16. Stats page.
17. Dark mode.
18. Audio preloading.
19. Keyboard shortcuts.
20. Docker setup and deployment.
21. Macaulay Library image embeds in card answers.

---

## 13. Resolved & Remaining Notes

**Resolved from earlier discussion:**
- ~250 species, ~750 cards. Full catalog loaded at once (no pagination).
- Audio licensing handled by developer.
- Taxonomic groups curated by developer, inspired by Collins Bird Guide.
- Most species have 2-6 sound types (mode 2, median 2-3, mean ~3). Always show expand per species since nearly all have multiple sounds.
- No service worker for v1 — app needs internet for audio/images but reviews work offline.
- Single recording per card is fine for v1.

**Remaining for the developer:**
- Finalize the `birds.json` seed data (species, cards, xc_ids, Macaulay asset IDs, group assignments).
- Curate the taxonomic group hierarchy and habitat group tags.
- Decide on the exact starter deck species.
- Verify Macaulay Library embed format and terms of use.