Petrarca

A personal system for keeping book knowledge alive

758
Curriculum Nodes
6+
Months of Data
4,500
Atomic Claims
450
Historical Persons
Petrarca is a deeply instrumented single-user mobile app that tracks knowledge across books, podcasts, and voice recordings using bounded curricula (50–80 nodes per domain), LLM-scored voice free recall, and contextual resurfacing instead of isolated flashcards. I am looking for research collaborators in knowledge assessment, adaptive learning, and tools for thought. Below: the design principles, the workflows, what failed, and the open research questions. Methodological frame: autobiographical design (Neustaedter & Sengers, 2012) — genuine long-term use by the designer-researcher, with honest reflection on limitations.

About five years ago I developed an intense interest in areas where I feel like a latecomer — Greek theater, opera, ancient history, classical culture. The kind of subjects where others grew up immersed in the tradition, and where I’m constantly reading material that assumes prerequisites I don’t have. A book about Sicilian history will mention the Punic Wars as if of course you know what happened, and until recently, I didn’t.

This isn’t an abstract complaint. I read wonderful books — Norwich on Sicily, Holland on Persia, Goldsworthy on Rome — and they feel transformative in the moment. Three months later? I can state the “big idea” but the argument is fuzzy. The names and dates that made it vivid are gone. And this creates a quietly corrosive cycle: the gap between what you understood while reading and what you can reconstruct afterward makes you hesitate before the next book. Why invest twenty hours in something that will evaporate?

Petrarca is my attempt to break that cycle. Named after Francesco Petrarca, the pioneer of systematic reading methods, it’s a mobile app I’ve been building for myself across months of intense prototyping. It tracks what I know across everything I read — books, articles, podcasts — and keeps that knowledge alive through gentle resurfacing, not drilling. The core insight came from Michel Thomas, the language teacher who told his students: “Don’t try to remember. I’ll manage your memory for you.” That freedom from unproductive anxiety actually improves learning — while the system still provides the effortful retrieval practice (voice recall, review cards) that produces durable memory.

I should be clear about what this is not. This is not an app to learn history. I want to spend most of my time reading books, listening to podcasts, watching documentaries — the rich, narrative, authored experiences that make a subject come alive. What I want the app to do is help me solidify that knowledge and keep it. Digital micro-learning sessions bridge to deep-focus reading of physical texts; books provide initial encoding through narrative, the app provides reinforcement and connection. Physical reading is sacred — the system supports it, never replaces it.

* * *

Hooks, Not Facts

The thing I’ve discovered through experience — and that the research confirms — is that reading deeply about a few key points creates vivid anchors that help navigate everything around them. When I read Tom Holland’s account of the Persian Wars, the Battle of Salamis in 480 BC became vivid. And that vividness now anchors all neighboring knowledge. A few years ago, “England in 1200” meant nothing to me; now I have hooks.

This inverts the conventional learning-tech assumption. Petrarca does not optimize for recall. It optimizes for building and maintaining hooks — anchor points of understanding that make future reading productive. The definition of “done” is not “can recite facts” but “able to place new knowledge.” Having enough framework that new information has somewhere to land.

✦ The Temporal Hook Hierarchy

I’ve found four types of connection, ordered by mnemonic effectiveness (not by historical importance — a historian would rightly prioritize causal reasoning): (1) Anchoring to events I already know (“Archimedes died ~75 years after Alexander the Great”). (2) Same-moment connections (“While Archimedes defended Syracuse, Hannibal was marching through Italy”). (3) Causal chains (“Archimedes’ death during Rome’s siege was a consequence of Hannibal’s invasion”). (4) Cross-domain surprises — but only if I already know the other domain.

The system checks what I already know before generating cross-domain hooks. A connection to something unfamiliar isn’t a hook — it’s noise.

A small anecdote that crystallized this for me: learning that Hanna Winsnes, a minor Norwegian historical figure, was born in 1789 — the same year as the French Revolution. Neither fact is important on its own, but the connection made both unforgettable. Temporal hooks like these are not trivia. They are the mechanism by which isolated facts become a connected web. Each new anchor makes the next one easier to attach. The Matthew Effect for knowledge: the more you know, the more interesting new knowledge becomes.

Comprehension Before Memory

I spend quite a lot of time thinking about Andy Matuschak’s work on the “mnemonic medium” — embedding review prompts into prose. His Quantum Country experiment showed impressive results: 54 days average retention per question, under 95 minutes of cumulative review time. But his most important contribution was a pivot. In 2023 he wrote: “What seems like a problem of forgetting is sometimes a problem of reading comprehension — never having understood in the first place.”

This resonates deeply, but it also points to a tension. The default flashcard workflow — isolated Q&A pairs without context — struggles with arguments, causal chains, and historiographic debates. Experienced SRS users know this and have developed sophisticated workarounds: cloze deletions within full paragraphs, image occlusion, elaborative cards following Wozniak’s 20 Rules, and SuperMemo’s incremental reading system (which has been embedding cards within source context since 2000). Rawson and Dunlosky’s work on “successive relearning” shows that SRS combined with elaboration does work for complex material.

Petrarca’s contribution is not discovering something SRS practitioners are unaware of. It is automating and systematizing what sophisticated users do manually: maintaining context across sources, evolving questions over repeated encounters, and connecting review to ongoing reading rather than treating it as a separate activity. The system preserves expanding intervals and active retrieval — the parts of SRS that are rock-solid — while replacing isolated cards with contextual resurfacing.

There is also a cautionary tale here. Matuschak has been remarkably open about how building infrastructure consumed the time that should have gone to research. Petrarca takes that lesson seriously: experiments before infrastructure, always.

✦ The Factual Scaffold

Fuzzy-Trace Theory (Brainerd & Reyna) provides the memory architecture I build on. People encode two parallel traces: verbatim (exact words, numbers) and gist (essential meaning). Verbatim decays rapidly; gist persists far longer. The most valuable part of nonfiction reading — frameworks and arguments — is actually the most durable. What fades are the specific names, dates, and evidence that support the gist.

For a self-directed adult learner who has already read the narrative accounts, the bottleneck is often these factual specifics — the scaffold that makes understanding actionable. Reinforcing the scaffold through new contexts and connections helps maintain what the books taught. This is closer to E.D. Hirsch’s position (knowledge as the foundation of literacy) than to Matuschak’s — but the relationship between facts and comprehension is bidirectional, not unidirectional. Neither is sufficient alone.

* * *

How Petrarca Actually Works

The system is organized around curricula — bounded topic maps of 50–80 concepts each, roughly equivalent to what a good university lecturer would decide are the things you walk away understanding. I currently have maps for Ancient Greece (67 nodes), the Roman Republic and Empire (55 nodes), and the History of Sicily (70 nodes, my deepest area). These are generated by Claude Opus — and that choice matters; cheaper models produce meaningless titles like “Cultural Developments in the Hellenistic Period” instead of specific, memorable concepts.

I should note the limitation: an AI-generated topic map is not a curriculum in the sense that historians like Christine Counsell or Michael Fordham use the word. A true curriculum encodes pedagogical sequencing decisions, deliberate encounters with second-order concepts (causation, significance, evidence), and choices about emphasis that reflect historiographic judgment. Petrarca’s maps are closer to an expert’s table of contents than a designed course. That said, the bounded-scope model is quite important. I experimented with fractal world-history trees in an earlier project and they failed completely — 67,000 items across 10,000 nodes at 16 levels of depth, no narrative coherence, no sense of progress. A bounded map works because deliberate scoping creates achievable goals.

Knowledge Atlas

History of Sicily 70 NODES
10 anchored   18 engaged   14 mentioned
Ancient Greece 67 NODES
5 anchored   12 engaged   20 mentioned
Roman Republic & Empire 55 NODES
Voice recall: Frederick II → 49 facts, 7 nodes
Review: 8 cards completed, 6 correct
ML card: Al-Idrisi’s World Map (1154)
Knowledge Atlas — progress across curricula
(idealized mockup using real system data)

What makes the model interesting — and, as far as I can tell, novel in the knowledge tracing literature — is the multi-lens entity system. Archimedes exists once in the shared entity database, but the Sicily curriculum sees “Archimedes the defender of Syracuse” (siege weapons, Roman conquest), a History of Science curriculum would see “Archimedes the mathematician” (buoyancy, pi approximation), and a Greek Civilization curriculum would see “Archimedes the Hellenistic genius.” The same entity participates in different knowledge states across different curricula. When a node lights up in multiple curricula simultaneously, it becomes a nexus point — and those are the richest learning moments. Most knowledge tracing systems assume knowledge components are independent; this design explicitly models cross-domain connections through shared entities.

Currently the system tracks 758 curriculum nodes with 100% date coverage, 450 historical persons, 384 places, and 316 events. 25 cross-curriculum entities already have connections across domains, with 74 node links between them.

Knowledge levels go up: unknown → mentioned → engaged → anchored. The framing is always positive: “here’s what you already know” rather than “here’s what you’re missing.” There is a real tradeoff here: knowledge does decay, and a monotonic model diverges from reality over time. I’ve chosen to prioritize motivational framing over modeling accuracy — the FSRS-based scheduling underneath still models decay via stability and retrievability, even though the visible knowledge level never drops. Whether this tradeoff is worth it is an open question.

The Book Companion

The part of Petrarca I’m most excited about is the book companion workflow. No existing product connects physical book reading to a broader knowledge system. Readwise, Highlighted, Screvi, Basmo — they all stop at capture. The interesting question is what happens after.

The workflow starts when I add a book — from Kindle sync, an EPUB import, or just a title. The system maps the book’s content to curriculum nodes. When I finish chapters, the system generates review items that connect what I just read to what I already know across all my reading. And here is the key architectural insight that took quite a while to get right: there is one knowledge item per curriculum node, not one per book-chapter. When a second book covers the same topic, it adds a source rather than creating a duplicate.

I discovered this the hard way. The original design created one review item per (curriculum node × book chapter), which meant that “Which city founded Naxos?” appeared seven times in 42 review items. Meanwhile the Athenian Expedition — the most dramatic event in Syracusan history — had zero review items because no chapter strongly mapped to it. After the fix: ~180 knowledge items enriched by multiple sources, not ~600 items where 70% are duplicates.

Review Session

engaged HISTORY OF SICILY
The Siege of Syracuse
213–212 BC
✦ Temporal Hook
While Archimedes defended Syracuse with his war machines, Hannibal was devastating the Italian peninsula after Cannae — Rome was fighting on two fronts.
What role did Archimedes play in the defense of Syracuse against Rome, and how did the siege end?
Sources: Norwich, Sicily ch. 3 · Goldsworthy, The Fall of Carthage
Rate your recall
Again
Hard
Good
Easy
Also want to know
Archimedes’ inventions Marcellus’ campaigns Roman siege tactics
Review card with temporal hook & multi-source

Review questions evolve with repetition. The first time I see a node, it’s basic recall from one source. The second time, a different aspect from a different source. By the third review, it becomes analytical: “The Syracuse book emphasizes Archimedes’ engineering genius; Norwich emphasizes the political context — which framing do you find more illuminating?” These multi-source comparison questions also provide a modest entry point into multiperspectivity — though I’m aware that the system currently focuses on substantive content knowledge rather than the disciplinary second-order concepts (evidence evaluation, significance, change/continuity) that historians like Seixas and Wineburg rightly prioritize. That’s an area for development.

After grading a review card, “Also want to know” chips appear — entity-based suggestions for expanding sideways. Simple facts route to quick quiz cards, while complex questions trigger full microlearning research cards. This creates a curiosity-driven expansion path that feels like exploring rather than studying.

A practical note: factual questions (dates, events, persons) currently get a scheduling priority boost of +2.0 over significance/connection questions. This reflects my specific situation as a learner whose factual scaffold for ancient history is still developing — it is not a general pedagogical claim that facts should always precede concepts. In history education research, Lee and Ashby’s CHATA project demonstrated that substantive knowledge and second-order thinking develop in tandem, not sequentially.

Voice Elicitation

This is the feature I’m most interested in from a research perspective. The idea is simple: go for a walk, and talk about what you know. The system gives you a topic — “Frederick II Stupor Mundi” or “The Arab Conquest of Sicily” — and you speak freely for a minute or two about everything you remember. No prompts, no multiple choice, no structure imposed. Just: tell me what you know.

Voice Recall

HISTORY OF SICILY
Frederick II Stupor Mundi
Holy Roman Emperor who ruled from Palermo, patron of arts and sciences, founder of the University of Naples, led the Sixth Crusade…
Speak freely about what you remember
Recording… 0:47
Up next
The Sicilian School of Poetry
Norman Conquest of Sicily
The Sicilian Vespers (1282)
Voice recall — recording screen

Recall Results

78% anchored
Coverage of Frederick II Stupor Mundi
Ruled from Palermo, spoke six languages
Founded University of Naples 1224
Negotiated Jerusalem through diplomacy, not war
Patron of the Sicilian School of poetry
Excommunicated four times by different popes
De Arte Venandi cum Avibus — his scientific treatise on falconry
Connected Frederick to Al-Andalus troubadour tradition — goes beyond sources
“Was Frederick influenced by Islamic philosophy?”
“What happened to Palermo after Frederick died?”
Results — captured, missed, wonderings

The pipeline behind this is quite rich. The audio goes to Soniox for transcription, passes through a quality gate (rejecting anything under 15 words — accidental button presses), and gets deduplicated by audio fingerprint (I discovered that 35% of my early transcripts were duplicates from mobile retry failures). Then Claude analyzes the transcript against the curriculum node definition and all my book sources for that topic, classifying everything I said into: captured facts, structurally important omissions, interesting connections that go beyond the sources, and — most valuably — wonderings.

Wonderings are the gold. “I wonder if Frederick was influenced by Islamic philosophy?” or “Was there a connection between the troubadours and the Sicilian School?” These get automatically routed to the microlearning card pipeline, where Claude researches them with web search and generates structured educational content. My curiosity becomes a curriculum expansion mechanism.

The results from reprocessing three Sicily voice recordings: 92 facts extracted across 25 curriculum nodes, 25+ quiz questions generated, 15 microlearning wonderings triggered. One recording about Frederick II alone produced 49 facts mapping to 7 nodes. I should note that these are throughput metrics — I have not yet validated the LLM’s scoring against independent human judgment. The most impactful next step would be having a history PhD student independently score a sample of recordings to establish inter-rater reliability. If kappa exceeds 0.7, there is a paper here.

What I find most interesting from a research perspective is that this is free recall — the most demanding form of retrieval practice. Unlike cued recall (prompted questions) or recognition (multiple choice), free recall requires the learner to generate their own organizational structure. Research by Karpicke shows that free recall promotes both relational and item-specific processing, while cued recall promotes only item-specific processing.

Some important caveats. The “production effect” from speaking aloud was demonstrated primarily for word lists (MacLeod et al., 2010); the effect size for connected prose is smaller and sometimes null. And free recall is susceptible to retrieval-induced forgetting (Anderson & Bjork, 1994): recalling some facts about Frederick II can actually suppress competing facts that weren’t recalled. Over many sessions, this could systematically strengthen a subset of knowledge while weakening the rest — a confound the system needs to track. More fundamentally, what you recall freely in a single session reflects your current state (recent priming, mood, context) more than your durable knowledge. The gap between “what I said in one recording” and “what I actually know” is substantial.

Nobody has validated this for complex, multi-source historical knowledge. Jeremy Manning at Dartmouth has a Naturalistic Free Recall dataset (229 participants, spoken narratives, automated scoring), but his subjects recall short stories, not months of accumulated reading. Andrew Lan at UMass built GENCAT for LLM-based open-ended knowledge assessment — the closest existing system — but in classroom settings. I’m quite interested in what happens when you combine genuinely rich personal knowledge, unconstrained voice production, and LLM-based analysis over months of longitudinal data.

Microlearning Cards

Microlearning

Al-Idrisi’s World Map for Roger II (1154)
In 1154, the Arab geographer Muhammad al-Idrisi completed the Tabula Rogeriana for King Roger II of Sicily — the most accurate world map of the medieval period. Working from Roger’s court in Palermo, al-Idrisi spent fifteen years compiling reports from travelers and sailors across the Mediterranean.
The original Nuzhat al-Mushtaq manuscript survives in multiple copies. The silver planisphere commissioned by Roger was melted during a palace revolt in 1161.
The Bodleian Library, Oxford, holds the earliest surviving copy. A reconstruction of the silver disc is displayed at the Palazzo dei Normanni in Palermo.
Follow-up rabbit holes
Islamic cartographic tradition Norman–Arab cultural synthesis What happened to Palermo after Frederick?
Microlearning card with sources & follow-ups

When a wondering from voice elicitation, a curiosity chip after review, or a follow-up question from another microlearning card triggers research, the system generates a structured card. These are not Wikipedia summaries. Each card is required to include specific primary sources (who wrote about this?), material evidence (what survives physically? where can you visit?), and cultural artifacts (art, opera, literature). The follow-up queries go sideways — geography as explanation, counter-narratives, structural causes, transmission history — rather than deeper into what the card already covered.

The “Still Visible” section turned out to be surprisingly compelling — knowing that you can actually go to Oxford and see al-Idrisi’s map, or visit the Palazzo dei Normanni where it was made, creates a different kind of engagement than abstract historical narrative.

Microlearning cards are interleaved into the review stream, but never front-loaded. High-priority cards (from voice wonderings or explicit requests) appear at a ratio of 1 per 3 spaced-repetition cards. Lower-priority ones (follow-ups, entity research) at 1 per 7. Each card generates its own quiz questions, which are independently scheduled via FSRS with expanding intervals. So a wondering expressed during a walk on Tuesday becomes a researched card by Wednesday, its quiz questions enter the review stream by Thursday, and by the following week the knowledge is being actively maintained.

What Didn’t Work

Building a system like Petrarca over months of iteration means accumulating quite a lot of failed approaches alongside the successful ones. I think this section may be the most useful part of this page.

The Fractal Knowledge Graph (Otak)

My earlier project attempted a hierarchical knowledge tree that would recursively split any topic into subtopics. It reached 67,000 items across 10,000 nodes at 16 levels of depth. Auto-splitting at 15 children produced near-synonym branches. The fundamental problem: one tree can’t serve classification, browsing, and navigation simultaneously. Individual claims were too atomic for hierarchy. The whole approach lacked narrative coherence. Bounded curricula replaced it entirely.

Embeddings Beat LLM Judges for Similarity

For comparing claims across articles, I tried multiple approaches. The LLM judge scored 78% accuracy — worse than embedding-only. Topic Jaccard overlap managed 50%. A two-stage embed-then-LLM pipeline added no improvement over embeddings alone. Local embeddings (Nomic-embed-text, 768 dimensions, free) with calibrated cosine thresholds won: AUROC 0.930, 94% accuracy. The LLM judge adds value only in the narrow 0.68–0.78 ambiguous zone (~5% of claim pairs).

Claims vs. Insights: The Ontological Shift

The first version of article processing extracted “claims” — atomic factual assertions. 31% noise, 17% insight rate. The problem was the wrong granularity: “Claude Code can fix lint errors” is technically a claim but carries no intellectual weight. This was not just a quality issue; it was an ontological shift from factual propositions to interpretive takeaways. Switching to insights with a better model brought noise to 0% and insight rate to 91%. The organizing unit should be meaningful, not atomic.

The Complexity Bankruptcy (Session 7)

By session 7, the app had accumulated 15+ features that didn’t work end-to-end. The server pipeline was broken, concepts were wiped, no new content was ingesting. I had to “declare bankruptcy on complexity” and start from a clean foundation. The lesson: experiments before infrastructure, always verify end-to-end.

LLM-Generated Review Questions

Despite having rich structured key_facts data (dates, events, persons per node), every LLM-generated review question was vague: “What characterized the Age of Tyrants?” The fix: generate factual questions deterministically from the structured data (“In what year did Gelon become tyrant of Syracuse?”). Only use the LLM for rich contextual answers and analytical questions. This runs counter to the current trend of LLM-everything, but it is the kind of principle I wish I’d learned earlier: never delegate to an LLM what can be computed from structure.

Alongside the book companion, I’ve been experimenting with a read-later feed that models my knowledge against everything I ingest — articles from email, browser bookmarklets, and Twitter bookmark integration. The system extracts insights from each article, embeds them, and compares against my knowledge base. One design parameter I’m testing: a “curiosity zone” scoring where articles with roughly 70% novel content and 30% familiar context rank highest (the zone of proximal development made concrete). Whether this actually predicts engagement, as opposed to merely producing a different ranking, is an open question awaiting validation with reading-time data.

* * *

Related Work

Petrarca doesn’t exist in isolation. It’s part of a broader set of experiments I’ve been running on mapping knowledge in structured ways, all sharing a common architecture: extract atomic units from unstructured text, embed them, compute novelty and relationships, and present structured navigation.

Alif & Networked Thought

A single-user Arabic learning app I built after Duolingo frustrated me. Sentence-first learning, FSRS scheduling, root-aware stability boosts. Reached ~1,300 words in 8 weeks (~3× the pace of the U.S. Defense Language Institute). The key design principle that carries over: “the best way to find something universal might be to make it perfectly specific first.”

The Hirsch Argument Atlas

A comprehensive mapping of E.D. Hirsch Jr.’s complete intellectual output across 10 books spanning 47 years. 10,312 claims extracted, 656 cross-book arguments tracked, 424 external scholarly findings providing independent context. Hirsch’s thesis (knowledge is the foundation of literacy) directly informs Petrarca’s design — though the system aims for connected understanding rather than recognition-level familiarity.

MDG Programanalyse

An analysis of ~128 Norwegian Green Party local programs: 7,600 proposals extracted, clustered semantically, scored for uniqueness, linked to the national program. The same extract-embed-compare architecture as Petrarca, applied to political text. PCA-whitened embeddings and the lesson that AI extraction beats structural parsing both carried over directly. Technical details.

For Researchers

Petrarca offers something unusual for research: a deeply instrumented single-user system with months of longitudinal data across multiple knowledge domains, combining reading, voice assessment, spaced review, and curriculum-based knowledge tracking. I hold a PhD in learning science and am familiar with the methodological constraints of N=1 research. This is an autobiographical design study (Neustaedter & Sengers, 2012; Lucero, 2018) — genuine long-term use by the designer-researcher. The single-user limitation is real: I have deep expertise in learning science and I built the system, which makes me radically unrepresentative of most learners. But the longitudinal depth and the instrumentation create opportunities that multi-user studies typically cannot provide.

Here are the three research directions I find most promising:

1. Voice-Based Knowledge Assessment Validity

No validation exists for unconstrained voice dumps as a knowledge assessment modality for complex, multi-source topics. The immediate next step: establish inter-rater reliability between Claude’s scoring and human expert judgment on 10–15 recordings. Then: do coverage scores predict performance on independent assessments? How does recall structure (causal connections, temporal organization) change over time? Manning’s Naturalistic Free Recall work and Lan’s GENCAT system are the closest reference points.

2. Knowledge Structure Analysis via Free Recall

Goldsmith’s finding that student knowledge network similarity to the instructor’s network correlates r=0.74 with exam performance is the most important single finding for Petrarca’s growth measurement. The curriculum graph is the expert network. Zemla’s SNAFU toolkit could model recall as a censored random walk on the knowledge network. Analyzing not just what was recalled but the structural quality — causal language, temporal organization, source attribution — could connect to the SOLO taxonomy used in history education.

3. Type-Differentiated Knowledge Decay

Fuzzy-Trace Theory predicts different decay rates for verbatim (dates, events) versus gist (arguments, significance) knowledge. Petrarca’s key_facts are already typed (date/event/person/connection/significance). Duolingo’s Half-Life Regression model (Settles & Meeder, 2016) could learn different decay rates per type automatically. This has direct implications for scheduling: should dates be reviewed more frequently than causal arguments?

Additional directions I’m interested in: natural spacing through related reading (does incidental re-encounter produce measurable retention?), cross-source knowledge integration longitudinally, and temporal hooks as a retention mechanism (29/30 hooks self-rated as useful, but this needs independent validation). If any of these interest you, I would genuinely love to talk.

Limitations

This is a single-user system built and used by its designer — an expert in learning science with deep familiarity with SRS. The findings are preliminary observations, not validated results. The phone mockups on this page are idealized HTML renderings using real system data, not screenshots of the live app. The system currently focuses on substantive historical content knowledge; disciplinary historical thinking (source analysis, evidence evaluation, multiperspectivity) is an acknowledged gap. The knowledge model tracks familiarity and retention but does not model depth of understanding. The LLM-based voice scoring has not been validated against human expert judgment. These are the starting points for research collaboration, not the conclusions.

* * *

A Few Technical Notes

For those interested in the implementation: the frontend is Expo SDK 54 (React Native) with a 2-tab layout (Feed and Library) plus a drawer. The backend runs on a Hetzner VM with nginx, a Python research server, and a 4-hour cron pipeline. SQLite is the canonical data store. All LLM calls go through Claude (Opus for curriculum generation, Sonnet/Haiku for routine tasks) and Gemini (for some extraction pipelines). Embeddings are local via Nomic-embed-text-v1.5 (768 dimensions, calibrated to AUROC 0.930). FSRS-inspired scheduling with stability multipliers (knew: 2.5×, partly: 1.5×, missed: reset to 1.0; engagement-based initial stability from 9 days for skimmed content to 120 days for annotated). Voice transcription uses Soniox.

Key References

Anderson, M. C. & Bjork, R. A. (1994). Mechanisms of inhibition in long-term memory. In Inhibitory Processes in Attention, Memory, and Language.

Bjork, R. A. & Bjork, E. L. (1992). A new theory of disuse and an old theory of stimulus fluctuation. In From Learning Processes to Cognitive Processes.

Brainerd, C. J. & Reyna, V. F. (2005). The Science of False Memory. Oxford University Press. (Fuzzy-Trace Theory.)

Carpenter, S. K. (2009). Cue strength as a moderator of the testing effect. J. Experimental Psychology: Learning, Memory, and Cognition, 35(6).

Goldsmith, T. E. et al. (1991). Assessing structural knowledge. J. Educational Psychology, 83(1). (r=0.74 network–exam correlation.)

Karpicke, J. D. & Smith, M. A. (2012). Separate mnemonic effects of retrieval practice and elaborative encoding. J. Memory and Language, 67(1).

Kornell, N. & Bjork, R. A. (2008). Learning concepts and categories: Is spacing the “enemy of induction”? Psychological Science, 19(6).

Lucero, A. (2018). Living without a mobile phone: An autoethnography. DIS ’18. (First-person research methods.)

MacLeod, C. M. et al. (2010). The production effect: Delineation of a phenomenon. J. Experimental Psychology: Learning, Memory, and Cognition, 36(3).

Neustaedter, C. & Sengers, P. (2012). Autobiographical design in HCI research. Designing Interactive Systems (DIS ’12).

Rawson, K. A. & Dunlosky, J. (2011). Optimizing schedules of retrieval practice for durable and efficient learning. J. Experimental Psychology: General, 140(3).

Settles, B. & Meeder, B. (2016). A trainable spaced repetition model for language learning. ACL ’16. (Duolingo HLR.)

Wozniak, P. A. & Gorzelanczyk, E. J. (1994). Optimization of repetition spacing in the practice of learning. Acta Neurobiologiae Experimentalis, 54. (Two-component memory model.)

Zemla, J. C. & Austerweil, J. L. (2018). Estimating semantic networks of groups and individuals from fluency data. Computational Brain & Behavior, 1. (SNAFU toolkit.)