Technology February 15, 2026 · 2 min read

Managing a 1,500-Book Digital Archive: Lessons from the Field

How we built systems to manage, process, and publish from a personal archive of 1,537 books totalling 44+ million words.

The Scale of the Problem

Atharva Inamdar's book archive contains:

Managing this archive is not a creative problem. It is an engineering problem.

A Node.js script that ingests raw book files (Markdown, DOCX) and outputs structured JSON with:

Automated quality scoring based on:

Output: A quality report for each book with actionable editorial notes.

With 1,537 books spanning 19 years, duplication is inevitable — revised versions, renamed titles, partial rewrites. Our detection system uses:

From book to published book:

This pipeline runs in under 30 seconds for the entire 68-book published catalog.

The archive doesn't just produce books — it produces editorial content:

All generated programmatically from the book data. No manual content creation.

Treat content as data: Books are not just creative works. They are data that can be processed, analyzed, and transformed.
Build for scale: Systems designed for 68 books should work for 680. And 6,800.
Automate editorial: If content can be derived from existing data, it should be generated, not written.
Version everything: Every book, every script, every configuration file lives in Git.

— BogaDoga Engineering

BogaDoga Ltd

Publishing & Digital Innovation, London