Running TinyLlama 1.1B locally on a PowerBook G4 from 2002. Mac OS 9, no internet, installed from a CD.

Tools 294 points 30 comments 2 months ago

Hey everyone! I've been working on this for months and today's the day. MacinAI Local is a complete local AI inference platform that runs natively on classic Macintosh hardware, no internet required. **What makes this different from previous retro AI projects:** Every "AI on old hardware" project I've seen (llama98.c on Windows 98, llama2.c64 on Commodore 64, llama2 on DOS) ports Karpathy's llama2.c with a single tiny 260K-parameter model. MacinAI Local is a ground-up platform: * **Custom C89 inference engine:** not a port of llama.cpp or llama2.c. Written from scratch targeting Mac Toolbox APIs and classic Mac OS memory management. * **Model-agnostic:** runs GPT-2 (124M), TinyLlama, Qwen (0.5B), SmolLM, and any HuggingFace/LLaMA-architecture model via a Python export script. Not locked to one toy model. * **100M parameter custom transformer:** trained on 1.1GB of Macintosh-specific text (Inside Macintosh, MacWorld, Usenet archives, programming references). * **AltiVec SIMD optimization:** 7.3x speedup on PowerPC G4. Went from 2.4 sec/token (scalar) down to 0.33 sec/token with Q8 quantization and 4-wide unrolled vector math with cache prefetch. * **Agentic Mac control:** the model generates AppleScript to launch apps, manage files, open control panels, and automate system tasks. It asks for confirmation before executing anything. * **Disk paging:** layers that don't fit in RAM get paged from disk, so even machines with limited memory can run inference. TinyLlama 1.1B runs on a machine with 1GB RAM by streaming layers from the hard drive. * **Speech Manager integration:** the Mac speaks every response aloud using PlainTalk voices. * **BPE tokenizer:** 8,205 tokens including special command tokens for system actions. **The demo hardware:** PowerBook G4 Titanium (2002), 1GHz G4, 1GB RAM, running Mac OS 9.2.2. **Real hardware performance (PowerBook G4 1GHz, Mac OS 9.2, all Q8):** |Model|Params|Q8 Size|Tokens/sec|Per token|Notes| |:-|:-|:-|:-|:-|:-| |MacinAI Tool v7|94M|107 MB|2.66 tok/s|0.38s|Custom tool model, AppleScript| |GPT-2|124M|141 MB|1.45 tok/s|0.69s|Text completion| |SmolLM 360M|360M|394 MB|0.85 tok/s|1.18s|Chat model| |Qwen 2.5 0.5B|494M|532 MB|0.63 tok/s|1.59s|Best quality| |TinyLlama 1.1B|1.1B|1.18 GB|0.10 tok/s|9.93s|Disk paging (24.5 min for 113 tok)| **Technical specs:** | | Details | |---|---| | Language | C89 (CodeWarrior Pro 5) | | Target OS | System 7.5.3 through Mac OS 9.2.2 | | Target CPUs | 68000, 68030, 68040, PowerPC G3, G4 | | Quantization | Float32, Q8_0 (int8 per-group) | | Architectures | LLaMA-family (RMSNorm/SwiGLU/RoPE) + GPT-2 family (LayerNorm/GeLU/learned pos) | | Arena allocator | Single contiguous block, 88% of physical RAM, no fragmentation | | AltiVec speedup | 7.3x over scalar baseline | **What's next:** Getting the 68040 build running on a 1993 LC 575 / Color Classic Mystic. The architecture already supports it, just need the hardware in hand. Demo: https://youtu.be/W0kV\_CCzTAM Technical write-up: https://oldapplestuff.com/blog/MacinAI-Local/ Happy to answer any technical questions. I've got docs on the AltiVec optimization journey (finding a CodeWarrior compiler bug along the way), the training pipeline, and the model export process. Thanks for the read!

More from r/LocalLLaMA