Introducing talkie: a 13B vintage language model from 1930
A 13B model trained exclusively on pre-1931 text aims to explore AI's reasoning, creativity, and 're-discovery' abilities within knowledge boundaries, sparking new discussions on data copyright and model purity.
Simon Willison · Apr 28, 2026
Deep Neural Nets: 33 years ago and 33 years from now
Karpathy reproduces LeCun's 1989 handwritten zip code recognition paper in PyTorch, revealing the nature of progress in deep learning over 33 years.
karpathy.github.io · Apr 5, 2026
Gemma 4: Byte for byte, the most capable open models
Google DeepMind's Gemma 4 models innovate in parameter efficiency and support multi-modal inputs, marking a significant advancement in research on small effective models.
Simon Willison · Apr 3, 2026
microgpt
Andrej Karpathy's microgpt project demonstrates how to implement a simplified GPT model from scratch in just 200 lines of Python code, revealing a trend towards minimalism in AI development.
Andrej Karpathy · Feb 12, 2026
Adversarial Attacks on LLMs
This article explores adversarial attacks on large language models (LLMs), including types of attacks, threat models, and their impact on the safety of generated text, revealing significant challenges in AI safety.
Lilian Weng · Oct 25, 2023