Building HiSLM: What Happens When You Try to Train a Language Model and Learn Everything the Hard Way
The first run crashed in 11 minutes. The second produced a confident but wrong model. The third was okay. Research is mostly the first two.
Continue reading →