Qwen3 0.6B From Scratch

PythonPyTorchLLMTransformers
Qwen3 0.6B From Scratch

Implemented and trained Qwen3 0.6B from scratch on the fineEDU dataset.

PreviousDistilCLIPNextKrushiMitra