Posted by u/Different_Case_6484
[R] China just released first SOTA multimodal model trained entirely on domestic chips
Zhipu AI and Huawei just dropped GLM-Image, and the technical details are interesting. First multimodal model trained completely on Chinese chips (Huawei Ascend 910) from data preprocessing to full scale training. They're using a hybrid architecture combining autoregressive + diffusion decoder. What stands out is the Chinese text rendering. It consistently ranks first among open source models for complex text generation, especially handling Chinese characters which most models struggle with. Native support for 1024 to 2048 resolution at any aspect ratio without additional training. API pricing is 0.1 yuan per image (roughly $0.014). The model handles both text to image and image to image generation in a single model. GitHub and Hugging Face repos are already up. This is significant because it proves you can train frontier models without relying on Nvidia hardware. The compute efficiency numbers they're claiming are 60% better than H200 for tokens per joule. Whether those benchmarks hold up in practice remains to be seen but the fact they pulled this off on domestic hardware is noteworthy. Edit: For anyone testing this, X-Design also handles multilingual text rendering well. Been comparing outputs and both handle complex layouts better than DALL-E 3.
More from r/MachineLearning
Nvidia: End-to-End Test-Time Training for Long Context aka Being Able To Update A Model's Weights In Real-Time As You Use It | "TTT changes the paradigm from retrieving info to learning it on the fly...the TTT model treats the context window as a dataset & trains itself on it in real-time." [R]
####TL;DR: The paper describes a mechanism that essentially turns the context window into a training dataset for a...
[P] I Gave Claude Code 9.5 Years of Health Data to Help Manage My Thyroid Disease
I have episodic Graves' disease, which has been difficult b/c its not chronic. Meds are up and down and often lag when...
[D] Why Mamba rewrote its core algorithm and Microsoft abandoned RetNet
Mamba-2 restructured its recurrence from parallel scans (10-20% Tensor Core utilization) to block-diagonal GEMMs...