This model (DeepSeek-OCR) aligns particularly well with what we know about written language and the biology of the human act of reading.
The Visual Word Form Area (VWFA) on the left side of the brain is where the visual representation of words is transformed to something more meaningful to the organism.
https://en.wikipedia.org/wiki/Visual_word_form_area
The DeepSeek-OCR encoding (rather than simple text encoding) appears analogous to what occurs in the VWFA.
This model may not only be more powerful than text-based LLMs but may open the curtain of ignorance that has stymied our understanding of how language works and ergo how we think, what intelligence is precisely, etc.
Kudos to the authors: Haoran Wei, Yaofeng Sun, and Yukun Li – you may have tripped over the Rosetta Stone of intelligence itself! Bravo!
Comments URL: https://news.ycombinator.com/item?id=45717198
Points: 2
# Comments: 2
Source: news.ycombinator.com
