How many Ps are in "Google"? According to Google's AI, there are two. Interestingly, it also claims there is "exactly 1 'r' in the word 'poop'," yet it misspelled "journalism" as "j-o-u-r-n-a-d-i-s-m." While the AI correctly identified one P in the last name of the U.S. president, it spelled it as "t-r-p-u-m."
Predicting the challenges of Google's AI-driven Search transformation was not difficult. This is not the first time Google has integrated AI Overviews into Search, which previously cited satirical content from sources like The Onion and Reddit, suggesting absurd advice such as eating rocks.
As Google intensifies its focus on generative AI within its flagship product, it's no surprise to witness some missteps along the way.
Google acknowledged the issue, stating, "Counting within words has been a known challenge for LLMs, and we're working to fix this particular issue." Such spelling errors may seem trivial, yet they highlight a significant limitation of large language models (LLMs), which are not inherently designed to grasp spelling. It's a common joke that one should ask a new AI model how many 'r's are in "strawberry," as these models often perform at the level of a young child in spelling.
The amusing nature of these errors extends beyond simple misspellings. Recently, Google rectified a glitch where searching for "disregard" yielded an unintended response resembling a dictionary definition: "Understood. Let me know whenever you have a new prompt or question!" While these spelling blunders are entertaining, they remain challenging to resolve.
Researchers have previously explained that AI doesn't interpret sentences as coherent units of language composed of words and letters. LLMs rely on transformer models that decompose text into tokens, which can represent full words, syllables, or letters depending on the model. Rather than "reading" in the human sense, the AI translates text into numerical representations, contextualizing them to formulate logical responses.
According to Matthew Guzdial, an AI researcher at the University of Alberta, "LLMs are based on this transformer architecture, which notably is not actually reading text." He elaborated that when a prompt is input, it translates into an encoding, and the model does not recognize individual letters.
The inherent limitations of the token-based architecture that supports LLMs like Google's AI overview raise concerns among researchers, who are not optimistic about overcoming the spelling challenges.
As Sheridan Feucht, a PhD student at Northeastern University, noted, defining what constitutes a "word" for a language model is complex. Even with a consensus on an ideal token vocabulary, models would still likely benefit from "chunking" information further. The quest for a perfect tokenizer may remain elusive due to this ambiguity.
While the spelling inadequacies of LLMs are not an urgent concern for researchers, they serve as a reminder that AI, despite its remarkable capabilities, is not infallible. This reinforces the importance of verifying AI outputs for accuracy.