I got in a discussion the other day with my friend Eli, who is a frequent T9 texter, about words that have the same T9 codes. He was wondering what the longest two words are that have the same T9 code.
For those that don’t know, T9 is a text input method for phones that just have a 9 digit keypad. You can read more about it here on wikipedia. But in our discussion, I realized I could easily use the word database I use for my Scrabble iPhone web app to answer this question. I told him I’d do this; I don’t think he believed me. He was wrong.
First, I added a new integer field to the database table of words. This would hold the T9 value for each word. Then I wrote a script to convert all the words in the DB to their T9 counterparts, e.g. IRREFLEXIVE to 47733539483. Then I searched for words that have the same T9 codes.
Unfortunately a lot of words in the dictionary are very similar - for example “photosensitizer” and “photosensitizes.” Those are both 15 letter words, but they are basically the same word. So I realized what I wanted is long words that are significantly different from each other. I wrote a script that found all of the longest duplicates, and then compared them character by character for similarities. I then eliminated all the words that only have one character that’s different. Here’s what I found.
There are only two 15 character pairs of words with the same T9 code and more than one letter different. They are Repeatabilities / Resectabilities (in T9, 737328224548437) and Defectivenesses / Effectivenesses (333328483637737). There is a 14 character pair that have three characters different, Gamesomenesses and Handsomenesses (in T9, 42637663637737). It starts to get more satisfying at 13 characters, where you have the T9 code 2667874284667 spelling either Compurgations or Constrictions (five letters different). Most diverse is the ten character pair Housemaids / Intrenches, with a whopping eight letters different.
Of course that is just T9 pairs. If you go back and search based on number of matches, you can also find some interesting stuff. For example, it you type 22737 into your phone, it might explode. There are 13 different five letter words for that code: Acres, Bards, Barer, Bares, Barfs, Baser, Bases, Caper, Capes, Cards, Carer, Cares, and Cases. Also, there are 11 words for the six letter T9 code 727437: Parges, Paries, Pashes, Raphes, Rapids, Rapier, Rasher, Rashes, Sarges, Sashes, and Scries.
In T9-speak these are all called “textonyms” - words with the same T9 code. T9 sorts the textonyms by frequency of use in order to give you the best word, but as many frequent T9ers know, it gets things wrong a lot. If I could get a database of word frequency, I could find the longest textonyms with high frequency, and that might be fun.
Or everyone could just get an iPhone.
If you’re interested in the data, here are a couple links to the scripts I used:
Textonym Pairs (only 2 matching words)
Most Textonyms (most numbers of matches for a single T9 code)