Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Rare Letter Combinations and Key Chords

DZone's Guide to

Rare Letter Combinations and Key Chords

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

A bigram is a pair of letters. For various reasons—word games, cryptography, user interface development, etc.—people are interested in knowing which bigrams occur most often, and so such information is easy to find. But sometimes you might want to know which bigrams occur least often, and that’s harder to find. My interest is finding safe key-chord combinations for Emacs.

Peter Norvig calculated frequencies for all pairs of letters based on the corpus Google extracted from millions of books. He gives a table that will show you the frequency of a bigram when you mouse over it. I scraped his HTML page to create a simple CSV version of the data. My file lists bigrams, frequencies to three decimal places, and the raw counts: bigram_frequencies.csv. The file is sorted in decreasing order of frequency.

The Emacs key-chord module lets you bind pairs of letters to Emacs commands. For example, if you map a command to jk, that command will execute whenever you type j and k in quick succession. In that case if you want the literal sequence “jk” to appear in a file, pause between typing the j and the k. This may sound like a bad idea, but I haven’t run into any problems using it. It allows you to execute your most frequently used commands very quickly. Also, there’s no danger of conflict since neither basic Emacs nor any of its common packages use key chords.

The table below gives bigrams whose percentage frequency rounds to zero keeping three decimal places. See the data file for details.

Since Q is always followed by U in native English words, it’s safe to combine Q with any other letter. (If you need to type Qatar, just pause a little after typing the Q.) It’s also safe to use any consonant after J and most consonants after Z. (It’s rare for a consonant to follow Z, but not quite rare enough to round to zero. ZH and ZL occur with 0.001% frequency, ZY 0.002% and ZZ 0.003%.)

Double letters make especially convenient key chords since they’re easy to type quickly. JJ, KK, QQ, VV, WW, and YY all have frequency rounding to zero. HH and UU have frequency 0.001% and AA, XX, and ZZ have frequency 0.003%.

Note that the discussion above does not distinguish upper and lower case letters in counting frequencies, but Emacs key chords are case-sensitive. You could make a key chord out of any pair of capital letters unless you like to shout in online discussions, use a lot of acronyms, or write old-school FORTRAN.

Update (2 Feb 2015):

This post only considered ordered bigrams. But Emacs key chords are unordered, combinations of keys pressed at or near the same time. This means, for example, that qe would not be a good keychord because although QE is a rare bigram, EQ is not (0.057%). The file unordered_bigram_frequencies.csv gives the combined frequencies of bigrams and their reverse (except for double letters, in which case it simply gives the frequency).

Combinations of J and a consonant are still mostly good key chords except for JB (0.023%), JN (0.011%), and JD (0.005%).

Combinations of Q and a consonant are also good key chords except for QS (0.007%), QN (0.006%), and QC (0.005%). And although O is a vowel, QO still makes a good key chord (0.001%).

Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.

Topics:

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}