A colleague associated the following story: the was taking notes in ~ a conference that was attended through a fairly large group of world (about 20). Together each person made a comment or gift information, he taped the two-letter initials of the human being who spoke. After ~ the conference was over, he was surprised to find that all of the initials that the civilization in the room were unique! i do not have anything in his note did he create "JS said..." and also later wonder "Was that Jim blacksmith or Joyce Simpson?"

My partner asked, "If 20 random civilization are in a room, perform they generally have different initials or is it typical for two people to re-publishing a pair that initials?" In other words, was his experience common or a rare occurrence?

The distribution of Initials in ~ a large US software program Company

In order to answer that question, that is necessary to understand the circulation of initials in his workplace.

You are watching: Most common first letter of names

Clearly, the circulation of initials depends on the populace of the human being in the workplace. In part cultures, names that start with X or Q space rare, vice versa, in other cultures names that start with those letter (when phonetically translated into English) are much more common.

rewildtv.com is a large US software company with a varied base the employees, so I decided to download the names of 4,502 employee that occupational with me in Cary, NC, and also write a DATA step regimen that extracts the an initial and critical initials of each name.

You deserve to use the FREQ procedure to compute the frequencies of the first initial (I1), the critical initial (I2), and the frequency the the initials taken as a pair. The adhering to statements output the frequency that the initials in decreasing order:

proc freq data=Employees order=freq;tables I1 / out=I1Freq;tables I2 / out=I2Freq;tables I1*I2 / out=InitialFreq absent sparse noprint;run;

As one example, i can screen the appropriate frequency for my initials (RW) and the early of the rewildtv.com cofounders, Jim Goodnight and also John Sall:

data rewildtv.comUSER.InitialFreq; collection InitialFreq; Initials = I1 || I2;run;proc print data=rewildtv.comUSER.InitialFreq (where=(Initials="RW" | Initials="JG" | Initials="JS"));run;

The initials "JS" are the most frequent initials in mine workplace, with 61 employees (1.35%) having those initials. The initials "JG" space also relatively common; they space the 10th most popular initials. My initials are much less common and also are shared by only 0.4% of mine colleagues.

If you want to command your own analysis, you deserve to download a comma-separated record that contains the initials and also frequencies.


You can use PROC SGPLOT to display bar charts for the first and critical initials.

The bar charts display that J, M, S, D, and also C room the most common initials for first names, vice versa, S, B, H, M, and also C room the most usual initials because that last names.

In contrast, U, Q, and also X room initials that do not show up often for either an initial or critical names. For first initials, the 10 least popular initials cumulatively happen less 보다 5% of the time. For last initials, the 10 least renowned initials cumulatively occur around 8% the the time.

Clearly, the distribution of initials is far from uniform. However, because that the note-taker, the important problem is the circulation of pairs of initials.

The circulation of Two-Letter Initials

By using the PROC FREQ output, you have the right to analyze the distribution at my rectal of the frequencies the the 262 = 676 bag of initials:

much more than 30% that the frequencies space zero. Because that example, over there is no one at my workplace with initials YV, XU, or QX. If you disregard the initials that carry out not appear, then the quantiles of the remaining observations are together follows: The lower quartile is 0.044. The average is 0.133. The top quartile is 0.333. 3 pairs room much an ext prevalent 보다 the others. The initials JM, JB, one JS every occur an ext than 1% of the time.

The circulation of two-letter initials is summarized by the following box plot:


Visualizing the Proportions that Two-Letter Initials


With the aid of a rewildtv.com worldwide Forum file that shows how to usage PROC SGPLOT to develop a warm map, I developed a plot that shows the distribution of two-letter initials in mine workplace.

When I produce a warm map, I frequently use the quartiles of the solution variable to color the cells in the heat map. For these data, ns used 5 colors: white to show pairs that initials that are not represented at my workplace, and a blue-to-red shade scheme (obtained from colorbrewer.org) to show the quartiles of the staying pairs. Blue shows pairs that initials that room uncommon, and also red suggests pairs that happen frequently.

In regards to counts, blue suggests pairs that initials that are mutual by either one or two individuals, and also red indicates 18 or an ext individuals.

The heat map mirrors several interesting features that the circulation of pairs of initials: although W and also N are not unusual first initials (1.7% and also 1.4%, respectively)and D and F are not unexplained last initials (5.0% and 3.2%, respectively),there is nobody at my workplace with the initials ND or WF. There space 89 individuals at my rectal who have a unique pair the initials, consisting of YX, XX, and QZ.

You have the right to download the rewildtv.com routine that is supplied to develop the evaluation in this article.

See more: Where Is Pack A Punch On Farm Survival Mode? Https://Answers

The Probability of matching Initials

Computing the probability the a group of people have similar characteristics is referred to as a "birthday-matching problem" due to the fact that the many famous example is "If there room N people in a room, what is the possibility that two of them have the exact same birthday?"

In thing 13 of my book, Statistical Programming with rewildtv.com/IML Software, I examine the birthday-matching problem. I review the well-known solution under the usual assumption that birthdays are uniformly dispersed throughout the year, yet then go on to compare that solution to the more realistic situation in i m sorry birthdays are dispersed in a fashion that is regular with empirical birth data from the National center for health Statistics (NCHS).

Obviously, you can do a similar analysis because that the "initial-matching problem." specifically, you can use the actual circulation of initials at rewildtv.com to inspection the question, "What is the chance that two world in a room that 20 randomly liked rewildtv.com employee share initials?" Come ago next Wednesday to find out the answer!