What are Globasa's Etymological Stats?

Summer, 2021 reply

Out of around 1,600 root words:

  • English: 44%
  • Spanish: 39%
  • French: 38%
  • German: 33%
  • Russian: 30%
  • Turkish: 28%
  • Indonesian: 25%
  • Hindi: 22%
  • Arabic: 22%
  • Persian: 21%
  • Filipino: 17%
  • Mandarin: 16%
  • Swahili: 13%
  • Korean: 12%
  • Japanese: 12%
  • Vietnamese: 6%
  • Telugu: 6%

It may not seem like it, but I tried to lower the English percentage (and the European percentage in general) as much as possible. That was the best I could do by following a specific methodology (forthcoming on this site) for selecting words. The reality is that the European languages, and in particular English, have been vastly influential. The great majority of European words selected for Globasa are overwhelmingly international words, so it’s hard to avoid them. These words are typically common to virtually all of the European languages, as well as borrowed by other languages, such as Filipino (via Spanish), Indonesian and Turkish. Other European languages not mentioned here, particularly the other Romance, Germanic and Slavic languages also have large percentages, close to the percentage of the major language in each of those families (Spanish, German, Russian).

As you can see, Turkish has a surprisingly high percentage. The reason for this is that Turkish is a highly cosmopolitan language, having borrowed extensively from both the European culture (primarily via French) and from the Arabic and Persian cultures. Similarly, Persian and Indonesian have also borrowed extensively from Arabic and Hindi.

I tried as best as I could to incorporate as many East Asian (Mandarin, Korean, Japanese and Vietnamese) words as possible, without artificially selecting Mandarin words in a haphazard way. Instead, I relied on any commonality between Mandarin and any of the other Asian languages. There was actually more commonality than I expected, so I was happy to get Mandarin up to 17%, about 1 in 6 words. Not bad, since about 1 in 6 people in the world are Chinese.

Telugu, a Dravidian language from India, got the lowest percentage. I used Telugu and a few others in the Dravidian family (which are spoken by a great number of people, although only in India) in order to help support Hindi and bring it up to 23%, while helping to lower the percentages for the European languages.