Word Selection Methodology

Methodology for Lexical Development in Globasa

Preliminary Step: Before rushing to expand Globasa's dictionary with a new root word, determine if the desired word can potentially be expressed through an already established root word or through a word formation method (affixing or compounding). Based on that determination, decide whether or not to propose a new root word.

Three-Part Methodology:

(1) Establishing the etymological source for the word

(2) Determining the semantics of the word

(3) Determining the exact form (lettering) of the word

Establishing the Etymological Source

Caveats

The following caveats must be kept in mind during the source selection process. Whenever necessary, an effort should be made to adapt word forms based on the caveats below using the most widely international etymological source. However, if this is not possible, a less widely international source should be used instead.

Never adopt minimal pairs with v and w, s and z or word-final position m and n.
Unless there is absolutely no other option and it has been thoroughly investigated for potential issues with affixed words, never adopt a minimal pair with plus/minus a consonant or vowel at the beginning/end of words.
Whenever possible, avoid minimal pairs with l and r, b and p, f and p, c and j, c and x, h and r or minimal pairs with plus/minus a vowel at the beginning/end of words.
Whenever possible, avoid any minimal pair: Whenever there is more than one more or less equal word-form option, choose the form that does not create a minimal pair.
Whenever possible, avoid one-syllable words and words longer than three syllables: Whenever there is more than one more or less equal source option, choose the source with two or three syllables, or add an a posteriori vowel to produce a two-syllable word. Also, keep in mind that Globasa favors three-syllable words over two-syllable words.
Whenever possible, avoid words that appear to be affixed: Whenever there is more than one more or less equal word-form option, choose the form that does not appear to be affixed.

Source Selection Algorithm

Check the following languages on online sources such as Google Translate, Wiktionary and Wikipedia as well as use print dictionaries as support when in doubt: English, French, German, Russian, Spanish, Mandarin, Japanese, Korean, Vietnamese, Hindi, Telugu, Arabic, Swahili, Persian, Turkish, Indonesian, Filipino. You may also want to try the Globasa Etymology Helper app.
- Select the source with the most language families represented.
  - English, French, German, Russian and Spanish are considered one family.
  - Mandarin, Korean, Japanese, Vietnamese, Hindi, Telugu, Arabic, Swahili, Persian, Turkish and Indonesian are all in different families.
- If there is a tie in number of families represented, the order of priority for source selection is as follows:
  - Any two or more of the East Asian languages: Mandarin, Japanese, Korean, Vietnamese
  - Arabic, supported by any other language or languages (Persian or Swahili, for example)
  - Hindi, supported by any other language or languages (Telugu, Indonesian, or any European language, for example)
  - European languages, supported by any other language or languages (Indonesian or Turkish, for example)
  - Persian and Turkish
- If there is no agreement, do a more thorough search with other parts of speech or with synonyms.
- If there is still no agreement, choose the most appropriate source based on the following order of priority.
  - Mandarin (two-character words only)
  - Vietnamese
  - Telugu
  - Swahili
  - Arabic
  - Hindi
- Keep in mind that the caveats above always trump the source selection guidelines.

Determining Semantics of the Word

In this step, a determination should be made as to whether the new root will be a noun/verb word or an adj/adv word. Other considerations should be whether the new root will denote a person or tool, or whether the word denoting a person or tool will be derived out of the new root.

Part of speech is typically determined by which appears to be more useful (i.e., more frequently used in practice).
In some cases, the part of speech that seems the most useful isn't well supported by the source languages, in which case, the part of speech is selected based on the greatest support in the source languages.
In some cases, the less commonly used part of speech is selected in order to better accommodate derived words. For example, Globasa has termo (heat) as a noun, rather than something like garme (warm) in order to easily use termo in compound scientific words such as thermodynamics, as well as to more easily derive words like termodo (heated), as opposed to garmegido.
In the case of words that denote people, there are specific guidelines for determining the root word. Some words denoting professions are derived using -yen or -kef while others are root words. Derived words denoting professions are derived from root words denoting verb/practice, team or establishment (but not the professional's office). Otherwise, the word denoting a profession is typically a root word: biskopo (bishop), etc.
- verb/practice
  - ato (act) -- atoyen (actor)
  - medis (medicine) -- medisyen (physician)
  - injeneri (engineering) -- injeneriyen (engineer)
  - arkitetur (architecture) -- arkiteturyen (architect)
  - jasusi (espionage) -- jasusiyen (spy)
- team
  - polisi (police) -- polisiyen (police officer)
  - askeri (army) -- askeriyen (soldier)
  - senato (senate) -- senatoyen (senator)
- establisment
  - banko (bank) -- bankoyen (banker)
  - eskol (school) -- eskolkef (principal)
- professional's office
  - sekretari (secretary) -- sekretaridom (secretary, secretariat)

Determining Exact Form of the Word

Apply the caveats above.
Try finding a middle ground when creating a blend between the words in the various languages. However, keep in mind that words sourced from European languages tend to favor spelling (see details below). An effort should be made to keep blends as natural as possible. That is, there should be a difference of only one phoneme from the form of any natural language. A difference of two phonemes is also acceptable if the phonemes are close to those of any natural language.
An effort should be made to avoid noun/verb words beginning with the syllables le-, xa-, nun-, ger-, be- and du- so as to avoid confusion with verb particles.
Select consonants and vowels that are the least common in Globasa.
- All else being more or less equal, choose e over a, e over i, o over any other vowel except u (u over o).
- All else being more or less equal, choose m over n, l over r, h over k, g over k, d over t, p over b.
- However, all else being more or less equal, choose s over z and w over v.
Root words in Globasa tend to end in a vowel, preferably an a posteriori vowel found in at least one of the major languages or language families.
- Examples of a posteriori final vowels: Spanish words ending in -o or -a, Swahili words ending in -i or -u, or Japanese words ending in -u.
- Caveats:
  - Caveat 1: If the word would consist of four of more syllables, the vowel is not added if there is at least one language that has the word ending in a consonant: estrutur, etc.
    - However, noun/verbs that end in -ation/-ate in English always add -a in Globasa, regardless of word length: diskrimina, etc.
    - In addition, words that end in -ia/-io in Spanish, in -aire/-oire in French and in -y in English typically end in -i in Globasa regardless of word length: ordinari, teritori, sekretari, etc.
  - Caveat 2: In a few select, very common words, a final vowel is not added in order to keep the word at two syllables: eskol, alim, muhim, etc. This is now a closed caveat, since all sufficiently common words have been added.
  - Caveat 3: If the etymology of a word includes eight or more languages with the word in question ending in a consonant, the Globasa word does not add a final vowel. For example, see tufan and salun. Compare with safe, taru and jusu.
- An a priori vowel should only be added if the last consonant is one which phonotactic rules do not allow in word-final position (b, c, d, g, h, j, k, p, t, v, z) or to create a two-syllable word if the source word consists of one syllable (for example, see pole, which adds -e to avoid minimal pairs with bol and pul). To select an a priori vowel, use the following guidelines:
  - Indo-European (Romance, Slavic, Germanic, Hindi, Persian): -i for English verbs, otherwise -e.
  - All others (Mandarin, Japanese, Korean and Vietnamese; Arabic; Indonesian and Filipino): -u.
The a posteriori or a priori initial vowel e- is added to initial consonant clusters not allowed by Globasa phonotactics: espageti, Esrilanka, Exkiperi, etc.

Words Sourced from European Languages

Words sourced from European languages tend to favor spelling of both consonants and vowels.
The consonants ⟨c⟩, ⟨g⟩, and ⟨s⟩, in particular, remain intact if there is at least one European language which pronounces them as /tʃ/, /g/ and /s/ respectively: centro, geo, visita, etc.
This rule of thumb is especially important when it comes to the way vowels are rendered in languages that have borrowed European words via English. For example, the word for rum is rendered as /ram/ in many languages, as an approximation of English /rʌm/. In spite of this /a/ being more prevalent, Globasa retains ⟨u⟩ in the original spelling: rum.
Words that end in -tion or -(s)sion in English and their equivalents in other languages are rendered as follows: two-syllable words end in -syon; longer words end in -si. However, verbs that ending in -ate in English take neither -syon nor -si, and simply end in -a.

Words Sourced from English

Words sourced primarily from English are rendered in Globasa based on the following rule of thumb: If an English phoneme can be rendered faithfully with Globasa's inventory, pronunciation is favored. Otherwise, the Globasa form favors the English spelling. Caveats: /oʊ/, as in post, note and boat, is rendered as ⟨o⟩; /ʊ/, as in foot, is rendered as ⟨u⟩; /z/ as in visit is rendered as ⟨s⟩. To further illustrate:

Spelling is favored for so-called short vowels as well as for single vowels reduced to /ə/: Alaska, Nevada, Misisipi, Boston, Konetikut, etc.
Spelling is favored for ⟨er⟩, ⟨ir⟩ and ⟨ur⟩ pronounced as /ɜr/: hamburger (hamburger), hamuster (hamster), eskirti (skirt), etc.
Spelling is favored for ⟨au⟩: Awgusta (Augusta)
Pronunciation is favored for most vowel clusters: Finiks (Phoenix), Wudrow (Woodrow), Tenesi (Tennessee)
Pronunciation is favored for ⟨are⟩: Delawer (Delaware)
Pronunciation is favored for so-called long vowels, except for long o, in which case spelling is favored: beykon (bacon), eskeyti (skate), Aydaho (Idaho), Yuta (Utah); long o: Ohayo (Ohio), etc.
Pronunciation is favored for most consonants, except s when pronounced as /z/, in which case spelling is favored: Misuri (Missouri)

Words Sourced from Mandarin

Words sourced from Mandarin are typically blended with their Japanese, Korean and Vietnamese forms. An attempt is usually made to simplify Mandarin phonotactics when supported by the other East Asian languages, particularly Japanese.

Pinyin -ang is often rendered as ⟨o⟩ in Globasa, as seen in Japanese.
A consonant plus glide is often reduced to a consonant.
Likewise, diphthongs are often reduced to a simple vowel.

Words Sourced from Arabic

The diphthong /au̯/ is often reduced to ⟨o⟩, as seen in Swahili or Persian: soti, etc.
The diphthong /aj/ is sometimes reduced to ⟨e⟩, as seen in Swahili: rehani, etc.
Unless there is a different naturalistic option supported by Persian, Swahili, Turkish, Indonesian or Hindi, noun/verb words sourced from Arabic typically end in -u, as seen in the present tense of Arabic verbs.