Skip to main content

One post tagged with "scientific alphabet"

View All Tags

· 41 minuty na čitanje
Adam Kamil Gola

Proposal for a refreshed version of the "Scientific Interslavic" alphabet and its practical applications.

Overview of the old Scientific Interslavic alphabet and proposed changes

The old "Scientific Interslavic" alphabet (found here: http://steen.free.fr/interslavic/nms.html), as devised by Jan van Steenbergen provided a unique insight into the evolution and divergence of Slavic languages while simultaneously being highly exhaustive and comprehensive. However, it was retired all the way before the merger of Slovianski and Novoslověnsky, meaning that some more recent changes may have rendered some parts of it outdated as of 2020. And last but not least, Jan approached it with the feeling that "creating strictly national flavorizations is not necessary, and probably pointless", which curtailed the comprehensiveness of the alphabet somewhat. In this paper, I will propose adjustments and extensions to the alphabet, as well as practical uses for it that will attempt to guarantee both that this invention will not go to waste, and that it's not misused.

Purpose and uses/advantages of the Scientific Alphabet:

  1. Provides a full, non--compromising source code not only for regional, but also for national flavorizations -- with knowledge it is possible (and straightforward) to derive any modern form of any word just by reading it as written in this alphabet
  2. Contains all necessary etymological and flavorization information for readers at a glance
  3. Helps learn national languages by demonstrating the connections between Proto--Slavic and modern Slavic languages, as well as between the languages themselves -- to a maximum degree
  4. Helps understand every or almost every change and process that the modern languages have undergone
  5. Doesn't need to conform or align with MS Plus or the Standard Orthography since it will not be in everyday use, and for that reason can be safely expanded or changed at any time without causing any confusion or damage (as long as every letter has a proper explanation behind it)

The alphabet in its latest known revision is as follows (quoting from Jan's article).

a á å b c ç ć č d ḓ ď đ e é è ė ě ę f g h i í ì ı j ĵ k l ľ ŀ ĺ m n ň o ò œ p r ř ṙ ŕ s ś š t ṱ ť u ų ù v y ý z ʒ ź ž

Let's analyze the letters/groups in light of their continued relevance. The standard alphabet and most of today's MS Plus or "Etymological Alphabet" will be skipped.

á é í ý

á é í ý -- these letters represent adjective endings, and their value is in flavorization to Eastern Slavic languages. á corresponds to either "a" or "aja" (sing. fem),é to "e" or "ije/yje/yja" and the like (plur, non masc.), í to "i" or "ij" (sing. masc.) and ý to "y" or "yj" (also sing. masc). This is very useful for deriving word endings, since these are quite regular in all Slavic languages, and all that would be required is a very simple match table for the Eastern Slavic languages. Russian ,Belarusian and Ukrainian each behave somewhat differently in regards to adjectives (in Russian both the forms are long, while Belarusian and Ukrainian have one short and one long each, which are opposite in the two languages). This could also be appropriated for a Rusyn flavorization if the know--how was provided. The graphical form is OK, since it is reminiscent of Czech/Slovak long vowels -- which shares a connection in meaning with the short vs long adjective endings, though alternatively these vowels could be spelled with a macron: ā ē ī ȳ -- so as to eliminate ambiguity as to the meaning of the diacritic (since the acute already denotes softened/palatalized consonants). Out of these four letters, normally only í and ý would see dictionary use, since adjectives are customarily given in masculine gender, but if other gender and number forms were added, the remaining letters could see use as well . The other letters could appear if the Interslavic Dictionary was also extended with declensions in full scientific alphabet, otherwise they could be noted as analogous in an annotation.

œ -- this letter is for flavorizing neuter--e vs neuter--o Slavic languages (for example "dobre" vs "dobro" etc), no change appears required. In fact this is one of those letters that could serve an important role even in MS Plus, if we assumed that the e/o difference is important for intelligibility. At any rate, a "middle--ground" pronunciation would be very simple here, though redundant with the strong yer (o umlaut or schwa). This letter also forms a continuum of sorts with the letters described above, since it could also be responsible for distinguishing the short and long neuter adjective form (--oje (RU), --aje (BE)), so it could be rendered with a macron for consistency, since it fulfills both those options at the same time: œ̄. So œ̄ can flavorize to anything from the range --o, --e, --oje, --aje. But plain œ with no macron should also be preserved for declension situations: dobrœgo --> dobrogo or dobrego, as the longer version could introduce issues in RU and BY ("dobrojego" etc). So the one with the macron would be used for words in the nominative, while the one without for all other cases.

ç, ʒ -- "for cases of second palatalisation of kv/gv + front vowel (> kv/gv in West Slavic, cv/zv elsewhere)". This denotes an important dialectal shift in Slavic languages, the second regressive palatalization across an intervening "*v". Interestingly enough, though, the number of words where this occurs is very small, and they're easily learnable by heart. Still, this can be very useful for national flavorizations. One suggested change: would replace the ʒ character with z̧ to preserve analogy with the c with cedilla used for the other letter, the former also reminds too much of IPA's representation of the Czech "ž" or English "su/zu" (as in "pleasure" or "seizure") sound.

ć, đ -- while these letters are perfect for MS Plus, as they reflect naturally occurring graphemes, we are not restricted by such considerations in the Scientific alphabet, and want to avoid the function of the acute to be misrepresented by ć. For that reason, I propose a consistent system of ĉ, d̂ for rendering these in Scientific.

ě -- the latin letter for the yat' from Standard and MS Plus. Should be rendered as something different, like ĕ in Scientific, for consistency -- the acute denotes softening/palatalization, and the haček denotes palato--alveolar or retroflex sibilants -- yat is technically neither of these things, so for perfect accuracy a recognizable, but different diacritic can be used. Of course Slavists might prefer the actual symbol for the yat', much like they tend to write yers in Cyrillic, but that might not be optimal for computer code.

ḓ, ṱ -- for the tl/dl sequences, preserved only in West Slavic, vanished elsewhere. It is an often--requested addition, some would even see it in MS Plus. This is still definitely valid and needed. The vanishing d could also be used in words where the "dz" sound was in Proto--Slavic, but evolved to plain "z" -- though a different letter would be required due to its different origins/needing a different rule for flavorization (all languages except for Macedonian get "z", Macedonian and Old Church Slaovnic get "dz"). The flavorization will not be perfect either way, but will at least show the Proto--Slavic root and evolution (the new letter will be proposed below).

è, ò -- the front and back strong yers, ь and ъ respectively. Staple letters of MS Plus which affect pronunciation and tend to vanish after declension. Have been redesigned to ė, ȯ in MS Plus, so the same change should be made here for consistency.

ė -- the letter that constitutes a pair with å,  and acts as a placeholder for different reflexes of --ere-- and --ele-- sequences. Unlike å though, there is no clear--cut idea how it should affect pronunciation, and therefore it would be cumbersome to use in MS Plus. But in terms of flavorization it can be hugely useful, so it's still an important part of the Slavic "source code". Due to there being no need to make this letter presentable or easily renderable, as well as the fact that ė now represents one of the yers, it would be better to change it to a matching grapheme: an "e" combined with the ring diacritic: e̊, which ė was meant to be an approximation of in the first place (though it doesn't happen to convey the "o" part of the pronunciation that would apply to "å").

ì -- specifically for the infinitive ending "--tì". Extremely useful for all sorts of flavorization, particularly national ones, where an algorithm could just convert "--tì" to any national infinitive ending, such as "--t", "--ť", "--ć", "--ti" etc.

ď, ľ, ň, ř, ś, ť ź -- the softened consonants. In light of the recent changes to the orthography, these should be changed to d́, ĺ, ń, ŕ, ś, t́, ź, respectively. Ironically, this orthography making use of nonstandard letter + diacritic combinations would be perfect for the scientific alphabet, while using naturally occurring letters such as ď, ť, ľ would be better for MS Plus, which -- unlike this alphabet -- still is very  usable in everyday writing (and it would be a painful waste to deliberately make it less so). Since acutes now all represent softened letters in the scientific alphabet, new ones will have to be found for syllabic R and L (as discussed right below).

ṙ, ŕ, ŀ, ĺ -- for the hard and soft syllabic liquids r/l -- since the letters with acutes will now be used strictly for softened consonants, and macrons for long adjective endings, perhaps combining these two sets of diacritics could create a solution that is both consistent and understandable. For this scientific alphabet we are bound by neither tradition, nor ease of use, so our only concern should be utility. That gives us the following, with respect to the order of the letters at the beginning of the paragraph: r̄ , r̄́, l̄, l̄́. This preserves the letters' connection to their regular equivalents, while the addition of the macron denotes that they're "long", i.e. syllabic. Though l̄/ l̄́ always converts to ȯl/ȯlj in MS Plus, the additional distinction between regular soft/softened letters and their syllabic variants is useful to have around for other algorithm--based flavorizations.

ĵ -- in cases where most Slavic languages have contraction --aje-- > --a-- -- this is very useful for flavorizing for the Eastern vs Western/Southern verb conjugations -- the algorithm could simply convert Interslavic sequences such as --aĵų, ----ěĵų to --am, --ěm and analogous where needed. I propose changing the grapheme to ȷ not to clash with any other diacritic used here. This also makes the letter similar to ı which fulfills a somewhat similar function.\ ı -- for the tense yer with an adjacent j -- very useful, since it could help eliminate confusion in Interslavic itself, while it also contains valuable information for regional or even national flavorizations. While it always reflects as a softened consonant + j in MS Plus, in the standard alphabet the lack of this distinction can be tricky (like the standard Cyrillic spelling of infinitive verb endings: -- нје instead of the expected --ње). Not to mention it's very useful for flavorizing to national languages, as it can convert to all the various infinitive endings, as well as those of verb--derived nouns, which are particularly conservative in the East Slavic branch.

ù -- useful for denoting where "u" or a "v" are both just reflexes of the Greek upsilon in au/eu diphthongs, which was historically marked by the Greek--derived Cyrillic letter Izhitsa Ѵѵ. Using it for flavorization would require a bit of generalization though, since some languages render this in an unpredictable fashion, as either "u" or "v" seemingly without rhyme or reason. The cause of this is the fact that the Greek diphthongs "αυ/ευ" used to be pronounced with the vowel sound in Ancient times, but later on became pronounced "av/ev", which continues into the modern Greek of today. Some languages are more predictable with this than others though, so it is still useful. When making a retro--styled flavorization such as (Old) Church Slavonic, it could flavorize to the izhitsa. The letter used itself could be changed to ũ, so that unique functions of every diacritic are preserved.

Proposed changes and extensions to it

I have had several ideas on how to extend the functionality of this alphabet, mostly for even more accurate and precise flavorization potential, such as national level -- where Jan feared or refused to tread. While the letters themselves are not necessary in automatic flavorizations -- since just knowing the processes that have occurred by deriving them from the OCS proto--forms can do the same job -- they are still useful for marking these rules in writing in a single symbol, allowing for instance dictionary readers to learn to recognize and predict these processes and by extension learn to derive national flavorizations very quickly  -- meaning that one could perceive these (the originally introduced letters too, for that matter)  as "flavorization rules rendered with single symbols".  Here are the newly proposed letters:

ɨ, ɨ̄ -- these letters are to help flavorize to languages that follow different i/y rules than Interslavic. Interslavic always  places "y" after what it considers hard consonants, and "I" after what it considers to be soft ones, but what constitutes hard and soft consonants varies between languages. Many are consistent between the languages, but some like k,p, č, š, and ž can be treated differently. So this letter basically introduces a third rule, which doesn't overwrite Interslavic tradition itself, but helps flavorize more accurately when such conflicting cases are encountered. The variation with the macron is for adjective endings and supplements the previous letters with carons, acting as both the carrier of this rule and the long/short adjective ending designator. The variation without the macron is for all other uses.\ Examples:

Scientific: svěžɨ̄ -- Isl: svěži -- Ru: свежий -- Cz: svěží -- Pl: świeży -- Uk: свіжий -- Be: свежы\ Scientific: slovjańskɨ̄ Isl: slovjańsky -- Ru: славянский -- Cz: slovanský --Pl: słowiański -- Uk: слов'янський -- Be: славянскі\ Scientific: čɨtatì -- Isl: čitati -- Ru: читать -- Sk: čítať -- Pl: czytać -- Uk: читати -- Be: чытаць

The conflicting combinations would be: --pɨ, --šɨ, --žɨ, --čɨ, --rɨ/--kɨ̄, --pɨ̄, --šɨ̄, --žɨ̄, --čɨ̄ -- and each would require a different rule for the algorithm. In such conflicting cases, the usual letter of choice in Interslavic could be replaced by ɨ,ɨ̄, which would flavorize to the usual letter in Interslavic (--ky, --py, --ši, --ži, --či, --ri i tako dalje), but differently to national languages based on their separate rules.

ṽ -- this is only for dealing  with the variation in the preposition "v", meaning "in". It is rendered as "v" in most languages including Interslavic itself, but as "у/u" in Belarusian and Shtokavian (Serbo--Croatian). Though in Slovene it's written as  "v"  as well, it's pronounced as "u".

ḍ -- while this letter would behave pretty much identically to ḓ, it does not belong to a pair and denotes an altogether different thing: it would serve as the vanished "d" from the /dz/ sound of some Proto--Slavic words, some of which have only been preserved to this day in Macedonian (and other uses of "dz" innovated there). This corresponds to the shift from the OCS/Macedonian letter/sound "ѕ" to "з". The letter would be used in historical, not innovative words -- so its utility in flavorization to Macedonian would not be perfect -- but to be fair, flavorizations are not meant to perfectly reflect national word forms anyway. This letter could be used in words such as: ḍzly, ḍzvěŕ, ḍzmij and other historical words with the /dz/ sound (not including noun declension, that will be taken care of by another letter). Interestingly, this letter could be combined with z̧ in the word "ḍz̧vězda" -- to produce a perfect, historical form that demonstrates both the shift from /dz/ to /z/, and the later zv/gv shift, allowing for accurate flavorization both to Macedonian and all the remaining Slavic languages.

l̯ -- the oft--requested ephenthetic "l" that is a staple of Eastern Slavic and West Southern Slavic languages. It would appear in words such as zeml̯ja, dobavl̯yati etc, and flavorize to remain in Russian, Belarusian, Ukrainian, Rusyn, Shtokavian and Slovenian, and to vanish in Polish, Czech, Slovak, Bulgarian and Macedonian.

ł -- This letter's role would be to allow flavorization into languages whose final letter of the masculine past tense verb form diverged from the old slavonic hard "l". It will flavorize into L/Л in most Slavic languages, but in Shtokavian it will instead become O, (bio), in Ukrainian В (був), in Belarusian Ў (быў) and to itself -- ł -- in Polish (był). In Polish "ł" is always equivalent to the hard slavic L, but its pronunciation has evolved to match the Belorusian and Ukrainian letters used in this case instead (the "lazy l" sound that also appears in for example Bulgarian and Slovenian, though in different situations). Ironically, all the languages where this phoneme diverges in the masculine, revert to the hard L in the feminine and plural.\ k̦ g̦ -- special variations of k and g that are written to help flavorize certain feminine noun declensions. Interslavic simplifies noun declensions by eliminating the consonant shifts that many if not most natural languages still possess, such as for instance Czech ruka -- ruce/noha--noze itd. In Old Church Slavonic, there was a noga --> nodzě shift, so the algorithm could react to k̦ě/g̦ě clusters and flavorize them into national forms, such as ce/dze/ze etc. Most cases of "kě" in the dictionary are in fact expressions containing the dative of a word ending with --ka, but there is one exception: "kěšenj" where the shift to "ce" wouldn't apply for West Slavic languages. There are no words at all in the dictionary that would contain the "gě" sequence in nominative forms -- so perhaps these letters are not needed for flavorization at all, and the algorithm would be good enough simply looking for kě/gě sequences -- though the letters could still be valuable for providing the information that such consonant shifts occur in some Slavic languages after declension, and they eliminate any possibility of there being any future conflict if the dictionary is expanded with words containing kě/gě sequences in their roots.

g, h, ǥ, ħ -- another serious simplification that Interslavic makes is having just one kind of h and one kind of g, just like Serbian. This is perfectly enough for everyday use (even on MS Plus level), but as far as the scientific alphabet goes, it's a huge missed opportunity. Having this distinction both provides interesting etymological information and can increase the quality of flavorizations immensely. The letters would be used in the following manner:

g -- "common Slavic g" -- this is the "g" that differs based on whether we're dealing with a G--Slavic or a H--Slavic language. It is the "g" that will occur the most often even in Scientific Interslavic, and it flavorizes to "h" for Czech, Slovak, Belarusian, Ukrainian and Rusyn and to "g" for the remaining Slavic languages -- note that in the Cyrillic--based languages, it will always become the letter Гг. A unique example of coexistence of both reflexes of this letter would be in the names Bogdan and Bohdan, both of which are known to occur in Poland.

h -- "common Slavic h" -- the one to which the Cyrillic letter Хх corresponds. Will flavorize to "ch" in West Slavic, "h" in South Slavic and "х" in any Cyrillic--based language. Found in native Slavic words.

ǥ -- the "foreign g" -- this will generally reflect as "g" in any Slavic language, even H--Slavic ones, since it represents non--native words, such as borrowings from Latin, Greek or English. H--Slavic languages with consistent and predictable reflexes of this letter as "g" are Czech, Slovak and Rusyn (which uses the letter Ґґ for it). Ukrainian also has the letter ґ, but its use in that language is based on different rules and not consistent or predictable, so ǥ in fact will most often reflect as plain Гг in that language, as it would in all other Cyrillic--writing languages, which happen to be G--Slavic. Belarusian is the simplest in this regard, since its Гг letter is a sound reminiscent of a voiced /x/, and it encompasses both types of "g", making the language sit right on the line between G--Slavic and H--Slavic. The distinction is therefore the most useful for flavorizing into Czech, Slovak or Rusyn, but it provides information about the word's provenance to everyone. Examples include words such as: psiholoǥ, ǥravitacija, ǥalvanizacija, loǥika.

ħ -- the "foreign h" -- this is the "h" from foreign loanwords, mostly Greek, but also French, English, German or even Turkic in some cases. In Interslavic, it would mostly apply to words of Greek origin, such as for instance ħipnoza, ħorizont, ħalucinacija, ħarmonija, ħidraũličny; but also words of Germanic origin, such as ħak or French ones, such as ħumor, ħotel. Reflexes of this H in different Slavic languages are fairly predictable, but vary depending on the language of origin. Greek ones are the easiest -- they consistently reflect as "h" in all Latin--based languages, as "г" in Eastern Slavic (even taking on Russian's pronunciation of "г" as "g"), and "х" in Cyrillic--based South Slavic. Ones coming from French often reflect according to the same rules, with the sole exception of Russian, which tends to omit them (ħotel --> отель; ħumor --> юмор), so the rule for Greek could be used for all languages with very little risk. In practice, this flavorization option is probably the most useful for Polish, known for its h/ch distinction, though Polish "samo h" words are borrowed from many different languages, to which the above rules may not apply (even actually Slavic words borrowed directly from Ukrainian without adjusting their phonology, such as "hoży" or "huśtawka" -- though these are not present in Interslavic, so they're not an issue). Czechs and Slovaks could also benefit from learning how this "h" is different from their H--Slavic reflex of "g", as could users of any other Slavic language.

v̑ -- placeholder for flavorizing different endings of masculine/neuter nouns in plural genitive. The algorithm would read it alongside "o" in "--ov̑" sequences and flavorize it to national languages accordingly. This would be done to allow this flavorization without having to to touch "--ov" sequences outside of these endings, which are very numerous in Interslavic. So --ov̑ would flavorize to --ów; --ov; --ů --ов; --оў etc.

ꭢ -- used alongside "j"  to represent the distinction between the East Slavic initial "o" and "je" in remaining Slavic languages. For instance,  words such as "jꭢdin" and "jꭢzero" would flavorise to "odin" and "ozero" in East Slavic flavorizations and to "jedin" and "jezero" in remaining ones. This ligature was chosen because œ already serves another purpose.

í -- Standard Interslavic and MS Plus are liberal in terms of whether plain "i" softens or doesn't soften the preceding consonant. However, there are traditional preferred situations in most Northern languages in regards to this. Í could denote such examples, recommending a soft pronunciation. The diacritic was chosen since it's a simple softening, and the acute is already used for soft letters, so it's consistent. In terms of flavorization this is most useful for Polish and Belarusian, as these two are the only ones to have undergone a second palatalization, and softening actually changes the preceding letter entirely. This letter would be best used in situations such as the word kost́, which is pronounced soft in the nominative singular, but written "kosti" in the nominative plural, as well as some other singular cases. The orthography provides no clue on its own that consistency should be maintained and the word actually pronounced as "kost́i", and this letter could help solve this problem: kostí. It will also enable flavorization into Polish and Belarusian -- in Polish in particular the sequence "ti" doesn't exist in native words, and it is usually rendered as "ty", so the traditional orthography "kosti" would result in a hard rendering "kosty". Conversely, the word "tigr" will be appropriately rendered into "tygr", and not "cigr", which would have been the case if í had been used. The letter does affect pronunciation, so it could be even used as part of MS Plus with no big issues as an auxiliary letter denoting where you wish to convey a soft pronunciation.

é -- In some languages including Polish, Belarusian and Russian but excluding Ukrainian, there are instances of a softening "e" that does not originate with either the yat́ or the iotated e. Other languages would simply have a hard "e" there, and for denoting such instances é could be used. The diacritic is once again consistent with the function. Instances of the softening mentioned above include the word "né", which is pronounced softly, "nie" in Polish, Belarusian, Slovak and, and Russian (the famous word "нет"). All other languages have a hard "e" there though. Another example would be the word "véĺmi", which many languages will render with a hard "e", but Belarusian, Russian and Polish with softness -- sounding like "věĺmi" (the word is archaic in Russian and Polish though). Other examples. --té (Polish, Belarusian: "--cie"); véliky; nébo, zélénj etc. Note that é in unaccented positions in Belarusian the orthographic akańje will turn it to "я", as in the word "вялікі". In Slovak, on the other hand, only né, té, dé sequences will be pronounced soft, and others such as vé, zé will be pronounced hard. This is only reflected in pronunciation and not orthography though, so it's not an issue when flavorizing to Slovak. This letter also could be used in MS Plus with no problems if desired, though its impact on understandability is minimal.

ë, ȅ, e̋ -- these letters could be used to designate the transformation of e/ie into io, with ë pertaining to the way it happened in Russian, while ȅ pertaining to the way it happened in Polish  --  these shifts deal with the same sounds, but generally occur in different places. Belarusian mostly follows Russian rules here, though it also uses the letter for foreign cases, for which Russian would use either "ио" or "йо", and has some more idiosyncrasies of its own, compensating for which is probably not needed for a high enough quality flavorization anyway. However, there are cases where this shift occurred in the same place in Ru/Be and Pl -- for this reason we could use a third letter, e̋, in such places. What's interesting is that the flavorization goes slightly differently:\ 1) The Russian ë flavorizes directly to "e" in all languages except Belarusian and always comes with stress on the syllable it's in. For example the word "партнёр" doesn't turn into "patrnier" in Polish, but simply "partner", as it does in all the others bar Belarusian. One exception would be the word "пëс", which flavorizes outside Russian and Belarusian as per the strong soft yer "pės" -- that word would probably need an exception in the strong yer's coding or a two--stage flavorization algorithm that first derives the yer's flavorization, and then does the second one and deals with the Russian sound shift.\ 2) The common e̋ flavorizes to the "io" sound in all three languages, but to plain "e" outside them, an example would be the word "name̋t" (tent), which flavorizes to "namiot/намёт" in PL/BY/RU (word uncommon in the last), but "namet" in all others.\ 2) The e/o alternation, part of the so--called Lechitic umlaut, ȅ, flavorizes to "io" in Polish, é in Russian and Belarusian and "e" in all others. Example: "sȅdlo" will flavorize to "siodło" in Polish, "седло"/"сядло" in RU/BY and "sedlo" in all others. Of note though is an exception: the rȅ sequence will flavorize to "ro" in Polish (a simplification of what used to be "rzo", historically soft r + o), but into all other languages it will flavorize as per the yat́, so treat it as though it was "ro" in Polish and the reflexes of rĕ in all the other languages. A two--condition flavorization, if you will. So the word "srȅdína" will flavorize to "sródzina" for Polish, but will give a reflex of "srědina" for all the other languages.

þ -- The Latin letter thorn, responsible for the [θ] sound and because of the homophony it's  equivalent to the Cyrillic letter Fita (Ѳ ѳ), which in turn comes from the Greek theta. While most words with the Greek theta sound (which are reliably rendered in English, as "th" -- pronunciation and all) were turned into "t" in virtually all Slavic languages, proper Greek names tended to go towards the "F" sound in East Slavic. Thus "Theodore" became "Teodor" or "Todor" in all non--Eastern languages, while in the East it became "Feodor" and later "Fiodor". Same with "Themisto" -- "Temisto" outside the East, "Femisto" or "Fimisto" in the East (the "i" vs "e" sound is there because of the difference between the pronunciation of the Greek eta over the ages, which И derives from -- Ancient Greek had an "e" sound there, while later it merged into an "i" sound). This divide came about due to the letter's original orthography written with the theta--derived Fita, whose pronunciation was, however, "f" right from the start. There are tons of inconsistencies between East Slavic in many words, where one language will opt for "f", and the others for "t" -- but the general rules are enough to make the flavorization believable with just three simple rules:

  • concepts such as: þeokracija, þanatofobija, aũþor, maþematika, nafþalina, þema etc. -- the sequence þa/þe always, everywhere flavorizes to ta/te

  • proper names of places and people such as: Þeodor, Aþiny, Þemisto, Kiþeron etc. -- the sequence þa/þe/þi flavorizes to fa/fe/fi for East Slavic and ta/te/ti for everyone else -- this could be treated as a list of exceptions from the above rule, which would be general (many more words)

  • When flavorizing to Old Church Slavonic, Proto--Slavic or anything archaic, just replace it with Fita (Ѳ ѳ)in all cases

n︠j︡, l︠j︡, r︠j︡ -- While d́, ĺ, ń, ŕ, ś, t́, ź are softened consonants -- meaning they're altered versions of originally hard consonants, there are also consonants that are soft right from the get--go. The letters s,z, ś, ź exist only in hard and softened versions, while the remaining softened consonants additionally have etymological inherently soft equivalents. What is understood by "inherently soft" vs "softened" is in fact quite simple:

-- "inherently soft" consonants are ones that were soft independently of the soft yer in Proto--Slavic, they can also be called "iotated". Examples:  korľь (ISL: krålj), koňь (ISL: konj) -- while these letters do have the soft yer at the end, you will notice that the consonant preceding the yer is already written as a soft letter (ľ, ň) -- meaning the softness comes from iotation, not the yer. The word "moře" doesn't have a yer at the end, yet we can plainly see that the softness of the "r" is inherent. t︠j︡, d︠j︡ are not part of this family for the simple reason that they became ĉ and d̂ respectively, and then continued their separate journeys in the Slavic languages -- the Interslavic "noć" is plainly Proto--Slavic "noťь", while the Interslavic "sađa" was "*saďa" in Proto--Slavic. While spelling these consistently with the rest of the soft vowels would be etymologically accurate, they have diverged so significantly that a separate set of letters is more sensible to have.

-- "softened" consonants are called that because they often accompany a hard form, which then gets turned into a soft form through the addition of a suffix (like ISL sila --> siĺny, borti -> borьba -> boŕba). But in fact the best predictor for flavorization is the presence of the soft yer next to the hard consonant in Proto--Slavic, which also accounts for the suffix rule. For instance there is " sila", but "silьnъ" -- in the latter case the final hard yer turned into the "y" sound, but the middle soft yer disappeared, instead turning the preceding hard "l" into a soft one, thereby softening it. But there are words where a suffix is not needed and even the root form tells us all we need to know. MSL has "kost́", but the Proto--Slavic word is *kostь, meaning that the softness comes from the yer, therefore the consonant counts as "softened", not "inherently soft" for our purposes. The practical ramifications of this are such:

-- the inherently soft (iotated) consonants have been preserved in pretty much all Slavic languages though rj not so much in the South (Slovenian being a notable exception). That's reflected in Interslavic as lj, nj, rj -  with the occasional exception of r︠j︡ which has stayed in the orthography, (but is not considered a digraph or part of the alphabet because there is no one-sign equivalent for it in Cyrillic), but omitted word--finally even in cases where etymology would demand it (there don't seem to be many such cases though - there is "cesaŕ" in MS Plus, but the Proto--Slavic form is "cěsãřь", which suggests that the writing should be "cěsarj" and that it should extend to the standard orthography). But again, because word final soft "r" is mostly "softened" and thereofre absent from pretty much all South Slavic languages, probably due to convergent evolution, Interslavic skips it too for simplicity's sake and allows an optional "ŕ" in MS Plus - particularly since in pretty much all other cases it is etymologically correct anyway ("lěkarь" suggests a yer-softened "r").

-- the "softened" consonants (those with yers) are skipped entirely in standard Interslavic, and are simply rendered as hard consonants, as is the case in most South Slavic languages, and quite often also Czech. They are written in MS Plus and rendered to varying degrees in most Eastern and Western Slavic languages. Current Interslavic notably has some mistakes in this pattern, where the dictionary word for "day" is "dėnj", but the etymological form is " dьnь" - meaning a softened "n" - and yet Interslavic renders the "nj"as though it was iotated, and therefore inherent to all languages. And yet we have "den", "dan" and "dan" in Czech, Shtokavian and Slovenian, respectively, with no softness there - which can be confusing, so the correct form of the word would be "den"/"dėń". Likewise the word "myslь", which Interslavic as of today still renders as "myslj", even though the softness doesn't extend to the South (see Shtokavian "misao", and not "mislj), therefore the form "mysĺ" would be more correct.

So because letters with carons are used for something else in this alphabet, I will not propose the haček letters for this in the dictionary alphabet (though they'd be a perfectly fine addition to MS Plus or expanded Middle Slavic for this very purpose, since they're already used for Proto-Slavic), but instead use digraphs with a ligature arc above to make sure they're treated as a single entity together -- these reference the currently used system in MS and MS Plus. So the system would be:

  • -- n︠j︡, l︠j︡, r︠j︡ for where there is ň, ľ, ř in the Proto-Slavic form, rendered as lj, nj, rj in Interslavic: kon︠j︡, krål︠j︡, mor︠j︡e etc.

  • t́, d́, ń, ĺ, ŕ, ś, ź for when the consonant is softened with the vanished soft yer -- before suffixes and in places where the strong soft yer disappeared via declension: orųd́je, primoŕje, boŕba, bulgaŕsky, pit́je, svat́ba - but also some roots such as kost́, medvěd́, želųd́, ryś, knęź and so on. These will often render in East Slavic (with some exceptions though, such as Belarusian and Ukrainian not having ŕ) and Polish, more rarely in Slovak, even more so in Czech and finally will be mostly or completely absent in South Slavic. Having this distinction could also be a chance for someone to bring back the natural haček soft letters and use them alongside the softened ones in their everyday writing too, even replacing the digraphs in everyday writing (polje --> poľe, morje --> moře, but siĺny still siĺny and boŕba still boŕba).

    υ -- This is basically the modern day Greek letter Upsilon (also known as Ypsilon outside the Anglosphere) -- what it's used for is what I call the "I--based Izhitsa" -- while there has already been a letter for the "u/v--based Izhitsa" ever since the beginning of Jan's Scientific Interslavic, I feel like adding a separate Izhitsa for the "i" sound might be helpful for an automatic flavorizer, since the phonological rules this pertains to are so very different (izhitsa/upsilon having so many phonological uses in Greek/OCS) -- only this time it's for situations where we deal with upsilon's iotacism (υ pronounced like "i"), so its vowel use. An example would be the name Kiril itself -- it comes from Greek Κύριλλος (Kýrillos) -- and while the upsilon and the iota are pronounced identically in Greek today, they have obviously flavorized a bit differently into the Slavic languages, and using upsilon is what would let us handle this flavorization with an algorithm. So for example in the Scientific Interslavic rendering: "Kυril":\ -- the upsilon would become "y" in West Slavic: Kyryl/Kyril -- we could even go as far as take the entire "kυ" clump separately and make it render as "cy" in West Slavic, which would give us Cyryl/Cyril (the 2nd polish "y" unrelated, this is due to the fact that Polish requires the hard "y" after "r").\ -- it would become "i" in all other Slavic families, with "kυ" turning into "ci" in Slovene, "ći" in Shtokavian, and "ki" elsewhere.\ -- It would turn directly to an izhitsa when trying to make an OCS flavorization.\ The rules here would of course have to be evaluated by a trained Slavist with knowledge about the Greek reflexes, though luckily the number of words such as this isn't very high.

    ø -- the ó/ů/ô/і source code letter. This letter deals with one of the most unpredictable, variable and difficult to get right transformations in all Slavic languages, so I doubt the letter on its own would be enough for truly accurate renderings -- multiple extra conditions would have to be coded in, and possibly some extra variations of the letter itself if it ends up being too general. What this letter denotes is the transformation of certain "o" sounds (historically "long o") in Polish, Czech, Slovak and Ukrainian -- those will always be "o" in all other languages. The rules are extremely vague and inconsistent (or at least appear so without study), but the word "moj" is a good example where this reflex applies in all four. So "møj" would render as:\ -- "mój" in Polish (pronounced "muj"),\ -- "můj" in Czech (pronounced "muj" with a long "u" sound)\ -- "môj" in Slovak (pronounced "muoj")\ -- "мій" in Ukrainian (pronounced "mij").\ -- "moj" in all other languages.\ Another common example would be the word "køn︠j︡", which renders as "koń" (archaic "kóń"), "kůň", "kôň" and "кінь", respectively. What's interesting is that this sound shift, fairly consistently disappears and turns back to the simple "o" when the word is declined: konia, koňe, koňa, коня. But the consistency is not perfect, see: "mojego", "mého", "мойого(моєго)", but in Slovak it's "môjho" (but "mojím"). There are other inconsistencies, for example it can seem like the Polish "ó" and Ukrainian "і" are closer than the Czech "ů" and Slovak "ô", as proven in the reflexes of the word "spøsøb" PL/UK: sposób/спосіб, but CZ/SK: způsob/spôsob -- possibly giving a hint as to potential separation of this letter into two. However, there are also cases where Ukrainian is unique, like the word "під", where all the other languages including PL/CZ/SK have simply "pod" -- possibly hinting at an opportunity for tertiary separation or at least a more thorough algorithm. Polish did have "ó" formerly in some places consistent with the reflexes in other languages, but it turned to a simple "o" in the modern language (like the "kóń" vs "koń" example), but I haven't been able to find info about the form "pód", which means it's possibly unique to Ukrainian. Like mentioned at the start, this would be the hardest and most work--intensive flavorization to get to work with high accuracy, but this can prbably be determined by trained Slavists. Overall though the process was fairly similar in all these languages and occurred for similar reasons, so perhaps a single letter to mark it in dictionary writing might be enough, provided a comprehensive enough description of the processes is also given.

    This gives us the following, 84--letter (up from the old 59) scientific alphabet after the changes:

    a ā å b c ç ĉ č d ḓ ḍ d́ d̂ dž e ē ė é ë ȅ e̋ e̊ ĕ ę f þ g g̦ ǥ h ħ i í ī ɨ ɨ̄ ì ı j ȷ k k̦ l l︠j︡ ĺ ł l̄ l̄́ l̯ m n n︠j︡ ń o ȯ œ œ̄ ꭢ ø p r r︠j︡ ŕ r̄ r̄́ s ś š t ṱ t́ u ų ũ v v̑ ṽ υ y ȳ z z̧ ź ž

Now every diacritic fulfils a different function, and consistency is maintained:\ -- the macron denotes length (morphological or syllabic);\ -- the ring denotes an --oro--/--ere-- sequence;\ -- the cedilla denotes the kv--cv/gv--zv shift;\ -- the circumflex signifies the old tj/dj reflexes;\ -- the haček denotes a palato--alveolar or retroflex sibilant;\ -- the circumflex below denotes the vanishing t/d from td/dl sequences;\ -- the dot below indicates de--affrication from /dz/ to /z/\ -- the dot above indicates the strong yer;\ -- the acute denotes softening/palatalization\ -- the breve denotes yat;\ -- the ogonek means historical nasality;\ -- the comma below denotes a consonant shift after declension\ -- the stroke through the letter's body means different flavorization because of foreign origin;\ -- the grave means the infinitive ending;\ -- no diacritic at all means optional morphological lengthening\ -- the diaeresis & other types of double accent denotes the e/ie-->io shift;\ -- the inverted breve below shows the ephenthesis of /l/;\ -- the tilde above means the variable reflex of upsilon/izhitsa in loanwords with the au/eu diphthong as well as reflexes of the preposition "v"\ -- the inverted breve above is to show different reflexes of the "--ov" declensional suffix.\ -- the diagonal strike--through means reflexes of the Slavic hard L in the past tense of masculine singular verbs\ -- The digraph arc denotes inherent softness of a consonant\ -- the middle dash denotes different rules for i/y spelling.\ -- the diagonal slash denotes unpredictable evolutions of the long "o" sound.

Suggested practical applications

There are two main applications this scientific system could have. That said, it should be obvious that this  alphabet is not intended for everyday use, and there would be no encouragement whatsoever for anyone to make active use of it outside of linguistic studies, etymological demonstrations etc.

The first use would be an automatic flavorization tool with a multitude of options -- all varieties of Interslavic, general language groups, down to national languages (this should be made possible with the newly proposed letter inventory). Of course this would take quite a bit of effort to make work, since not only single letters, but also combinations would have to be scripted for converting to something else depending on the language in question. I'm willing to help write down the reflex tables if needed (and if there's interest in this solution).

The second use of the alphabet is something I have proposed before -- an addendum to the fantastic effort that is the Interslavic--Dictionary (https://interslavic--dictionary.com/). Its functionalities are growing by the day thanks to the passion of its authors, and my idea was to incorporate the Scientific Alphabet as an extension to this dictionary in the form of another pop--up category, perhaps something called Word Properties. Early examples of the functionality that I created in MS Paint a while ago can be seen below:

No photo description available.\ Image may contain: text\ No photo description available.\ Image may contain: text\ Image may contain: text

Today the Dictionary has evolved, for example containing popups in the lower right corner for every word:\ \ „Word Properties" could be added as another pop--up in blue text in much the same fashion:\ \ and the word rendered in Scientific Interslavic would then pop up, with each letter explained in detail, with example reflexes in other Slavic languages given (this should be done by someone really knowledgeable -- the text I proposed should be treated just as placeholders until someone who actually knows what they're doing does it properly :D).

This would put Scientific Interslavic in optional, passive use -- optional knowledge for those curious, presented in a way that doesn't impose on the uninterested. The benefits of this solution could be immense for learning not only Interslavic, but the natural languages as well -- knowing the rules and knowing how to use them gives great predictory power when it comes to unknown words and constructions in the living Slavic languages, which mostly tend to be consistent in their reflexes.

Proposed slight change of approach to MS Plus in light of the above changes

Recently, the Commision has rebranded MS Plus to "the etymological alphabet" and started actively discouraging its everyday use, even though the alphabet contains only the letters essential for middle--ground pronunciation and Interslavic--specific  morphological help at best. In fact, MS Plus consists of fewer letters than real--life Slovak, and arguably has better justification for all the letters it has in it than Slovak does. Because of that, it doesn't even begin to scratch the surface when it comes to completeness of etymological information, and therefore makes for a fairly rudimentary "etymological" alphabet at best. It is for example entirely insufficient for reasonable quality flavorizations, even regional, much less national.

That said, it is actually well suited for everyday use due to the many advantages it possesses compared to the mere Standard script -- and discouraging its use seems like a big waste as well.

Due to its middle--ground nature between Standard and Scientific, but leaning towards utility rather than being a source code like Scientific is, I think some minor adjustments should be made to it (or rather, changes reversed for ease of usability):\ -- Since consistency is not as essential as in "pure Scientific", more usable and natural letters could be used without reservation here, borrowed from national languages. So for instance it would serve everyone better if d́, ĺ, t́ were written using the naturally occurring letters, as before: ď, ľ, ť. Same goes for ć/đ (changed in Scientific, but should be preserved in Plus), and arguably the strong yers: ò/è -- they're both easier to write/renderable in more fonts than the dotted equivalents, and look more aesthetically pleasing and natural in a text. It is my strong belief that the possibly unintentional effect of MS Plus previously having letters from pretty much all natural Slavic Latin scripts is not to be underestimated, as it helps create an emotional connection for any Latin writing Slav due to getting to use letters they write every day in their own languages. Forcing a shift of MS Plus towards an artificial, scientific system (with letter + diacritic combinations rather than naturally occurring letters, for instance) deprives us of this advantage, while not offering much in return -- and only needlessly discourages the use of MS Plus. A proper Scientific Interslavic alphabet would remedy this issue -- it would give us a nigh--perfect flavorization potential and the deepest knowledge about any word one could hope to gain, while allowing MS Plus to remain what it used to be -- just a "rich variant of Interslavic", basically. That said, I recognize that the current orthography is already getting traction and backtracking on it may introduce lots of confusion anyway, so I don't feel as strongly about this part right now (November 2021) as during the initial writing of this article.

If these ideas do gain acceptance, I'm willing to do much of the work related to rewriting words in Scientific for derivation to MS Plus and standard MS for the dictionary, and even creating rules for flavorization algorithms. I'm that much of a geek. So let me know what you think!