Category Archives: Neat Stuff

First Steps

(See Digital, Interactive, and Topical Galilean Aramaic Dictionary for the background to this post.)

So how can I get this off the ground?

Figuring out how to tackle this project proved an interesting series of events. When making a general, practical dictionary for people to learn important words, the first question was, “What words does one choose?” The obvious answer seemed to be, “The words that are of the highest frequency in the corpus.” These would be the words that a student would come across the most, and therefore be of most immediate use.

So a few years back I collated the Concordance listings on the Comprehensive Aramaic Lexicon for all of the texts listed in the Palestinian Aramaic corpus.

The Concordance mode merely scans over the requested lemma file (the way that the CAL internally represents documents with lexical tagging), tallies up each instance of the word, and then sorts them in alphabetical order. If one simply collates each of these generated concordances (some 30 or so documents for JPA) and sorts them by frequency, you’re left with a list of nearly 9,400 words for the corpus in order of “popularity.” (I’ll probably post the full frequency list on the dictionary website later.)

Thousands of words are great for print dictionaries, but for a visual dictionary, it was a bit much. The distribution was also extremely skew (as it is for virtually all languages) with many words up front having huge attestation, trailing off into a very long tail of rarely used words, finally ending with a long line of singletons.

Attestation: TOTAL Number:
≥1000 65
≥100 and <1000 469
≥10 and <100 1960
=1 3229

N = 9379

As such, the list needed to be pruned back a bit. I decided to adopt the following two criteria:

  1. The first set of words must be nouns. (This is a visual dictionary, and nouns are easier to illustrate. All verbs, adverbs, prepositions, etc. were tossed from the list.)
  2. An individual word needs to appear at least 5 times in the corpus. (This cut off the aforementioned long tail of some 5784 sparsely attested words.)

Between those two criteria, it brought the list from many thousands of words, down to a “mere” 1,700. This was still a bit much for the initial dictionary in the amount of time I have to complete it.

Additionally, among those ~1,700 words, a large number of them were still tricky to illustrate because they were:

  • Abstract (like “knowledge” or “name” or “obligation”), or
  • Religious jargon (like “Mishnah” or “Torah” etc.), or
  • Otherwise better suited to a separate unit or set in context with its other members (numbers, family, etc.)

A single image slide could not provide sufficient context for these words, so pulling them all out, I was left with a list of about 600 “easily illustratable” words.

This is doable!

The Next Steps:

The List

My next step from here is going to be formatting this list in a readable form for the project’s website. When I start implementing the dataset, this will serve as the “checklist” towards completion and also aid with any crowd sourcing efforts.

Each word needs to have its gloss and orthography checked against the Galilean corpus (sometimes lemma forms diverge, since most lemmas are based off of Eastern Aramaic forms – I’ll put together a list of links), and be broken down into syllable and letter chunks:

(Mockup of multiple spoken hover states. Highlighting, transliteration, and sound would happen in real time depending on where the user hovers the mouse or – if on a mobile device – taps on the word.)

Each word also needs to have its audio recorded.

Once the list is posted, I’ll be sending out a request for help finding images. The images need to be public domain, or otherwise have their copyright released in such a way that they can be used for educational purposes. When this project is done, I’m going to make the source code available for other educators so that they can build their own datasets for different languages, and I want the images to be part of that.

The Test Set

While the full list percolates, I’ll need to compile a small subset of the list – perhaps just a few dozen words – to be the test set. This is what I will use to check to see how the audio will work and to later use as a “dummy” set to implement the interface.

The Audio Chunks

This is going to be, perhaps, the most difficult part.

I’ll need to compile a list of all possible single letter-vowel and syllable combinations and record audio for each one, and then develop some schema to store them so that the software can make use of them.

Luckily, due to the restricted vowel inventory of Galilean, this is a much more attainable task than if it were another dialect. For letter-vowel pairs, it’s roughly 120 combinations (and since that’s doable, that’s where I’ll start). With full syllables, however, I may be looking at 2,500 possible combinations total. Ugh… First things first, though.

The Interface

Finally, with the test set in hand, I’ll start working on the actual code driving the visual interface based off of the initial mockups. This, I anticipate, is going to be one of the easier and fun bits to get done, but when I do sit down to it I’m going to post another update about the design process.

User Testing

This is where everyone else comes in. Once I have a prototype up and running, I need you – yes YOU, reader on the Internet – to help me test it, break it, and reform it stronger. With every successive wave of testing, it will become a better tool.

Wish me luck. 🙂


Digital, Interactive, and Topical Galilean Aramaic Dictionary

So, I was alluding to a Faculty Research Grant over here at RVCC that I applied for back in February, and before the summer I got the good news that I was approved! 🙂

What is it for? A Digital, Interactive, and Topical Galilean Aramaic Dictionary project that I will be constructing over the course of the summer. This post is serving to motivate me to get it done.

Here’s an excerpt from the proposal that was accepted:

Faculty Research Grant Proposal:
Digital, Interactive, & Topical Galilean Aramaic Dictionary
Steve Caruso, MLIS – Computer Science Department

Aramaic is a family of languages that is part of the Northwest Semitic group with a written history that stretches back over 3,000 years and is related to Akkadian, Ugaritic, Amharic, Hebrew, and Arabic. Among other things, it has served as the language of the ancient Aramean kings, the official language of the western half of Darius I’s empire, one of the languages that helped spread Buddhism under Ashoka, and has strong influences upon both the writing system and vocabulary of Classical Arabic.

A Galilean Aramaic inscription in Kursi, near the Sea of Galilee. | Photo credit: Jennifer Munro

Galilean Aramaic, a Western Aramaic language, is of special importance within both Judaism where it was the language of the Jerusalem Talmud (and a large body of other of Rabbinic works) and within Christianity as it was the everyday language of Jesus of Nazareth and his earliest followers. Despite that, Galilean has proven to be one of the more obscure and misunderstood dialects due to systemic – albeit well-intentioned – corruption to its corpus over the centuries, involving the layering of Eastern scribal “corrections” away from genuine Western dialect features. To this day there is no easily accessible grammar[1] or fully articulated syntax, and due to the academic predisposition towards viewing Aramaic languages through an Eastern Aramaic lens, assessing vocabulary with appropriate orthographical and dialectical considerations has proven difficult.[2]

It is that last problem that my proposal for a Faculty Research Grant seeks to remediate. Over the course of the last 10 years or so, I have been compiling a topical lexical reference of the Galilean dialect comprising all words that appear in the corpus over five times with the intention of building a web-based, interactive dictionary. It will serve as both a learning tool as well as a reference work for both academics and laymen grappling with the dialect.

(Mockup of a flashcard screen, to be implemented in HTML5/Canvas and/or Pixi.js.)
(Mockup of multiple spoken hover states. Highlighting, transliteration, and sound would happen in real time depending on where the user hovers the mouse or – if on a mobile device – taps on the word.)

The system that I use seeks to address difficulties that have been handled poorly in other works, including a more appropriate vocalization system (the early 5-vowel “Palestinian” vocalization system from antiquity, rather than the more expansive Tiberian or Babylonian systems which do not match Galilean phonology), and more genuinely Galilean/Western Aramaic orthography.



[1] One reliable grammar by Michael Sokoloff is in Hebrew and uses a standardized orthography, where the other reliable grammar by Steven Fassberg is not for the faint of heart (it is far too technically-oriented for laymen and –arguably – even some experts). All other grammars published to date (Dalman, Stevenson, Levias, etc.) are based upon a corrupt or inconsistent corpus.

[2] For a fuller handling of the problems facing the Galilean dialect, see E.Y. Kutcher’s “Studies in Galilean Aramaic” (Bar-Illan University, 1976).

So you can see why I’m a bit excited.

I’ve already set up some webspace at RVCC for it, which you can find here:

Topical Galilean Aramaic Dictionary

As you can see, it’s very sparse at the moment, but I’ll be posting updates and taking feedback right here on

If anyone would like to help out with data entry, sourcing images, or testing, feel free to email me. There’s plenty of work to be done before the summer is out.


My God, my God, why have you forsaken me?

Jesus on the Cross

Most churches do a Passion Play this time of year, re-enacting the final moments of Jesus up to and including the crucifixion. Most of these Passion Plays tend to include Jesus’ final words as recorded in Matthew and Luke which appear in most Bibles transliterated as:

“Eloi! Eloi! Lama sabachthani?”

“How the heck do you pronounce *that*?” I am asked often enough. “Eh-loy eh-loy llama sab-ach!-thane-y?”

And my answer is: You don’t.

In truth, this phrase has been subject to a game of telephone, which started in Aramaic and twisted its way through Greek, and some German spelling conventions, before landing in English.

This phrase is an Aramaic translation of the beginning of Psalm 22, “My God, my God, why hast thou forsaken me? Why art thou so far from helping me, from the words of my groaning?”

As we can see from extant translations in other Aramaic dialects, in Jesus’ native Galilean Aramaic, it was most likely rendered:

אלהי אלהי למה שבקתני
əlahí əlahí ləmáh šəvaqtáni

(The funny upside-down e signifies a shewa, a vowel kind of like the a in “above.”)

When the Gospel writers were compiling their work in Greek, they ran into some interesting problems. Mainly that the Greek writing system had no way to express some of these sounds. It ended up with this (or something like it, as there is some variation from manuscript to manuscript):

ελοι ελοι λαμα σαβαχθανι
elü elü lama saḇaḥṯani

e-loo e-loo lema savakhthani

  1. In Greek, there was not a sufficient 1 to 1 relationship with Aramaic vowels. Galilean’s ə (shewa) and its open vowel a (patah) were under many circumstances differentiated solely by emphasis and were slightly colored depending upon what sounds fell nearby. In trying to approximate them, the Greek scribe chose what sounded the closest based upon Greek vocalization.
  2. The Greek alphabet has no way to indicate an “h” sound in the middle of a word, only at the beginning. So the “h” sounds  in əlahi disappeared, and there was an unintended consequence: The two letters ο (omicron) and ι (iota) when placed together formed a diphthong, similar to the nasalized eu in French. In truth, if the diphthong were broken and the two vowels spoken separately with an “h” in the middle, they are very good approximations to the original.
  3. There was also no way to express an sh sound (above š) so it was replaced with what was closest: σ (sigma, an “s” sound).
  4. There was no “q” sound, which in Aramaic is a guttural “k” in the very back of the throat. It was replaced with χ (chi, a sound like clearing your throat).
  5. And finally, the particular quality of the t was closer to their θ (theta) than to their τ (tau), so it was replaced with the former, softer sound.

Now when the Bible was translated into English, it went through yet another transliteration… but this time from the Greek. It looked (for the most part) like this:

Eloi, Eloi! Lama sabachthani?

How did we arrive at this from the Greek? Greek transliteration into English made use of the following conventions:

  1. Again, Greek vowels aren’t at all 1:1 with English vowels — they represented different sounds — but their cognates in transliteration were very well established.
    • ε and η → e, 
    • ο and ω→o,  
    • ι→i,
    • α→a,  
    • υ→y or u, 
    • etc. 
  2. The use of these transliterations actually broke up the οι diphthong in reading — so that was a step back in the right direction.
  3. The letter χ (ḥ, chi) is, like in German transliteration or Scottish, rendered as “ch,” as that digraph ch in makes a similar sound.
  4. The letter θ (theta) is transliterated as “th” as that’s the closest sound in English, although the quality of it is not nearly as breathy.

So there you have it.


Getting a Better Quantitative Grip on Galilean Verb Forms


I’m trying to get a better idea of which verbal stems/forms are used for each root in the Galilean grammar I am working on. Further down the list from this screenshot, it’s curious how certain rare forms (such as Poel, Palel, and various Quadraliterals like Palpel, etc.) show up as the base form for common roots, or how roots common under one stem in one dialect are just as common in Galilean, but use a different stem.