First Steps

(See Digital, Interactive, and Topical Galilean Aramaic Dictionary for the background to this post.)

So how can I get this off the ground?

Figuring out how to tackle this project proved an interesting series of events. When making a general, practical dictionary for people to learn important words, the first question was, “What words does one choose?” The obvious answer seemed to be, “The words that are of the highest frequency in the corpus.” These would be the words that a student would come across the most, and therefore be of most immediate use.

So a few years back I collated the Concordance listings on the Comprehensive Aramaic Lexicon for all of the texts listed in the Palestinian Aramaic corpus.

The Concordance mode merely scans over the requested lemma file (the way that the CAL internally represents documents with lexical tagging), tallies up each instance of the word, and then sorts them in alphabetical order. If one simply collates each of these generated concordances (some 30 or so documents for JPA) and sorts them by frequency, you’re left with a list of nearly 9,400 words for the corpus in order of “popularity.” (I’ll probably post the full frequency list on the dictionary website later.)

Thousands of words are great for print dictionaries, but for a visual dictionary, it was a bit much. The distribution was also extremely skew (as it is for virtually all languages) with many words up front having huge attestation, trailing off into a very long tail of rarely used words, finally ending with a long line of singletons.

Attestation: TOTAL Number:
≥1000 65
≥100 and <1000 469
≥10 and <100 1960
=1 3229

N = 9379

As such, the list needed to be pruned back a bit. I decided to adopt the following two criteria:

  1. The first set of words must be nouns. (This is a visual dictionary, and nouns are easier to illustrate. All verbs, adverbs, prepositions, etc. were tossed from the list.)
  2. An individual word needs to appear at least 5 times in the corpus. (This cut off the aforementioned long tail of some 5784 sparsely attested words.)

Between those two criteria, it brought the list from many thousands of words, down to a “mere” 1,700. This was still a bit much for the initial dictionary in the amount of time I have to complete it.

Additionally, among those ~1,700 words, a large number of them were still tricky to illustrate because they were:

  • Abstract (like “knowledge” or “name” or “obligation”), or
  • Religious jargon (like “Mishnah” or “Torah” etc.), or
  • Otherwise better suited to a separate unit or set in context with its other members (numbers, family, etc.)

A single image slide could not provide sufficient context for these words, so pulling them all out, I was left with a list of about 600 “easily illustratable” words.

This is doable!

The Next Steps:

The List

My next step from here is going to be formatting this list in a readable form for the project’s website. When I start implementing the dataset, this will serve as the “checklist” towards completion and also aid with any crowd sourcing efforts.

Each word needs to have its gloss and orthography checked against the Galilean corpus (sometimes lemma forms diverge, since most lemmas are based off of Eastern Aramaic forms – I’ll put together a list of links), and be broken down into syllable and letter chunks:

(Mockup of multiple spoken hover states. Highlighting, transliteration, and sound would happen in real time depending on where the user hovers the mouse or – if on a mobile device – taps on the word.)

Each word also needs to have its audio recorded.

Once the list is posted, I’ll be sending out a request for help finding images. The images need to be public domain, or otherwise have their copyright released in such a way that they can be used for educational purposes. When this project is done, I’m going to make the source code available for other educators so that they can build their own datasets for different languages, and I want the images to be part of that.

The Test Set

While the full list percolates, I’ll need to compile a small subset of the list – perhaps just a few dozen words – to be the test set. This is what I will use to check to see how the audio will work and to later use as a “dummy” set to implement the interface.

The Audio Chunks

This is going to be, perhaps, the most difficult part.

I’ll need to compile a list of all possible single letter-vowel and syllable combinations and record audio for each one, and then develop some schema to store them so that the software can make use of them.

Luckily, due to the restricted vowel inventory of Galilean, this is a much more attainable task than if it were another dialect. For letter-vowel pairs, it’s roughly 120 combinations (and since that’s doable, that’s where I’ll start). With full syllables, however, I may be looking at 2,500 possible combinations total. Ugh… First things first, though.

The Interface

Finally, with the test set in hand, I’ll start working on the actual code driving the visual interface based off of the initial mockups. This, I anticipate, is going to be one of the easier and fun bits to get done, but when I do sit down to it I’m going to post another update about the design process.

User Testing

This is where everyone else comes in. Once I have a prototype up and running, I need you – yes YOU, reader on the Internet – to help me test it, break it, and reform it stronger. With every successive wave of testing, it will become a better tool.

Wish me luck. 🙂


4 thoughts on “First Steps

  1. Steve C.,
    It sounds like you have a valuable project with the Digital, Interactive, and Topical Galilean Aramaic Dictionary . I will be interested to know how one makes the distinction between Galilean Aramaic and Babylonian Aramaic in the lexicon. I am trying to confirm or deny the theory that the Gospels or at least the synoptic Gospels are Galilean Aramaic while the Pauline Epistles maybe Babylonian Aramaic in light of Saul’s upbringing and pre-Christian activities. What started this question was where Jesus referred to Deuteronomy 6:4-5, specifying both heart and mind while both can be understood in one word in the Hebrew and perhaps Babylonian Aramaic.
    Steve M.

    1. Steve M.,

      As a matter of “what’s Galilean and what’s not” I’ve gone over this at some length over at .

      The difficulty of sorting through what is and isn’t Galilean on the CAL is that the CAL’s paradigm for lemmas is based on Eastern Aramaic (which is how most lexicographers do it because Eastern Aramaic languages are the most prolific, but it can cause some problems when Western Aramaic languages need to be “shoehorned in” to fit).

      Luckily, the CAL has a bunch of dialect data and citations in its listings. It even links to an early digital version of Sokoloff’s Dictionary which is presently the most comprehensive work on JPA/Galilean there is.

      Where it would make a lot of sense for Paul’s Aramaic *context* to be Eastern, and there are plenty of places one could argue where it shows through in his writings, it’s fairly sure that his letters were penned in Greek. I think that Jack’s comment below has outlined the sense of where the linguistic landscape currently lies.


      1. Steve C,
        Thanks to both you and Jack for your kind responses.
        I see a clear focus on an accurate rendering of the Galilean Aramaic words of Yeshua/Jesus into English.
        The query that I am working is, Why Yeshua/Jesus render Deuteronomy 6:5 from the Hebrew with the addition of the Aramaic word for “Mind”? In the Navy I served a tour in England, so I am aware of how a given word in English can be used differently in different places. I note that the Hebrew word translated into English as “Heart” in Deuteronomy 6:5 could also be translated as “mind” or “thought” (consider Deuteronomy 4:39, Deuteronomy 7:17, Deuteronomy 28:28, and Deuteronomy 29:18 in the Septuagint). My theory is that the corresponding Galilean Aramaic word that is translated into English as “Heart” in the Gospel accounts (Matthew 22:37, Mark 12:30, and Luke 10:27) may not as easily be used to mean “Mind”. If that were the case, that could be the reason for Yeshua/Jesus explicitly adding the Galilean Aramaic word for “Mind” instead of a simple interpretive focus on the Greek culture. I am disinclined to suggest that Yeshua/Jesus was adding to revelation in this case because He was not contested on this point by the scholars noted in the Gospel passages. Perhaps Sokoloff’s Dictionary will aid.
        Steve M

  2. Steve. the only words in the Gospels that were in Aramaic were the words of Yeshua/Jesus. The auMatthew shows indications of Greek-speaking only, probably using a Greek translation of sayings material. Mark’s first language was Aramaic and likely used his own Aramaic notebook of Jesus’ sayings and translated them to Greek and wrote the narrative in Greek. auLuke likely knew Aramaic and translates sayings material to Greek and wrote the narrative in Greek. The author of John (John of Ephesus) used, IMO, a smaller Aramaic “proto-Gospel” and translated “Proto-John” to Greek and fleshed his larger Gospel around it. Fortunately, the Aramaic “proto-John” is easily taken from the larger Greek Gospel. My point is there should be no need to translate originally Greek narratives to Aramaic.

Leave a Reply