Exercise 2.1 - Hyphenating Finnish (150p)

Introduction

In this exercise you will write a program that hyphenates Finnish text.

Don't worry - you don't have to understand Finnish at all ... it might even be an advantage as you don't carry any preconceptions about how things should be.

The program you are required to write will also demonstrate some ideas of inheritance in action. Make note how polymorphism allows adding new hyphenation rules with little changes to the whole program code.

Learning goals and reminders

  • Simple String processing.
  • Writing code that uses inheritance. (although we merely extend 3 classes)
    Base class - abstract class - class
    HyphenationRule ← FinnishHyphenationRule ← ConsonantRule

Points (150p altogether )

Each of the hyphenation rules is worth ~33p. (100p altogether)
Test coverage max 50 points. Own test vs own code.


Implementation instructions

Hyphenating Finnish is fairly simple. Most words can be hyphenated based on three simple rules.

In this exercise we will add new functionality for the class RuleBasedHyphenator which hyphenates any given text based on currently active hyphenation rules. Altering the rule set is simple and only requires writing new classes that implement the required interface HyphenationRule. You should implement three of these classes - one for each rule<.

HyphenationRule

HyphenationRule is an interface having only one method to be implemented.

next_hyphen(word, previous_hyphen_at)

This method will return for a given word the next possible point of hyphenation for this specific rule starting from the given position previous_hyphen_at, which typically is the previous hyphenation point found. If the rule cannot be applied for that position, the method reacts by returning an error code : HyphenationRule.NOT_APPLICABLE. For example in the word "kiusaus" the consonant rule cannot be applied from position 3 as the first vowel found, "a", is followed by consonants, but the consonants are not followed by any vowels as required by the rule.

The returned hyphenation point is the index of the letter following the hyphen. For the word "kissa" these points would be at places 0, 3 and 5. /kis/sa/

FinnishHyphenationRule

FinnishHyphenationRule is an abstract class which does not provide an implementation for the method next_hyphen in the HyphenationRule-interface. The class however provides very useful methods to it's subclasses to use.

ConsonantRule, VowelRule ja DiphthongRule (Return this)

In these three classes you should implement the missing next_hyphen-method. The implementation will vary according to which rule you are implementing. Use methods from the super class (FinnishHyphenationRule) wherever needed.

RuleBasedHyphenator

RuleBasedHyphenator does the actual hyphenation. It tries to apply the Hyphenation rules in the order they were added until it founds the rule that is applicable and provides the nearest hyphenation point. Each hyphenation point found is added to a list of integers.

The method next_hyphen (which you will implement in the three classes previously mentioned) will alway get as it's input a single word written in CAPITAL letters. This means that you don't have to worry about whitespace, case, etc.

Hyphenation Rules

Hyphenating Finnish mainly follows the following rules:

Consonant rule: If a vowel in a syllable is followed by one or more consonants, and these are followed by vowel(s) still, a syllable boundary (hyphenation point) is right before the last of those consonants.

Examples:

lef-fas-sa ki-vaa kah-del-le:
tra-giik-kaa se-kä hork-ka-ti-lo-ja
greip-pi
on

Implement this rule in the class ConsonantRule

All the hyphens in the example above came from the consonantrule when the rule was applied to the beginning of the word or to any indices following other hyphens. Note that for example for the word "kun" (requesting for the next hyphen from index 0) we can find the first vowel 'u' and the consonants after it, but as there are no vowels after the consonants the rule cannot be applied and the result is HyphenationRule.NOT_APPLICABLE.
Vowel rule: If the first vowel(* in a syllable is followed by another vowel, they have a boundary between them, unless...
a) The two vowels are the same (a long vowel).
b) The latter of the two vowels is "i" (a diphthong ending with i).
c) The vowel pair is one of the following: au, eu, ey, ie, iu, ou, uo, yö, äy tai öy (the rest of the diphthongs).

Examples:

"-" the vowel rule has to suggest this hyphen to be placed (starting position after the previous hyphen or at the beginning of the word)
"+" the vowel rule should return NOT_APPLICABLE if it is applied to the previous hyphenation point. (that is - the rule can never find the positions marked by +, but instead the rule would give up.)

lu-en+to Aa+si-an kää+pi-ö+puo+lu-eis+ta
For example the word "Aasian" can be used to demonstrate how the rule works. If we request the next hyphen from index 0, the first and second vowels are the same → NOT_APPLICABLE.
Starting from index 2 (letter s) the first wovel and the one following it (i, a) are not the same, do not form a diphthong and the latter of the two is not i, so a hyphenation point is found at index 4.

Implement this rule in the class VowelRule

*) NOTE! Do not try to look for a new syllable if the rule failed with the first one found. All the rules should be applied with the given syllable (given starting index) only.
Diphthong rule: If the first two vowels in the syllable form a dipthong or a long vowel and either is followed by one vowel still, there is a hyphenation point after the diphthong/long vowel.

Examples:

"-" the diphthong rule should suggest this hyphen to be placed.
"+" The diphthong should not suggest this hyphen but return NOT_APPLICABLE when applied to the preceeding syllable.

raa-is+tu+nut maa-il+ma, liu-ot+ti+met lau-an+tai+na tau-ot+ta leu-an al+la

Implement this rule in the class DiphthongRule

*) NOTE! Do not try to look for a new syllable if the rule failed with the first one found. All the rules should be applied with the given syllable (given starting index) only.
Exceptions: There are exceptions, which you just have to know or guess. (not required here)

The exceptions are typically loan words from other languages or compound words. For example the word "demokratia" would have a boundary (by the consonant rule) in the wrong position (demok - ratia). Using the same rule on the compound word "kaivosaukko" would create aboundary between o and s (kaivo - saukko), which is probably not the idea. (The meaning changes, because it becomes a new compound word)


Hints

HINT! In the class FinnishHyphenationRule there is a group of helper methods that make solving this exercise a lot easier.

HINT! Notice that all the rules require you to find the first vowel. That's a good starting point...

HINT! Note that you don't have to know Finnish to find test words. Words such as "zxzxeuook" or "e/auml;io" can be used to test the methods as well.

Code Templates

This exercise contains so many classes that it is easier to download them as one zipped eclipse project.

  1. hyphenation.zip
    A zipped archive containing everything you need. You can import this as an eclipse project by selecting
    File->Import->Existing projects into workspace->Archive file
    Browse for this package and press Finish.

    If you want you can unzip it yourself or download the files separately. (below)

  2. Files in the package described above.

Instructions

Submit the source code for the classes you implemented and for the test class Test in test.py.

You can assume that all the classes that you were told not to submit are present.