In this exercise you will write a program that hyphenates Finnish text.
Don't worry - you don't have to understand Finnish at all ... it might even be an advantage as you don't carry any preconceptions about how things should be.
The program you are required to write will also demonstrate some ideas of inheritance in action. Make note how polymorphism allows adding new hyphenation rules with little changes to the whole program code.
Base class - abstract class - class
HyphenationRule ← FinnishHyphenationRule ← ConsonantRule
Each of the hyphenation rules is worth ~33p. (100p altogether)
Test coverage max 50 points. Own test vs own code.
Hyphenating Finnish is fairly simple. Most words can be hyphenated based on three simple rules.
In this exercise we will add new functionality for the class RuleBasedHyphenator
which hyphenates any given text based on currently active hyphenation rules.
Altering the rule set is simple and only requires writing new classes that implement the required interface
HyphenationRule
. You should implement three of these classes - one for each rule<.
HyphenationRule
is an interface having only one method to be implemented.next_hyphen(word, previous_hyphen_at)
This method will return for a given
word
the next possible point of hyphenation for this specific rule starting from the given positionprevious_hyphen_at
, which typically is the previous hyphenation point found. If the rule cannot be applied for that position, the method reacts by returning an error code :HyphenationRule.NOT_APPLICABLE
. For example in the word "kiusaus" the consonant rule cannot be applied from position 3 as the first vowel found, "a", is followed by consonants, but the consonants are not followed by any vowels as required by the rule.
FinnishHyphenationRule
is an abstract class which does not provide
an implementation for the method next_hyphen in the HyphenationRule-interface. The class however
provides very useful methods to it's subclasses to use.
In these three classes you should implement the missing next_hyphen-method. The implementation will vary according to which rule you are implementing. Use methods from the super class (FinnishHyphenationRule) wherever needed.
RuleBasedHyphenator does the actual hyphenation. It tries to apply the Hyphenation rules in the order they were added until it founds the rule that is applicable and provides the nearest hyphenation point. Each hyphenation point found is added to a list of integers.
The method next_hyphen (which you will implement in the three classes previously mentioned) will alway get as it's input a single word written in CAPITAL letters. This means that you don't have to worry about whitespace, case, etc.
Hyphenating Finnish mainly follows the following rules:
Consonant rule: If a vowel in a syllable is followed by one or more consonants, and these are followed by vowel(s) still, a syllable boundary (hyphenation point) is right before the last of those consonants.
Examples:
lef-fas-sa ki-vaa kah-del-le:
tra-giik-kaa se-kä hork-ka-ti-lo-ja
greip-pi
on
Implement this rule in the class ConsonantRule
All the hyphens in the example above came from the consonantrule when the rule was applied to the beginning of the word or to any indices following other hyphens. Note that for example for the word "kun" (requesting for the next hyphen from index 0) we can find the first vowel 'u' and the consonants after it, but as there are no vowels after the consonants the rule cannot be applied and the result is HyphenationRule.NOT_APPLICABLE.
Examples:
lu-en+to Aa+si-an kää+pi-ö+puo+lu-eis+ta
For example the word "Aasian" can be used to demonstrate how the rule works. If we request the next hyphen from index 0, the first and second vowels are the same → NOT_APPLICABLE.
Starting from index 2 (letter s) the first wovel and the one following it (i, a) are not the same, do not form a diphthong and the latter of the two is not i, so a hyphenation point is found at index 4.
Implement this rule in the class VowelRule
*) NOTE! Do not try to look for a new syllable if the rule failed with the first one found. All the rules should be applied with the given syllable (given starting index) only.Examples:
raa-is+tu+nut maa-il+ma, liu-ot+ti+met lau-an+tai+na tau-ot+ta leu-an al+la
Implement this rule in the class DiphthongRule
*) NOTE! Do not try to look for a new syllable if the rule failed with the first one found. All the rules should be applied with the given syllable (given starting index) only.The exceptions are typically loan words from other languages or compound words. For example the word "demokratia" would have a boundary (by the consonant rule) in the wrong position (demok - ratia). Using the same rule on the compound word "kaivosaukko" would create aboundary between o and s (kaivo - saukko), which is probably not the idea. (The meaning changes, because it becomes a new compound word)
HINT! In the class FinnishHyphenationRule there is a group of helper methods that make solving this exercise a lot easier.
HINT! Notice that all the rules require you to find the first vowel. That's a good starting point...
HINT! Note that you don't have to know Finnish to find test words. Words such as "zxzxeuook" or "e/auml;io" can be used to test the methods as well.
This exercise contains so many classes that it is easier to download them as one zipped eclipse project.
File->Import->Existing projects into workspace->Archive fileBrowse for this package and press Finish.
A rule-based hyphenation engine do not return this
An upper class describing hyphenation rules. do not return this
An abstract upper class for the classes you will implement. Provides some methods for handling Finnish. do not return this
The consonant rule RETURN THIS
The vowel rule RETURN THIS
The diphthong rule RETURN THIS
Test class for this exercise RETURN THIS
Submit the source code for the classes you implemented and for the test class Test in test.py.
You can assume that all the classes that you were told not to submit are present.