TT Source Model - Training and Test Set of Antonym Collection
I. Introduction
A collection of antonym pairs (aPairs) from various sources on the internet was established to find the characteristics and patterns of antonyms. Some sources have duplicated aPairs. For example, aPairs [absence|presence] and [presence|absence] are considered as the same aPair and counted as 1 unique aPair. In addition, antonyms in aPairs are lowercased and single word only. Multiword aPairs, such as [already|not yet] or [none of|a lot of], are removed from the collection. The source web sites, the number of unique aPairs and URLs of this training and test set are shown in Table 1.
ID | Source | No of unique aPairs |
---|---|---|
1 | Sherwood School | 449 |
2 | Proof Reading Services | 418 |
3 | Enchanted Learning | 324 |
4 | 7ESL | 339 |
5 | English Grammar Here | 321 |
6 | Synonyms Antonyms | 301 |
7 | SLP Lesson Plans | 251 |
8 | ESL Forums | 198 |
9 | My English Tutors | 170 |
10 | Love To Know | 167 |
11 | Your Dictionary | 159 |
12 | Classic Thesaurus | 100 |
13 | Power Thesaurus | 100 |
14 | Smart Words | 9 |
II. Design
A program is developed to:
Please see design documents for more details.
III. Implementation
Java source codes are implemented in the directory of TtSet:
Algorithm:
Antonym sources are identified by computer programs (AntObj.java) for collected aPairs as follows:
The algorithm for identifying a SD (suffixD) aPair is described as follows:
The algorithm for identifying a PD (prefixD) aPair is described as follows:
Co-occurrences in a Corpus, our first attempt is to use the terms co-occurring in MEDLINE. These are aPairs retrieved by co-occurring patterns from a corpus.
Semantic opposite in corpora. These are aPairs retrieved from a semantic network. If an aPair does not belong to the above sources, it is assigned as SN (semantic network). Patterns are yet to be developed.