Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.
TT Source Model - Training and Test Set of Antonym Collection
I. Introduction
A collection of antonym pairs (aPairs) from various sources on the internet was established to find the characteristics and patterns of antonyms. Some sources have duplicated aPairs. For example, aPairs [absence|presence] and [presence|absence] are considered as the same aPair and counted as 1 unique aPair. In addition, antonyms in aPairs are lowercased and single word only. Multiword aPairs, such as [already|not yet] or [none of|a lot of], are removed from the collection. The source web sites, the number of unique aPairs and URLs of this training and test set are shown in Table 1.
ID | Source | No of unique aPairs |
---|---|---|
1 | Sherwood School | 449 |
2 | Proof Reading Services | 418 |
3 | Enchanted Learning | 324 |
4 | 7ESL | 339 |
5 | English Grammar Here | 321 |
6 | Synonyms Antonyms | 301 |
7 | SLP Lesson Plans | 251 |
8 | ESL Forums | 198 |
9 | My English Tutors | 170 |
10 | Love To Know | 167 |
11 | Your Dictionary | 159 |
12 | Classic Thesaurus | 100 |
13 | Power Thesaurus | 100 |
14 | Smart Words | 9 |
II. Design
A program is developed to:
Please see design documents for more details.
III. Implementation
Java source codes are implemented in the directory of TtSet:
Algorithm:
Antonym sources are identified by computer programs (AntObj.java) for collected aPairs as follows:
The algorithm for identifying a SD (suffixD) aPair is described as follows:
The algorithm for identifying a PD (prefixD) aPair is described as follows:
Co-occurrences in a Corpus, our first attempt is to use the terms co-occurring in MEDLINE. These are aPairs retrieved by co-occurring patterns from a corpus.
Semantic opposite in corpora. These are aPairs retrieved from a semantic network. If an aPair does not belong to the above sources, it is assigned as SN (semantic network). Patterns are yet to be developed.