Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

The SPECIALIST Lexicon

CC Source Model - Co-occurrence in Corpus (MEDLINE)

I. Introduction

Co-occurrence hypothesis is one of the most popular approaches for antonym identification [1989 Charles & Miller, 1995 Fllbaum, 2015 Tesfaye]. In this Co-occurrence in Corpus (CC) model, first, we enhanced co-occurrence patterns from previous researches [Justeson and Katz, 1991] to identify 10 co-occurrence patterns. These patterns are derived from a collection of 1000 antonyms from the internet domain [Lu, 2021]. The MEDLINE n-gram set [Lu 2015] is used as the corpus. These patterns are in the format of [X keyword Y], while keywords include: -and-, -or-, -to-, -versus-, -than-, -vs-, -from-, -nor-, -and/or- and -as well as-. High frequency co-occurrence terms that meet these patterns from the corpus (MEDLINE n-gram set) that are not Lexicon synonyms [Lu 2017], has CUIs, and meet STI rules are retrieved as aPair candidates, such as [above|below|prep], [accept|reject|verb], [sick|well|adj] and [birth|death|noun]. Both frequency in the MEDLINE (word count) and in the keywords (pattern count) are taken into consideration during this process.

II. Design

Two MEDLINE n-grams files are used for this model:

  • 3-gram.2024.30.core: for [X keyword Y], where keywords are: -and-, -or-, -to-, -versus-, -than-, -vs-, -from-, -nor-, -and/or-.
  • 5-gram.2024.30.core: for [X as well as Y]

Derived Pattern Details, please see design documents for details:

Ant-2Ant-2Co-occurrence Examples
normalabnormal
  • 11160|normal and abnormal
  • 2387|normal nor abnormal
  • 1917|normal or abnormal
  • 463|abnormal and normal
  • 385|normal from abnormal
  • 243|normal versus abnormal
  • 159|normal to abnormal
  • 125|abnormal or normal
  • 69|abnormal as well as normal
externalinternal
  • 15160|internal and external
  • 6836|external and internal
  • 1667|internal or external
  • 898|external or internal
  • 184|internal versus external
  • 164|internal as well as external
  • 124|internal to external
  • 122|internal, and external
  • 116|internal and/or external
  • 114|external to internal
...

We observed from above table,

  • Most of these aPairs fall into the collocate patterns of [Ant-1 keyword Ant-2]. Keywords are in the middle of the 3-gram, including “and”, “or”, “versus”, “to”, etc.
  • Some aPairs, such as calm|excited, buyer|seller, are not co-occurring in the MEDLINE n-grams. The reasonable guesses are:
    • the MEDLINE n-gram set does not cover these aPairs. In such case, we suggest applying this co-occurrences model with another corpus to find collocate patterns.
    • These aPairs cannot be derived by collocate model. In such case, we suggest performing more research and focus on the semantics. These types of aPairs are categorized with source of [SN] (semantic in corpus).

III. Implementation

Java source codes are implemented in the directory of Medline:

  • GetAntCandFrom3GramPatMid.java
  • GetAntCandFrom5GramPatMid.java

Algorithm:

  • go through all n-grams (N = 3 or 5) to retrieve antonyms from the normalized (coreterm) 1st and last grams. The middle word(s) are used as keywords.
  • check if middle word(s) match key words
  • check if the normalized 1st and last grams meet the criteria of antonyms:
    • have EUIs (in the Lexicon)
    • single words
    • have the same POS
    • not invalid words for antonym in CC model, such as "the", "a", "which", "not", etc.
    • not synonyms
    • have CUIs
    • have STIs, either same STIs or legal STI pairs
      legal STI was derived from tagged aPair candidates with occurrence above 10 for canonical aPairs. The report is in the file: ${ANTONYM}/${YEAR}/output/Analysis/antCand.data.tag.cuiSti.rpt.
      STI-1STI-2Frequency
      T033|FindingT080|Qualitative Concept38
      T033|FindingT121|Pharmacologic Substance10
      T033|FindingT169|Functional Concept19
      T033|FindingT170|Intellectual Product11
      T033|FindingT184|Sign or Symptom15
      T078|Idea or ConceptT080|Qualitative Concept10
      T080|Qualitative ConceptT081|Quantitative Concept13
      T080|Qualitative ConceptT082|Spatial Concept10
      T080|Qualitative ConceptT121|Pharmacologic Substance10
      T080|Qualitative ConceptT169|Functional Concept37
      T121|Pharmacologic SubstanceT169|Functional Concept10
  • convert to base form (citation form) for aPair candidates

IV. References

  • Walter G. Charles, George A. Miller, Contexts of antonymous adjectives, Applied Psycholinguistics (1989) 10, 357-375
  • Christiane Fellbaum, Co-Occurrence and Antonymmy, International Journal of Lexicography, Vol 8 no 4, 1995 Oxford University Press, 281-303
  • Debela Tesfaye, Carita Paradis, On the use of antonyms and synonyms from a domain perspective, Proceedings of the NetWordS Final Conference, Pisa, March 30-April 1, 2015, 150-154
  • John S. Justeson, Slava M. Katz, Co-occurrences of Antonymous Adjectives and Their Contexts, Computational Linguistics, Vol 17, No 1, Association for Computational Linguistics, 1991, 1-19