Multiword Candidates Generation Processes:
SpVar Matcher with Frequency in the Distilled Medline N-gram Set
N-grams matches SpVar pattern is a good sources for multiword candidates. Over 10+ SpVar types were developed to identify spVars from a given corpus.
For example: terms of
- bloodpressure
- blood pressure
- blood-pressure
- tradeoff
- trade off
- trade-off
|
|
are in a corpus and matches the spVar types (SVT_SPACE|SVT_PUNC_DASH) in the spVar model. Thus, they are good candidates for LMWs.
Frequency filter (WC) are added to this list for frequency analysis:
Matcher SpVar: Steps 60-61A (08.MatcherSpVar)
Some candidate is automatically tag [AUTO_YES|AUTO_NO]
Should apply highest frequency strategy
Not as productive as expected, not used after 2016+.
Generated files:
Distilled MEDLINE nGram Set | Candidate Files | Status | Notes
|
---|
2015 |
| Done | Tag [Y|N]
|
2016+ | N/A | Postphone due to limited resources |
|