Skip to content

Free Stop Word Lists in 23 Languages

Stop words or stopwords are used in Natural Language Processing (NLP) to eliminate words that bear no content or relevant semantics. Search engines use stop words to improve the search queries. Google’s FAQ gives a short explanation here. A stop word list consists mostly of some basic combination of letters and numbers as well as [...]

Information Mapping Project (INFOMAP)

The INFOMAP project is an older but nevertheless interesting introduction into semantic vector space models. The related software is freely available. It uses a combination of approaches but mostly relies on Schütze’s Automatic word sense discrimination work. However, it does not use context vectors and concentrates on a SVD compressed HAL matrix.

Hyperspace Analogue to Language (HAL) Introduction

Also known as semantic memory it was developed by Kevin Lund and Curt Burgress from the University of California, Riverside, California. You can download the corresponding paper, Producing high-dimensional semantic spaces from lexical co-occurrence, in PDF format.
The basic premise the work relies on is that words with similar meaning repeatedly occur closely (also known as [...]