A downloadable project

This project was a piece of coursework for my Natural Language Engineering module, in which we were investigating the Distributional Hypothesis; do words which appear in similar contexts tend to have similar meanings?

Natural Language engineering is a fascinating field, as it highlights the inconsistencies and intricacies of written language; how humans can interpret the correct meaning of the words given context with relative ease, whereas computers seem to struggle to accomplish this.

For security reasons, I have hidden the candidate number, as I am unsure whether it should be released or visible to external viewers.

The project used the Reuters finance corpus, delegating a random section for use during the project for uniqueness among students. The words in this data were compared against the WordNet definitions in a variety of ways to gather details, such as the 1000 most frequent words which also have a noun sense in WordNet (as seen in Q2).

Overall, the coursework aimed to investigate the correlation between semantic similarity according to WordNet and distributional similarity with different context window sizes. By plotting a range of context window sizes, against the semantic similarity score (using LIN measure), the correlation between the data (using Spearman's Rank Correlation Coefficient), it was determined that there was no real correlation, due to little alteration between window sizes, and thus for my algorithms for the given data sample, the distributional method was not a good approximation. Discussion of further testing which could be conducted to improve the experiment is present at the end of the conclusion.

This coursework achieved a mark of 95/100

Download

Download
https://colab.research.google.com/drive/1jP_Ej4M7fd7xJzAEaCakiZud-gQxbQB3?usp=sharing
External
Download
NLEassignment2.ipynb 221 kB

Install instructions

Open the file using Google Colab, or redirect to the URL provided to view the page

Leave a comment

Log in with itch.io to leave a comment.