SEMILAR: A Semantic Similarity Toolkit

Referring SEMILAR

Rus, V., Lintean, M., Banjade, R., Niraula, N., and Stefanescu, D. (2013). SEMILAR: The Semantic Similarity Toolkit. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, August 4-9, 2013, Sofia, Bulgaria. Available here

Referring the SEMILAR Corpus and SEMILAT Tool

Vasile Rus, Mihai Lintean, Cristian Moldovan, William Baggett, Nobal Niraula, Brent Morgan, The SIMILAR Corpus: A Resource to Foster the Qualitative Understanding of Semantic Similarity of Texts, In Semantic Relations II: Enhancing Resources and Applications, The 8th Language Resources and Evaluation Conference (LREC 2012), May 23-25, Instanbul, Turkey.

Extended Reference section for SEMILAR API

Rus, Vasile, Nobal Niraula, and Rajendra Banjade. "Similarity Measures Based on Latent Dirichlet Allocation." Computational Linguistics and Intelligent Text Processing. Springer Berlin Heidelberg, 2013. 459-470.

Rus, Vasile, Lintean Mihai. “A Comparison of Greedy and Optimal Assessment of Natural Language Student Input Using Word-to-Word Similarity Metrics” Pedersen, T., Patwardhan, S., and Michelizzi, J. (2004). WordNet::Similarity -Measuring the Relatedness of Concepts, In the Proceedings of the Nineteenth National Conference on Artificial Intelligence (AAAI-04), pp. 1024-1025, July 25-29, 2004, San Jose, CA (Intelligent Systems Demonstration).

Xuan-Hieu Phan, Le-Minh Nguyen, and Susumu Horiguchi. Learning to Classify Short and Sparse Text & Web with Hidden Topics from Large-scale Data Collections. In Proc. of The 17th International World Wide Web Conference (WWW 2008), pp.91-100, April 2008, Beijing, China.

Corley, C. and Mihalcea, R. (2005). Measuring the semantic similarity of texts. In Proceedings of ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment. [1, 15, 34].

Michael Denkowski and Alon Lavie, "Meteor 1.3: Automatic Metric for Reliable Optimization and Evaluation of Machine Translation Systems", Proceedings of the EMNLP 2011 Workshop on Statistical Machine Translation, 2011.

Papineni, Kishore, et al. "BLEU: a method for automatic evaluation of machine translation." Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, 2002.

Landauer, Thomas K., Peter W. Foltz, and Darrell Laham. "An introduction to latent semantic analysis." Discourse processes 25.2-3 (1998): 259-284.

Blei, David M., Andrew Y. Ng, and Michael I. Jordan. "Latent dirichlet allocation." the Journal of machine Learning research 3 (2003): 993-1022.

Gabrilovich, Evgeniy, and Shaul Markovitch. "Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis." IJCAI. Vol. 7. 2007.

Church, Kenneth Ward, and Patrick Hanks. "Word association norms, mutual information, and lexicography." Computational linguistics 16.1 (1990): 22-29.

Related References to Semantic Similarity Assessment

Androutsopoulos, Ion and Prodromos Malakasiotis. 2010. A survey of paraphrasing and textual entailment methods. Journal of Artificial Intelligence Research, 38:135-187.

Banerjee, S. and T. Pedersen. 2003. Extended gloss overlaps as a measure of semantic relatedness. In Proceedings of the 18th International Joint Conference on Artificial Intelligence, pages 805-810.

Barzilay, Regina and Lillian Lee. 2003. Learning to paraphrase: An unsupervised approach using multiple-sequence alignment. In HLT-NAACL 2003: Main Proceedings, pages 16-23.

Blei, David M., Andrew Y. Ng, and Michael I. Jordan. 2003. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993-1022.

Brockett, Chris and William B. Dolan. 2005. Support vector machines for paraphrase identification and corpus construction. In Proceedings of the 3rd International Workshop on Paraphrasing, pages 1-8.

Burges, Christopher J. C. 1998. A tutorial on support vector machines for pattern recognition. Data Minning Knowledge Discovery, 2:121-167, June.

Collins, Michael. 1996. A new statistical parser based on bigram lexical dependencies. In Proceedings of the 34th Annual Meeting of the ACL, Santa Cruz.

Corley, Courtney and Rada Mihalcea. 2005. Measuring the semantic similarity of texts. In Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment. Ann Arbor, MI.

Dagan, Ido, Oren Glickman, and Bernardo Magnini. 2005. The pascal recognising textual entailment challenge. In In Proceedings of the PASCAL Challenge Workshop on Recognizing Textual Entailment.

Das, Dipanjan and Noah A. Smith. 2009. Paraphrase identification as probabilistic quasi-synchronous recognition. In In Proceedings of the Joint Conference of the Annual Meeting of the ACL and the International Joint Conference on NLP, Singapore, August.

Dolan, Bill, Chris Quirk, and Chris Brockett. 2004. Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources. In In Proceedings of 20th International Conference on Computational Linguistics (COLING).

Fernando, Samuel and Mark Stevenson. 2008. A semantic similarity approach to paraphrase detection. In In Proceedings of the Computational Linguistics UK (CLUK 2008).

Finch, Andrew, Young Sook Hwang, and Eiichiro Sumita. 2005. Using machine translation evaluation techniques to determine sentence-level semantic equivalence. In Proceedings of the 3rd International Workshop on Paraphrasing (IWP2005).

Graesser, Arthur C., Andrew Olney, Brian C. Haynes, and Patrick Chipman. 2005. Autotutor: A cognitive system that simulates a tutor that facilitates learning through mixed-initiative dialogue. In In Cognitive Systems: Human Cognitive Models in Systems Design. Mahwah: Erlbaum.

Graesser, Arthur C., Phanni Penumatsa, Matthew Ventura, Zhiqiang Cai, and Xiangen Hu, 2007. Handbook of Latent Semantic Analysis, chapter Using LSA in AutoTutor: Learning through Mixed-initiative Dialogue in Natural Language, pages 243-262. Lawrence Erlbaum Associates.

Heilman, Michael and Noah A. Smith. 2010. Tree edit models for recognizing textual entailments, paraphrases, and answers to questions. In Proceedings of the NAACL/HLT), Los Angeles, US.

Iordanskaja, Lidija, Richard Kittredge, and Alain Polguere, 1991. Natural Language Generation in Artificial Intelligence and Computational Linguistics, chapter Lexical selection and paraphrase in a meaning-text generation model, pages 293-312. Kluwer Academic Publishers, Norwell, MA, USA.

Jurafsky, Daniel and James H. Martin. 2002. Speech and Language Processing. Prentice Hall Series in Artificial Intelligence. Prentice Hall, 2nd edition edition, May.

Kate, Rohit J. 2008. A dependency-based word subsequence kernel. In In Proceedings of EMNLP. Kozareva, Zornitsa and Andrs Montoyo, 2006. Advances in Natural Language Processing: Lecture Notes in Computer Science, volume 4139, chapter Paraphrase Identification on the basis of Supervised Machine Learning Techniques, pages 524-533. Springer-Verlag Berlin Heilderberg.

Landauer, Thomas K., Danielle S. McNamara, Simon Dennis, and Walter Kintsch. 2007. Handbook of Latent Semantic Analysis. Mahwah, NJ: Erlbaum.

Li, Yuhua, David McLean, Zuhair A. Bandajar, James D. O'Shea, and Keeley Crockett. 2006. Sentence similarity based on semantic nets and corpus statistics. IEEE Transactions on Knowledge and Data Engineering, 18(8):1138-1150.

Lintean, Mihai, Vasile Rus, and Arthur C. Graesser. 2008. Using dependency relations to decide paraphrasing. In In Proceedings of the Society for Text and Discourse Conference.

Lodhi, Huma, Craig Saunders, John Shawe-Taylor, Nello Cristianini, and Chris Watkins. 2002. Text classification using string kernels. Journal of Machine Learning Research, 2:419-444.

Malakasiotis, Prodromos. 2009. Paraphrase recognition using machine learning to combine similarity measures. In In Proceedings of the ACL-IJCNLP, Suntec, Singapore, August.

Manning, Christopher D. and Heinrich Schutze. 1999. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA.

McCarthy, Philip M. and Danielle S. McNamara. 2009. User-language paraphrase corpus challenge. Online at https://umdrive.memphis.edu/pmmccrth/public/Paraphrase Corpus/Paraphrase site.htm.

McCarthy, Philip M., Vasile Rus, Scott A. Crossley, Arthur C. Graesser, and Danielle S. McNamara. 2008. Assessing forward-, reverse-, and average-entailer indices on natural language input from the intelligent tutoring system, istart. In In proceedings of Twenty-First International FLAIRS Conference.

Miller, George A. 1995. Wordnet: A lexical database for english. Communications of the ACM, 38(11):39-41.

Park, Eui-Kyu, Dong-Yul Ra, and Myung-Gil Jang. 2005. Techniques for improving web retrieval effectiveness. Information Processing and Management, 41(5):1207-1223.

Patwardhan, Siddharth. 2003. Incorporating Dictionary and Corpus Information into a Context Vector Measure of Semantic Relatedness. Master's thesis, University of Minnesota, Duluth, August.

Patwardhan, Siddharth, Satanjeev Banerjee, and Ted Pedersen. 2003. Using measures of semantic relatedness for word sense disambiguation. In In Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics (CICLING-03, pages 241-257.

Pedersen, Ted, Siddharth Patwardhan, and Jason Michelizzi. 2004. Wordnet::similarity - measuring the relatedness of concepts. In In Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2004), pages 38-41, Boston.

Qiu, Long, Min-Yen Kan, and Tat-Seng Chua. 2006. Paraphrase recognition via dissimilarity significance classification. In In Proceedings of EMNLP, pages 18-26.

Ramage, Daniel, Anna N. Rafferty, and Christopher D. Manning. 2009. Randomwalks for text semantic similarity. In In Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing, Suntec, Singapore.

Rinaldi, Fabio, James Dowdall, Kaarel Kaljurand, and Michael Hess. 2003. Exploiting paraphrases in a question answering system. In In Proceedings of the 2nd International Workshop in Paraphrasing, pages 25-32. Saporo, Japan.

Rus, Vasile, Philip M. McCarthy, Mihai C. Lintean, Danielle S. McNamara, and Arthur C. Graesser. 2008a. Paraphrase identification with lexico-syntactic graph subsumption. In proceedings of Twenty-First International FLAIRS Conference.

Rus, Vasile, Philip M. McCarthy, Danielle S. McNamara, and Arthur C. Graesser. 2008b. A study of textual entailment. International Journal on Artificial Intelligence Tools, 17(4):659-685.

Vapnik, Vladimir N. 1998. Statistical Learning Theory. Wiley-Interscience, September.

Wan, Stephen, Mark Dras, Roberd Dale, and Cecile Paris. 2006. Using dependency-based features to take the para-farce out of paraphrase. In In Proceedings of ALTW.

Webster, Jonathan J. and Chunyu Kit. 1992. Tokenization as the initial phase in nlp. In Proceedings of the 14th conference on Computational linguistics - Volume 4, pages 1106-1110, Morristown, NJ, USA. Association for Computational Linguistics.

Weeds, Julie, David Weir, and Bill Keller. 2005. The distributional similarity of sub-parses. In Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment, pages 7-12, Ann Arbor, Michigan, June. Association for Computational Linguistics.

Witten, Ian H. and Eibe Frank. 2005. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco, 2nd edition edition.

Wu, Dekai. 2005. Recognizing paraphrases and textual entailment using inversion transduction grammars. In In Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment, pages 25-30. Ann Arbor.

Zanzotto, Fabio Massimo, Marco Pannacchiotti, and Alessandro Moschitti. 2009. A machine learning approach to textual entailment recognition. Natural Language Engineering, 15(4):551-582.

Zhang, Yitao and Jon Patrick. 2005. Paraphrase identification by text canonicalization. In In Proceedings of the Australasian Language Technology Workshop.