Towards an open software architecture for interleaved knowledge and natural language processing
Authors: Spies Marcus
Keywords: cognitive computing; knowledge processing; semantic technologies; natural language processing
Abstract:
Many existing integrations of natural language processing (NLP) and knowledgebased systems build on ‘shallow’ NLP. Prominent examples are term-frequency based document retrieval or document topic extraction systems. Recent progress in NLP, however, has brought ‘deep’ processing features like sentential parsing and semantic dependency analysis to a highly mature level for a substantial set of common spoken languages. Specifically, ‘deep’ NLP outputs beyond conventional syntax trees allow for better interoperability with other information or knowledge management (IM/KM) components, e.g. those using semantic technologies or statistical learning approaches. In this paper, the emerging importance of interleaved knowledge and language processing in the context of cognitive computing is shown. Building blocks for an open architecture for deep NLP applications are introduced and discussed.
References:
[1] Rajendra Akerkar, ed. Big Data Computing. Chapman and Hall/CRC, 2013. ISBN: 978-1-4665-7837. DOI: doi:10.1201/b16014. URL: http://dx.doi.org/10.1201/b16014-1.
[2] Yoav Artzi, Nicholas FitzGerald, and Luke Zettlemoyer. Semantic Parsing with Combinatory Categorial Grammars. Tutorial. University of Washington, 2013. URL: http://yoavartzi.com/pub/afz-tutorial.acl.2013.pdf.
[3] Yoav Artzi and Luke Zettlemoyer. UW SPF: The University of Washington Semantic Parsing Framework. 2013. eprint: arXiv:1311.3011.
[4] Jason Baldridge and Michael White. OpenCCG. Tech. rep. University of Edinburgh, 2006. URL: http://sourceforge.net/projects/openccg.
[5] Jonathan Berant et al. “Semantic Parsing on Freebase from Question-Answer Pairs”. In: Proceedings of EMNLP. 2013. URL: http://cs.stanford.edu/~pliang/papers/freebase-emnlp2013.pdf.
[6] J. Berant et al. “Semantic Parsing on Freebase from Question-Answer Pairs”. In: Empirical Methods in Natural Language Processing (EMNLP). 2013.
[7] Stephen Bishop and et al. infer.NET User Guide. Tech. rep. Microsoft Research, 2013. URL: http://research.microsoft.com/en-us/um/cambridge/projects/infernet/.
[8] David M. Blei. “Probabilistic Topic Models”. In: Communications of the ACM 55.4 (2012). A high-level overview of probabilistic topic models., pp. 77–84.
[9] Bernard Bou. GrammarScope. 2014. URL: http://grammarscope.sourceforge.net/.
[10] Cem Boz¸sahin, Geert-Jan M. Kruijff, and Michael White. Specifying Grammars for OpenCCG: A Rough Guide. Tech. rep. School of Informatics; University of Edinburgh, 2013.
[11] Roger B. Bradford. Implementation techniques for large-scale latent semantic indexing applications. 2011.
[12] Mats Carlsson. SICStus Prolog Users Manual. Tech. rep. Swedish Institute of Computer Science, 2011.
[13] Marie-Catherine de Marneffe and Christopher D. Manning. Stanford typed dependencies manual. 2008.
[14] Marie-Catherine de Marneffe, Bill McCartney, and Christopher D. Manning. “Generating Typed Dependency Parses from Phrase Structure Parses”. In: Proceedings 5th Int. Conf. on Language Resources and Evaluation (LREC). European Language Resources Association. 2006. URL: http://www.lrec-conf.org/proceedings/lrec2006/.
[15] Pedro Domingos and Matthew Richardson. “Markov Logic: A Unifying Framework for Statistical Relational Learning”. In: Introduction to Statistical Relational Learning. Ed. by Lise Getoor and Ben Taskar. Cambridge, MA: MIT Press, 2007. Chap. 12, pp. 339–372.
[16] J. Fan et al. “Automatic knowledge extraction from documents”. In: IBM Journal of Research and Development 56.3.4 (May 2012), 5:1–5:10. ISSN: 0018-8646. DOI:10.1147/JRD.2012.2186519.
[17] Ingo Feinerer. An introduction to the tm package – Text mining in R. 2012.
[18] Freebase. 2012. URL: https://www.freebase.com/.
[19] Geoffrey Hinton. A Practical Guide to Training Restricted Boltzmann Machines. URL: http://www.cs.toronto.edu/~hinton/absps/guideTR.pdf.
[20] G. Klyne and J.J. Caroll. Resource Description Framework (RDF): Concepts and Abstract Syntax. 2009.
[21] Stanley Kok et al. The Alchemy System for Statistical Relational AI. 2005. URL: http://www.cs.washington.edu/ai/alchemy.
[22] Tom Kwiatkowski et al. “Lexical Generalization in CCG Grammar Induction for Semantic Parsing”. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. Ed. by Association for Computational Linguistics. Association for Computational Linguistics. Edinburgh, Scotland, UK, July 2011, pp. 1512–1523. URL: http://aclweb.org/anthology//D/D11/D11-1140.pdf.
[23] A. Lally et al. “Question analysis: How Watson reads a clue”. In: IBM Journal of Research and Development 56.3.4 (May 2012), 2:1–2:14. ISSN: 0018-8646. DOI:10.1147/JRD.2012.2184637.
[24] Honglak Lee. Tutorial on Deep Learning and Applications. 2010. URL: http://deeplearningworkshopnips2010.files.wordpress.com/2010/
09/nips10-workshop-tutorial-final.pdf.
[25] Percy Liang. “Lambda Dependency-Based Compositional Semantics”. In: Computing Research Repository (CoRR) abs/1309.4408v2 (2013).
[26] Percy Liang, Michael I. Jordan, and Dan Klein. “Learning Dependency-Based Compositional Semantics”. In: arxiv.org/CoRR abs/1109.6841 (2011).
[27] Thomas Lin, Mausam Etzioni, and Oren Etzioni. “Entity Linking at Web Scale”. In: AKBC-WEKEX 2012 | The Knowledge Extraction Workshop at NAACL-HLT 2012. Ed. by Turing Center KnowItAll Project. Montréal, Canada, June 2012. URL:https://akbcwekex2012.files.wordpress.com/2012/05/25%5C_paper.pdf.
[28] Eric Margolis. Concepts: Core Readings. Cambridge: Cambridge University Press, 1999.
[29] Andrew McCallum, Karl Schultz, and Sameer Singh. “FACTORIE: Probabilistic Programming via Imperatively Defined Factor Graphs”. In: Neural Information Processing Systems (NIPS). 2009.
[30] M. C. McCord, J. W. Murdock, and B.K. Boguraev. “Deep parsing in Watson”. In: IBM Journal of Research and Development 56.3.4 (May 2012), 3:1–3:15. ISSN:0018-8646. DOI: 10.1147/JRD.2012.2185409.
[31] B. Motik, P. Patel-Schneider, and B. Parsia. OWL 2 Web Ontology Language Structural Specification and Functional-Style Syntax. 2009.
[32] Opencalais. 2012. URL: http://viewer.opencalais.com/.
[33] Hoifung Poon and Pedro Domingos. “Machine Reading: A “Killer App” for Statistical Relational AI”. In: Proc. AAAI Conference. 2010.
[34] Ruslan Salakhutdinov and Geoffrey Hinton. “Deep Boltzmann Machines”. In: Proceedings of the 12 th International Conference on Artificial Intelligence and Statistics (AISTATS). Ed. by K. Murphy and B. et al. Schölkopf. Vol. 5. 12. 2009, pp. 448–455. URL: http://machinelearning.wustl.edu/mlpapers/paper%5C_files/AISTATS09%5C_SchmidtBFM.pdf.
[35] Peter Selinger. Lecture Notes on the Lambda Calculus. Ed. by Dpt. of Mathematics and Statistics. Halifax, Canada: Dalhousie University, 2013. URL: http://www.mathstat.dal.ca/~selinger/papers/lambdanotes.pdf.
[36] Marcus Spies. “Knowledge Discovery from Constrained Relational Data: A Tutorial on Markov Logic Networks”. In: Business Intelligence – Second European Summer School, eBISS 2012, Brussels, Belgium, July 15-21, 2012, Tutorial Lectures. Ed. by
Marie-Aude Aufaure and Esteban Zimányi. Vol. 138. Lecture Notes in Business Information Processing. Springer, 2013, pp. 78–102. ISBN: 978-3-642-36317-7.
[37] Marcus Spies and Monika Jungemann-Dorner. “Big Textual Data Analytics and Knowledge Management”. In: Big Data Computing. Ed. by Rajendra Akerkar. Chapman and Hall/CRC, 2013. Chap. 23, pp. 501–537. ISBN: 978-1-4665-7837. DOI: doi:10.1201/b16014- 23. URL: http://dx.doi.org/10.1201/b16014-1.
[38] Mark Steedman and Jason Baldridge. “Combinatory Categorial Grammar”. In: ed. by R. Borsley and . K. Borjars (eds.) 181-224. Blackwell, 2011.
[39] C. Sutton and A. McCallum. “An Introduction to Conditional Random Fields”. In: ArXiv e-prints (Nov. 2010). arXiv: 1011.4088 [stat.ML].
[40] The Stanford Parser: A statistical parser. Apr. 2014. URL: http://nlp.stanford.edu/software/lex-parser.shtml.
[41] C. Wang et al. “Relation extraction and scoring in DeepQA”. In: IBM Journal of Research and Development 56.3.4 (May 2012), 9:1–9:12. ISSN: 0018-8646. DOI:10.1147/JRD.2012.2187239.
[42] Paul C. Zikopoulos et al. Harness the Power of Big Data: The IBM Big Data Platform. New York: McGraw-Hill, 2013.