20-00-0433-iv Natural Language Processing and the Web

Course offering details

Instructors: Prof. Dr. phil. Iryna Gurevych

Event type: Integrated Course

Org-unit: Dept. 20 - Computer Science

Displayed in timetable as: NLP and the Web

Subject:

Crediting for:

Hours per week: 4

Language of instruction: German

Min. | Max. participants: - | -

Course Contents:
The Web contains more than 10 billion indexable web pages, which can be retrieved via keyword search queries. The lecture will present natural language processing (NLP) methods to automatically process large amounts of unstructured text from the web and analyze the use of web data as a resource for other NLP tasks.

Key topics:

- Processing unstructured web content
- NLP basics: tokenization, part-of-speech tagging, stemming, lemmatization, chunking
- UIMA: principles and applications
- Web contents and their characteristics, incl. diverse genres such as personal web sites, news sites, blogs, forums, wikis
- The web as a corpus – innovative use of the web as a very large, distributed, interlinked, growing, and multilingual corpus
- NLP applications for the web
- Introduction to information retrieval
- Web information retrieval and natural language interfaces
- Web-based question answering
- Mining Web 2.0 sites such as Wikipedia, Wiktionary
- Quality assessment of web contents
- Multilingualism
- Internet of services: service retrieval
- Sentiment analysis and community mining
- Paraphrases, synonyms, semantic relatedness

Literature:
- Kai-Uwe Carstensen, Christian Ebert, Cornelia Endriss, Susanne Jekat, Ralf Klabunde: Computerlinguistik und Sprachtechnologie. Eine Einführung. 3. Auflage. Heidelberg: Spektrum, 2009. ISBN: 978-3-8274-20123-7.
- http://www.linguistics.rub.de/CLBuch/
- T. Götz, O. Suhre: Design and implementation of the UIMA Common Analysis System, IBM Systems Journal 43(3): 476–489, 2004.
- Adam Kilgarriff, Gregory Grefenstette: Introduction to the Special Issue on the Web as Corpus, Computational Linguistics 29(3): 333–347, 2003.
- Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze: Introduction to Information Retrieval, Cambridge: Cambridge University Press, 2008. ISBN: 978-0-521-86571-5. http://nlp.stanford.edu/IR-book/

Preconditions:
Basic knowledge in Algorithms and Data Structure
Programming in Java

Small group(s)
This course is divided into the following small groups:
  • Natural Language Processing and the Web - Ü 01

    Prof. Dr. phil. Iryna Gurevych

    Th, 24. Oct. 2019 [16:15]-Th, 13. Feb. 2020 [17:55]

Literature
Appointments
Date From To Room Instructors
1 Tue, 15. Oct. 2019 09:50 11:30 S202/C120 Prof. Dr. phil. Iryna Gurevych
2 Tue, 22. Oct. 2019 09:50 11:30 S202/C120 Prof. Dr. phil. Iryna Gurevych
3 Tue, 29. Oct. 2019 09:50 11:30 S214/24 Prof. Dr. phil. Iryna Gurevych
4 Tue, 5. Nov. 2019 09:50 11:30 S214/24 Prof. Dr. phil. Iryna Gurevych
5 Tue, 12. Nov. 2019 09:50 11:30 S214/24 Prof. Dr. phil. Iryna Gurevych
6 Tue, 19. Nov. 2019 09:50 11:30 S214/24 Prof. Dr. phil. Iryna Gurevych
7 Tue, 26. Nov. 2019 09:50 11:30 S214/24 Prof. Dr. phil. Iryna Gurevych
8 Tue, 3. Dec. 2019 09:50 11:30 S214/24 Prof. Dr. phil. Iryna Gurevych
9 Tue, 10. Dec. 2019 09:50 11:30 S214/24 Prof. Dr. phil. Iryna Gurevych
10 Tue, 17. Dec. 2019 09:50 11:30 S214/24 Prof. Dr. phil. Iryna Gurevych
11 Tue, 14. Jan. 2020 09:50 11:30 S214/24 Prof. Dr. phil. Iryna Gurevych
12 Tue, 21. Jan. 2020 09:50 11:30 S202/C120 Prof. Dr. phil. Iryna Gurevych
13 Tue, 28. Jan. 2020 09:50 11:30 S214/24 Prof. Dr. phil. Iryna Gurevych
14 Tue, 4. Feb. 2020 09:50 11:30 S214/24 Prof. Dr. phil. Iryna Gurevych
15 Tue, 11. Feb. 2020 09:50 11:30 S214/24 Prof. Dr. phil. Iryna Gurevych
Class session overview
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
Instructors
Prof. Dr. phil. Iryna Gurevych