Tag der Computerlinguistik

From FachschaftSprachwissenschaft
Revision as of 20:36, 14 June 2008 by Kilian (Talk | contribs)

Jump to: navigation, search

The Day of Computational Linguistics will be held on Saturday, June 21, 2008 and will inform students from nearby high schools and universities about our course program. After an introduction to computational linguistics in general and to Tübingen's ISCL in particular, attendees will be free to gather information from several different info sections, each devoted to one particular facet of CL. The event is currently being organized by the Fachschaft members and if you are willing to join the preparations, you are very welcome to do so.

Date and Place

The Open Door Day will take place on Saturday, June 21, 2008 in the Seminar für Sprachwissenschaft (SfS), Wilhelmstraße 19, Tübingen.

  • Rooms: we have room 1.13 and maybe 1.01. (The lecture halls are already booked by others.)

Schedule (tentative)

(as of June 10th - deadline for objections is June 18th)

  • 10:00 Visitors arrive, gathering and talking
  • 10:20 - 11:00 Welcome talk by Prof. Dr. Erhard Hinrichs
  • 11:00 - 11:30 1.13: Talk by Anas about algorithms
  • 11:30 - 12:00 1.13: Talk by Caroline (and Johannes?) about introductory maths
  • 12:00 - 13:00ish Lab session with Marie: programming
  • 13:00 - 14:00 Lunch Break
  • 14:00 - 15:00 1.13: Talk by EML people (everybody, attend!)
  • 15:00 - 15:30 1.13: Talk by Niels about corpora and lexicography; Lab: Talk by Laura about her internship *iff* EML people don't already cover that (still waiting for abstract)
  • 15:30 - 16:00 1.13: Talk by Magdalena about NLP-aided sentiment detection; Lab: Talk by Kilian (and Johannes?) about LACrIMoSA
  • 16:00 - 16:30 1.13: Talk by Aleks about his internship
  • 16:30 - 17:00 1.13: Talk by Anne about her internship
  • 17:00 Visitors leave

Stuff that's indicated as happening in the lab is optional - except for Marie's programming tutorial. Whether we can have two things happening at the same time will depend on whether the group of visitors is big enough so that splitting them up will not make us look ridiculous.

Poster

Here is both a PNG-version of the posters, as well as the original inkscape-made SVG. Please use the SVG if you are going to make any changes to the poster. Please use the PNG if you only want to look at it.

Drawing.png Drawing.svg

Program

Talks by the Faculty

  • Prof. Dr. Erhard Hinrichs
  • Marie Hinrichs
  • Sam Featherson - no confirmation yet

Software Fair

  • Kilian: LaCrIMoSA
  • One of Nomi, Tanya, Plamena, and Anas: Passivator (laptop needed)

Talks by Students (?)

  • Kilian and maybe Johannes about LACrIMoSA?
  • Magdalena about NLP-aided sentiment detection
  • Maria about IR and semantics
  • Niels about corpora and lexicography
  • Caroline about introductory maths
  • Nomi? Anas said that you said that you might do something - what and how much?

Don't have time, but would talk if nobody else is found:

  • Aleks about his internship (he promised something "tactile" - what exactly is the topic, again?)
  • Laura about her internship (taxonomy from Wikipedia, at EML - maybe not a good idea if the EML people tell the same stuff)
  • Anne: about her internship (stemming and OCR)

All the people listed under the various info point sections might be requested to change their contributions to short talks, too.

Talk by EML

EML Research gGmbH

  • Place: Not known yet

Info Sections

Each section will give a short intro and is to be manned by two of us. Please volunteer.

Linguistics

Volunteers: Anonymous, Anonymous

Presentation of intriguing examples, most likely from German, since most attendees are going to be German. Ideas include:

  • Collection of marked sentences in Sternefeld 2006 - initiate discussion about their grammaticality (e.g. "weil es wird aufhören können zu regnen" vs. "weil es hätte aufhören müssen zu regnen", "den Kuchen bäckt die Mutter und isst der Franz" vs. "den Kuchen bäckt die Mutter und isst der Franz Kaugummi")
  • Presenting ambiguities in languages
  • Show how different languages can be (there is an excellent example of Chinese weirdness here). We should only mention languages that people can actually learn in Tübingen. Good candidates for weirdness are certainly Old Irish and Nahuatl. We could actually present one language from every major typological category (e.g. Turkish for agglutinative, Icelandic or Old Irish for inflectional, Chinese for isolating and Nahuatl for (moderately) polysynthetic.

On basis of the examples we can try justify bracketing patterns and tree structures and present that.

Mathematics

Volunteers: Johannes, Caroline?

The station to convince the mathematically-minded of our program.

At this station, people will be able to play around with a few mathematical concepts and tools that we use every day. It is somewhat hard to assess how much mathematical background people will have, so we should be prepared to explain everything from scratch. Offering a broad overview rather than a few little gems might help to avoid problems if some parts are less understandable than expected, and the risk of boredom with the audience is also minimized.

I know that I am probably proposing way too much here. Please tell me which of these numerous ideas you consider adequate, or provide me with some additional ideas.

On the whole, I suggest concentrating on three major topics:

1. Theoretical Computer Science

  • demonstrate finite-state technology by means of a transducer that encodes some fancy morphological rules, preferably something German such as subjunctive inflection or plural forms for certain noun classes; perhaps use some graphical tool to project the FST onto a wall and let it process random strings ?
  • explain the canonical "S --> VP NP" style toy CFG and discuss how this describes a language (introduce notions such as syntactic structure, derivation, ambiguity etc.)
  • take this toy CFG to introduce CYK parsing and let people fool around a bit with it
  • explain why it is not wise to simply try out all alternatives until the solution is found, this could be a good way of introducing complexity classes
  • mention some undecidable problems and point out intuitively why they are undecidable
  • create some confusion and mystery about NP-completeness and the P=NP problem

2. Logic

  • introduce the basic set-theoretic notions and state some common sense theorems
  • informally introduce basic predicate logic (boolean connectives, quantifiers etc.)
  • demonstrate how useful FOL is for expressing facts about objects and their relations ("model theory")
  • introduce the canonical scope ambiguity example (ExAy vs AxEy) to motivate its use in formal semantics
  • maybe show the Peano axiomatization for natural numbers (not really CL-related, but nice to discuss notions like axioms, models etc.)

3. Discrete Mathematics

  • introduce graphs and especially trees, explaining how to formalize them
  • introduce the concepts of recursion and induction by proving some trivial property of trees
  • combinatorics, e.g. "How many ways are there to bracket an expression?"
  • some illustrative example for combinatorical explosion, perhaps some hints on how to avoid that

Text Mining

Volunteers: Kilian, Maria

How do search engines work? What's a (linguistic) Corpus? Ideas:

  • Present an annotated corpus with a cool interface (latest SPLICR alpha version maybe)
  • Automatics text mining (possible demo application: WERTi)

Programming

Volunteers: Aleks, Anonymous

This section will present a short introduction to Computer Science as practised in CL to the visitors. It will contain an introduction to problem solving using systematic methods (probably Algorithms, though people have voted to put that into the mathematics/logics section) including (but not limited to)

  • Object Oriented programming
  • Presentation of typical homeworks or projects (passivator)

Algorithms

Volunteers: Anas, Anonymous

This was an idea given by Anas so that there is a possibility algorithms to be explained without really showing and using any "scary" code for the purpose.

  • Sorting and search algorithms could actually be used for an activity game. Let two teams of people try to sort a chaotic array of objects with as few steps as possible. People can choose to adhere to one of the standard algorithms or to use human intuition. Starting from the results, one could then introduce notions such as amortized analysis, divide-and-conquer, worst-case behaviour and average-case behaviour.

Food and Drinks

During the whole program, or at least a large subset of it, food and drinks will be served in the hall. The faculty will pay for this as well.

  • Place: Somewhere in the hall. I think the first floor makes sense.

Fachschaft Workspace

Ideas / Sources

It seems that there is a very nice introduction to CL on the pages of CL in Stuttgart. Anyone willing to share a link? Also, Hubert Truckenbrodt's scripts for introduction to Phonology and Ede Zimmerman's scripts for introduction to Semantics are very easy to understand and contain a lot of good examples.

Open tasks or TODO

This is the section for all small things that we can or have to do. Volunteers for those tasks should as soon as possible contact Desi for more information.

  • Talk to the Tübingen press (Television, papers, radios...find a contact and talk to them if they could include us as news)
  • Make labels with the names of all of us (SFS)
  • Organize food and drinks
  • Send around posters per mail - done by Maria (2nd semester)
  • Stick posters around in Tübingen - done by Iliana
  • Guest Book - done by Desi
  • Flyer and Info materials for take away - should contain information like:
    • Application deadline for the ISCL program (July 15)
    • Necessary documents for the application
    • Some FAQs from the SfS webpage
    • Information about CL in general (should be more than on the poster)
    • Contact data (email, webpage, ...)
  • Some "Werbegeschenke" will be as well quite nice to have
  • Orientation sheets (maps and posters showing the way to the different rooms)