TAU2014/Ćwiczenia 1

Redmine

We will use Redmine for our classes, please:

  • log in to Redmine (just log in, nothing interesting will happen for the time being),
  • tell me or send an e-mail to add you to the project.

Task M (Zadanie M)

Choose one translation direction. You can choose any direction on condition that:

  • it is not enpl (English-Polish) — this direction will have special treatment during our classes,
  • any parallel corpus is available for the language pair at Opus open parallel corpus

After choosing the language direction, go to Redmine and add an issue (“Zadanie”) with the subject as follows (Spanish-Polish is used as an example):

[espl] - Task M

(Use two-letter ISO 639-1 language codes While adding an issue, set “assigned to” to yourself.

One direction can be done by at most one person! After adding the task, make sure nobody else added a task with the same language pair a moment before. If this is the case, sorry, you have to choose another direction.

Note that, for example, espl is a different direction than ples. (We are choosing directions, not language pairs).

And the task is: train a basic Moses translator for your direction (you don’t have to use a large parallel corpus, 100K sentences is OK), evaluate the results with BLUE. Show the working system to me, discuss (on Redmine Task page) what is OK and what is bad with your translator. Give some interesting or funny examples. Note: you don’t have to tweak the system, just train the basic system, it does not have to be very good.

Conditions:

  • this task is obligatory, you cannot do any other task unless you complete Task M,
  • the task should be completed no later than on 4th meeting (October, 29th), if it completed then, you got 30 points,
  • if the task is completed on 3rd meeting (October, 22th) you got 35 points,
  • if the task is completed later you got 0 points, it’s still obligatory, then!

(Arch)Linux

We are going to use Linux during our classes (no, using Windows for Natural Language Processing is a bad idea, don’t even try).

For our 2nd meeting (October, 15th), you should have Linux installed on your computer or at least you should be ready (i.e. have chosen a Linux distribution, prepared a partition, etc.) for it, I can help you with that.

Arch Linux will be the preferred distribution:

  • KISS, lightweight, bleeding edge, the Arch Way
  • powerful package manager Pacman, it’s dead easy to create a package (much, much easier than in Ubuntu/Debian)
  • lots of Natural Language Processing/Machine Translation tools packaged (more than for Ubuntu)

Installing Arch Linux on your machine is recommended (if you don’t have any other specific reasons, please choose ArchLinux, it’ll be easier for our classes).

Useful links:

What’s next?

If you already have Linux, you can proceed to [[Ćwiczenia 2]]