Relay the delay!
Languages en svNote: you may make at most 8 submissions to this problem. This is because there is no public/private split of the test data.
Zimmer the swimmer has a tiring job: he works as a delay-relayer. This means that he has to swim
between all cities in Germany to keep them informed about
potential delays in the national train network. Although Zimmer
likes swimming in general, this task has gotten way out of hand
in the last couple of years with Zimmer being forced to deliver
the news of delays more efficiently than Santa Claus delivers
presents (even though he - in contrast to mr.Claus - doesn’t
have a magical sleigh!!). The number of delays have been
growing in the last couple of years and Zimmer is now starting
to give up on the work he once loved. Sadly Zimmer has
therefore missed to label a couple ($3,500,000$) of delays the last year
and a half. Zimmer wants you to write an AI-model that, given
the train station and the scheduled arrival time of the train
at the station, guesses the delay of the train. To help you he
has given you all the other data including delay of the ones he
didn’t miss to label.
Of course, not all trains are delayed; they could also be
early!
Input
The attachments contain the following files:
-
data.zip is a zip file that contains the following files:
-
train.csv - training data points consisting of the train station, the timestamp when the train should have arrived, and how big the delay was (in minutes).
-
test.csv - test data, consisting of only the train station and timestamp when the train should have arrived for a given train.
-
license.txt - a license we need to provide to use this data.
-
-
baseline.ipynb - A basic solution to the task. It appends its own source code to the submission.
-
print_source_code.py - A utility script to print your source code as a comment, useful for including your solution in the submission file. Does not work inside a Jupyter Notebook.
Output
For each train information (train station, time) in test.csv, output a line containing a single integer: the guessed delay for that train.
Scoring
Your solution will be evaluated based on how accurately you guessed the delays compared to the actual ones.
The scoring is calculated as follows:
-
For each train, your error is the squared difference between your guess and the real delay.
-
More exactly, assume that $x$ is your guess and that $y$ is the real delay. Your error for that train is $(x-y)^2$.
-
Your overall error $S$ is the mean of all individual errors: the total error divided by the number of trains.
Your final points are determined by comparing your error $S$ to a base error (note: NOT the error achieved by the baseline solution) and a best error:
-
Base Error: $90$
-
Best Error: $80$
The score is calculated using the formula:
\[ \text{Points} = 100 \cdot \max \left(0,\; \min \left(1,\; \frac{90 - S}{90 - 80}\right)\right) \]This means:
-
If your error $S$ is $\ge 90$, you get 0 points.
-
If your error $S$ is $\le 80$, you get 100 points.
-
Otherwise, you get a score between 0 and 100 proportional to your improvement over the base error.
Testing
During the competition, your solution will be scored on all of the test data. Your score during the competition is the same as your final score.
