The competition is focused on innovative approaches to identifying job duplicates with the aim of more accurately estimating the number of currently open jobs on the market. Therefore, certain aspects of the competition have been left open for the teams to explore in order to find various more effective approaches.
We would advise teams to focus on the actual content. As an example: two job postings can have different titles and descriptions (as in use of words), but advertise the same job position, while positions with similar descriptions in different countries are considered different job positions since they would be taken by different people.
In order to aid competitors in creating solutions, we are providing the following content descriptions:
Full duplicates represent the posting of the same job position as two separate entries. Two job advertisements can be considered as full duplicates when all of the values are the same.
Below are some examples:
Partial duplicates are those where one of the fields is "missing". For instance, if one job advertisement has a “missing” company name, but all other information is the same as in the second job advertisement, then the two job advertisements are partial duplicates. If more than one characteristic is missing, then they are non-duplicates. In other words, two job advertisements are considered partial duplicates if they describe the SAME job position AND one job advertisement is missing a characteristic of the other one.
As partial duplication is not a transitive relationship, it might
be the case that 1 is deemed to be a partial duplicate of 2 and 2
is a partial duplicate of 3 – whereas 1 does not meet the
criterion to be a partial duplicate of 3 – in which case only
1,2,PARTIAL
2,3,PARTIAL
would be submitted.
Temporal duplicates are two job advertisements posted at two different times.
Semantic duplicates describe the same job position but are worded differently.
For example: If title+description+other fields in Job#1 carries the same semantic information as title+description+other fields in Job#2, jobs are semantic duplicates.
The full duplicate is the least specific type, the semantic is more specific, while the temporal and partial duplicates are the most specific type of duplicates.
In the case multiple labels are possible, please follow the format below:
FULL and SEMANTIC ⇾ SEMANTIC
FULL and TEMPORAL ⇾ TEMPORAL
FULL and PARTIAL ⇾ PARTIAL
SEMANTIC and TEMPORAL ⇾ TEMPORAL
Transitivity is not inferred, so the teams need to provide all individual pairs they deem to be duplicates.
For instance, assuming that the team considers the following IDs to be full duplicates: [1, 2, 3, 4] then every pair has to be explicitly indicated, as follows:
1,2,FULL
1,3,FULL
1,4,FULL
2,3,FULL
2,4,FULL
3,4,FULL
In the dataset, we provide the job advertisements in the form they are found on the web. We do not format the title or body of the advertisement, as a result of which the body can, for example, contain newline characters.
The number of records in the dataset is 112 006. Each team is required to parse the data set correctly.
The teams can make up to 10 successful submissions, with the intention of enabling them to improve their results by adjusting their approach. The purpose of the leaderboard is to enable the teams to (1) compare their results with those of other teams and (2) compare their own results between submissions, thereby improving their method to reach better results. For each team, the submission which yields the best result will be in the running for receiving one or more awards.
Yes, the European Statistics Awards for Nowcasting are open for participants from all continents. Please refer to the eligibility overview sections of the three currently ongoing competitions:
Unlike a traditional hackathon, teams do not get any particular data to use for the European Statistics Awards for Nowcasting. To edge out your competitors, you are free to use whichever auxiliary data you think are the most predictive ones – but finding them will be a challenge!
You can only submit nowcasts for countries. To maximise your chances to win, we encourage you to submit for as many countries as possible.
If you have missed to register by 29 September, then you will not be able to submit any nowcast for September 2022 – but all teams that submit six consecutive nowcasts are still in the running (please refer to the Country Score as defined in the Glossary) – so you can still take part in the EU nowcasting awards competition even if you sign up in October or November.
Still, for models of equal predictive value, teams that already begin submitting for September have somewhat higher chances to get a higher accuracy score, since it is the best 6-month streak of nowcasts that counts!
To join an existing team:
Only team leaders can designate the members of a team. If you are
a group of colleagues organising a team together, please make
sure that the team leader adds you to the team – that is the
only way for you to join it.
To add members to an existing team (if you are the team leader):
If you wish to have more than 1 member, you may participate with up
to 4 other individuals. After logging into statistics-awards.eu,
go to Settings (top right-hand corner). In the text box
“Team members” add the first and last (surname) name of each
member, their email, nationality and country of residence.
Please make sure that each member is added on a new line in the text box.
To create a new team:
You may form your own team, with you being the only member, by registering at:
https://statistics-awards.eu/accounts/signup/.
You are not obliged to make a submission for September 2022. Your first submission can be as late as November 2022. However, you are strongly encouraged to make your submissions as soon as possible as this will increase your chance at success and lower the probability that a submission will be invalid, due to the late publication of the April 2022 European statistics for a time series.
To get a higher accuracy score, you might modify the model that you use for an entry. Please note, however, that if you are in the running for the reproducibility awards, then the best integrity scores are awarded if you consistently apply the same model.
In order to make your submission, create a zip file containing
Zip the files directly, not in a folder.
Next,
The system returns an error in case you are uploading the json file instead of a zip file.
The system returns an error in case you are uploading the json file instead of a zip file. In the zip file, the files have to be directly in the root, not in a subfolder.
For further details, see how to submit.
Yes, the submission must be made in JSON format – but the JSON files must be embedded in a zip file.
For further details, see how to submit.
Yes – each monthly submission must be accompanied by an Accuracy_Approach_description file – but you do not need to change it between submissions if you are using the same approach. If you adjust your method between submissions, then the Accuracy_Approach_description file should be updated to reflect this.
Yes – In order to be eligible to compete for the Accuracy award, each monthly submission must be accompanied by an Accuracy_Approach_description file that outlines the approach. You do not need to change it between submissions if you are using the same approach. If you adjust your method between submissions, then the Accuracy_Approach_description file should be updated to reflect this.
Only the last submission counts – it supersedes all previous submissions.
Before the first release of European statistics for a given reference month, the leaderboard
Once the European statistics have been released (and the entries have been evaluated) for the time series in question, the leaderboard
Questions can be sent to the competition info email at any time during the competition. All questions will be answered in a timely manner. Those relevant for other competitors will be added to the FAQ. However, in order to avoid bias, questions that arrive 3 days prior to each submission deadline date will be answered after the deadline. Following each monthly deadline, the answering of questions will resume normally.
Confirmation of a new team will be executed one day following the registration. In case a new registration is made on the day of a registration deadline, a confirmation on the following day will still render that the registration was within the requirements of the competition rules and all eligibility requirements will have been met.