Multinational Enterprise (MNE) Groups Data Discovery challenge:
Identify web sources with MNE Group data
Multinational enterprise groups play a major role in the European economy. In all EU and EFTA countries, they contribute substantially to the production of goods and services, employment and investments. Due to their importance, they are closely monitored by the National Statistical Institutes and Eurostat. According to the data of the Euro Groups Register (the European statistical business register on MNE groups created by the European Statistical System and managed by Eurostat), for the reference year 2022, MNE groups employed over 47 million people in EU-EFTA countries. This means that around 28 % of people employed in Europe worked for a multinational enterprise group. The majority (82 %) of them, worked in a small number of large multinational enterprise groups.
The goal of the Multinational Enterprise Group Data Discovery Challenge is to develop approaches that automatically identify sources of annual financial data on the World Wide Web for MNE Groups.
Participants will receive a list of 200 MNE Groups. The discovered sources of financial data and reports should be as recent as possible, credible and trustworthy, and contain as much financial data as possible.
Timeline
Teams of up to 5 members are invited to register by 22 April 2025. All registered teams will receive the dataset on 23 April 2025 and begin competing simultaneously.
The competition will run for one month until 23 May 2025. Teams will have additional time until 30 June 2025 to submit a description of their developed approach.
Important Dates
Competition opening for registrations:
- 18 March 2025
Registration deadline:
- 22 April 2025
Data provided to all registered teams simultaneously on 23 April 2025
Submission deadlines:
- 23 May 2025 – Final Accuracy award submission deadline
- 30 June 2025 – Final Reusability and Innovativeness documentation submission deadline
Awards
Accuracy Award
First Prize EUR 10 000
Second Prize EUR 5 000
Third Prize EUR 3 000
Reusability Award
First Prize EUR 10 000
Second Prize EUR 5 000
Third Prize EUR 3 000
Innovativeness Award
First Prize EUR 5 000
Second Prize EUR 3 000
Third Prize EUR 1 000
Multinational Enterprise (MNE) Groups Data Extraction challenge:
Extract the MNE Group data
Multinational enterprise groups play a major role in the European economy. In all EU and EFTA countries, they contribute substantially to the production of goods and services, employment and investments. Due to their importance, they are closely monitored by the National Statistical Institutes and Eurostat. According to the data of the Euro Groups Register (the European statistical business register on MNE groups created by the European Statistical System and managed by Eurostat), for the reference year 2022, MNE groups employed over 47 million people in EU-EFTA countries. This means that around 28 % of people employed in Europe worked for a multinational enterprise group. The majority (82 %) of them, worked in a small number of large multinational enterprise groups.
The goal of the Multinational Enterprise Group Data Extraction Challenge is to develop approaches that automatically extract important annual financial data of MNE Groups.
Participants will receive a list of 200 MNE Groups. The extraction of MNE Group financial data should be done in an automated way, the data being as recent as possible, credible and trustworthy, and extracted in the correct format.
Timeline
Teams of up to 5 members are invited to register by 22 April 2025. All registered teams will receive the dataset on 23 April 2025 and begin competing simultaneously.
The competition will run until 15 June 2025. Teams will have additional time until 30 June 2025 to submit a description of their developed approach.
Important Dates
Competition opening for registrations:
- 18 March 2025
Registration deadline:
- 22 April 2025
Data provided to all registered teams simultaneously on 23 April 2025
Submission deadlines:
- 15 June 2025 – Final Accuracy award submission deadline
- 30 June 2025 – Final Reusability and Innovativeness documentation submission deadline
Awards
Accuracy Award
First Prize EUR 10 000
Second Prize EUR 5 000
Third Prize EUR 3 000
Reusability Award
First Prize EUR 10 000
Second Prize EUR 5 000
Third Prize EUR 3 000
Innovativity Award
First Prize EUR 5 000
Second Prize EUR 3 000
Third Prize EUR 1 000
THE CLASSIFICATION OF OCCUPATIONS FOR ONLINE JOB ADVERTISEMENTS CHALLENGE - The second round of the European Statistics Awards for Web Intelligence
Online job advertisements contain various types of information including a job description, information about the company looking to hire, job benefits, requirements for job seekers, etc. In order to calculate meaningful statistics given the data collection method and size of the online job advertisements datasets, occupational class labels must be provided for these various entries. Within the WI CLASSIFICATION CHALLENGE, teams will compete using advanced modelling techniques to develop an efficient and robust automated solution for correctly assigning class labels.
The second round of European Statistics Awards for Web Intelligence will begin in June 2024 with registrations open until 15 July 2024.
Timeline
The competition will begin on 1 June 2024 and will run for four months until 30 September 2024. The deadline for registration is 15 July 2024.
Awards
Accuracy Award
First Prize EUR 10 000
Second Prize EUR 5 000
Third Prize EUR 3 000
Reusability Award
First Prize EUR 10 000
Second Prize EUR 5 000
Third Prize EUR 3 000
Innovativity Award
First Prize EUR 5 000
Second Prize EUR 3 000
Third Prize EUR 1 000
Teams
Teams comprising a maximum of five individuals with diverse backgrounds and expertise in programming and web intelligence are eligible to participate in the competition. This contest presents an exceptional chance to apply your understanding of classification modelling in an actual context and potentially receive up to EUR 10 000 for developing the most accurate model. If your team secures the top spot for all three awards, you could earn up to EUR 25 000 in this round.
Find out moreThe Web Intelligence - Deduplication Challenge
The winners of the web intelligence - deduplication challenge have been announced
A part of the European Statistics Awards Program aims at stimulating innovation in the area of Web Intelligence for European statistics, focusing on identifying potential duplicate job postings on websites as a basic condition to produce high quality statistics from online job advertisements.
Find out moreFrequently asked questions - Discovery and Extraction Challenges
Are we allowed to share the MNE data that we received on 23 April?
The MNE data are not confidential, so yes, they may be shared and submitted to third-party systems.
Are we allowed to use LLMs?
The use of LLMs is allowed. However, a full description of the developed approach is needed in order to be eligible to compete for a prize.
Simply prompting an LLM is not considered as the development of an algorithm-based approach which automatically identifies public web sources of annual financial data of MNE Groups or as the development of an algorithm-based approach which automatically extracts financial data of MNE Groups.
We want to just compete for the Accuracy Award. Can we skip sending in full documentation?
No, the description of your team’s developed approach is considered a necessary requirement in order to be eligible to receive any of the competition prizes.
There are multiple deadlines for Reusability and Innovativeness submissions. Which is the one that we need to keep?
The Reusability and Innovativeness documentation can be submitted at any phase of the competition, as long as the final deadline for this submission (30 June 2025) is kept.
How many submissions can my team make before the deadline?
Your team is encouraged to make early submissions with dummy data in order to ensure that the required files are in the correct format and that no technical issues arise once you make your final submission.Your team is encouraged to make early submissions with dummy data in order to ensure that the required files are in the correct format and that no technical issues arise once you make your final submission.
Each submission made ‘overwrites’ the previous submission. However, please take care to thoroughly check your final submission, as the technical limitation is a maximum of 1 submission per UTC calendar day until the submission deadline.
Is a fully automated URL discovery (within the extraction code) mandatory for the EXTRACTION challenge?
For the EXTRACTION challenge, teams can use any URL irrespectively of how the URL was identified (manually or automatically).
Automated URL identification is the goal of the DISCOVERY challenge. Therefore, if a team wants to make a submission for the Discovery challenge, the URLs must be identified in an automated way.
If the same team is participating in the EXTRACTION challenge, automated identification of the URLs is not a pre-requisite. URLs can be identified manually.
- be identified using the algorithms of the DISCOVERY challenge
- consist of URLs obtained by other means (URLs already on file with the team members, manually extracted URLs, etc.)
or a combination of both.
My team has made a submission and the submission has received only 1 point. Is there something wrong with my submission?
The submissions for the Discovery and Extraction Challenges will not be calculated automatically during the submission phase. The calculation of the scores will be done outside of the platform by the evaluation committee following the submission deadline.
The 1 point displayed on the right-hand side is a technical idiosyncracy of the leaderboard and could be disregarded; it has no relation whatsoever with the score ultimately assigned to the submissions of the team.
Frequently asked questions - Deduplication and Classification Challenges
Under which legal system is the NDA signed?
As a rule, EU law applies. The implementation of the terms of use shall be governed by Luxembourg law; the courts in Luxembourg shall have sole jurisdiction to hear any disputes.
In the event of a dispute (eg. breach of NDA) the Commission can take action by filing a complaint or by reporting the breach to the police on the basis of national legislation.
Is it permissible to use the eTranslation tool from the European Commission?
The current NDA doesn't foresee such a possibility and using eTranslation would mean losing control over data thus breaking the provisions of the NDA.
Moreover, the eTranslation service is not available to everybody so it would disrupt the level playing field for the other competitors.
Does the question regarding data security issues and the use of the ETRANSLATION tool extend to using other 3rd party APIs, for instance, Google Translate, OpenAI?
Yes, it does. Sending the job advertisement text to third-party API servers makes it accessible to those third parties, which violates the terms of the NDA.
Considering the terms of the NDA, are teams expected to develop their solutions locally on their own machines, or does the possible restriction on third party APIs extend to spinning up remote GPU machines for model training?
No, you don't have to develop your solution locally. You can use cloud infrastructure as long as access to the data is restricted to those who have signed the NDA. This means you are responsible for ensuring the security of the cloud resources you use. You must control who can access the data and ensure that data transmission between your local machines and the cloud is secure, such as through encryption.
Our team has made 2 failed and 1 successful submission. The performance ranking states that we've made 3 submission. Are we still able to make 9 successful submissions?
Failed submissions DO NOT count towards the submission limits. Only valid, successful submissions are counted.
We will periodically make corrections on the performance ranking page and adjust for failed attempts.
Even if the performance ranking currently indicates the total number of submissions which includes failed attempts, we will check the total number of VALID submissions during the evaluation phase and disregard all FAILED attempts, ensuring that each team is allowed 10 VALID submissions.