Share on WhatsApp

Funding Opportunity




  Not Verified

Data Corpora for Artificial Intelligence

Deutsche Forschungsgemeinschaft (DFG)

The DFG is supporting the development of data corpora for training artificial intelligence (AI) 

The DFG’s Committee on Scientific Library Services and Information Systems (AWBI) is responding to needs identified by the research community in connection with the Call for Ideas to Support AI in Research through Information Infrastructures(interner Link). This call invites proposals for projects aimed at preparing and providing data as a foundation for the development or advancement of AI in research. 

Background

Methods used in the context of artificial intelligence – such as machine learning (ML) and text and data mining (TDM) – are becoming increasingly relevant in many areas of digital research practice and in the provision of scientific information. Such methods are used to analyse and process large volumes of data, for instance, as well as for language processing and generation. The development and use of these methods rely on a wide range of multimodal data types, such as factual data, measurement data, behavioural data, monolingual and multilingual texts, image data, synthetic data, digitisation and indexing data. The quality and availability of these data types can vary significantly, and their aggregation and cleansing often require considerable effort. As such, there is a need within the research community for systematically prepared, curated, annotated and aggregated data corpora.

Objectives of the Call for Proposals

This call aims to support the establishment and development of high-quality, extensive data corpora that can serve as a robust and reliable foundation for the development/advancement and application of AI methods in research. The future use of methods and applications based on the funded corpora may take place both in research and within scientific information infrastructures. The quality, scope and composition of the corpora must be tailored to the relevant needs and must enable research beyond individual projects or locations, or contribute to improving the provision of scientific information. The data corpora should be provided in accordance with established principles and standards (such as FAIR and CARE) and via existing information infrastructures, in particular the National Research Data Infrastructure (NFDI).

Scope of Funding

The purpose of funding is to support the compilation and extension of data corpora for AI. Projects may also include the following aspects:

  • conception of selection and quality criteria, implementation of quality assurance measures
  • reuse, adaptation and in particular application of data cleansing, aggregation, annotation, curation and harmonisation procedures
  • provision of the compiled or extended data corpus accessible through existing scientific information infrastructures

Proposal Submission

Proposals must be submitted in English so as to enable international review. Proposals must be structured in accordance with the Proposal Preparation Instructions – Project Proposals in the Area of Scientific Library Services and Information Systems (DFG form 12.01(interner Link)). In addition, applicants must also consider the specific details of this call and the Guidelines and Supplementary Instructions (DFG form 12.14)(interner Link)relating to the “Information Infrastructures for Research Data” funding programme. The following instructions correspond to the LIS Proposal Preparation Instructions.

Project Description (LIS Proposal Preparation Instructions, Part B)

Please explicitly address the following aspects in your project description:

  • Explain in detail how and why the planned corpus will support the development/advancement and/or application of AI methods in research or within scientific information infrastructures. What existing data resources are already available in this area? What potential lies in aggregating and systematically preparing these data into a curated corpus?
  • Demonstrate the need for the proposed corpus. Describe one or more use cases to illustrate the research questions or improvements in AI-based information provision that will be enabled through the planned data corpus. What key tasks and activities are necessary to build the corpus? Describe how you will ensure that the corpus is appropriately prepared for its intended use.
  • Explain in detail the composition of the corpus and justify your selection: Which data is to be selected? From which sources will these data be collated? Where relevant, address the issue of bias and how you plan to handle it.
  • Outline the criteria you will use to assess data quality. Explain how these criteria align with commonly accepted standards. To what extent will FAIR and CARE principles be applied?
  • Specify the desired quality and format of the data. Describe the initial quality of the data and the targeted standard after preparation. What methods and measures will be used to achieve the targeted quality?
  • Provide details of how the corpus will remain accessible to researchers in the long term. If updates are expected after the project ends (e.g. due to planned versioning), explain how ongoing curation will be ensured.
  • Where available, the corpus should be made accessible via relevant subject-specific infrastructures, such as NFDI consortia or specialised information services (FID). In which structure (centralised or decentralised) will the corpus be made available on completion of the project? How will the corpus be technically and/or organisationally integrated into existing information infrastructures? Wherever possible and appropriate, cooperation with partners should be arranged.

Measures to Meet Funding Requirements and Handle Project Results (LIS Proposal Preparation Instructions Part B, 4.3)

  • The compiled data corpus is to be published under a licence that allows free use for research purposes. The chosen licence must be specified in the project description.
  • Availability and access to the corpus should be as open as possible for scientific users. If open access cannot be granted, this must be explained in detail in the proposal. In general, access modalities for users must be clearly described.

Please confirm that the following actions will be taken:

  • Key interim results will be published after the first year of the project.
  • The data corpus will be made findable in both disciplinary and cross-disciplinary directories, registries and the like.
  • The corpus will be documented according to recognised quality standards.
  • Please also confirm explicitly that no duplicate funding is involved and that the corpus is not already being compiled under another project.

Duration

Projects may run for a maximum of two years.

Deadline

Your proposal must be submitted to the DFG in English by 30 July 2025. Proposal submission is exclusively via the elan portal(externer Link) for the purpose of recording proposal-related data and transmitting documents in a secure manner.

If this is the first time you are submitting a proposal to the DFG, please note that you must register in the elan portal before you can submit your proposal. You must do so by 23 July 2025. You will normally receive confirmation of your registration by the next working day.

An informal letter of intent is requested by 28 May 2025. Please use the form linked under “Further Information” for this purpose.

All selected project owners will be required to attend a joint kick-off workshop in the first half of 2026 which will support networking and dialogue between the funded projects.

The DFG is supporting the development of data corpora for training artificial intelligence (AI) 

Eligible Countries:

Sponsor Institute/Organizations: Deutsche Forschungsgemeinschaft (DFG)

Sponsor Type: Corporate

Address: German Research Foundation Kennedyallee 40 53175 Bonn, Germany Telephone: +49 (228) 885-1 Telefax +49 (228) 885-2777

Affiliation Disclaimer: Trialect operates independently and is not affiliated with, endorsed by, or supported by any sponsors or organizations posting on the GrantsBoard platform. As an independent aggregator of publicly available funding opportunities, Trialect provides equal access to information for all users without endorsing any specific funding source, content, organization, or sponsor. Trialect assumes no responsibility for the content posted by sponsors or third parties.

Subscription Disclaimer: Upon logging into Trialect, you may choose to SUBSCRIBE to GrantsBoard for timely notifications of funding opportunities and to access exclusive benefits, such as priority alerts, reminders, personalized recommendations, and additional application support. However, users are advised to contact sponsors directly for any questions and are not required to subscribe to engage with funding opportunities.

Content Ownership and Copyright Disclaimer: Trialect respects the intellectual property rights of all organizations and individuals. All content posted on GrantsBoard is provided solely for informational purposes and remains the property of the original owners. Trialect does not claim ownership of, nor does it have any proprietary interest in, content provided by third-party sponsors. Users are encouraged to verify content and ownership directly with the posting sponsor.

Fair Use Disclaimer: The information and content available on GrantsBoard are compiled from publicly accessible sources in alignment with fair use principles under U.S. copyright law. Trialect serves as an aggregator of this content, offering it to users in good faith and with the understanding that it is available for public dissemination. Any organization or individual who believes their intellectual property rights have been violated is encouraged to contact us for prompt resolution.

Third-Party Posting Responsibility Disclaimer: Trialect is a neutral platform that allows third-party sponsors to post funding opportunities for informational purposes only. Sponsors are solely responsible for ensuring that their postings comply with copyright, trademark, and other intellectual property laws. Trialect assumes no liability for any copyright or intellectual property infringements in third-party content and will take appropriate action to address any substantiated claims.

Accuracy and Verification Disclaimer: Trialect makes no warranties regarding the accuracy, completeness, or reliability of the information provided by sponsors. Users are advised to verify the details of any funding opportunity directly with the sponsor before taking action. Trialect cannot be held liable for any discrepancies, omissions, or inaccuracies in third-party postings.

Notice and Takedown Policy: Trialect is committed to upholding copyright law and protecting the rights of intellectual property owners. If you believe that content on GrantsBoard infringes your copyright or intellectual property rights, please contact us with detailed information about the claim. Upon receipt of a valid notice, Trialect will promptly investigate and, where appropriate, remove or disable access to the infringing content.

Grant

Letter Of Intent Deadline:

May 28, 2025

Final Deadline:

Jul 30, 2025

Funding Amount:

$454,228

Similar Funding Opportunities

Browse similar funding opportunities
$1,327
Deadline: Jul 31, 2025
Grant, Award
$664
Deadline: Sep 26, 2025
Grant, Award
$664
Deadline: Jul 31, 2025
Grant, Award
  Verified
$1,513,718
Deadline: Jun 04, 2025
Grant

Activity Logs

There are 2 new tasks for you in “AirPlus Mobile App” project:
Added at 4:23 PM by
img
Meeting with customer
Application Design
img
img
A
In Progress
View
Project Delivery Preparation
CRM System Development
img
B
Completed
View
Invitation for crafting engaging designs that speak human workshop
Sent at 4:23 PM by
img
Task #45890merged with #45890in “Ads Pro Admin Dashboard project:
Initiated at 4:23 PM by
img
3 new application design concepts added:
Created at 4:23 PM by
img
New case #67890is assigned to you in Multi-platform Database Design project
Added at 4:23 PM by
Alice Tan
You have received a new order:
Placed at 5:05 AM by
img

Database Backup Process Completed!

Login into Admin Dashboard to make sure the data integrity is OK
Proceed
New order #67890is placed for Workshow Planning & Budget Estimation
Placed at 4:23 PM by
Jimmy Bold
Pic
Brian Cox 2 mins
How likely are you to recommend our company to your friends and family ?
5 mins You
Pic
Hey there, we’re just writing to let you know that you’ve been subscribed to a repository on GitHub.
Pic
Brian Cox 1 Hour
Ok, Understood!
2 Hours You
Pic
You’ll receive notifications for all issues, pull requests!
Pic
Brian Cox 3 Hours
You can unwatch this repository immediately by clicking here: https://app.trialect.com
4 Hours You
Pic
Most purchased Business courses during this sale!
Pic
Brian Cox 5 Hours
Company BBQ to celebrate the last quater achievements and goals. Food and drinks provided
Just now You
Pic
Pic
Brian Cox Just now
Right before vacation season we have the next Big Deal for you.

Shopping Cart

Iblender The best kitchen gadget in 2022
$ 350 for 5
SmartCleaner Smart tool for cooking
$ 650 for 4
CameraMaxr Professional camera for edge
$ 150 for 3
$D Printer Manfactoring unique objekts
$ 1450 for 7
MotionWire Perfect animation tool
$ 650 for 7
Samsung Profile info,Timeline etc
$ 720 for 6
$D Printer Manfactoring unique objekts
$ 430 for 8