OpenAI Launches Data Partnership Program to Collect Unique Organizational Data

OpenAI Launches Data Partnership Program to Collect Unique Organizational Data

OpenAI has announced the launch of the OpenAI Data Partnerships program, aimed at acquiring unique datasets from third-party organizations to train its AI models. This initiative seeks to attract extensive databases, including those not publicly available.

A key feature of the program is its comprehensive approach: the data does not have to be quantitative or in text format—OpenAI is also open to images, audio, and video. The company emphasizes that it is looking for data on any topic and in any language, as long as it “expresses human intent.” Human-centered information is expected to help improve tools such as automatic speech recognition technology for transcribing spoken words. Additionally, it will enhance the GPT-4 Turbo model, enabling it to provide users with more complex and meaningful responses.

OpenAI states that it has already begun working with interested organizations, including the government of Iceland. For example, the company is already training its models to better understand queries in Icelandic.

To participate in the program, organizations need to submit a form on the company’s website and share information about the type and size of their data. This could be an open-source archive, but materials in it will become publicly available. Alternatively, OpenAI offers the option to submit information through its own channel, which will focus on training “finely tuned custom models.” However, the company stresses that it does not require datasets containing confidential or personal information.

Recently, OpenAI introduced a more powerful and cost-effective version of its language model—ChatGPT-4 Turbo. It is already available via API, with a knowledge cutoff of April 2023. The company stated that the new version is three times cheaper for developers—$0.01 per 1,000 input tokens and $0.03 per 1,000 output tokens.

Leave a Reply