OpenAI asks contractors to upload past work to assess the performance of AI agents

OpenAI is asking Third-party contractors can upload actual assignments and tasks they have performed at their present or former workplace so the company can evaluate the performance its next-generation workforce. AI modelsWIRED has obtained data from OpenAI, a training company that uses AI for its software and Handshake AI.

OpenAI appears to have launched the project as part of its efforts to create a baseline human performance for various tasks, which can be then compared to AI models. In September, OpenAI launched a new evaluation The process measures the performance of AI models against professionals in a wide range of industries. OpenAI claims that this is an important indicator for its progress in achieving AGI or AI systems which outperform humans on the most valuable economic tasks.

“We’ve hired folks across occupations to help collect real-world tasks modeled off those you’ve done in your full-time jobs, so we can measure how well AI models perform on those tasks,” OpenAI’s confidential documents are available for reading. “Take existing pieces of long-term or complex work (hours or days+) that you’ve done in your occupation and turn each into a task.”

WIRED was shown an OpenAI project presentation that asked contractors to describe their tasks in current or past jobs and upload examples of real work. All examples should be uploaded. “a concrete output (not a summary of the file, but the actual file), e.g., Word doc, PDF, Powerpoint, Excel, image, repo,” Notes for the presentations. OpenAI said people could also share fake work examples to show how they’d respond realistically in certain scenarios.

OpenAI and Handshake AI have declined to make any comments.

According to OpenAI’s presentation, real-world tasks are composed of two parts. Task requests (what someone’s boss or coworker told them to accomplish) are separated from task deliverables (the work that was produced to meet the request). It is stressed in the instructions of the company that contractors’ examples should show how they have met their goals. “real, on-the-job work” This person is a “actually done.”

In the OpenAI Presentation, one example outlines how to perform a task using a “Senior Lifestyle Manager at a luxury concierge company for ultra-high-net-worth individuals.” Goal is to “prepare a short, 2-page PDF draft of a 7-day yacht trip overview to the Bahamas for a family who will be traveling there for the first time.” Included are details on the family interests, and how the trip should be planned. It includes additional details regarding the family’s interests and what itinerary should look like. “experienced human deliverable” The contractor would then upload a Bahamas itinerary that was created by a client.

OpenAI tells contractors to remove corporate intellectual property as well as personally identifiable data from work files that they upload. Under a section labeled “Important reminders,” OpenAI tells employees to “remove or anonymize any: personal information, proprietary or confidential data, material nonpublic information (e.g., internal strategy, unreleased product details).”

In one of the documents viewed by WIRED, a tool named “ChatGPT” is mentioned.Superstar ScrubbingThis article provides instructions on how to remove confidential data.

Evan Brown, an intellectual property lawyer with Neal & McDevitt, tells WIRED that AI labs that receive confidential information from contractors at this scale could be subject to trade secret misappropriation claims. If contractors offer their documents, even if they have been scrubbing them, to a company that specializes in AI, then there is a risk of breaching nondisclosure or trade secret agreements with their prior employers.

“The AI lab is putting a lot of trust in its contractors to decide what is and isn’t confidential,” Brown, says “If they do let something slip through, are the AI labs really taking the time to determine what is and isn’t a trade secret? It seems to me that the AI lab is putting itself at great risk.”

OpenAI asks contractors to upload past work to assess the performance of AI agents

Anthropic Mythos is Unauthorized by Discord Sleuths

Ace the Ping Pong Robot can Whup your Ass

AI-Designed drugs by a DeepMind spinoff are headed to human trials

Apple’s new CEO must launch an AI killer product

Anthropic denies that it can sabotage AI during war

Google’s Nano Banana Pro Image Generator: Hands on with Google

The Cursor AI Coding tool is now available for designers

AI may soon help you understand what your pet is trying to say

Apple Vision Pro: This startup wants to integrate its brain-computer interface into the Apple Vision Pro

Top Insights

Alibaba Qwen Team Releases Qwen3.5 omni: A native multimodal system for audio, video and real time interaction

Apple DiffuCoder is a Diffusion LLM 7B that’s tailored to code generation

Latest News

Anthropic Mythos is Unauthorized by Discord Sleuths

Ace the Ping Pong Robot can Whup your Ass

OpenAI asks contractors to upload past work to assess the performance of AI agents

Related Posts