- അനുഭവം
- ഏതെങ്കിലും
- ശമ്പളം
- USD 80 – USD 110 / year
- ഓപ്പണിംഗുകൾ
- 1
- പോസ്റ്റ് ചെയ്തു
- 4 മണിക്കൂർ മുൻപ്
- Work mode
- വീട്ടിൽ നിന്ന് ജോലി ചെയ്യുക
- വിദ്യാഭ്യാസം
- PhD
- Eligibility
- Current or retired professors and PhD candidates in STEM or professional disciplines based in the United States.
- Resume
- Required to apply
ജോലി വിവരണം
Role overview
This contract role is for academics and research professionals in the United States who want to contribute to a frontier model evaluation program. The work centers on improving next-generation large language model systems across technical and professional subject areas.
What you will do
- Create demanding benchmark tasks based on your academic or professional expertise and make sure they reflect real-world use cases.
- Develop Python-based problem sets that can be executed, clearly specified, and backed by test cases for agent-style workflows.
- Review model responses to spot weaknesses in reasoning, logic, and problem solving across complex scenarios.
- Produce gold-standard answers and evaluation rubrics that enable consistent assessment.
- Study system behavior to identify capability gaps and recurring failure patterns in advanced reasoning tasks.
- Work with subject-matter experts from STEM and quantitative fields to raise the quality and rigor of evaluations.
Requirements
- You should be a current or retired professor, or a PhD candidate, in a STEM or professional field such as computer science, mathematics, physics, engineering, statistics, economics, finance, law, or a closely related area.
- A strong academic record from a leading university or an equivalent research setting is expected.
- You need practical Python skills used in research, academic work, or a professional environment.
- You should be able to create executable problem-solving tasks and computational workflows.
- Prior exposure to benchmarking, structured evaluation, or research-based task design is an advantage.
- Strong analytical judgment is important for checking logical validity and understanding system behavior.
- You must be able to work on your own and maintain a steady schedule of at least 30 hours per week on weekdays.
Additional information
This position is a W-2 contingent role based in the United States. The pay range is stated as $80 to $110 per year, and the expected workload is 30+ hours per week. Applicants should proceed through the easy-apply process to continue.