Role Overview
Responsible for planning and executing MiroMind’s data production and annotation projects (text / image / audio / video / multimodal). This includes organizing and managing internal and external annotation teams and vendors, establishing standards and processes, and ensuring the balance of progress, quality, cost, and compliance.
Key Responsibilities
Project Planning & Delivery
Break down data requirements based on model training goals (task types, scale, coverage, difficulty, and priority). Develop milestones, budgets, and resource plans.
Manage schedules, risks, and dependencies across concurrent data projects to ensure on-time delivery aligned with training needs.
Team & Vendor Management
Build and mentor annotation and quality inspection teams (full-time / part-time / crowdsourced / vendors), including scheduling, performance, and incentive management.
Manage vendor onboarding and evaluation (bidding, SLA, pricing, delivery quality), as well as cost and contract management.
Standards & Processes (SOPs)
Define and iterate annotation guidelines, label taxonomies, edge cases, and decision trees; maintain operation manuals and case libraries.
Design layered quality control (self-check, peer review, expert sampling), gold standard sets, and re-review workflows to continuously reduce rework rates.
Quality & Data Governance
Establish quality metrics: gold standard accuracy, IAA (e.g., Cohen’s kappa / Krippendorff’s alpha), coverage, noise rate, PII leakage rate, etc.
Apply methods such as active learning, hard example mining, weak supervision, and LLM-as-judge to drive a “data flywheel” for continuous refinement and augmentation.