Home | Connectors | Box | Box - Prodigy Integration and Automation
Box and Prodigy complement each other well in enterprise AI workflows: Box serves as the secure system of record for documents, images, and other unstructured content, while Prodigy turns that content into high-quality labeled training data for machine learning teams. Integrating the two platforms helps organizations move content from governed storage into annotation workflows and then return labeled outputs back into controlled repositories for reuse, auditability, and downstream model development.
Data flow: Box to Prodigy
Teams can store raw documents, scanned forms, images, or text files in Box and automatically route selected folders or files into Prodigy for labeling. This is useful for regulated organizations that need a controlled handoff from content management to AI training without exposing files through ad hoc downloads or email.
Data flow: Box to Prodigy
Organizations can use Box as the intake repository for customer-submitted documents such as claims, invoices, contracts, or identity documents, then push those files into Prodigy for annotation. Domain experts can label fields, categories, or document types to create training data for automation models.
Data flow: Bi-directional
Prodi gy?s active learning can identify the most informative samples to label next, while Box can store the broader corpus of unannotated content and the selected review sets. As models improve, newly prioritized files or excerpts can be written back to Box for tracking and audit purposes, creating a controlled feedback loop between data storage and annotation.
Data flow: Box to Prodigy, then Prodigy to Box
Box can be used to securely share source content with internal teams and approved external partners, while Prodigy handles the actual annotation work. Completed labels, review notes, and training exports can then be stored back in Box for stakeholder review, version control, and approval workflows.
Data flow: Prodigy to Box
After annotation, Prodigy can export labeled datasets, JSON files, or training artifacts back into Box as the authoritative repository for model training inputs. This gives organizations a secure archive of dataset versions, label sets, and supporting documentation for audits, reproducibility, and model governance.
Data flow: Prodigy to Box
When annotators flag ambiguous, low-confidence, or policy-sensitive items in Prodigy, those cases can be exported to Box for formal review and escalation. Box Relay workflows can route exceptions to legal, compliance, or senior subject matter experts for approval before the data is accepted into the final training set.
Data flow: Bi-directional
Box can manage retention, legal hold, and access policies for raw and labeled datasets, while Prodigy supports the operational labeling stage. Together, they provide a governed lifecycle from source content to annotated dataset to archived training record, which is important for regulated AI initiatives.
Data flow: Box to Prodigy and Prodigy to Box
Business teams can curate source content in Box, such as contracts, support tickets, or quality inspection images, and hand it off to ML teams for annotation in Prodigy. Once labels are complete, the resulting dataset and documentation can be returned to Box so product, operations, and governance teams can review what was trained and approve downstream use.