Home | Connectors | Amazon S3 | Amazon S3 - Prodigy Integration and Automation
Data flow: Amazon S3 ? Prodigy
Store large volumes of source files in Amazon S3, such as images, PDFs, audio clips, or text corpora, and let Prodigy pull only the subsets needed for labeling. This gives data science and operations teams a single, governed repository for raw training data while Prodigy handles the annotation workflow.
Data flow: Prodigy ? Amazon S3
After annotation is completed, export labeled datasets, review outputs, and training-ready files from Prodigy into Amazon S3 for downstream model training, audit retention, or sharing with other teams. This creates a durable handoff between labeling and machine learning pipelines.
Data flow: Amazon S3 ? Prodigy ? Amazon S3
Use Amazon S3 as the master repository for unlabeled data and let Prodigy continuously sample the next best records for annotation based on model uncertainty or active learning rules. Once labels are produced, write them back to Amazon S3 to refresh the training set for the next iteration.
Data flow: Amazon S3 ? Prodigy
Organizations with product images, inspection photos, or visual search assets can store image libraries in Amazon S3 and stream them into Prodigy for bounding box, classification, or segmentation tasks. This is especially useful for retail, manufacturing, and logistics teams managing large image volumes.
Data flow: Amazon S3 ? Prodigy
Use Amazon S3 to store emails, support tickets, contracts, chat logs, or scanned documents, then feed those files into Prodigy for entity recognition, text classification, or relation annotation. This helps legal, customer service, and analytics teams build structured datasets from unstructured content.
Data flow: Prodigy ? Amazon S3 ? Prodigy
Store completed annotation batches in Amazon S3 for review, audit, or secondary validation, then reload corrected files into Prodigy for rework when needed. This supports multi-stage review processes where subject matter experts, QA teams, and data scientists collaborate on label quality.
Data flow: Amazon S3 ? Prodigy and Prodigy ? Amazon S3
Use Amazon S3 as the shared distribution layer for global teams working on the same labeling program. Regional teams can pull assigned datasets into Prodigy, annotate independently, and publish results back to Amazon S3 for consolidation and downstream model training.
Data flow: Prodigy ? Amazon S3
Store each labeled dataset version from Prodigy in Amazon S3 with clear naming conventions, timestamps, and project identifiers. This gives ML teams a reliable history of training data used for each model release and supports reproducibility, rollback, and governance requirements.