s3DataDistributionType
Specifies how the training data stored in Amazon S3 should be distributed to training instances. This parameter controls the data distribution strategy for the training job:
FullyReplicated
- The entire dataset is replicated on each training instance. This is suitable for smaller datasets and algorithms that require access to the complete dataset.ShardedByS3Key
- The dataset is distributed across training instances based on Amazon S3 key names. This is suitable for larger datasets and distributed training scenarios where each instance processes a subset of the data.