dataFormat
The format of your training data:
COMPREHEND_CSV
: A CSV file that supplements your training documents. The CSV file contains information about the custom entities that your trained model will detect. The required format of the file depends on whether you are providing annotations or an entity list.If you use this value, you must provide your CSV file by using either theAnnotations
orEntityList
parameters. You must provide your training documents by using theDocuments
parameter.AUGMENTED_MANIFEST
: A labeled dataset that is produced by Amazon SageMaker Ground Truth. This file is in JSON lines format. Each line is a complete JSON object that contains a training document and its labels. Each label annotates a named entity in the training document. If you use this value, you must provide theAugmentedManifests
parameter in your request.
If you don't specify a value, Amazon Comprehend uses COMPREHEND_CSV
as the default.