DocumentClassifierInputDataConfig
The input properties for training a document classifier.
For more information on how the input file is formatted, see Preparing training data in the Comprehend Developer Guide.
Types
Properties
A list of augmented manifest files that provide training data for your custom model. An augmented manifest file is a labeled dataset that is produced by Amazon SageMaker Ground Truth.
The format of your training data:
Provides configuration parameters to override the default actions for extracting text from PDF documents and image files.
The S3 location of the training documents. This parameter is required in a request to create a native document model.
The type of input documents for training the model. Provide plain-text documents to create a plain-text model, and provide semi-structured documents to create a native document model.
Indicates the delimiter used to separate each label for training a multi-label classifier. The default delimiter between labels is a pipe (|). You can use a different character as a delimiter (if it's an allowed character) by specifying it under Delimiter for labels. If the training documents use a delimiter other than the default or the delimiter you specify, the labels on that line will be combined to make a single unique label, such as LABELLABELLABEL.