Class TextExtractorOptions
Represents text extraction options for the Documentize.TextExtractor plugin.
public sealed class TextExtractorOptions : PdfExtractorOptions, IPluginOptions
Inheritance
object ← PdfExtractorOptions ← TextExtractorOptions
Implements
Inherited Members
- PdfExtractorOptions.AddInput(IDataSource),
- PdfExtractorOptions.Inputs,
- PdfExtractorOptions.OperationName,
- object.GetType(),
- object.ToString(),
- object.Equals(object?),
- object.Equals(object?, object?),
- object.ReferenceEquals(object?, object?),
- object.GetHashCode()
Examples
The example demonstrates how to extract text content of PDF document.
// create TextExtractor object to extract PDF contents
using (TextExtractor extractor = new TextExtractor())
{
// create TextExtractorOptions object to set TextFormattingMode (Pure, or Raw - default)
extractorOptions = new TextExtractorOptions(TextExtractorOptions.TextFormattingMode.Pure);
// add input file path to data sources
extractorOptions.AddInput(new FileDataSource(inputPath));
// perform extraction process
ResultContainer resultContainer = extractor.Process(extractorOptions);
// get the extracted text from the ResultContainer object
string textExtracted = resultContainer.ResultCollection[0].ToString();
}
Remarks
The Documentize.TextExtractorOptions object is used to set Documentize.TextExtractorOptions.TextFormattingMode and another options for the text extraction operation. Also, it inherits functions to add data (files, streams) representing input PDF documents.
Constructors
TextExtractorOptions(TextFormattingMode)
Initializes a new instance of the Documentize.TextExtractorOptions object for the specified text formatting mode.
public TextExtractorOptions(TextExtractorOptions.TextFormattingMode formattingMode)
Parameters
formattingMode
TextExtractorOptions.TextFormattingMode: Text formatting mode value.
TextExtractorOptions()
Initializes a new instance of the Documentize.TextExtractorOptions object with ‘Raw’ (default) text formatting mode.
public TextExtractorOptions()
Properties
FormattingMode
Gets formatting mode.
public TextExtractorOptions.TextFormattingMode FormattingMode { get; }
Property Value
TextExtractorOptions.TextFormattingMode
OperationName
Returns name of the operation.
public override string OperationName { get; }
Property Value
Namespace: Documentize Assembly: Documentize.dll