Class PdfExtractor
Represents Documentize.PdfExtractor plugin. Used to Extract Text, Images, Form Data from PDF documents.
public static class PdfExtractorInheritance
Inherited Members
- object.GetType(),
- object.MemberwiseClone(),
- object.ToString(),
- object.Equals(object?),
- object.Equals(object?, object?),
- object.ReferenceEquals(object?, object?),
- object.GetHashCode()
Examples
The example demonstrates how to extract text content of PDF document.
// Create ExtractTextOptions object to set instructions
var options = new ExtractTextOptions();
// Add input file path
options.AddInput(new FileDataSource("path_to_your_pdf_file.pdf"));
// Perform the process
var results = PdfExtractor.ExtractText(options);
// Get the extracted text from the ResultContainer object
var textExtracted = results.ResultCollection[0].ToString();The example demonstrates how to extract text content of PDF document with TextFormattingMode.
// Create ExtractTextOptions object to set TextFormattingMode
var options = new ExtractTextOptions(TextFormattingMode.Pure);
// Add input file path
options.AddInput(new FileDataSource("path_to_your_pdf_file.pdf"));
// Perform the process
var results = PdfExtractor.ExtractText(options);
// Get the extracted text from the ResultContainer object
var textExtracted = results.ResultCollection[0].ToString();The example demonstrates how to extract images from PDF document.
// Create ExtractImagesOptions to set instructions
var options = new ExtractImagesOptions();
// Add input file path
options.AddInput(new FileDataSource("path_to_your_pdf_file.pdf"));
// Set output Directory path
options.AddOutput(new DirectoryDataSource("path_to_results_directory"));
// Perform the process
var results = PdfExtractor.ExtractImages(options);
// Get path to image result
var imageExtracted = results.ResultCollection[0].ToFile();The example demonstrates how to extract images from PDF document to Streams without folder.
// Create ExtractImagesOptions to set instructions
var options = new ExtractImagesOptions();
// Add input file path
options.AddInput(new FileDataSource("path_to_your_pdf_file.pdf"));
// Not set output - it will write results to streams
// Perform the process
var results = PdfExtractor.ExtractImages(options);
// Get Stream
var ms = results.ResultCollection[0].ToStream();
// Copy data to file for demo
ms.Seek(0, SeekOrigin.Begin);
using (var fs = File.Create("test_file.png"))
{
ms.CopyTo(fs);
}The example demonstrates how to Export Form values to CSV file.
// Create ExtractFormDataToDsvOptions object to set instructions
var options = new ExtractFormDataToDsvOptions(',', true);
// Add input file path
options.AddInput(new FileDataSource("path_to_your_pdf_file.pdf"));
// Set output file path
options.AddOutput(new FileDataSource("path_to_result_csv_file.csv"));
// Perform the process
PdfExtractor.ExtractFormData(options);Methods
ExtractFormData(ExtractFormDataToDsvOptions)
Extract Form Data from PDF document.
public static ResultContainer ExtractFormData(ExtractFormDataToDsvOptions options)Parameters
optionsExtractFormDataToDsvOptions: An options object containing instructions for the operation.
Returns
ResultContainer : An object containing the result of the operation.
Exceptions
If options not set.
ExtractImages(ExtractImagesOptions)
Extract images from PDF document.
public static ResultContainer ExtractImages(ExtractImagesOptions options)Parameters
optionsExtractImagesOptions: An options object containing instructions for the operation.
Returns
ResultContainer : An object containing the result of the operation.
Exceptions
If options not set.
ExtractText(ExtractTextOptions)
Extract text from PDF document.
public static ResultContainer ExtractText(ExtractTextOptions options)Parameters
optionsExtractTextOptions: An options object containing instructions for the operation.
Returns
ResultContainer : An object containing the result of the extraction.
Exceptions
If options not set.
Namespace: Documentize Assembly: Documentize.dll