Class PdfExtractor

Информация

Представляет плагин Documentize.PdfExtractor. Используется для извлечения текста, изображений, данных форм, свойств (метаданных) из PDF‑документов.

public static class PdfExtractor

Наследование

objectPdfExtractor

Унаследованные члены

Методы

Extract(ExtractTextOptions)

Извлекает текст из PDF‑документа.

public static string Extract(ExtractTextOptions options)

Параметры

  • options ExtractTextOptions: Объект параметров, содержащий инструкции для операции.

Возвращаемое значение

string : Извлечённый текст.

Примеры

The example demonstrates how to Extract Text content from PDF file.

// Create ExtractTextOptions object to set input file path
var options = new ExtractTextOptions("path_to_your_pdf_file.pdf");
// Perform the process and get the extracted text
var textExtracted = PdfExtractor.Extract(options);

The example demonstrates how to Extract Text content from PDF stream.

// Create ExtractTextOptions object to set input stream
var stream = File.OpenRead("path_to_your_pdf_file.pdf");
var options = new ExtractTextOptions(stream);
// Perform the process and get the extracted text
var textExtracted = PdfExtractor.Extract(options);

The example demonstrates how to Extract Text content of PDF document with TextFormattingMode.

// Create ExtractTextOptions object to set input file path and TextFormattingMode
var options = new ExtractTextOptions("path_to_your_pdf_file.pdf", TextFormattingMode.Pure);
// Perform the process and get the extracted text
var textExtracted = PdfExtractor.Extract(options);

The example demonstrates how to Extract Text from PDF file in the shortest possible style.

// Perform the process and get the extracted text
var textExtracted = PdfExtractor.Extract(new ExtractTextOptions("path_to_your_pdf_file.pdf", TextFormattingMode.Pure));

Исключения

ArgumentException

Если параметры не заданы.

Extract(ExtractImagesOptions)

Извлекает изображения из PDF‑документа.

public static ResultContainer Extract(ExtractImagesOptions options)

Параметры

  • options ExtractImagesOptions: Объект параметров, содержащий инструкции для операции.

Возвращаемое значение

ResultContainer : Объект, содержащий результат операции.

Примеры

The example demonstrates how to Extract Images from PDF document.

// Create ExtractImagesOptions to set instructions
var options = new ExtractImagesOptions();
// Add input file path
options.AddInput(new FileData("path_to_your_pdf_file.pdf"));
// Set output Directory path
options.AddOutput(new DirectoryData("path_to_results_directory"));
// Perform the process
var results = PdfExtractor.Extract(options);
// Get path to image result
var imageExtracted = results.ResultCollection[0].ToFile();

The example demonstrates how to Extract Images from PDF document to Streams without folder.

// Create ExtractImagesOptions to set instructions
var options = new ExtractImagesOptions();
// Add input file path
options.AddInput(new FileData("path_to_your_pdf_file.pdf"));
// Not set output - it will write results to streams
// Perform the process
var results = PdfExtractor.Extract(options);
// Get Stream
var ms = results.ResultCollection[0].ToStream();
// Copy data to file for demo
ms.Seek(0, SeekOrigin.Begin);
using (var fs = File.Create("test_file.png"))
{
    ms.CopyTo(fs);
}

Исключения

ArgumentException

Если параметры не заданы.

Extract(ExtractFormDataToDsvOptions)

Извлекает данные форм из PDF‑документа.

public static ResultContainer Extract(ExtractFormDataToDsvOptions options)

Параметры

  • options ExtractFormDataToDsvOptions: Объект параметров, содержащий инструкции для операции.

Возвращаемое значение

ResultContainer : Объект, содержащий результат операции.

Примеры

The example demonstrates how to Export Form values to CSV file.

// Create ExtractFormDataToDsvOptions object to set instructions
var options = new ExtractFormDataToDsvOptions(',', true);
// Add input file path
options.AddInput(new FileData("path_to_your_pdf_file.pdf"));
// Set output file path
options.AddOutput(new FileData("path_to_result_csv_file.csv"));
// Perform the process
PdfExtractor.Extract(options);

The example demonstrates how to Export Form values to TSV file and set Properties.

// Create ExtractFormDataToDsvOptions object to set instructions
var options = new ExtractFormDataToDsvOptions();
//Set Delimiter
options.Delimiter = '\t';
//Add Field Names to result
options.AddFieldName = true;
// Add input file path
options.AddInput(new FileData("path_to_your_pdf_file.pdf"));
// Set output file path
options.AddOutput(new FileData("path_to_result_csv_file.tsv"));
// Perform the process
PdfExtractor.Extract(options);

Исключения

ArgumentException

Если параметры не заданы.

Extract(ExtractPropertiesOptions)

Извлекает свойства из PDF‑документа.

public static PdfProperties Extract(ExtractPropertiesOptions options)

Параметры

  • options ExtractPropertiesOptions: Объект параметров, содержащий инструкции для операции.

Возвращаемое значение

PdfProperties : Объект, содержащий результат операции.

Примеры

The example demonstrates how to Extract Properties (FileName, Title, Author, Subject, Keywords, Created, Modified, Application, PDF Producer, Number of Pages) from PDF file.

// Create ExtractPropertiesOptions object to set input file
var options = new ExtractPropertiesOptions("path_to_your_pdf_file.pdf");
// Perform the process and get Properties
var pdfProperties = PdfExtractor.Extract(options);
var filename = pdfProperties.FileName;
var title = pdfProperties.Title;
var author = pdfProperties.Author;
var subject = pdfProperties.Subject;
var keywords = pdfProperties.Keywords;
var created = pdfProperties.Created;
var modified = pdfProperties.Modified;
var application = pdfProperties.Application;
var pdfProducer = pdfProperties.PdfProducer;
var numberOfPages = pdfProperties.NumberOfPages;

The example demonstrates how to Extract Properties (Title, Author, Subject, Keywords, Created, Modified, Application, PDF Producer, Number of Pages) from PDF stream.

// Create ExtractPropertiesOptions object to set input stream
var stream = File.OpenRead("path_to_your_pdf_file.pdf");
var options = new ExtractPropertiesOptions(stream);
// Perform the process and get Properties
var pdfProperties = PdfExtractor.Extract(options);
var title = pdfProperties.Title;
var author = pdfProperties.Author;
var subject = pdfProperties.Subject;
var keywords = pdfProperties.Keywords;
var created = pdfProperties.Created;
var modified = pdfProperties.Modified;
var application = pdfProperties.Application;
var pdfProducer = pdfProperties.PdfProducer;
var numberOfPages = pdfProperties.NumberOfPages;

The example demonstrates how to Extract Properties from PDF file in the shortest possible style.

// Perform the process and get Properties
var pdfProperties = PdfExtractor.Extract(new ExtractPropertiesOptions("path_to_your_pdf_file.pdf"));

Исключения

ArgumentException

Если параметры не заданы.

Пространство имён: Documentize
Сборка: Documentize.dll

13 мар. 2026 г.
 Русский