Class PdfExtractor

Information

Représente le plugin Documentize.PdfExtractor. Utilisé pour extraire du texte, des images, des données de formulaire, des propriétés (méta‑données) des documents PDF.

public static class PdfExtractor

Héritage

objectPdfExtractor

Membres hérités

Méthodes

Extract(ExtractTextOptions)

Extract Text from PDF document.

public static string Extract(ExtractTextOptions options)

Paramètres

  • options ExtractTextOptions : Un objet d’options contenant les instructions pour l’opération.

Retour

string : Texte extrait.

Exemples

L’exemple montre comment extraire le contenu texte d’un fichier PDF.

// Create ExtractTextOptions object to set input file path
var options = new ExtractTextOptions("path_to_your_pdf_file.pdf");
// Perform the process and get the extracted text
var textExtracted = PdfExtractor.Extract(options);

L’exemple montre comment extraire le contenu texte d’un flux PDF.

// Create ExtractTextOptions object to set input stream
var stream = File.OpenRead("path_to_your_pdf_file.pdf");
var options = new ExtractTextOptions(stream);
// Perform the process and get the extracted text
var textExtracted = PdfExtractor.Extract(options);

L’exemple montre comment extraire le contenu texte d’un document PDF avec TextFormattingMode.

// Create ExtractTextOptions object to set input file path and TextFormattingMode
var options = new ExtractTextOptions("path_to_your_pdf_file.pdf", TextFormattingMode.Pure);
// Perform the process and get the extracted text
var textExtracted = PdfExtractor.Extract(options);

L’exemple montre comment extraire du texte d’un fichier PDF de la manière la plus concise possible.

// Perform the process and get the extracted text
var textExtracted = PdfExtractor.Extract(new ExtractTextOptions("path_to_your_pdf_file.pdf", TextFormattingMode.Pure));

Exceptions

ArgumentException
Si les options ne sont pas définies.

Extract(ExtractImagesOptions)

Extract images from PDF document.

public static ResultContainer Extract(ExtractImagesOptions options)

Paramètres

Retour

ResultContainer : Un objet contenant le résultat de l’opération.

Exemples

L’exemple montre comment extraire des images d’un document PDF.

// Create ExtractImagesOptions to set instructions
var options = new ExtractImagesOptions();
// Add input file path
options.AddInput(new FileData("path_to_your_pdf_file.pdf"));
// Set output Directory path
options.AddOutput(new DirectoryData("path_to_results_directory"));
// Perform the process
var results = PdfExtractor.Extract(options);
// Get path to image result
var imageExtracted = results.ResultCollection[0].ToFile();

L’exemple montre comment extraire des images d’un document PDF vers des flux sans dossier.

// Create ExtractImagesOptions to set instructions
var options = new ExtractImagesOptions();
// Add input file path
options.AddInput(new FileData("path_to_your_pdf_file.pdf"));
// Not set output - it will write results to streams
// Perform the process
var results = PdfExtractor.Extract(options);
// Get Stream
var ms = results.ResultCollection[0].ToStream();
// Copy data to file for demo
ms.Seek(0, SeekOrigin.Begin);
using (var fs = File.Create("test_file.png"))
{
    ms.CopyTo(fs);
}

Exceptions

ArgumentException
Si les options ne sont pas définies.

Extract(ExtractFormDataToDsvOptions)

Extract Form Data from PDF document.

public static ResultContainer Extract(ExtractFormDataToDsvOptions options)

Paramètres

Retour

ResultContainer : Un objet contenant le résultat de l’opération.

Exemples

L’exemple montre comment exporter les valeurs de formulaire vers un fichier CSV.

// Create ExtractFormDataToDsvOptions object to set instructions
var options = new ExtractFormDataToDsvOptions(',', true);
// Add input file path
options.AddInput(new FileData("path_to_your_pdf_file.pdf"));
// Set output file path
options.AddOutput(new FileData("path_to_result_csv_file.csv"));
// Perform the process
PdfExtractor.Extract(options);

L’exemple montre comment exporter les valeurs de formulaire vers un fichier TSV et définir les propriétés.

// Create ExtractFormDataToDsvOptions object to set instructions
var options = new ExtractFormDataToDsvOptions();
//Set Delimiter
options.Delimiter = '\t';
//Add Field Names to result
options.AddFieldName = true;
// Add input file path
options.AddInput(new FileData("path_to_your_pdf_file.pdf"));
// Set output file path
options.AddOutput(new FileData("path_to_result_csv_file.tsv"));
// Perform the process
PdfExtractor.Extract(options);

Exceptions

ArgumentException
Si les options ne sont pas définies.

Extract(ExtractPropertiesOptions)

Extract Properties from PDF document.

public static PdfProperties Extract(ExtractPropertiesOptions options)

Paramètres

Retour

PdfProperties : Un objet contenant le résultat de l’opération.

Exemples

L’exemple montre comment extraire les propriétés (FileName, Title, Author, Subject, Keywords, Created, Modified, Application, PDF Producer, Number of Pages) d’un fichier PDF.

// Create ExtractPropertiesOptions object to set input file
var options = new ExtractPropertiesOptions("path_to_your_pdf_file.pdf");
// Perform the process and get Properties
var pdfProperties = PdfExtractor.Extract(options);
var filename = pdfProperties.FileName;
var title = pdfProperties.Title;
var author = pdfProperties.Author;
var subject = pdfProperties.Subject;
var keywords = pdfProperties.Keywords;
var created = pdfProperties.Created;
var modified = pdfProperties.Modified;
var application = pdfProperties.Application;
var pdfProducer = pdfProperties.PdfProducer;
var numberOfPages = pdfProperties.NumberOfPages;

L’exemple montre comment extraire les propriétés (Title, Author, Subject, Keywords, Created, Modified, Application, PDF Producer, Number of Pages) d’un flux PDF.

// Create ExtractPropertiesOptions object to set input stream
var stream = File.OpenRead("path_to_your_pdf_file.pdf");
var options = new ExtractPropertiesOptions(stream);
// Perform the process and get Properties
var pdfProperties = PdfExtractor.Extract(options);
var title = pdfProperties.Title;
var author = pdfProperties.Author;
var subject = pdfProperties.Subject;
var keywords = pdfProperties.Keywords;
var created = pdfProperties.Created;
var modified = pdfProperties.Modified;
var application = pdfProperties.Application;
var pdfProducer = pdfProperties.PdfProducer;
var numberOfPages = pdfProperties.NumberOfPages;

L’exemple montre comment extraire les propriétés d’un fichier PDF de la manière la plus concise possible.

// Perform the process and get Properties
var pdfProperties = PdfExtractor.Extract(new ExtractPropertiesOptions("path_to_your_pdf_file.pdf"));

Exceptions

ArgumentException
Si les options ne sont pas définies.

Namespace : Documentize
Assembly : Documentize.dll

 Français