When converting a PDF to image then back to PDF redy for OCR using PdfToImageConverter the quality of the repurposed PDF is terrible.
I have attached the original PDF (original.pdf) and the converted to image and back to PDF (convertedtoimage.pdf) and this is a snippet of the code for conversion being used:
PdfToImageConverter imageConverter = new PdfToImageConverter();
// Load the PDF document as a stream.
FileStream stream = new FileStream(filePath, FileMode.Open);
imageConverter.Load(stream);
// Create a new PDF document to store the converted images.
PdfDocument doc = new PdfDocument();
// Iterate through each page of the input PDF, convert to image, and add to the new document
for (int i = 0; i < imageConverter.PageCount; i++)
{
// Convert PDF to Image.
Stream outputStream = imageConverter.Convert(i, true, false);
// Create a PdfBitmap from the converted image stream.
PdfBitmap pdfImage = new PdfBitmap(outputStream);
//Create a new PdfSection and add the page size.
PdfSection section = doc.Sections.Add();
//Set Margins
section.PageSettings.Margins.All = 0;
//Set the page size.
section.PageSettings.Size = new Syncfusion.Drawing.SizeF(pdfImage.PhysicalDimension.Width, pdfImage.PhysicalDimension.Height);
// Add a new page to section.
PdfPage page = section.Pages.Add();
// Obtain the graphics context for the current PDF page.
PdfGraphics graphics = page.Graphics;
// Draw the converted image onto the PDF page.
graphics.DrawImage(pdfImage, 0, 0, page.Size.Width, page.Size.Height);
}
// Save the new document with converted images to a memory stream
MemoryStream file = new MemoryStream();
doc.Save(file);
//Close the document.
doc.Close(true);
Hi Martin,
Currently we are analyzing on the reported behavior with the provided details on our end. We will provide the further details on November 21st, 2024.
Regards,
Irfana J.
Hi Martin,
In PdfToImageConverter.NET, it is possible to export a PDF page as an image with custom image resolution. While using this convert overload API, you can specify the number of rows and columns. By setting both the row and column counts to 1, you can significantly increase the image quality, as this forces the entire page to be rendered as a single high-resolution image.
For your reference, we have attached a sample implementation and code snippet demonstrating this approach.
PdfToImageConverter imageConverter = new PdfToImageConverter(); imageConverter.ScaleFactor = 2f; // Load the PDF document as a stream. FileStream stream = new FileStream(filePath, FileMode.Open); imageConverter.Load(stream); // Create a new PDF document to store the converted images. PdfDocument doc = new PdfDocument(); // Iterate through each page of the input PDF, convert to image, and add to the new document for (int i = 0; i < imageConverter.PageCount; i++) { float zoomFactor = 1; int tileXCount = 1; int tileYCount = 1; int tileX = 0; int tileY = 0; // Convert PDF to Image. Stream outputStream = imageConverter.Convert(i, zoomFactor, tileXCount, tileYCount, tileX, tileY); } |
Regards,
Irfana J