How to remove blank image

5 Replies
2 Participants

Created by
AB Ambrogio Brambilla

Platform
ASP.NET Web Forms

Platform
ASP.NET Web Forms

Control
PDF

Created On
Sep 14, 2017 09:39 AM UTC

Last Activity On
Sep 20, 2017 09:42 AM UTC

Want to subscribe?
SIGN IN

Hi.

We are working on a project which manages pdf file created via scanner.

We have documents with empty pages and we want to remove them. Unfortunately an empty page 'includes' a blank image.

How can we recognize that it is an empty page (so we can remove it)?

We tried with your example about image extract, but it fails because img.length is 1.

5 Replies

CM Chinnu Muniyappan Syncfusion Team September 15, 2017 09:23 AM UTC

Hi Ambrogio,

Thank you for contacting Syncfusion support.

We can identify the blank images by using OCRProcessor, we have created a simple sample for exporting the images from PDF document and processed the exported images using OCRProcessor and the OCRProcessor returns null or empty string then marked that page as a blank one. Please refer the below code snippet and sample for more details.

private bool IsBlankPage(PdfLoadedPage lpage)

{

bool isBlankPage = false;

//Extract images

Image[] images = lpage.ExtractImages();

if (images.Length > 0)

{

foreach (Image img in images)

{

if (!PerformOCR(img as Bitmap))

{

isBlankPage = false;

break;

}

else

isBlankPage = true;

}

else

{

isBlankPage = true;

}

return isBlankPage;

}

private bool PerformOCR(Bitmap img)

{

bool empty = false;

//Create a new OCR processor

using (OCRProcessor processor = new OCRProcessor(tesseractBinariesPath))

{

//Set language.

processor.Settings.Language = Languages.English;

//perform OCR

string text = processor.PerformOCR(img,tessdataPath);

if(text == null || text == string.Empty )

{

empty = true;

}

return empty;

}

Sample Link: http://www.syncfusion.com/downloads/support/forum/132658/ze/WFSample-2081739903

Please let us know if you have any concern.

Regards,

Chinnu

AB Ambrogio Brambilla September 15, 2017 09:40 AM UTC

Hi.

Thanks for the answer.

A question about your answer:

If the image doesn't include text (eg a picture), PerformOCR will return empty value and we will remove a good page (not only the empty one).

CM Chinnu Muniyappan Syncfusion Team September 18, 2017 10:32 AM UTC

Hi Ambrogio,

Thank you for your update.

Yes, if the image does not have any text then the PerformOCR will result empty text. At present, we do not have any image manipulation library for processing images. So that we are suggesting you to identify the empty images by processing the each image pixels individually. Please refer the below code snippet for more details.

1.Here we are processing all the image pixels.

2.If the pixel has colored data, then we consider not an empty image and skipped the process.

3.And also check if the image has 25% of black pixels, then marked it is not an empty image.

private bool IsEmptyImage(Bitmap image)

{

bool isEmpty = true;

int blackPixelCount = 0;

//Suspect 25% of image have black pixels then it is not an empty image.

int blackPixelRange = ((image.Width * image.Height) / 100) * 25;

for (int i = 0; i < image.Width; i++)

{

for (int j = 0; j < image.Height; j++)

{

Color color = image.GetPixel(i, j);

if (color.R == 255 && color.G == 255 && color.B == 255)

{

//Skip the white pixels

}

else if (color.R == 0 && color.G == 0 && color.B == 0)

{

//Get the black pixels count

blackPixelCount++;

}

else

{

//Colored pixels

isEmpty = false;

break;

}

if (blackPixelCount >= blackPixelRange)

{

isEmpty = false;

break;

}

if (!isEmpty)

break;

}

return isEmpty;

}

Please try the above workaround and let us know the details.

Regards,

Chinnu

AB Ambrogio Brambilla September 19, 2017 07:35 AM UTC

Hi. Thanks for your help.

Your solution works well but it is very slow. It takes 1 minute to work a 42 pages pdf file.

CM Chinnu Muniyappan Syncfusion Team September 20, 2017 09:42 AM UTC

Hi Ambrogio,

Yes, it takes some amount of time for processing all the image pixels by using Image.GetPixel method. We can overcome this by using Bitmap.LockBits methods, so we suggest you to use Bitmap.LockBits functions to avoid the performance related issues. Please refer the below code snippet for more details.

private bool IsEmpty(Bitmap image)

{

Rectangle bounds = new Rectangle(0, 0, image.Width, image.Height);

BitmapData bmpData = image.LockBits(bounds, ImageLockMode.ReadWrite, image.PixelFormat);

IntPtr ptr = bmpData.Scan0;

int bytes = Math.Abs(bmpData.Stride) * image.Height;

byte[] rgbValues = new byte[bytes];

// Copy the RGB values into the array.

Marshal.Copy(ptr, rgbValues, 0, bytes);

// Unlock the bits.

image.UnlockBits(bmpData);

//Suspect 25% of image have black pixels then it is not an empty image.

int blackPixelRange = ((image.Width * image.Height) / 100) * 25;

//Get the white pixels count

int whitePixelsCount = Enumerable.Range(0, rgbValues.Length).Where(i => rgbValues[i] == 255).ToList().Count;

//Get the black pixels count

int blackPixelsCount = Enumerable.Range(0, rgbValues.Length).Where(i => rgbValues[i] == 0).ToList().Count;

if ((blackPixelsCount + whitePixelsCount) != rgbValues.Length)

return false;

else if (blackPixelsCount >= blackPixelRange)

return false;

else

return true;

}

Please try the above workaround and let us know the results.

Regards,

Chinnu

5 Replies
2 Participants
Want to subscribe?
SIGN IN
Created by
AB Ambrogio Brambilla
Platform
ASP.NET Web Forms
Control
PDF
Created On
Sep 14, 2017 09:39 AM UTC
Last Activity On
Sep 20, 2017 09:42 AM UTC

Viewer Component

.NET PDF Processing Library

Conversions

Editor Component

.NET Word Processing Library

Conversions

Editor Component

.NET Excel Processing Library

Conversions

.NET PowerPoint Processing Library

Conversions

How to remove blank image

Enterprise Solutions

Free Products

Viewer Component

.NET PDF Processing Library

Conversions

Editor Component

.NET Word Processing Library

Conversions

Editor Component

.NET Excel Processing Library

Conversions

.NET PowerPoint Processing Library

Conversions

Learning

Resources

Support

How to remove blank image