We use cookies to give you the best experience on our website. If you continue to browse, then you agree to our privacy policy and cookie policy.
Unfortunately, activation email could not send to your email. Please try again.

How to remove blank image

Thread ID:

Created:

Updated:

Platform:

Replies:

132658 Sep 14,2017 05:39 AM Sep 20,2017 05:42 AM ASP.NET Web Forms 5
loading
Tags: PDF
Ambrogio
Asked On September 14, 2017 10:16 AM

Hi.
We are working on a project which manages pdf file created via scanner.
We have documents with empty pages and we want to remove them. Unfortunately an empty page 'includes' a blank image.
How can we recognize that it is an empty page (so we can remove it)?
We tried with your example about image extract, but it fails because img.length is 1.


Chinnu Muniyappan [Syncfusion]
Replied On September 15, 2017 05:23 AM

Hi Ambrogio, 
 
Thank you for contacting Syncfusion support. 
 
We can identify the blank images by using OCRProcessor, we have created a simple sample for exporting the images from PDF document and processed the exported images using OCRProcessor and the OCRProcessor returns null or empty string then marked that page as a blank one. Please refer the below code snippet and sample for more details. 
 
private bool IsBlankPage(PdfLoadedPage lpage) 
        { 
            bool isBlankPage = false; 
 
           //Extract images 
            Image[] images = lpage.ExtractImages(); 
 
            if (images.Length > 0) 
            { 
                foreach (Image img in images) 
                { 
                    if (!PerformOCR(img as Bitmap)) 
                    { 
                        isBlankPage = false; 
                        break; 
                    } 
                    else 
                        isBlankPage = true; 
                } 
            } 
            else 
            { 
                isBlankPage = true; 
            } 
            return isBlankPage; 
        } 
 
private bool PerformOCR(Bitmap img) 
        { 
            bool empty = false; 
            //Create a new OCR processor 
            using (OCRProcessor processor = new OCRProcessor(tesseractBinariesPath)) 
            { 
                //Set language. 
                processor.Settings.Language = Languages.English; 
 
                //perform OCR 
                string text = processor.PerformOCR(img,tessdataPath); 
 
                if(text == null || text == string.Empty ) 
                { 
                    empty = true; 
                } 
            } 
            return empty; 
        } 
 
 
Please let us know if you have any concern. 
 
Regards, 
Chinnu 


Ambrogio
Replied On September 15, 2017 05:40 AM

Hi.

Thanks for the answer.

A question about your answer:

If the image doesn't include text (eg a picture), PerformOCR will return empty value and we will remove a good page (not only the empty one).



Chinnu Muniyappan [Syncfusion]
Replied On September 18, 2017 06:32 AM

Hi Ambrogio,  
Thank you for your update. 
 
Yes, if the image does not have any text then the PerformOCR will result empty text. At present, we do not have any image manipulation library for processing images. So that we are suggesting you to identify the empty images by processing the each image pixels individually. Please refer the below code snippet for more details. 
1.Here we are processing all the image pixels. 
2.If the pixel has colored data, then we consider not an empty image and skipped the process. 
3.And also check if the image has 25% of black pixels, then marked it is not an empty image.  
 
private bool IsEmptyImage(Bitmap image) 
        { 
            bool isEmpty = true; 
            int blackPixelCount = 0; 
 
            //Suspect 25% of image have black pixels then it is not an empty image. 
            int blackPixelRange = ((image.Width * image.Height) / 100) * 25; 
 
            for (int i = 0; i < image.Width; i++) 
            { 
                for (int j = 0; j < image.Height; j++) 
                { 
                    Color color = image.GetPixel(i, j); 
 
                    if (color.R == 255 && color.G == 255 && color.B == 255) 
                    { 
                        //Skip the white pixels 
                    } 
                    else if (color.R == 0 && color.G == 0 && color.B == 0) 
                    { 
                        //Get the black pixels count 
                        blackPixelCount++; 
                    } 
                    else 
                    { 
                        //Colored pixels  
                        isEmpty = false; 
                        break; 
                    } 
 
                    if (blackPixelCount >= blackPixelRange) 
                    { 
                        isEmpty = false; 
                        break; 
                    } 
                } 
                if (!isEmpty) 
                    break; 
            } 
            return isEmpty; 
        }  
     
 
Please try the above workaround and let us know the details. 
Regards, 
Chinnu 


Ambrogio
Replied On September 19, 2017 03:35 AM

Hi. Thanks for your help.

Your solution works well but it is very slow. It takes 1 minute to work a 42 pages pdf file.


Chinnu Muniyappan [Syncfusion]
Replied On September 20, 2017 05:42 AM

Hi Ambrogio,   
 
Yes, it takes some amount of time for processing all the image pixels by using Image.GetPixel method. We can overcome this by using Bitmap.LockBits methods, so we suggest you to use Bitmap.LockBits functions to avoid the performance related issues. Please refer the below code snippet for more details. 
 
 
private bool IsEmpty(Bitmap image) 
        {               
            Rectangle bounds = new Rectangle(0, 0, image.Width, image.Height); 
 
            BitmapData bmpData = image.LockBits(bounds, ImageLockMode.ReadWrite, image.PixelFormat); 
            
            IntPtr ptr = bmpData.Scan0; 
            
            int bytes = Math.Abs(bmpData.Stride) * image.Height; 
 
            byte[] rgbValues = new byte[bytes]; 
 
            // Copy the RGB values into the array. 
            Marshal.Copy(ptr, rgbValues, 0, bytes); 
 
            // Unlock the bits. 
            image.UnlockBits(bmpData); 
 
            //Suspect 25% of image have black pixels then it is not an empty image. 
            int blackPixelRange = ((image.Width * image.Height) / 100) * 25; 
 
            //Get the white pixels count 
            int whitePixelsCount = Enumerable.Range(0, rgbValues.Length).Where(i => rgbValues[i] == 255).ToList().Count; 
 
            //Get the black pixels count 
            int blackPixelsCount = Enumerable.Range(0, rgbValues.Length).Where(i => rgbValues[i] == 0).ToList().Count; 
 
            if ((blackPixelsCount + whitePixelsCount) != rgbValues.Length) 
                return false; 
            else if (blackPixelsCount >= blackPixelRange) 
                return false; 
            else 
                return true; 
        } 
 
 
 
Please try the above workaround and let us know the results. 
 
Regards, 
Chinnu 


CONFIRMATION

This post will be permanently deleted. Are you sure you want to continue?

Sorry, An error occured while processing your request. Please try again later.

You are using an outdated version of Internet Explorer that may not display all features of this and other websites. Upgrade to Internet Explorer 8 or newer for a better experience.

;