We use cookies to give you the best experience on our website. If you continue to browse, then you agree to our privacy policy and cookie policy. Image for the cookie policy date
close icon

How to remove blank image

Hi.
We are working on a project which manages pdf file created via scanner.
We have documents with empty pages and we want to remove them. Unfortunately an empty page 'includes' a blank image.
How can we recognize that it is an empty page (so we can remove it)?
We tried with your example about image extract, but it fails because img.length is 1.


5 Replies

CM Chinnu Muniyappan Syncfusion Team September 15, 2017 09:23 AM UTC

Hi Ambrogio, 
 
Thank you for contacting Syncfusion support. 
 
We can identify the blank images by using OCRProcessor, we have created a simple sample for exporting the images from PDF document and processed the exported images using OCRProcessor and the OCRProcessor returns null or empty string then marked that page as a blank one. Please refer the below code snippet and sample for more details. 
 
private bool IsBlankPage(PdfLoadedPage lpage) 
        { 
            bool isBlankPage = false; 
 
           //Extract images 
            Image[] images = lpage.ExtractImages(); 
 
            if (images.Length > 0) 
            { 
                foreach (Image img in images) 
                { 
                    if (!PerformOCR(img as Bitmap)) 
                    { 
                        isBlankPage = false; 
                        break; 
                    } 
                    else 
                        isBlankPage = true; 
                } 
            } 
            else 
            { 
                isBlankPage = true; 
            } 
            return isBlankPage; 
        } 
 
private bool PerformOCR(Bitmap img) 
        { 
            bool empty = false; 
            //Create a new OCR processor 
            using (OCRProcessor processor = new OCRProcessor(tesseractBinariesPath)) 
            { 
                //Set language. 
                processor.Settings.Language = Languages.English; 
 
                //perform OCR 
                string text = processor.PerformOCR(img,tessdataPath); 
 
                if(text == null || text == string.Empty ) 
                { 
                    empty = true; 
                } 
            } 
            return empty; 
        } 
 
 
Please let us know if you have any concern. 
 
Regards, 
Chinnu 



AB Ambrogio Brambilla September 15, 2017 09:40 AM UTC

Hi.

Thanks for the answer.

A question about your answer:

If the image doesn't include text (eg a picture), PerformOCR will return empty value and we will remove a good page (not only the empty one).




CM Chinnu Muniyappan Syncfusion Team September 18, 2017 10:32 AM UTC

Hi Ambrogio,  
Thank you for your update. 
 
Yes, if the image does not have any text then the PerformOCR will result empty text. At present, we do not have any image manipulation library for processing images. So that we are suggesting you to identify the empty images by processing the each image pixels individually. Please refer the below code snippet for more details. 
1.Here we are processing all the image pixels. 
2.If the pixel has colored data, then we consider not an empty image and skipped the process. 
3.And also check if the image has 25% of black pixels, then marked it is not an empty image.  
 
private bool IsEmptyImage(Bitmap image) 
        { 
            bool isEmpty = true; 
            int blackPixelCount = 0; 
 
            //Suspect 25% of image have black pixels then it is not an empty image. 
            int blackPixelRange = ((image.Width * image.Height) / 100) * 25; 
 
            for (int i = 0; i < image.Width; i++) 
            { 
                for (int j = 0; j < image.Height; j++) 
                { 
                    Color color = image.GetPixel(i, j); 
 
                    if (color.R == 255 && color.G == 255 && color.B == 255) 
                    { 
                        //Skip the white pixels 
                    } 
                    else if (color.R == 0 && color.G == 0 && color.B == 0) 
                    { 
                        //Get the black pixels count 
                        blackPixelCount++; 
                    } 
                    else 
                    { 
                        //Colored pixels  
                        isEmpty = false; 
                        break; 
                    } 
 
                    if (blackPixelCount >= blackPixelRange) 
                    { 
                        isEmpty = false; 
                        break; 
                    } 
                } 
                if (!isEmpty) 
                    break; 
            } 
            return isEmpty; 
        }  
     
 
Please try the above workaround and let us know the details. 
Regards, 
Chinnu 



AB Ambrogio Brambilla September 19, 2017 07:35 AM UTC

Hi. Thanks for your help.

Your solution works well but it is very slow. It takes 1 minute to work a 42 pages pdf file.



CM Chinnu Muniyappan Syncfusion Team September 20, 2017 09:42 AM UTC

Hi Ambrogio,   
 
Yes, it takes some amount of time for processing all the image pixels by using Image.GetPixel method. We can overcome this by using Bitmap.LockBits methods, so we suggest you to use Bitmap.LockBits functions to avoid the performance related issues. Please refer the below code snippet for more details. 
 
 
private bool IsEmpty(Bitmap image) 
        {               
            Rectangle bounds = new Rectangle(0, 0, image.Width, image.Height); 
 
            BitmapData bmpData = image.LockBits(bounds, ImageLockMode.ReadWrite, image.PixelFormat); 
            
            IntPtr ptr = bmpData.Scan0; 
            
            int bytes = Math.Abs(bmpData.Stride) * image.Height; 
 
            byte[] rgbValues = new byte[bytes]; 
 
            // Copy the RGB values into the array. 
            Marshal.Copy(ptr, rgbValues, 0, bytes); 
 
            // Unlock the bits. 
            image.UnlockBits(bmpData); 
 
            //Suspect 25% of image have black pixels then it is not an empty image. 
            int blackPixelRange = ((image.Width * image.Height) / 100) * 25; 
 
            //Get the white pixels count 
            int whitePixelsCount = Enumerable.Range(0, rgbValues.Length).Where(i => rgbValues[i] == 255).ToList().Count; 
 
            //Get the black pixels count 
            int blackPixelsCount = Enumerable.Range(0, rgbValues.Length).Where(i => rgbValues[i] == 0).ToList().Count; 
 
            if ((blackPixelsCount + whitePixelsCount) != rgbValues.Length) 
                return false; 
            else if (blackPixelsCount >= blackPixelRange) 
                return false; 
            else 
                return true; 
        } 
 
 
 
Please try the above workaround and let us know the results. 
 
Regards, 
Chinnu 


Loader.
Live Chat Icon For mobile
Up arrow icon