Trying to generate pdfs that may have 1000+ pages quickly.

I am trying to take a pdf form fill it out and then add it as a page to one document. This process would be repeated for every set of data associated with one file (which could have from 100-10000+ records to be turned into a pdf). 

I have tried the below,

Filling out the form and importing all pages into a pdf that holds all pages. Then after all pages are added flattening and saving the pdf with all pages. This is fast on the filling and importing but very very slow on saving and flattening.

Filling out the form and importing 50 pages into one pdf. Once the 50 are flattened and saved (this does not take that long), I then try to merge the pdfs into one. This works but then once we get up in the 1000 page range, the merge starts to take a long time.

Filling out the form and then trying to create a template page from the filled out form and drawing it on the output pdf. This is fast but I can not see how to get the data on the form to also stay when the pdf template page is created (if this is even possible).

Populating a class that holds all the data that needs to go on the form and also holds the coordinates to where on the pdf the text should be drawn. This seems to be effective and fast but time consuming to set up the location of every position data needs to be filled on the pdf.


I believe my answer maybe a mix between some of the above, and was wondering if there is any help that can be given for this issue.


6 Replies

GK Gowthamraj Kumar Syncfusion Team August 19, 2021 12:27 PM UTC

Hi Chris,  
  
Thank you for contacting Syncfusion support. 
 
Filling out the form and importing all pages into a pdf that holds all pages. Then after all pages are added flattening and saving the pdf with all pages. This is fast on the filling and importing but very very slow on saving and flattening. 
Yes. It will take a responsible time to saving and flattening the large PDF document using PDF library.   
 
Filling out the form and importing 50 pages into one pdf. Once the 50 are flattened and saved (this does not take that long), I then try to merge the pdfs into one. This works but then once we get up in the 1000 page range, the merge starts to take a long time. 
We can manage memory while merging large PDF documents. By Setting the EnableMemoryOptimization property of the PdfLoadedDocument to true reduces the memory usage when its instance is closed. If the document has more content, then merge function will take responsible time. Please share the input form document with form data to enhance the performance.   
 
Filling out the form and then trying to create a template page from the filled out form and drawing it on the output pdf. This is fast but I can not see how to get the data on the form to also stay when the pdf template page is created (if this is even possible). 
 
Populating a class that holds all the data that needs to go on the form and also holds the coordinates to where on the pdf the text should be drawn. This seems to be effective and fast but time consuming to set up the location of every position data needs to be filled on the pdf. 
 
We request you to share more details about this approach or exact requirement such as complete code snippet (form template), output document, to analyze on our end.  So, that it will be helpful for us to analyze and assist you further on this.   
 
 
Regards,  
Gowthamraj K 



CO Chris O August 19, 2021 06:10 PM UTC

Thanks for the input on the options. We ended up also trying another option that we are planning to go with due to processing time.


We are going to take a pdf page that has the form that needs to be filled out, then every field has a location specifying the x and y coordinates that the text needs to be at for each field. It will then draw the strings onto the template page for each record. It seems to be quick and saving is fast. After I get the data into a object that represents that data, I then do a loop through each page of the template form. I draw this page on the output pdf and then populate the form. Populating the form takes the data in the object along with the location that data needs to go on what page of the form, and draw it on the form. Since the main template is already drawn on the output PDF, I just continue to the next record and repeat till completed.


If there is anything that maybe more optimal then the above approach, please let me know.


In regards to the last option, what was tried was filling out a pdf form (this form has many different fields). Then after the form was filled out (using the forms fields) the below code would run. This code would take the pages from the filled form and make it a template page. The create template function seemed to remove any fields and data that was in them instead of just keeping the data without the fields (this was to try and get around the slowness of flattening). Once all pages were added we would then save the pdf.

foreach (PdfLoadedPage page in form.Pages)

{

        var pageTemplate = outputPdf.Pages.Add();

        pageTemplate.Graphics.DrawPdfTemplate(page.CreateTemplate(), Syncfusion.Drawing.PointF.Empty, new Syncfusion.Drawing.SizeF(page.Size.Width, page.Size.Height));

}



GK Gowthamraj Kumar Syncfusion Team August 20, 2021 01:06 PM UTC

Hi Chris, 

Thank you for your details. 

We suggest you follow the below steps to improve the performance. 

  1. Load the form document
  2. Fill the forms and flatten and save the document
  3. Again load the flattened document and import it to the new document.
 
Please find the simple code snipper for your reference. 

static void Main(string[] args) 
        { 

            PdfDocument document = new PdfDocument(); 

            Stream filledForm = Program.FillPDFForm("form.pdf"); 

            PdfLoadedDocument ldoc = new PdfLoadedDocument(filledForm); 
            foreach (PdfLoadedPage page in ldoc.Pages) 
            { 
                var pageTemplate = document.Pages.Add(); 
                pageTemplate.Graphics.DrawPdfTemplate(page.CreateTemplate(), PointF.Empty, new SizeF(page.Size.Width, page.Size.Height)); 

            } 

            document.Save("sample.pdf"); 
            document.Close(true); 
            ldoc.Close(true); 
            filledForm.Dispose(); 
        } 

        private static Stream FillPDFForm(String fileName) 
        { 
            PdfLoadedDocument doc = new PdfLoadedDocument(fileName); 
            (doc.Form.Fields[0] as PdfLoadedTextBoxField).Text = "sample"; 
            doc.Form.Flatten = true; 
            MemoryStream stream = new MemoryStream(); 
            doc.Save(stream); 
            doc.Close(true); 
            stream.Position = 0; 
            return stream; 
        } 
 
Please try the above suggestion in your side and let us know the results. 

Regards, 
Gowthamraj K 



CO Chris O August 20, 2021 08:01 PM UTC

I have implemented the above mixed with the other options. So now I create pdfs that have 50 pages up to the number of pages that I need. I make sure to flatten and save the pdfs to a drive. Then in another step I pull each pdfs data into a stream and go through each page saving it to an output pdf as a template. I save and close the output pdf at the end.


This seems to be consistent but, compared to placing the text by coordinates, the pdf file size for the same amount of records is a lot more. The pdf that was not generated from a form is only 538 KB, where the one that was generated by the new method is 11 MB. These pdfs both have 50 pages. This is quite a jump considering I am using the same create template function. With a pdf that has 299 pages, the difference is 1.7 MB to 70 MB. Is there a way after flattening that I can remove a lot of this extra data from the pdfs?



GK Gowthamraj Kumar Syncfusion Team August 23, 2021 10:58 AM UTC

Hi Chris, 
 
Thank you for your update.

Currently, we are trying to reproduce the reported issue on our end and we will update the further details by August 24th 2021.

Meanwhile, we request you to share the input document, output document, complete code snippet or simple sample to reproduce the issue on our end.
So, that it will be helpful for us to analyze and assist you further on this.   
 
Regards, 
Gowthamraj K 



GK Gowthamraj Kumar Syncfusion Team August 24, 2021 02:30 PM UTC

Hi Chris, 
 
Thank you for your update. 
 
We have checked the reported issue with provided details on our end. While flattening the document, its removing the existing form field and replacing it with graphical objects that would resemble the form field and cannot be edited. Since, we are implementing Incremental Update in our library to modify the pdf document content. This IncrementalUpdate will write the updated object after the actual document content. You can also restructure the entire pdf document by setting the incremental update to false.   

//Disable the incremental update  
document.FileStructure.IncrementalUpdate = false;   


If still the documents has more size , you can compress the document by using our compression support. Please refer the below link for more information,
UG: https://help.syncfusion.com/file-formats/pdf/working-with-compression 

Please let us know if you need any further assistance with this. 

Regards, 
Gowthamraj K 


Loader.
Up arrow icon