Best practice - merging a mix of existing PDFs, and filling out PDF templates in memory using C# .NET Core and Azure blob storage

Question

Hello, I am going to try to explain this at a high level since I am fairly certain I'll have to rewrite a bunch of existing code and I am not sure providing the existing code at this point will be helpful.

I have an C# .NET Core application deployed in Azure, and utilizing PDF files and a PDF template stored in Azure blob storage. I didn't write the SyncFusion code to create a report, but I am looking at fixing an issue we have where it seems the larger the report, the more likely it is to error out without giving clear detail why. It does consistently error out 4 minutes into the report running though.

The report: our users provide personal details in our application and can add up to 10 PDF attachments to an application.

When we create a report, we do the following from the API side:

- Use some business logic to filter out users based on requirements before using any logic to create the PDF

- Have a placeholder byte array that represents the final report.

This is the primary logic I am looking for advice on for creating the report itself - which is one enormous PDF:

- Right now, as we loop through the users, we start by filling out a PDF template for the user matching key-value pairs of their personal details (coming from a SQL DB) to the template fields. We do validation to make sure the keys match so it errors out if they don't. When done, we flatten the PDF and merge it into the final report array.

- We look to see if that user included PDF attachments. If they did, we loop through and merge each one one-by-one to the final report.

- We then go to the next user, fill out that template again and flatten it, append it, look for their 0-10 attachments, merge them one-by-one, and continue on.

At the end then, we have an initial page for each user with that flattened, filled-out template, followed by 0-10 of their attachments.

Something we've noticed is a big performance issue with our current code. We have an example where there are a total of 194 pages in the final report (so those flattened templates + attachments), and it'll take between 2 1/2 to almost 4 minutes to fill when running our code locally, but outright errors out when run from Azure - it seems to time out or have a memory leak when it is running in Azure.

What advice can you give to make sure this process works well for each of these points:

1.) Should we be storing the filled-out, flattened PDF template in Azure before involving it in the process to make the final report? Or is it unlikely this is part of our performance issue? Right now, that filled-out PDF only exists in memory/the final byte array. We do not save it separately. It only exists in the context of this report.

2.) Is there a best practice for loading/downloading PDFs from Azure to work with your libraies? Our code right now doesn't seem to be using SyncFusion-related libraries at those steps. If there are attachments for the user, it looks like we're using just Microsoft classes for getting the blob container, container reference, block blob reference for the actual file, and then using DownloadToByteArray of the CloudBlockBlob for the PDFs, and we are doing that each time. I've seen some super short code examples on the site where you just use DirectoryInfo to get all the PDFs in a folder. I think that could work with our existing set up, since we do store the attachments for a given user in one folder, but tying that into Azure setup is less clear to me.

3.) What exceptions and tools do you have in the SyncFusion libraries that could help us pinpoint where we're going wrong with performance in building this report? I've been trying to hunt those down. I've added a few catch's forPdfException but am a bit at a loss as to how to more clearly discover where else our code might be inefficient.

4.) Based off the context of what we're trying to do, is there any other advice you can give for fixing our process so it time out? We do close the PdfLoadedDocuments after Saves, and I believe that's where you get possible benefit if you use this setting:

document.EnableMemoryOptimization = true;

But I've noticed with our existing process, document.EnableMemoryOptimization = true does not result in any real consistent time difference in how long the same report is generated.

5.) Is there any point where you'd say we should actually break up this report into more reports? e.g. what is the upper limit for your product to merge documents without timing out? Would it be something like 200, 400, 1,000 PDFs?

From what I've seen on the best practices pages, these are important:

- With bigger PDFs and more PDFs, not opening everything into memory at once - combining bit by bit - seems to be important for performance.

- Usedocument.EnableMemoryOptimization = true; if it seems to help reduce time - but when I tested just this with out existing code, I saw no noticeable difference in performance.

Thanks for any help. I can also provide additional detail and code snippets for any of what I am describing above if it helps.

Ravikumar Baladhandapani · Answer

Hi Megan, 





1.) Should we be storing the filled-out, flattened PDF template in Azure before involving it in the process to make the final report? Or is it unlikely this is part of our performance issue? Right now, that filled-out PDF only exists in memory/the final byte array. We do not save it separately. It only exists in the context of this report. 

While saving the documents in byte array(memory stream), it will increase on time memory and affect the performance. 
Instead of that ,you can save the document in FileStream and store the temporary file server, If it is possible. Finally you can upload the azure sever and  merge the documents. It will reduce the on time memory and improve the performance. 


2.) Is there a best practice for loading/downloading PDFs from Azure to work with your libraies? Our code right now doesn't seem to be using SyncFusion-related libraries at those steps. If there are attachments for the user, it looks like we're using just Microsoft classes for getting the blob container, container reference, block blob reference for the actual file, and then using DownloadToByteArray of the CloudBlockBlob for the PDFs, and we are doing that each time. I've seen some super short code examples on the site where you just use DirectoryInfo to get all the PDFs in a folder. I think that could work with our existing set up, since we do store the attachments for a given user in one folder, but tying that into Azure setup is less clear to me. 

In our library , we could not have any API to upload or download in azure server. You can use Microsoft to  upload or download The files. 


3.) What exceptions and tools do you have in the SyncFusion libraries that could help us pinpoint where we're going wrong with performance in building this report? I've been trying to hunt those down. I've added a few catch's forPdfException but am a bit at a loss as to how to more clearly discover where else our code might be inefficient. 
 

Kindly share the code snippet for filling and merging process in your project. It will helpful to analyze and pinpoint performance issue. 
Kindly share the below details to validate the issue, 

Code snippet
Sample,
Which technology you have used to deploy the project.


4.) Based off the context of what we're trying to do, is there any other advice you can give for fixing our process so it time out? We do close the PdfLoadedDocuments after Saves, and I believe that's where you get possible benefit if you use this setting: 
document.EnableMemoryOptimization = true; 
But I've noticed with our existing process, document.EnableMemoryOptimization = true does not result in any real consistent time difference in how long the same report is generated. 
 

As of now , you have properly processing to optimize the performance. Kindly share the code snippet to find which operation you have performing. It will helpful to further analyze on this. 


5.) Is there any point where you'd say we should actually break up this report into more reports? e.g. what is the upper limit for your product to merge documents without timing out? Would it be something like 200, 400, 1,000 PDFs? 
From what I've seen on the best practices pages, these are important: 
- With bigger PDFs and more PDFs, not opening everything into memory at once - combining bit by bit - seems to be important for performance. 
- Usedocument.EnableMemoryOptimization = true; if it seems to help reduce time - but when I tested just this with out existing code, I saw no noticeable difference in performance. 
 

In our pdf library merging speed and performance based on the environment memory size(RAM) and processing speed. 
 
If the document contains more than 500 or 1000 pages means ,we have merge 10, 10 pages and saved in the temporary folder and finally we have merge whole documents into single pdf. It will reduce the on time memory and improve the performance. 
 

Regards, 
Ravikumar.

Megan Anderson · Answer

#1 This should work for us. I already have ideas for implementing it, and I think that'll be my next step for troubleshooting.

For #3 and #4, if I attach the code in a file, does it become available for anyone to download, or does it just become available to SyncFusion support? Just want to make sure I clean it up enough before submitting if it becomes available to public via this forum too.

To answer the other part: our code for this is part of a larger C# .NET Core API deployed to an Azure App Service under the Basic tier. We deploy via CI/CD pipelines.

Gowthamraj Kumar · Answer

Hi Megan, 
 
Could you please report this issue through Direct Trac Developer Support System with input pdf files, (https://www.syncfusion.com/support/directtrac/incidents) because , it is secure and you can take the advantage of the expertise of a dedicated a support engineer and a guaranteed response time and we hope you will take advantage of this system as well. If you have already reported, please ignore this. 
 
Regards, 
Gowthamraj K

1.) Should we be storing the filled-out, flattened PDF template in Azure before involving it in the process to make the final report? Or is it unlikely this is part of our performance issue? Right now, that filled-out PDF only exists in memory/the final byte array. We do not save it separately. It only exists in the context of this report.	While saving the documents in byte array(memory stream), it will increase on time memory and affect the performance. Instead of that ,you can save the document in FileStream and store the temporary file server, If it is possible. Finally you can upload the azure sever and merge the documents. It will reduce the on time memory and improve the performance.
2.) Is there a best practice for loading/downloading PDFs from Azure to work with your libraies? Our code right now doesn't seem to be using SyncFusion-related libraries at those steps. If there are attachments for the user, it looks like we're using just Microsoft classes for getting the blob container, container reference, block blob reference for the actual file, and then using DownloadToByteArray of the CloudBlockBlob for the PDFs, and we are doing that each time. I've seen some super short code examples on the site where you just use DirectoryInfo to get all the PDFs in a folder. I think that could work with our existing set up, since we do store the attachments for a given user in one folder, but tying that into Azure setup is less clear to me.	In our library , we could not have any API to upload or download in azure server. You can use Microsoft to upload or download The files.
3.) What exceptions and tools do you have in the SyncFusion libraries that could help us pinpoint where we're going wrong with performance in building this report? I've been trying to hunt those down. I've added a few catch's forPdfException but am a bit at a loss as to how to more clearly discover where else our code might be inefficient.	Kindly share the code snippet for filling and merging process in your project. It will helpful to analyze and pinpoint performance issue. Kindly share the below details to validate the issue, Code snippet Sample, Which technology you have used to deploy the project.
4.) Based off the context of what we're trying to do, is there any other advice you can give for fixing our process so it time out? We do close the PdfLoadedDocuments after Saves, and I believe that's where you get possible benefit if you use this setting: document.EnableMemoryOptimization = true; But I've noticed with our existing process, document.EnableMemoryOptimization = true does not result in any real consistent time difference in how long the same report is generated.	As of now , you have properly processing to optimize the performance. Kindly share the code snippet to find which operation you have performing. It will helpful to further analyze on this.
5.) Is there any point where you'd say we should actually break up this report into more reports? e.g. what is the upper limit for your product to merge documents without timing out? Would it be something like 200, 400, 1,000 PDFs? From what I've seen on the best practices pages, these are important: - With bigger PDFs and more PDFs, not opening everything into memory at once - combining bit by bit - seems to be important for performance. - Usedocument.EnableMemoryOptimization = true; if it seems to help reduce time - but when I tested just this with out existing code, I saw no noticeable difference in performance.	In our pdf library merging speed and performance based on the environment memory size(RAM) and processing speed. If the document contains more than 500 or 1000 pages means ,we have merge 10, 10 pages and saved in the temporary folder and finally we have merge whole documents into single pdf. It will reduce the on time memory and improve the performance.