I am evaluating the pdf control for redacting purposes. When I extractText from page, i see all the text values I need to redact. When I then use FindText to create a boundry area for redaction, it only finds the items if they are separated by 10-15 pixels in height. It finds every other and redacts great. When I try multiple passes, it still will not find them. Any suggestions? Could it be the font size the document uses? Fonts are CourierStd (Type 1), ArialMT (truetype), Microsoft Sans Serif (truetype).
Maybe font size 6 or 7.
PdfLoadedDocument loadedDocument = new PdfLoadedDocument("../../Input/test.pdf");
PdfFont font = new PdfTrueTypeFont(new Font("CourierStd", 6));
foreach (PdfLoadedPage page in loadedDocument.Pages)
{
//find all text that needs to be redacted
var acctNr6 = Regex.Matches(extractedText, @"4865(\d{12})", RegexOptions.IgnorePatternWhitespace);
var fText = acctNr6.Cast<Match>().Where(x => x.Value != "0000000000000000").Select(match => match.Value).Distinct().ToList();
//find actual location of extractedText need redact
List<MatchedItem> oRetDat = new List<MatchedItem>();
var res = loadedDocument.FindText(pageRes.Value.ToList(), pageRes.Key, out oRetDat);
foreach (MatchedItem oRect in oRetDat)
{
RectangleF redactionBound = new RectangleF(oRect.Bounds.X, oRect.Bounds.Y,
(oRect.Bounds.Width), (oRect.Bounds.Height-2));
page.Graphics.DrawRectangle(PdfBrushes.White, redactionBound);
page.Graphics.DrawString(oRect.Text.Replace('-',' ').Substring(oRect.Text.Length - 4, 4), font, PdfBrushes.Black, redactionBound.X, redactionBound.Y);
}
loadedDocument.Save("RedactedPDF.pdf");
loadedDocument.Close(true);
Process.Start("RedactedPDF.pdf");
}