DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
  • Refcardz
  • Trend Reports
  • Webinars
  • Zones
  • |
    • Agile
    • AI
    • Big Data
    • Cloud
    • Database
    • DevOps
    • Integration
    • IoT
    • Java
    • Microservices
    • Open Source
    • Performance
    • Security
    • Web Dev
DZone >

How to Search Text in a PDF using Regular Expression & Add Hyperlink over It

David Zondray user avatar by
David Zondray
·
Feb. 25, 15 · · Code Snippet
Like (0)
Save
Tweet
2.63K Views

Join the DZone community and get the full member experience.

Join For Free
This technical tip shows how .NET developers can search text inside PDF file using a regular expression and adding hyperlinks over the matches inside their .NET applications. To find a phrase and add hyperlink over it, first pass the regular expression as a parameter to the TextFragmentAbsorber constructor and then create a TextSearchOptions object which specifies whether the regular expression is used or not. After that get the matching phrases into TextFragments and loop though the matches to get their rectangular dimensions, change foreground color to blue (optional - to make it appear like hyperlink and create a link using the PdfContentEditor class' CreateWebLink(..) method. Lastly save the updated PDF using Save method of Document object.
//The following code snippet shows you how to search text inside PDF file using a regular expression and adding hyperlinks over the matches.

//C# Code Sample

//create absorber object to find all instances of the input search phrase
TextFragmentAbsorber absorber = new TextFragmentAbsorber("D[a-z]{7}:");
//Enable regular expression search
absorber.TextSearchOptions = new TextSearchOptions(true);
//open document
PdfContentEditor editor = new PdfContentEditor();
// bind source PDF file
editor.BindPdf("c:/pdftest/25741-out.pdf");
//accept the absorber for the page
editor.Document.Pages[1].Accept(absorber);

int[] dashArray = { };
String[] LEArray = { };
System.Drawing.Color blue = System.Drawing.Color.Blue;

//loop through the fragments
foreach (TextFragment textFragment in absorber.TextFragments)
{
    textFragment.TextState.ForegroundColor = Aspose.Pdf.Color.Blue;
    System.Drawing.Rectangle rect = new System.Drawing.Rectangle((int)textFragment.Rectangle.LLX,
        (int)Math.Round(textFragment.Rectangle.LLY), (int)Math.Round(textFragment.Rectangle.Width + 2),
        (int)Math.Round(textFragment.Rectangle.Height + 1));
    Enum[] actionName = new Enum[2] { Aspose.Pdf.InteractiveFeatures.PredefinedAction.Document_AttachFile, Aspose.Pdf.InteractiveFeatures.PredefinedAction.Document_ExtractPages };
    editor.CreateWebLink(rect, "http://www.aspose.com", 1, blue, actionName);
    editor.CreateLine(rect, "", (float)textFragment.Rectangle.LLX + 1, (float)textFragment.Rectangle.LLY - 1,
        (float)textFragment.Rectangle.URX, (float)textFragment.Rectangle.LLY - 1, 1, 1, blue, "S", dashArray, LEArray);
}
//Save & Close the document
editor.Save("c:/pdftest/TextReplaced_with_Links.pdf");
editor.Close();

//VB.NET Code Sample

'create absorber object to find all instances of the input search phrase
Dim absorber As Aspose.Pdf.Text.TextFragmentAbsorber = New Aspose.Pdf.Text.TextFragmentAbsorber("D[a-z]{7}")
'Enable regular expression search
absorber.TextSearchOptions = New TextSearchOptions(True)
'open document
Dim editor As Aspose.Pdf.Facades.PdfContentEditor = New Aspose.Pdf.Facades.PdfContentEditor()
' bind source PDF file
editor.BindPdf("c:/pdftest/25741-out.pdf")
'accept the absorber for the page
editor.Document.Pages(1).Accept(absorber)

Dim dashArray() As Integer = {}
Dim LEArray() As String = {}
Dim blue As System.Drawing.Color = System.Drawing.Color.Blue

'loop through the fragments
For Each textFragment As Aspose.Pdf.Text.TextFragment In absorber.TextFragments

    textFragment.TextState.ForegroundColor = Aspose.Pdf.Color.Blue
    Dim rect As System.Drawing.Rectangle = New System.Drawing.Rectangle(CType(textFragment.Rectangle.LLX, Integer),
        CType(Math.Round(textFragment.Rectangle.LLY), Integer), CType(Math.Round(textFragment.Rectangle.Width + 2), Integer),
        CType(Math.Round(textFragment.Rectangle.Height + 1), Integer))
    Dim actionName() As System.Enum = New System.Enum() {Aspose.Pdf.InteractiveFeatures.PredefinedAction.Document_AttachFile, Aspose.Pdf.InteractiveFeatures.PredefinedAction.Document_ExtractPages}
    editor.CreateWebLink(rect, "http://www.aspose.com", 1, blue, actionName)
    editor.CreateLine(rect, "", CType(textFragment.Rectangle.LLX + 1, Double), CType(textFragment.Rectangle.LLY - 1, Double),
        CType(textFragment.Rectangle.URX, Double), CType(textFragment.Rectangle.LLY - 1, Double), 1, 1, blue, "S", dashArray, LEArray)
Next

'Save & Close the document
editor.Save("c:/pdftest/TextReplaced_with_Links.pdf")
editor.Close()

PDF

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Why Is Software Integration Important for Business?
  • How Database B-Tree Indexing Works
  • Maven Tutorial: Nice and Easy [Video]
  • A Simple Guide to Heaps, Stacks, References, and Values in JavaScript

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • MVB Program
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends:

DZone.com is powered by 

AnswerHub logo