Link to home
Start Free TrialLog in
Avatar of Rick Becker
Rick BeckerFlag for United States of America

asked on

How to edit/modify text in PDF document with ITextSharp

Greetings;

I have a multi page PDF file that is a String of Shipping Labels, The labels have Line Items that reflect the items purchased by our customers, Each line item contains our 'ItemSKU', 'ItemDescription' and 'ItemQuantity'.

The ItemSKU is a 'Parent SKU' which I need to replace with a substitute SKU if a give product is out of stock. The Labels are generated via a separate process where a PDF file with multiple labels/pages. I process this PDF file with a completely different process which determines whether or not the 'Parent SKU' needs to be replaced with a Substitute SKU,

I have gone through many iterations but have not been successful in creating a working program. The following Code is what i tried last and seem to hold much promise but it fails after the First page/label has been processed. I believe that a better set of eyes might be able to spot my mistakes.. Any and All help is appreciated. Thanks in advance,

Public Function MyReplacePDFText() As Integer

        Dim psStamp As PdfStamper = Nothing 'PDF Stamper Object
        Dim pcbContent As PdfContentByte = Nothing 'Read PDF Content

        Dim strSource As String
        Dim strDest As String

        Dim strSearch As String
        Dim scCase As StringComparison

        scCase = StringComparison.OrdinalIgnoreCase

        strSource = MultiQuantManagerForm.txtPDFFileName.Text
        strDest = TempDataStorageFolder + "\TestTextReplacement.pdf"


        If File.Exists(strSource) Then 'Check If File Exists

            Dim pdfFileReader As New PdfReader(strSource) 'Read Our File

            psStamp = New PdfStamper(pdfFileReader, New FileStream(strDest, FileMode.Create)) 'Read Underlying Content of PDF File


            For intCurrPage As Integer = 1 To pdfFileReader.NumberOfPages 'Loop Through All Pages

                Dim lteStrategy As LocTextExtraction.LocTextExtractionStrategy = New LocTextExtraction.LocTextExtractionStrategy 'Read PDF File Content Blocks

                pcbContent = psStamp.GetUnderContent(intCurrPage) 'Look At Current Block

                'Determine Spacing of Block To See If It Matches Our Search String
                lteStrategy.UndercontentCharacterSpacing = pcbContent.CharacterSpacing
                lteStrategy.UndercontentHorizontalScaling = pcbContent.HorizontalScaling

                'Trigger The Block Reading Process
                Dim currentText As String = PdfTextExtractor.GetTextFromPage(pdfFileReader, intCurrPage, lteStrategy)

                '#############################################################################
                ' RRB Added Code:
                ' Use my previous code to extract the ItemSKU from This this PDF page. Replace
                ' That sku with a dummy sku for now.. More code needed to find the Correct SKU
                '################################################################################
                'CustomerSKU
                strSearch = ""
                strSearch = ExtractThisFieldFromPDF(MultiQuantManagerForm.txtPDFFileName.Text, intCurrPage, "ItemSKU")
                '################################################################################

                'Determine Match(es)
                Dim lstMatches As List(Of iTextSharp.text.Rectangle) = lteStrategy.GetTextLocations(strSearch, scCase)

                Dim pdLayer As PdfLayer 'Create New Layer
                pdLayer = New PdfLayer("Overrite", psStamp.Writer) 'Enable Overwriting Capabilities

                'Set Fill Colour Of Replacing Layer
                pcbContent.SetColorFill(iTextSharp.text.BaseColor.BLACK)

                Dim list = DirectCast(lstMatches, IList)

                For Each rctRect As iTextSharp.text.Rectangle In lstMatches 'Loop Through Each Match

                    pcbContent.Rectangle(rctRect.Left, rctRect.Bottom, rctRect.Width, rctRect.Height) 'Create New Rectangle For Replacing Layer

                    pcbContent.Fill() 'Fill With Colour Specified

                    pcbContent.BeginLayer(pdLayer) 'Create Layer

                    pcbContent.SetColorFill(iTextSharp.text.BaseColor.BLACK) 'Fill aLyer

                    pcbContent.Fill() 'Fill Underlying Content

                    Dim pgState As PdfGState 'Create GState Object
                    pgState = New PdfGState()

                    pcbContent.SetGState(pgState) 'Set Current State

                    pcbContent.SetColorFill(iTextSharp.text.BaseColor.WHITE) 'Fill Letters

                    pcbContent.BeginText() 'Start Text Replace Procedure

                    pcbContent.SetTextMatrix(rctRect.Left, rctRect.Bottom) 'Get Text Location

                    'Set New Font And Size
                    pcbContent.SetFontAndSize(BaseFont.CreateFont(BaseFont.HELVETICA, BaseFont.CP1252, BaseFont.NOT_EMBEDDED), 9)

                    pcbContent.ShowText("AMAZING!!!!") 'Replacing Text

                    pcbContent.EndText() 'Stop Text Replace Procedure

                    pcbContent.EndLayer() 'Stop Layer replace Procedure


                Next
                pdfFileReader.Close() 'Close File
            Next

            psStamp.Close() 'Close Stamp Object

        End If

    End Function

Open in new window

Avatar of Scott Fell
Scott Fell
Flag of United States of America image

I have done something similar in C#.

It is a little confusing.  Let me see if I can get it straight.  You said, "it fails after the First page/label has been processed.".  What is failing? Are you getting any type of error messages or is the label just not progressing?  

You also said, "The Labels are generated via a separate process where a PDF file with multiple labels/pages"  Is the code you posted the separate process?
Avatar of Rick Becker

ASKER

Hi Sir... How are you? I hope you are doing well.

Let me clarify a couple of things first so that you don't get the wrong impression and think that I  wrote all that code.. I pulled about 90% from different posting and tried about a 'Million' of them before 'Giving Up'.

Anyway this piece 'Looked' to me like it would do what I wanted so I decided to post it here and get some help getting it to work...

So that being said; I use a PDF Label file that has 7 labels in it.. (I am also  using Visual Studio 2017) When running it in Debug mode I  add several break points and monitor the the text buffer at several points.... as it progress through the first loop it 'Appears' to be performing the desired 'Replace' function however at the beginning of the second loop it fails with the error 'cannot continue with a closed pdfFileReader' (Not exact error but close').. It looks like the 'reader' is being closed well in advance of the end of the look...

Hope this helps a bit, I look forward to working with you ...

Rick
BTW, I am not 'Stuck' on this piece of code... I just want/Need something that works
something is not right with your sales process that generates the pdf. It should put in the substitute SKU or suggest it to the order taker to ask the customer if an equivalent product is acceptable. Why you are printing from a pdf also amazes me.

We have a pdf with 7 labels and any object might have to have text replaced. 1 item or a maximum of 7 items.
How I would code it is as follows:
open pdf collect items into an array
you have another array with the text to be replaced and the replacment text

for item =1 to 7
   if replaced(item) not equal to null or empty string
   do the replace
   endif
  next
transform objects to pdf, save pdf
Hi sir... Well It is not as simple as I eluded to.. suffice to say that there are many steps in the process that leads to this one,  Modifying the SKU  at the last possible moment before shipping is the route we chose to take.. But thanks for your concern and observation....

Now I must embarrass myself.. I know very little  about how to work with PDF and PDF documents.. I am still learning and I know that it is not  easy to write syntactically correct code to manipulate PDF documents.. I do OK with  ,txt files but I am a complete novices when  it comes to  .pdf file generation/manipulation

The logic in your outline is 100% correct, problem is I don't know how to get there... Can you detect any syntax  errors in the code I presented??

Rick
what you have in your code is not a syntax error but a logic error. It would appear that you are closing the pdf before it has gone through each and every label  try moving your close PDF outside of that loop
I pulled about 90% from different posting and tried about a 'Million' of them before 'Giving Up'.

This makes sense and I have been there before.  The more I get frustrated at trying something, the faster I keep making changes and the faster I get errors and the circle goes on. I seem to get this point when I have an idea and start coding right away thinking I have what I want to do in my head.

The best thing to do now is to sketch out what you want to do on a napkin, then translate that to a more formal flow chart that you can share along with some bullet point explanations. Sometimes just doing that will help to get your logic and coding in order. Next start translating that to some pseudo code or a mix of vb and pseudo code where you are not sure the syntax.  I think David has somewhat given you a start here https://www.experts-exchange.com/questions/29177414/How-to-edit-modify-text-in-PDF-document-with-ITextSharp.html?anchorAnswerId=43057973#a43057973.  

With all that said, I think your next step is to provide a flowchart, some bullet points and expected output and as much code as you can on your own and note one area at a time where you are stuck. For now, it may just be working out the logic/program flow.
Hi sir..

I already tried that ...  I wanted to include the entire function in its original form without all of my 'Hacking' so that someone might be able to look at it and say ... "OH you are doing THAT and you should be doing THIS..."

 I was hoping that someone would review the code and offer specific suggestions....

I have another piece of code that also shows promise but the problem with it is that it does not dig out the specific XY location of the text that I want to change. The code I presented does that and creates an Image box  to hide the original Text before adding the New Text to the output. This other piece of code does not do that and XY locations are 'Hard Coded' in..

Rick
Rick,

What I am hearing from you are multiple issues.  

"The following Code is what i tried last and seem to hold much promise but it fails after the First page/label has been processed."  

To me this seems like everything is ok except the subsequent pages are failing. You later said, "Replace' function however at the beginning of the second loop it fails with the error 'cannot continue with a closed pdfFileReader' (Not exact error but close').." and David I think provided you with the fix, " try moving your close PDF outside of that loop"

In addition, the code you showed in your question is not the same as your 'hacked' code.

Now you are saying you have another piece of code but does not dig out the specific locations.  

Can you see that this is getting confusing. It seems like there are now three different pieces of code that all do something different. And the error you have mentioned, it is not clear if it is from the code in your example or the hacked code.  It is really best to work on just one issue at a time and provide a working test case that is void of any  unnecessary code if possible and the code you present should be the code that generated the error. Otherwise it will be hard for the experts to assist.

Going back to what you originally posted, I do think David has your answer. Give that a try and see what happens.  https://www.experts-exchange.com/questions/29177414/How-to-edit-modify-text-in-PDF-document-with-ITextSharp.html?anchorAnswerId=43058033#a43058033
OK 1 at a time...

(1) "David I think provided you with the fix, " try moving your close PDF outside of that loop"

my response:  "I already tried that ... "

(2) "In addition, the code you showed in your question is not the same as your 'hacked' code."

Answer: "I wanted to include the entire function in its original form without all of my 'Hacking' so that someone might be able to look at it and say ... "OH you are doing THAT and you should be doing THIS..."

(3) "Now you are saying you have another piece of code but does not dig out the specific locations. "

From Original Post:  "I have gone through many iterations but have not been successful in creating a working program."


I did not just grab a piece of code and throw it out to EE.. I spent many days working on many varied solutions... it was only after I was unable to get a working solution that I posted to EE. the Code I posted is in the original form which I was hoping to have someone scrutinize it as it had feature that keep me from having to hard code each text location....

I will pursue other avenues...
I understand. I'm sorry if what I said came off the wrong way. I am Just trying to establish something specific to work on for you and show what I am interpreting.

As example, "I already tried that" is not very specific because you are not showing the exact code that was tried and you already said the code you posted is not the same as your actual version. This makes trouble shooting difficult.
ASKER CERTIFIED SOLUTION
Avatar of Rick Becker
Rick Becker
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
"As example, "I already tried that" is not very specific because you are not showing the exact code that was tried and you already said the code you posted is not the same as your actual version. This makes trouble shooting difficult."

Yes Scott but I also stated that the code I posted did not work correctly as well... let alone anything that I hacked... Again I was hoping someone would look at what I posted and say something about like "Hey You Dummy... you did THAT and not THIS... "

And YES I am NOT the Sharpest Tack in the Box... but by Brute Strength I can get things done.. I would just like to do better sometimes and not act like a Bull in China Shop... but Oh Well ...

Rick
Hi Rick,

I know I am late to the party but if you want me to have a look at your codebase, I can try to steal some time this weekend and check it out via a remote session (of course if that is OK with you.).

Let me know.

Regards,
Chinmay.
Scott how do I send reply's to you and Chinmay since this is now closed??
Uh OH... I guess I had to send that last comment.... Silly Me..

Hi Chinmay... Thanks for the offer of working with me this weekend but unfortunately  for me I will not have very much time on Saturday. I may have some time on Sunday But I would not expect anyone to 'Work' on Sunday... regardless of your beliefs... it should be a day of rest with no commitments... So If we can get together sometime Sunday that is great if not then sometime next week is just fine..

Rick...
Hi Rick,

No worries. EE is not Work for me ;) so technically it is fine. I am in IST timezone so we will have to align timezone a bit. Let me know Please give me a couple of time-slots.

Regards,
Chinmay.
Hi Chinmay...

Ok we are almost 12 hours apart.. it is already Midnight your time..  so to make it easy on you maybe we could try about 7 o'clock... That would put it PM your time and AM my time tomorrow... does that sound reasonable??

BTW how are you and everyone around you doing with this Pandemic?? I hope that you ALL are doing well...

Rick
Yupp. 7 PM my time works for me.
Things are chaotic, but coming on the track slowly.  
OK Sir.. chat more later..

Rick
Hi Chinmay... So how do we proceed??
BTW I use SKYPE and a free collaboration package called Ultraviewer it is a Teamviewer look alike..
Will connect on Skype initially. When would you like to start?
My skype handle is genolizer
I am ready most anytime now... just call when you are ready
Hi All...
 I am adding this to (1) Thank Chinmay for his assistance and (2) to post the Final Solution just in case someone runs across this and needs to do something similar. My Requirement was to be able to over-write some SKU text on an existing PDF Shipping label. All earlier attempts had various problems and nothing came close to being a 'Reasonable' solution. This solution while NOT being a 100% 'Clean' solution at least does a very reasonable job of providing a 'Good' shipping label with All the desired 'Changes'.

The 'Answer' was to Add a PDF 'Text Box/Field' Over the Text that I wished to Replace. The resulted in the Original text being BLOCKED OUT allowing for the NEW TEXT to be displayed with the original 'Bleeding' through...

The Following is the Full Function 'Laid Bare' for All to see and  make use of.. Again.. Thanks Chinmay.

Public Function NewNewBlockOutMultiSKU() As Integer
        'Path to where you want the file to output
        Dim outputFilePath As String
        'Path to where the pdf you want to modify is
        Dim inputFilePath As String

        inputFilePath = MultiQuantManagerForm.txtPDFFileName.Text.ToString
        outputFilePath = MultiQuantManagerForm.txtPDFFileName.Text.ToString.Replace(".pdf", "_IMB.pdf")

        MultiQuantManagerForm.Cursor = Cursors.WaitCursor
        Try

            Using inputPdfStream As Stream = New FileStream(inputFilePath, FileMode.Open, FileAccess.Read, FileShare.Read)

                Using outputPdfStream As Stream = New FileStream(outputFilePath, FileMode.Create, FileAccess.Write, FileShare.ReadWrite)

                    Using outputPdfStream2 As Stream = New FileStream(outputFilePath, FileMode.Create, FileAccess.ReadWrite, FileShare.ReadWrite)

                        Dim NumPages As Integer

                        'Opens the unmodified PDF for reading
                        Dim reader = New PdfReader(inputPdfStream)
                        NumPages = reader.NumberOfPages

                        'Creates a stamper to put an image on the original pdf
                        Dim stamper = New PdfStamper(reader, outputPdfStream) With {
                                            .FormFlattening = True,
                                            .FreeTextFlattening = True
                                        }

                        'stamper.AcroFields.GenerateAppearances = True

                        'Creates an image that is the size i need to hide the text i'm interested in removing
                        'Dim image As iTextSharp.text.Image = iTextSharp.text.Image.GetInstance(New Bitmap(90, 14), iTextSharp.text.BaseColor.WHITE) 'single sku label
                        'Dim image As iTextSharp.text.Image = iTextSharp.text.Image.GetInstance(New Bitmap(52, 10), iTextSharp.text.BaseColor.RED) ' multi sku label
                        Dim image As iTextSharp.text.Image
                        'Sets the position that the image needs to be placed (ie the location of the text to be removed)
                        'image.SetAbsolutePosition(9, 250) Single Sku Label
                        'image.SetAbsolutePosition(8, 272) ' multiple sku labels

                        Dim CurrentSKU As String
                        Dim RealItemSKU As String
                        Dim strText As String
                        Dim lines() As String

                        Dim StartPositionIndex As Integer
                        Dim PageNum As Integer
                        Dim RtnVal As Integer
                        Dim intI As Integer

                        Dim XPosition As Double
                        Dim YPosition As Double
                        Dim SingleOffset As Double
                        Dim DoubleOffset As Double
                        Dim ThisOffset As Double

                        Dim TitleSpanCount As Integer
                        Dim SkuXYPosition As String
                        Dim SkuXYRectangleArray() As String

                        Dim AbsoluteXPosition As Integer
                        Dim AbsoluteYPosition As Integer
                        Dim AbsoluteWidth As Integer
                        Dim AbsoluteHeight As Integer

                        Dim NextBoxNum As Integer


                        Dim its As ITextExtractionStrategy
                        'Dim its As ITextExtractionStrategy = New iTextSharp.text.pdf.parser.LocationTextExtractionStrategy()

                        MultiQuantManagerForm.ProgressBar2.Value = 0
                        MultiQuantManagerForm.ProgressBar2.Maximum = NumPages
                        MultiQuantManagerForm.lblCurrentPage.Text = 0
                        MultiQuantManagerForm.lblMaxItems.Text = NumPages
                        MultiQuantManagerForm.lblMaxItems.Refresh()

                        NextBoxNum = 0
                        For PageNum = 1 To NumPages

                            Dim pdfPageContents As PdfContentByte = stamper.GetUnderContent(PageNum)

                            MultiQuantManagerForm.ProgressBar2.Value = PageNum
                            MultiQuantManagerForm.lblCurrentPage.Text = PageNum
                            MultiQuantManagerForm.lblCurrentPage.Refresh()

                            StartPositionIndex = -1
                            its = New iTextSharp.text.pdf.parser.LocationTextExtractionStrategy()

                            strText = ""
                            strText = PdfTextExtractor.GetTextFromPage(reader, PageNum, its)
                            strText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.[Default], Encoding.UTF8, Encoding.[Default].GetBytes(strText)))
                            lines = strText.Split(vbLf)

                            ' ok I need to determine the starting location of the first
                            ' sku on each page. Once I kno the starting point then I can
                            ' loop though the remainder of the page and either replace the
                            ' current SKU on just leave it alone
                            'if startindex = 11; if startindex = 2 if startindex = 1

                            'I need to compensate for a blank line
                            If strText = "" Then Continue For

                            For intI = 0 To lines.Length - 1
                                NextBoxNum = NextBoxNum + 1
                                'do the sku look up and reppcement here if possible

                                If Regex.IsMatch(lines(intI), "^[0-9][0-9][0-9]" + Chr(173) + "[0-9][0-9][0-9][0-9]") Then

                                    CurrentSKU = lines(intI).Substring(0, 9)


                                    'get the xy location
                                    'Now get its XY Location
                                    'Get the XY location of the sku just found
                                    SkuXYPosition = ""
                                    SkuXYPosition = ExtractXYLocationOfSingleSKUPDF(inputFilePath, PageNum, CurrentSKU)

                                    If Not SkuXYPosition.ToString = "" Then

                                        SkuXYRectangleArray = SkuXYPosition.Split("|")

                                        'image = iTextSharp.text.Image.GetInstance(New Bitmap(52, 12), iTextSharp.text.BaseColor.RED)
                                        image = iTextSharp.text.Image.GetInstance(New Bitmap(CInt(SkuXYRectangleArray(2)), CInt(SkuXYRectangleArray(3))), iTextSharp.text.BaseColor.WHITE)

                                        'image.SetAbsolutePosition(XPosition, YPosition)

                                        image.SetAbsolutePosition(CInt(SkuXYRectangleArray(0)), CInt(SkuXYRectangleArray(1)))
                                        stamper.GetOverContent(PageNum).AddImage(image, True)

                                        '#####################################################################
                                        ' OK TextBox Code works.. Now look for a Parent SKU and get a replacement
                                        ' SKU if available... Before Adding the Text Box with Text
                                        '#####################################################################
                                        CurrentSKU = CurrentSKU.Replace(Chr(173), "-")
                                        RealItemSKU = ""
                                        RealItemSKU = MyUserSelectStatement("vainventorytable", "customlabel", "where parentsku = '" + CurrentSKU + "'", True)

                                        ''comment out the following code until we go into production
                                        'If Not RealItemSKU = "" Then
                                        'RtnVal = MySQLCopyFromToTable("vainventorytable", "vacompletedorders", "customlabel", RealItemSKU.ToString)
                                        'RtnVal = MyDeleteFromStatement("vainventorytable", "customlabel", RealItemSKU.ToString)
                                        'Else
                                        'RtnVal = MySQLCopyFromToTable("vainventorytable", "vacompletedorders", "customlabel", CurrentSKU.ToString)
                                        'RtnVal = MyDeleteFromStatement("vainventorytable", "customlabel", CurrentSKU.ToString)
                                        'End If

                                        If Not RealItemSKU = "" Then CurrentSKU = RealItemSKU

                                        '#####################################################################
                                        'New Rectangle(XPosition, Height, Width, YPosition)
                                        ' NOTE: Height = YPosition + Some Offset TBD
                                        'Rectangle(X, Y, Width, Y + height)

                                        ' NOTE If Necessary add Offsets to the following values To Tweek the position
                                        AbsoluteXPosition = CInt(SkuXYRectangleArray(0)) - 1
                                        AbsoluteYPosition = CInt(SkuXYRectangleArray(1))
                                        AbsoluteWidth = 62
                                        AbsoluteHeight = CInt(SkuXYRectangleArray(1)) + CInt(SkuXYRectangleArray(3)) + 3

                                        Dim bf = BaseFont.CreateFont(BaseFont.TIMES_BOLD, BaseFont.CP1252, False)
                                        Dim tf = New TextField(stamper.Writer, New Rectangle(AbsoluteXPosition, AbsoluteYPosition, AbsoluteWidth, AbsoluteHeight), "nexTextField" + NextBoxNum.ToString) With {
                                            .Alignment = Element.ALIGN_JUSTIFIED_ALL,
                                            .BackgroundColor = Nothing,
                                            .BorderColor = BaseColor.WHITE,
                                            .BorderStyle = PdfBorderDictionary.STYLE_SOLID,
                                            .BorderWidth = 0,
                                            .DefaultText = CurrentSKU,
                                            .Font = bf,
                                            .TextColor = BaseColor.BLACK,
                                            .FontSize = 10,
                                            .MaxCharacterLength = 12,
                                            .Options = TextField.REQUIRED And TextField.READ_ONLY,
                                            .Rotation = 0,
                                            .Text = CurrentSKU
                                        }

                                        stamper.AddAnnotation(tf.GetTextField(), PageNum)
                                        '######################################################################
                                        'Remove to here if it fails
                                        '#####################################################################

                                        image = Nothing
                                        bf = Nothing
                                        tf = Nothing

                                    End If
                                End If
                            Next

                            'clear the last iTextSharp.text.pdf.parser.LocationTextExtractionStrategy()
                            its = Nothing

                        Next
                        'Adds the image to the output pdf                   
                        'Creates the first copy of the outputted pdf
                        stamper.Close()

                        'Opens our outputted file for reading
                        Dim reader2 = New PdfReader(outputPdfStream2)

                        '' The following can be added to Encrypts the outputted PDF to make it not allow Copy or Pasting
                        'PdfEncryptor.Encrypt(reader2, outputPdfStream2, Nothing, Encoding.UTF8.GetBytes("test"), PdfWriter.ALLOW_PRINTING, True)

                        'Creates the outputted final file
                        reader2.Close()
                        reader.Close()
                    End Using

                End Using

            End Using

        Catch ex As Exception
            'MsgBox(ex.Message)
            MultiQuantManagerForm.Cursor = Cursors.Default
        End Try
        'Copy New File to the old file name
        'File.Copy(outputFilePath, inputFilePath, True)

        'Now display it
        MultiQuantManagerForm.WebBrowser2.Navigate(outputFilePath.ToString)
        MultiQuantManagerForm.txtPDFFileName.Text = outputFilePath.ToString

        MultiQuantManagerForm.Cursor = Cursors.Default
    End Function

Open in new window