How do I fix the memory spike and get my C# Windows Form Application to run as required.

Currently creating a C# Windows Form Application.  The purpose of this is to get all pdf files from a directory that have more than 2 pages and contain the string"assembly". These two variables are default settings in controls on the form. The page count comes from a numerical up and down control named "nudNOP" and the string "assembly" is the default text from the textbox control named "txtSearch". I use an iTextSharp addon to read through the pdf files and find the string "assembly" in the bool funtion shown here:

public bool GetTextFromPDF(String PdfFileName)
        {
            PdfReader oReader = new PdfReader(PdfFileName);
            int _nop = oReader.NumberOfPages;
            string sstr = txtSearch.Text;

            for (int page = 1; page <= _nop; page++)
            {
                ITextExtractionStrategy its = new SimpleTextExtractionStrategy();
                sOut = PdfTextExtractor.GetTextFromPage(oReader, page, its).ToString();

                if (sOut.Contains(sstr) == true)
                {
                    isFound = true;
                }
                else
                {
                    isFound = false;
                }
            }
            return isFound;
        }

Open in new window



To execute the code I use a button to run the following sub

public void TheData()
        {
            //Declare Directory Path for Search Tool
            sDir = Properties.Settings.Default.sDir;
            dDir = Properties.Settings.Default.dDir;

            //String Array of all Files found in root Directory Path
            var pdfFiles = Directory.GetFiles(sDir, "*.pdf", SearchOption.TopDirectoryOnly);

            //Loop through each of the found files
            foreach (var pdfFile in pdfFiles)
            {
                //Variables
                string fName = pdfFile.ToString();
                string trimmed = System.IO.Path.GetFileName(fName);
                PdfReader pr = new PdfReader(fName);
                int nop = pr.NumberOfPages;

                //for each found file, get any file that has a greater value than numeric 
                //up and down control value and a string value from the txtSearch TextBox
                int pdfnop = Convert.ToInt32(nudNOP.Value);
                if (!(nop <= pdfnop))
                {
                    if (GetTextFromPDF(fName) == true)
                    {
                        MessageBox.Show(fName.ToString().ToUpper());
                    }
                }
            }
        }

Open in new window


Before I added the iTextSharp addon, and the second if statement the Messagebox would popup as it should finding only pdf files that had 3 or more pages. So I know this part of the sub works. It has something to do with the iTextSharp ITextExtractionStrategy. Just don't know enough about this addon to determine my error. Can anyone in this forum help with this? The Problem I'm having is that when I run the code the memory usage ramps up and stays ramped for well over 6 minutes until I stop it. I use MS VS2017 Community, it has a memory usage snapshot tool which I used and this is the result. I sorted it by Byte Size Descending. When the memory is ramped up the message box doesn't show the app window just sits there and nothing happens other than the memory usage ramping.

Screenshot of memory usage snapshot result sorted by Byte Size Descending
***Update***
It has been over 30 minutes and still no change in situation. Must be stuck in an infinate loop.
LVL 1
Steve WilliamsProduct Design EngineerAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Chinmay PatelChief Technical NinjaCommented:
Hi Steve,

Can you tweak your code as shown below, I have not tested the code but want to check couple of things.

If you have too many PDF files or they are huge and you do not dispose itextsharp objects then they will create a mess in the memory.

 {
            // Check if we have a Path available or not / maybe add validations here
            //Declare Directory Path for Search Tool
            //string sDir = Properties.Settings.Default.sDir;
            // dDir = Properties.Settings.Default.dDir;

            string sDir = string.Empty;
            try
            {
                // Get the Directory from Settings
                DirectoryInfo directory = new DirectoryInfo(sDir);

                FileInfo[] files = directory.GetFiles("*.pdf", SearchOption.TopDirectoryOnly);

                foreach (FileInfo pdfFile in files)
                {
                    using (PdfReader reader = new PdfReader(pdfFile.FullName))
                    {
                        using (PdfDocument document = new PdfDocument(reader))
                        {
                            if (document.GetNumberOfPages() < 2)// I have hard coded it here, you should get it from your dropdown
                            {
                                return;
                            }
                            for (int i = 0; i < document.GetNumberOfPages(); i++)
                            {
                                ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
                                PdfPage page = document.GetPage(i + 1);

                                string pageContent = PdfTextExtractor.GetTextFromPage(page);

                                if (pageContent.Contains("assembly")) // You would want this from your textbox
                                {
                                    // Do your stuff here
                                }

                            }

                        }

                    }
                }
            }
            catch (Exception ex)
            {
                ShowException(ex);
            }
        }

Open in new window



Regards,
Chinmay.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Steve WilliamsProduct Design EngineerAuthor Commented:
@Chinmay Patel Thank you for taking the time to look at my problem. I really appreciate it. Below is a screenshot of most of your code in action. There are a few errors.
Screenshot of TheData()
I don't understand the reason for:
PdfPage page = document.GetPage(i + 1);

Open in new window

looks like your trying to increment to the next page. Doesn't the for loop do this with the "i++"
TheData_broken.PNG
0
Steve WilliamsProduct Design EngineerAuthor Commented:
Thanks for the help, I used Garbage Collection which made all the difference. there are about 6600 files in the directory. The app runs for a very long time and finds a corrupted pdf which I have to fix before I can proceed and then I have to clear all my found data and restart the app.  But I don't think it gets any better than this. So Thanks and you get the points.
0
Cloud Class® Course: Microsoft Office 2010

This course will introduce you to the interfaces and features of Microsoft Office 2010 Word, Excel, PowerPoint, Outlook, and Access. You will learn about the features that are shared between all products in the Office suite, as well as the new features that are product specific.

Chinmay PatelChief Technical NinjaCommented:
Great that your issue was resolved. Keep in mind that when you use Using () {} Code block it is automatically garbage collected once it has completed execution so you don't have to do manual garbage collection.

Also I was wondering why you have to restart the entire process? is your data cumulative? couldn't you mark the PDF as corrupted and proceed forward with other files?
0
Steve WilliamsProduct Design EngineerAuthor Commented:
Chinmay, yes eventually I will add the bypass into the application with error logging so I can go back and correct all the files in bulk instead of at each error.
0
Steve WilliamsProduct Design EngineerAuthor Commented:
UPDATE: Here is the working Code:

public void TheData()
        {
            sDir = Properties.Settings.Default.sDir;
            dDir = Properties.Settings.Default.dDir;
            lDir = Properties.Settings.Default.lDir;
            if (!(Directory.Exists(sDir) == true))
            {
                Directory.CreateDirectory(sDir);
            }

            DirectoryInfo dir = new DirectoryInfo(sDir);
            FileInfo[] pdfFiles = dir.GetFiles("*.pdf", SearchOption.TopDirectoryOnly);
            int nop = Convert.ToInt32(nudNOP.Value);

            foreach (FileInfo pdfFile in pdfFiles)
            {
                using (PdfReader reader = new PdfReader(pdfFile.FullName))
                {
                    string fFile = Convert.ToString(pdfFile.Name);
                    lblName.Text = fFile;

                    using (StringWriter output = new StringWriter())
                    {
                        Int32 rnop = reader.NumberOfPages;
                        string src = System.IO.Path.Combine(sDir, fFile);
                        string dst = System.IO.Path.Combine(dDir, fFile);

                        //check to see if file has been processed by reading the log file, if it has skip it and process next file
                        string fnlog = System.IO.Path.Combine(lDir, "filename_processed_log.txt");
                        using (TextReader tr = new StreamReader(fnlog))
                        {
                            string txtContent = tr.ReadToEnd();

                            if (txtContent.Contains(fFile) == true)
                            {
                                tr.Close();
                                continue;
                            }
                            tr.Close();

                            //append processed file name to log
                            File.AppendAllText(fnlog, fFile + Environment.NewLine);

                            if ((File.Exists(dst) == true))
                            {
                                continue;
                            }

                            if (rnop <= nop)
                            {
                                continue;
                            }
                            WaitNSeconds(2);
                            for (int i = 1; i <= reader.NumberOfPages; i++)
                                try
                                {
                                    output.WriteLine(PdfTextExtractor.GetTextFromPage(reader, i, new SimpleTextExtractionStrategy()));
                                    string _output = output.ToString();
                                    string sstr = txtSearch.Text;

                                    if (_output.Contains(sstr, StringComparison.OrdinalIgnoreCase) == true)
                                    {
                                        if (!(File.Exists(dst) == true))
                                        {
                                            File.Copy(src, dst);
                                            continue;
                                        }
                                        else
                                        {
                                            continue;
                                        }
                                    }
                                    int x = 0;
                                }
                                catch (Exception ex)
                                {
                                    string exstr = ex.ToString();
                                    string theMessage = fFile + "|||" + exstr + Environment.NewLine;
                                    Log.Message(theMessage);
                                }
                            continue;
                        }
                    }
                }
            }
            MessageBox.Show("All Done");
        }
        private void bthStartSearch_Click(object sender, EventArgs e)
        {
            Cursor.Current = Cursors.WaitCursor;
            TheData();
            Cursor.Current = Cursors.Default;
        }
    }

Open in new window

1
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
.NET Programming

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.