Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 76
  • Last Modified:

Copy html file with reference

Hi All,
I have an requirement to copy the html file into specific folder if this contains pdf file name.
For example...
in "Temp" folder
Temp->Html folder contains two txt files and multiple html files.
1.txt
     t1.pdf
     t2.pdf
check t1.pdf in html files and copy that html file into "download" folder.
check t2.pdf in html files and copy that html file into "download" folder.
.....

Can you please provide any reference or sample perl script to achieve this?
Thanks,
Shail
0
Shailesh Shinde
Asked:
Shailesh Shinde
  • 6
  • 6
  • 5
  • +1
1 Solution
 
SStoryCommented:
You can check if in the files by doing:
     #!/bin/bash
     files = `grep -l t[1-2].pdf /Temp/Html/*.html`

Open in new window


This will get a bash script variable with all files in it. This isn't perl, but can easily be done with BASH.  Just take those files and copy them to the desired directory... Watch out for BASH substitution.
0
 
tel2Commented:
Hi SStory,
I don't think it will work with spaces around the '='.
Instead of this:
    files = `grep -l t[1-2].pdf /Temp/Html/*.html`
I think it should be this:
    files=`grep -l t[1-2].pdf /Temp/Html/*.html`

Hi Shailesh,
What do you mean by "Temp->Html"?
Do you mean "Temp/Html" or what?
0
 
wilcoxonCommented:
This should do what you want...
use strict;
use warnings;
my $dir = shift or die "Usage:  $0 src_dir dest_dir\n"
my $dest = shift or die "Usage:  $0 src_dir dest_dir\n"
opendir DIR, $dir or die "could not open dir $dir: $!";
my @files = grep { -f "$dir/$_" } readdir DIR;
closedir DIR;
my @pdfs;
foreach my $txt (grep /\.txt$/, @files) {
    open IN, "$dir/$txt" or die "could not open $dir/$txt: $!";
    while (<IN>) {
        chomp;
        push @pdfs, $_;
    }
    close IN;
}
my $pdfrx = join '|', @pdfs;
foreach my $html (grep /\.html$/, @files) {
    open IN, "$dir/$html" or die "could not open $dir/$html: $!";
    if (grep /\b(?:$pdfrx)\b/, <IN>) {
        close IN;
        mv "$dir/$html", "$dest/$html" or die "could not move $dir/$html: $!";
    } else {
        close IN;
    }
}

Open in new window

0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
SStoryCommented:
tel2,

Look as I might, I don't see any quotes around my =. I do have backticks around the command to be executed. I also do see any difference in what you showed me from mine and then yours:
    files = `grep -l t[1-2].pdf /Temp/Html/*.html`

Open in new window

"I think it should be this:"
    files=`grep -l t[1-2].pdf /Temp/Html/*.html`

Open in new window


the only difference I notice is that you took out a space before the =

I actually tested my solution and based upon what the OP seems to have said, it seemed to work fine.
0
 
wilcoxonCommented:
SStory, there are two problems with your provided solution for what the original poster asked:
  • The solution is not in perl.
  • You do not read the names of the pdf files from the text file (you hard-code t1 and t2).
0
 
SStoryCommented:
wilcoxon, OK.  I do agree it is not in perl.  I don't see where the OP mentions the file name being in a text file...but maybe you are a better interpreter than I.  If you are correct then it would need a slight modification.  I did want the OP to realize that it could easily be done without Perl if my interpretation was correct.
0
 
wilcoxonCommented:
I agree the OPs question could be clearer about where the pdf names come from and what the txt files are for.
  • He mentions 2 txt files but then only lists 1.
  • Are the t1.pdf and t2.pdf listed in the txt file or are they actual files also found in the directory?
    • If listed in the txt file, should all pdfs listed in any txt file be used to check the html files?
    • If in the directory, should all pdfs found in the dir be checked in the html files?  If so, what are the txt files for?

Shailesh Shinde, can you answer the above questions so we make sure we are providing you with the solution you want?
0
 
tel2Commented:
Hi SStory,

> "Look as I might, I don't see any quotes around my =."
I didn't say there were quotes around your =.  I put my own quotes around it because I was quoting a portion of your code.  When I said:
> "I don't think it will work with spaces around the '='.
I was referring to the 'spaces' around it.

> "I do have backticks around the command to be executed. I also do see any difference in what you showed me from mine and then yours:
    files = `grep -l t[1-2].pdf /Temp/Html/*.html`
I think it should be this:
    files=`grep -l t[1-2].pdf /Temp/Html/*.html`
the only difference I notice is that you took out a space before the ="

Yes, and I also removed a space after it.

If I run your command (with spaces) in bash, I get this error:
   files: command not found
but if I remove the spaces, I get no error.
It has always been my understanding & experience that shells like sh, ksh & bash don't allow spaces on either side of equals signs in assignments like this, and the above error message supports that theory.  So I don't know how it could be working for you.  Any ideas how it could be...anyone?
0
 
Shailesh ShindeLocalization Engineering & AutomationAuthor Commented:
Hi,
Sorry, for delay in my comments.
Yes, t1.pdf and t2.pdf is in .txt file which needs to be check in Temp->Html folder containing html files. If found the t1.pdf in any of the html copy that html into another folder.

Thanks,
Shail
0
 
Shailesh ShindeLocalization Engineering & AutomationAuthor Commented:
Hi,
I will test wilcoxon's perl script for my requirement.

Thanks,
Shail
0
 
tel2Commented:
Hi again Shailesh,

I know I've asked these questions before, but I haven't seen answers yet, so I'm going to ask them again:
    What do you mean by "Temp->Html"?
    Do you mean "Temp/Html" or what?
0
 
SStoryCommented:
tel2: Sorry, my bad. Somehow I saw "quotes"
You are right it didn't need spaces. even though in a bash script file.
0
 
tel2Commented:
Thanks SStory.

No worries, all is forgiven.

There's still something I don't understand about this, though.  When you say:
> "You are right it didn't need spaces. even though in a bash script file."
I don't think it's a matter of 'need'.  To me that implies they are optional, but from my experience, it just won't work if you have any spaces around the '='.  Do you agree?

> "I actually tested my solution and based upon what the OP seems to have said, it seemed to work fine."
So are you saying you get no errors when you have spaces around the '='?
0
 
Shailesh ShindeLocalization Engineering & AutomationAuthor Commented:
Hi,
sorry for delay.
Temp->Html is actually a folder path
There is temp folder and html folder is nested in this temp folder.

Thanks,
Shail
0
 
SStoryCommented:
tel2: no, it didn't work for me either.
0
 
wilcoxonCommented:
Did you ever test my perl script?  Did it meet your needs?  If not, what is not working?
0
 
tel2Commented:
Hi Shailesh,

> "Temp->Html is actually a folder path
> There is temp folder and html folder is nested in this temp folder."

If the Html folder is inside the Temp folder, then you represent it like this in UNIX/Linux:
    Temp/Html
That is how your commands (like "mkdir", "cd", etc) will refer to it, too.
Please don't represent that like Temp->Html in future, as that will confuse some people...like me.  Looks like you have some kind of pointer or link.


Hi SStory,
> "tel2: no, it didn't work for me either."
OK, but if it didn't work for you, then what did you mean by this?:
> "I actually tested my solution and based upon what the OP seems to have said, it seemed to work fine."
0
 
wilcoxonCommented:
There are certainly responses so this should not be deleted without assigning points.

The OP said he would test the script I provided but then never responded with if it worked or what any issues were.
1
 
wilcoxonCommented:
2) Accept comments as solution

https:#41751469 appears to be the only comment with a solution
0
 
tel2Commented:
I agree with wilcoxon's (totally unbiased) suggestion...     8)
...even if his link to his post doesn't work properly.  I think it should read: https:#a41751469
0
 
Shailesh ShindeLocalization Engineering & AutomationAuthor Commented:
Hi All,
I have check the Workspace for this question to accept multiple solutions. However, this question is not available within this option as this has been accepted with solution provided by below detail...

wilcoxon
Accepted Solution 2016-08-10 at 21:04:59ID: 41751469

Thanks,
Shail
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

  • 6
  • 6
  • 5
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now