Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 136
  • Last Modified:

Matching a random pattern with one common character

Hi, I have a file with a large number of character conversion errors and all non-ASCII characters were converted to question marks - "?" - so there are a number of instances of such strings as: Jos?, Company?s, ???????ahs-dhdh, The???hdh--dhd?, etc. The length of the string will vary along with the number of questions mark in it

Is there a regular expression(s) I can use in a Perl script that will match any string with x number of characters and at least one question mark or more in it? Thanks
0
hadrons
Asked:
hadrons
1 Solution
 
jmcgOwnerCommented:
Perhaps this little snippet will get you started.

my @TestStrings = ("NoMatch", "Jos?", "Company?s", "???????ahs-dhdh", "The???hdh--dhd?");
for (@TestStrings) {
        printf "%s: %s\n", $_, ($_ =~ m/\?/ ? "matched" : "not matched");
        }

Open in new window


The part of your question I'm not sure I'm understanding properly is the "x number of characters". Using the above script, you can decide what divides strings into strings, then check each one for whether or not it contains a question mark. The results look like:
NoMatch: not matched
Jos?: matched
Company?s: matched
???????ahs-dhdh: matched
The???hdh--dhd?: matched

Open in new window


Another approach might look something like the following:
my $TestData = "NoMatch Jos? Company?s ???????ahs-dhdh The???hdh--dhd?";
for ($TestData =~ m/([\w\?\-]*\?[\w\?\-]*)/g) {
        printf "%s: %s\n", $_, "matched";
        }

Open in new window

In this case, you're pulling out the strings of interest from a large batch of data. The character class [\w\?\-] can be expanded if there are other characters you want considered part of your strings. In this case, the results leave out that first non-matched string and look like:
Jos?: matched
Company?s: matched
???????ahs-dhdh: matched
The???hdh--dhd?: matched

Open in new window

I don't envy whoever has the task of trying to make sensible back-substitutions for the lost characters.
0
 
hadronsAuthor Commented:
Hi, I few days I thought I hit the best solutions button, but it may not have went thru, but the solution worked great; thanks
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Tackle projects and never again get stuck behind a technical roadblock.
Join Now