Solved

Matching a random pattern with one common character

Posted on 2016-11-18
2
61 Views
Last Modified: 2016-11-22
Hi, I have a file with a large number of character conversion errors and all non-ASCII characters were converted to question marks - "?" - so there are a number of instances of such strings as: Jos?, Company?s, ???????ahs-dhdh, The???hdh--dhd?, etc. The length of the string will vary along with the number of questions mark in it

Is there a regular expression(s) I can use in a Perl script that will match any string with x number of characters and at least one question mark or more in it? Thanks
0
Comment
Question by:hadrons
2 Comments
 
LVL 20

Accepted Solution

by:
jmcg earned 500 total points
ID: 41893815
Perhaps this little snippet will get you started.

my @TestStrings = ("NoMatch", "Jos?", "Company?s", "???????ahs-dhdh", "The???hdh--dhd?");
for (@TestStrings) {
        printf "%s: %s\n", $_, ($_ =~ m/\?/ ? "matched" : "not matched");
        }

Open in new window


The part of your question I'm not sure I'm understanding properly is the "x number of characters". Using the above script, you can decide what divides strings into strings, then check each one for whether or not it contains a question mark. The results look like:
NoMatch: not matched
Jos?: matched
Company?s: matched
???????ahs-dhdh: matched
The???hdh--dhd?: matched

Open in new window


Another approach might look something like the following:
my $TestData = "NoMatch Jos? Company?s ???????ahs-dhdh The???hdh--dhd?";
for ($TestData =~ m/([\w\?\-]*\?[\w\?\-]*)/g) {
        printf "%s: %s\n", $_, "matched";
        }

Open in new window

In this case, you're pulling out the strings of interest from a large batch of data. The character class [\w\?\-] can be expanded if there are other characters you want considered part of your strings. In this case, the results leave out that first non-matched string and look like:
Jos?: matched
Company?s: matched
???????ahs-dhdh: matched
The???hdh--dhd?: matched

Open in new window

I don't envy whoever has the task of trying to make sensible back-substitutions for the lost characters.
0
 

Author Comment

by:hadrons
ID: 41897954
Hi, I few days I thought I hit the best solutions button, but it may not have went thru, but the solution worked great; thanks
0

Featured Post

ScreenConnect 6.0 Free Trial

Want empowering updates? You're in the right place! Discover new features in ScreenConnect 6.0, based on partner feedback, to keep you business operating smoothly and optimally (the way it should be). Explore all of the extras and enhancements for yourself!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Call Shell Script from Perl Script 6 100
Regex code,how to do this? 3 39
Allow space in this pattern 2 58
Perl: How to add backslashes to every period in a string 1 22
Many time we need to work with multiple files all together. If its windows system then we can use some GUI based editor to accomplish our task. But what if you are on putty or have only CLI(Command Line Interface) as an option to  edit your files. I…
I have been pestered over the years to produce and distribute regular data extracts, and often the request have explicitly requested the data be emailed as an Excel attachement; specifically Excel, as it appears: CSV files confuse (no Red or Green h…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

773 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question