• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 391
  • Last Modified:

How to get the length of string containing HTML Entities using Perl?

I have a string variable containing mix of HTML Entities of Japanese characters, I would like to get the length of the string, and treat each HTML Entity as length of 2, and standard English character as length of 1.

I would like to use regular express for matching the &#xxxxx; counts, how can I get this done?

eg. $str contains abc&#xxxxx;123&#xxxxx;xyz

Normally, I would get the length of 25, but instead, I would like to get length of 11.
  • 2
1 Solution
You might try:

$a = $str;  # copy the string
$a =~ s/&#.*?;/aa/sg;  # swap all your html entities with a two character string
print length($a);  # return its length

Open in new window

print $length=()="abc&#xxxxx;123&#xxxxx;xyz"=~/&.*?;|./g    
Do you want a length of 11 or 13?

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now