• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 389
  • Last Modified:

CF Function to get number of bytes in a UTF-8 string?

Hi

We are extending our application to support multi-lingual characters and we've upgraded to CFMX and Oracle 9i (built with UTF-8 character sets). So, each character in my CF form will become 3 times longer when I insert it into Oracle(for Japanese Kanji) and 2 times longer for European characters.

To ensure that I do not send in a string longer than the database field to Oracle, I'd have to verify and trim the string at the CF end. I found that at the Oracle end, I can check it using the LENGTHB and trim it using SUBSTRB functions, but I'm not able to find out what the equivalent CF function is ( because CF Len function always returns the number of characters and not the number of bytes)

Any help will be greatly appreciated.

Thanks
Greg
0
netuser1976
Asked:
netuser1976
  • 3
  • 2
1 Solution
 
Seth_BienekCommented:

Hi Greg,

There's not really an easy way to do this without consuming beau-coups resources on your server.

Basically, you'd need to look at the input from the form, and check it to see if any of the characters exceed the Unicode code 0800 (3 byte), or 0080 (2 byte), or if they are all below 0080, meaning the is data is single-byte ASCII (which, by the way, is kind of a misnomer since it's actually ony 7 bits, not 8).

You could write a UDF to do this, but like I said, I think it would be resource-intensive since you have to check every character in your string to see if it is an extended character.

Maybe a regular expression is the way to go...?

Good luck,

Seth
0
 
netuser1976Author Commented:
Yeah I agree. It would be nice if macromedia gives us a function to do this, like a LenB in VB.
0
 
Seth_BienekCommented:
Meybe there is a native Java class you could extend?

Check your Java documentation and see if anything jumps out...
0
New feature and membership benefit!

New feature! Upgrade and increase expert visibility of your issues with Priority Questions.

 
PE_CF_DEVCommented:
I beleive this will work:

 <cfset varlength=len(variable.getBytes('UTF-8'))>
This should return the proper length
0
 
netuser1976Author Commented:
THANKS PE_CF_DEV. That Worked..

Thanks Seth for your thoughts.
0
 
netuser1976Author Commented:
Any way to trim the string based on the byte count..

Trim(string, 1, 4000) where 4000 is the byte count...
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 3
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now