Solved

# Querying for Roman Numerals

Posted on 2009-04-28
2,050 Views
Last Modified: 2013-12-18
I'm trying to build a Query using Oracle (10g) to identify all fields with a roman numeral - and put that roman numeral in a new cell:

Field                        Roman Numeral
MGM XXIX:              XXIX
MGM XIX, One         XIX

Roman Numerals are limited to I, V, X, L, C, D, M

I haven't found a function for this but...
I'm thinking that the query may be written where a word within a field contains (V, X, L, C, D, M) only - but will need to account for roman numerals like 'VI' -- where there is an I. Also, I think including the 'I' will result in words as 'DIM', 'MID', 'LID' - not what I would want. So - would need to search for I, II, III, IV separately.

Roman Numerals won't appear in the first part of a piece of text... and they may be proceeded by a comma (MGM XIX, one)

Thanks in advance
0
Question by:kwieckii
20 Comments

LVL 73

Expert Comment

may be proceeded by a comma (MGM XIX, one)

that example shows roman numeral being followed by a comma not preceeded.

is preceeding also allowed? or just following?
0

LVL 73

Expert Comment

assuming it's always whitespace roman-numeral then , or nothing how about this...

SELECT field, REGEXP_REPLACE(REGEXP_SUBSTR(field, ' [IVXLCDM]+,?'), '[ ,]', NULL) roman_numeral
FROM your_table
0

LVL 73

Expert Comment

a simpler version...

SELECT field, ltrim(REGEXP_SUBSTR(field, ' [IVXLCDM]+')) roman_numeral
FROM your_table
0

LVL 73

Expert Comment

don't really need the comma check
0

LVL 73

Expert Comment

Do note, the above can give false positives.

it only checks for a contiguous set of letters that couldbe used to make a roman numeral
it doesn't actually check if it is.

'I LIKE DILL PICKLES' will return DILL  as a roman numeral even though it isn't.

0

Author Comment

Would there be a way to not consider the other characters? - (to try to get rid of the false positives)?
For example - if there's a B or F in the string - don't put the value in the field...
using:
SELECT field, ltrim(REGEXP_SUBSTR(field, ' [IVXLCDM]+')) roman_numeral
FROM your_table

The word = IMMOBILIEN - returned IMM  - just trying to understand the query.

Thanks
0

LVL 73

Expert Comment

can you post some more example input strings so I can see what all the various delimiters are...

whitespace, comma, colon, end of field, etc.
0

LVL 47

Expert Comment

http://forums.devshed.com/python-programming-11/roman-numeral-to-arabic-converter-601261.html

Here they have tried to convert Roman in Arabac

This code could be translated in PL/SQL and used to check if the substring is a real Roman
0

Author Comment

Here's a few.. please let me know if you'd like more:

with x as
(
select 'ABC CORPORATION LIMITED' CoName from dual union all
select 'ABC CORP XII LTD' CoName from dual union all
select 'ABC CORP XL, LTD' CoName from dual

)
select
CoName,
ltrim(REGEXP_SUBSTR(CoName, ' [IVXLCDM]+')) roman_numeral
from x

Thanks again!!
0

LVL 73

Expert Comment

using oracle's built in features it's quite easy to write a roman numeral converter (this was a fun exercise,  thanks for the question!)
Note, the function only supports up to 3999 because after that Oracle can't support the conversion from number to roman value.
If you need higher than that then we'll have to write our own conversion from number to roman too.
However, in practice, most roman numerals aren't supported beyond 3999 because 4000 "should" be M followed by some new character representing 5000 but there isn't one.
I have seen converters that simply repeat M's as far as needed and the code below could be modified to do that if you wanted.

however, using this function it makes it easy to eliminate false positives.
I'm not sure what you wanted for  "ABC CORPORATION LIMITED"  since it has no roman numerals, I assume you want those excluded

Try this... (I extended your WITH to include some more examples)

WITH x AS (SELECT   'MGM XXIX:' coname FROM DUAL
UNION ALL
SELECT   'MGM XIX, One' coname FROM DUAL
UNION ALL
SELECT   'MGM XIX' coname FROM DUAL
UNION ALL
SELECT   'MGM D XIX' coname FROM DUAL
UNION ALL
SELECT   'MGM DDDDD XIX' coname FROM DUAL
UNION ALL
SELECT   'ABC CORPORATION LIMITED' coname FROM DUAL
UNION ALL
SELECT   'ABC CORP XII LTD' coname FROM DUAL
UNION ALL
SELECT   'ABC CORP XL, LTD' coname FROM DUAL)
SELECT   coname, roman_numeral
FROM   (SELECT   coname,
REGEXP_REPLACE(REGEXP_SUBSTR(coname, ' [IVXLCDM]+([ ,:]|\$)', 1, n),
'[^IVXLCDM]',
NULL)
roman_numeral,
n
FROM   x, (    SELECT   LEVEL n
FROM   DUAL
CONNECT BY   LEVEL < 10))
WHERE   roman_value(roman_numeral) IS NOT NULL
``````CREATE OR REPLACE FUNCTION roman_value(p_roman_str VARCHAR2)

RETURN INTEGER

IS

v_integer   NUMBER;

BEGIN

SELECT   SUM(z)

INTO   v_integer

FROM   (SELECT   DECODE(p,

'IV',

4,

'IX',

9,

'XL',

40,

'XC',

90,

'CD',

400,

'CM',

900,

'I',

1,

'V',

5,

'X',

10,

'L',

50,

'C',

100,

'D',

500,

'M',

1000)

z

FROM   (SELECT   REGEXP_SUBSTR(p_roman_str,

'IV|IX|XL|XC|CD|CM|I|V|X|L|C|D|M',

1,

n)

p

FROM   (    SELECT   LEVEL n

FROM   DUAL

CONNECT BY   LEVEL <= 15))

WHERE   p IS NOT NULL);

IF TO_CHAR(v_integer, 'fmRN') = UPPER(TRIM(p_roman_str)) THEN

RETURN v_integer;

ELSE

RETURN NULL;

END IF;

END roman_value;
``````
0

Author Comment

Thanks - I wish I could run this code - I don't have privileges in the db to create functions.  Is there anyway to change this to a select only --- otherwise, I think I'm going to have to pay someone off to run this for me.

I included ABC Corporation Limited - because with the first query - the C was retrieved as the roman numeral.

I can't think of a time where there would be over 3999 - if there is - most likely would be part of the text - or a mistake.

Thanks!!
0

LVL 73

Accepted Solution

try this...
``````WITH x AS (SELECT   'MGM XXIX:' coname FROM DUAL

UNION ALL

SELECT   'MGM XIX, One' coname FROM DUAL

UNION ALL

SELECT   'MGM XIX' coname FROM DUAL

UNION ALL

SELECT   'MGM D XIX' coname FROM DUAL

UNION ALL

SELECT   'MGM DDDDD XIX' coname FROM DUAL

UNION ALL

SELECT   'ABC CORPORATION LIMITED' coname FROM DUAL

UNION ALL

SELECT   'ABC CORP XII LTD' coname FROM DUAL

UNION ALL

SELECT   'ABC CORP XL, LTD' coname FROM DUAL)

SELECT   coname, roman_numeral

FROM   (  SELECT   coname, roman_numeral, SUM(DECODE(p,

'IV',

4,

'IX',

9,

'XL',

40,

'XC',

90,

'CD',

400,

'CM',

900,

'I',

1,

'V',

5,

'X',

10,

'L',

50,

'C',

100,

'D',

500,

'M',

1000))

z

FROM   (SELECT   coname,

roman_numeral,

REGEXP_SUBSTR(roman_numeral, 'IV|IX|XL|XC|CD|CM|I|V|X|L|C|D|M', 1, n)

p

FROM   (SELECT   coname,

REGEXP_REPLACE(

REGEXP_SUBSTR(coname, ' [IVXLCDM]+([ ,:]|\$)', 1, i),

'[^IVXLCDM]',

NULL

)

roman_numeral,

i

FROM   x, (    SELECT   LEVEL i

FROM   DUAL

CONNECT BY   LEVEL < 10)),

(    SELECT   LEVEL n

FROM   DUAL

CONNECT BY   LEVEL <= 15))

GROUP BY   coname, roman_numeral)

WHERE   TO_CHAR(z, 'fmRN') = UPPER(TRIM(roman_numeral))
``````
0

Author Comment

That works pretty good -- very slow (running against 2 million records).  It took about 23 seconds to return the first 20k.

Some of the false positives now...
Washington DC            DC
A & C Co                      C
A & D Co                      C

Do you think the easiest way to eliminate the C only (probably D and DC also) value would be to not select it after it's calculated or before?... (SELECT * FROM Roman_Numeral_Table where Roman_Numeral <> 'C')

ABC CORPORATION LIMITED didn't return in the results - (with a blank roman numeral)

Thanks again.
0

LVL 73

Expert Comment

well, that "could" work

you've got some special circumstances there because you have valid roman numerals that are proper delimited but you're eliminating them by "human" recognition
not by parsing rules.

Depending on your indexing it might be to identify special strings and strip them from the input before you start

select * from your table where coname not like 'Washington DC%' and coname not like 'A & C % and coname not like 'A & D %'

or, good roman numerals might follow the bad ones then  strip out those substrings (this will make it slower but accurate)

select regexp_replace(coname,'(Washington DC)|(A & C)|(A &D)',null) coname from your table
0

LVL 73

Expert Comment

you can remove the UPPER and TRIM functions too, those are carry overs from the function
0

LVL 73

Expert Comment

you can also make it faster by reducing the counter loops
The first "connect by level" counter is 1..10 so it can check up to 10 candidate strings.  If you'll never have that many, reduce it.

Similarly the roman numeral counter 1..15 can be reduced if you'll have smaller numbers. it can be reduced 14 if you'll never go over 3887
13 if you'll never go over 2887,  12 if you'll never go over 1887, 11 if never over 887, 10 if never over 387, 9 if never over 287, 8 if not over 187,
7 if not over 87,  and so on
0

Author Comment

Thanks for the quick response!!  I'm still trying to understand how this works:

When I change the Level i counter to 1, I receive the same results - but very quick
(the time on 10k records with level i = 10 was 20 seconds - compared to 1.8 seconds where level i = 1).... What does the Level i counter do?

Is there a way to retrieve the rows where no roman numerals are identified in the data?
0

LVL 73

Expert Comment

The first "connect by level" counter is 1..10 so it can check up to 10 candidate strings.  If you'll never have that many, reduce it.

it's for counting substrings that look like they "might" be roman numerals.

DDDDD for instance,  has all roman numerals in it but it's not valid.

So if you had "My test string DDDDD  XXIX"

DDDDD would be 1
XXIX would be 2

the DDDDD would be thrown out later

by setting it to 1 you are effectively saying never check beyond the first substring that looks like it might be valid.
if it's that's ok, fine just make sure of your data

0

Author Closing Comment

Thanks again!! I always pick up something new with your answers (besides the original question)
0

LVL 73

Expert Comment

glad I could help
0

## Join & Write a Comment Already a member? Login.

### Suggested Solutions

From the Oracle SQL Reference (http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/queries006.htm) we are told that a join is a query that combines rows from two or more tables, views, or materialized views. This article provides a glimpsâ€¦
From implementing a password expiration date, to datatype conversions and file export options, these are some useful settings I've found in Jasper Server.
This video shows, step by step, how to configure Oracle Heterogeneous Services via the Generic Gateway Agent in order to make a connection from an Oracle session and access a remote SQL Server database table.
This video shows how to recover a database from a user managed backup

#### 746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

#### Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!