<

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x

Ye Olde Generic Parsing Module

Published on
11,462 Points
1,862 Views
1 Endorsement
Last Modified:
Awarded
BRIEF HISTORY

From time to time during my journey through the "IT" realm, I have been faced with the necessity to code a program or module to parse some source text and produce an array containing the tokens, operators and delimiters thereof, the ultimate goal being that of performing some type of analysis of the original sentence.

All started during the dinosaur era when I was tasked with coding a program to translate one flavor of “Business Basic” to another flavor of “Business Basic”. And surely there was the need to code a parsing routine to produce a structure that could be analyzed and cross-referenced in order to automatically produce the equivalent statements in the target syntax.

Some years passed by and during a project to design a relational database we needed to create a metadata catalog from the source COBOL-based legacy application’s record definitions, working-storage areas, CICS screens, programs and other related structures . Consequently a parsing module was required.

More years passed by, until recently I needed to analyze arithmetic formulas using the Shunting Yard algorithm and yet another parsing program was coded to go.

OVERVIEW

According to Wikipedia, the term ”parsing” is used to refer to the formal analysis by a computer of a sentence or string of words and breaking it down into its constituents, resulting in a parse tree showing their syntactic relation to each other.

In the market today you may find multiple “parser” programs, but all are integrated into some kind of application, compiler or program which primary focus is on the lexical analysis of the source text based on some syntax and grammatical rules.

The Text2Token PL/SQL package I am sharing is the result of an attempt to code a simple program that will take text input and build a data structure useful for any type of analysis (or whatever). And which functionality based on user-defined operators and delimiters is limited to the task of splitting this sentence into its constituents.

MODULE Components
Variables, Constants and Arrays
Name		Type		Description
--------------- --------------- ---------------------------------
Operator	VARCHAR2(30)	Default operators: '!^*/=+-&|'
Op_Association	VARCHAR2(30)	Operator associations: 'RRLLLLLLL'
Op_Precedence	VARCHAR2(30)	Operator precedence: '443322210'
Quote		VARCHAR2(5)	Quote characters: '"'''
Blank		VARCHAR2(5)	Space and tab characters: ' '||CHR(9)
Comma		CHAR(1)		Comma character: ','
Tok#		PLS_INTEGER	Count of Tokens
Ops#		PLS_INTEGER	Count of Operators
Res#		PLS_INTEGER	Count of Result elements

Text_Array 	TABLE OF VARCHAR2	Main type definition for arrays

Ops		Text_Array	Array of operators
Tokens		Text_Array	Tokenized source
Results		Text_Array	Result array

Open in new window

SUBPROGRAMS
Subprogram	Description
--------------  ----------------------------------------------
Initialize	Setup initial options
Tokenize 	Return array of string constituents
Parse_Csv	Return field values of delimited string
Shunting Yard	Return Shunting Yard array

Open in new window

INITIALIZE Procedure

This procedure initializes the following processing options:

Option		   Description
---------------    ---------------------------------------------------
Operator	   Change/Set the default list of operators
Op_Association	   Change/Set the corresponding association of operators
Op_Precedence	   Change/Set the operator precedence
Use_Quotes	   Enable quoted strings to be tokenized regardless of
		   embedded delimiters
Discard_Blanks	   Ignore spaces and tabs (exclude blanks from result array) 
Include_Operators  Includes operators in the result array
Debug_On	   Enables debugging messages

Open in new window

Syntax
Text2Token.INITIALIZE (
		  P_Operator		VARCHAR2 DEFAULT NULL
		, P_Op_Association	VARCHAR2 DEFAULT NULL
		, P_Op_Precedence	VARCHAR2 DEFAULT NULL
		, P_Include_Operators	BOOLEAN DEFAULT TRUE
		, P_Use_Quotes		BOOLEAN DEFAULT TRUE
		, P_Discard_Blanks	BOOLEAN DEFAULT TRUE
		, P_Debug		BOOLEAN DEFAULT FALSE);

Open in new window

TOKENIZE Function

This is the main engine that separates the source text into its various components according to the optional operators or delimiters.

Syntax

Text2Token.Tokenize (
	 P_Source_Text	VARCHAR2
	, P_Delimiters	VARCHAR2 DEFAULT NULL );

Returns 
	Text2Token.Tokens%TYPE;

Open in new window


PARSE_CSV Function

Calls the Tokenize function and return an array containing the fields from the delimited source string.

Syntax

Text2Token.Parse_Csv (
	  P_ P_Source_Text	 VARCHAR2
	, P_Delimiters	VARCHAR2 DEFAULT Comma );

Returns 
	Text2Token.Results%TYPE;

Open in new window

SHUNTING_YARD Function

Calls the Tokenize function and returns an array containing the Shunting Yard array of elements.

Syntax

Text2Token.Shunting_Yard (
	  P_ P_Source_Text	 VARCHAR2
	, P_Delimiters	VARCHAR2 DEFAULT Comma );

Returns 
	Text2Token.Results%TYPE;

Open in new window


Here is the code:

text2token-pkg.sql

EXAMPLE

 
SQL> DECLARE
  2      V_Text                  VARCHAR2 ( 1000 );
  3      V_Results               Text2token.Results%TYPE;
  4
  5      PROCEDURE Print_Result ( P_Ttl VARCHAR2 )
  6      IS
  7      BEGIN
  8          DBMS_OUTPUT.Put_Line ( '***** ' || P_Ttl || ' Results *****'||CHR(10)||'String ['||V_Text||']' );
  9
 10          FOR I IN 1 .. V_Results.COUNT
 11          LOOP
 12              DBMS_OUTPUT.Put_Line ( TO_CHAR ( I, '000.' ) ||'  '|| V_Results ( I ) );
 13          END LOOP;
 14      END;
 15  BEGIN
 16      Text2token.Initialize ( );
 17      V_Text           := '3 + 4 * 2 / ( 1 - 5 ) ^ 2 ^ 3';
 18      V_Results        := Text2token.Tokenize ( V_Text );
 19      Print_Result ( 'Tokenize' );
 20
 21      Text2token.Initialize ( );
 22      V_Results        := Text2token.Shurting_Yard ( V_Text );
 23      Print_Result ( 'Shurting_Yard' );
 24
 25      Text2token.Initialize ( );
 26      V_Text           := 'We are,the people,out fishing,with,"O''Brien, Elka",at the lake.';
 27      V_Results        := Text2token.Parse_Csv ( V_Text );
 28      Print_Result ( 'Parse_Csv' );
 29
 30  END;
 31  /
***** Tokenize Results *****
String [3 + 4 * 2 / ( 1 - 5 ) ^ 2 ^ 3]
001.  3
002.  +
003.  4
004.  *
005.  2
006.  /
007.  (
008.  1
009.  -
010.  5
011.  )
012.  ^
013.  2
014.  ^
015.  3
***** Shurting_Yard Results *****
String [3 + 4 * 2 / ( 1 - 5 ) ^ 2 ^ 3]
001.  3
002.  4
003.  2
004.  *
005.  1
006.  5
007.  -
008.  2
009.  3
010.  ^
011.  ^
012.  /
013.  +
***** Parse_Csv Results *****
String [We are,the people,out fishing,with,"O'Brien, Elka",at the lake.]
001.   We are
002.   the people
003.   out fishing
004.   with
005.   O'Brien, Elka
006.   at the lake.

PL/SQL procedure successfully completed.

Open in new window

1
Author:MikeOM_DBA
  • 3
4 Comments
LVL 29

Author Comment

by:MikeOM_DBA
Ok thanks,
I'll remove the disclaimer.
0
 

Administrative Comment

by:Eric AKA Netminder
Mike,

Nice, handy article. It has been published, and has been awarded EE-Approved status besides.

Congratulations!

ericpete
Page Editor
0
LVL 29

Author Comment

by:MikeOM_DBA
Thanks!
The source code seems to be missing -- added the source code.
0
LVL 29

Author Comment

by:MikeOM_DBA
Corrected "Shunting_Yard" and other typos in the PL/SQL code.

   text2token-pkg.sql
0

Featured Post

Big Business Goals? Which KPIs Will Help You

The most successful MSPs rely on metrics – known as key performance indicators (KPIs) – for making informed decisions that help their businesses thrive, rather than just survive. This eBook provides an overview of the most important KPIs used by top MSPs.

This video shows how to Export data from an Oracle database using the Original Export Utility.  The corresponding Import utility, which works the same way is referenced, but not demonstrated.
This videos aims to give the viewer a basic demonstration of how a user can query current session information by using the SYS_CONTEXT function

Keep in touch with Experts Exchange

Tech news and trends delivered to your inbox every month