Community Pick: Many members of our community have endorsed this article.
Editor's Choice: This article has been selected by our editors as an exceptional contribution.

Ye Olde Generic Parsing Module

Published:
Updated:
BRIEF HISTORY

From time to time during my journey through the "IT" realm, I have been faced with the necessity to code a program or module to parse some source text and produce an array containing the tokens, operators and delimiters thereof, the ultimate goal being that of performing some type of analysis of the original sentence.

All started during the dinosaur era when I was tasked with coding a program to translate one flavor of “Business Basic” to another flavor of “Business Basic”. And surely there was the need to code a parsing routine to produce a structure that could be analyzed and cross-referenced in order to automatically produce the equivalent statements in the target syntax.

Some years passed by and during a project to design a relational database we needed to create a metadata catalog from the source COBOL-based legacy application’s record definitions, working-storage areas, CICS screens, programs and other related structures . Consequently a parsing module was required.

More years passed by, until recently I needed to analyze arithmetic formulas using the Shunting Yard algorithm and yet another parsing program was coded to go.

OVERVIEW

According to Wikipedia, the term ”parsing” is used to refer to the formal analysis by a computer of a sentence or string of words and breaking it down into its constituents, resulting in a parse tree showing their syntactic relation to each other.

In the market today you may find multiple “parser” programs, but all are integrated into some kind of application, compiler or program which primary focus is on the lexical analysis of the source text based on some syntax and grammatical rules.

The Text2Token PL/SQL package I am sharing is the result of an attempt to code a simple program that will take text input and build a data structure useful for any type of analysis (or whatever). And which functionality based on user-defined operators and delimiters is limited to the task of splitting this sentence into its constituents.

MODULE Components
Variables, Constants and Arrays
Name		Type		Description
                      --------------- --------------- ---------------------------------
                      Operator	VARCHAR2(30)	Default operators: '!^*/=+-&|'
                      Op_Association	VARCHAR2(30)	Operator associations: 'RRLLLLLLL'
                      Op_Precedence	VARCHAR2(30)	Operator precedence: '443322210'
                      Quote		VARCHAR2(5)	Quote characters: '"'''
                      Blank		VARCHAR2(5)	Space and tab characters: ' '||CHR(9)
                      Comma		CHAR(1)		Comma character: ','
                      Tok#		PLS_INTEGER	Count of Tokens
                      Ops#		PLS_INTEGER	Count of Operators
                      Res#		PLS_INTEGER	Count of Result elements
                      
                      Text_Array 	TABLE OF VARCHAR2	Main type definition for arrays
                      
                      Ops		Text_Array	Array of operators
                      Tokens		Text_Array	Tokenized source
                      Results		Text_Array	Result array

Open in new window

SUBPROGRAMS
Subprogram	Description
                      --------------  ----------------------------------------------
                      Initialize	Setup initial options
                      Tokenize 	Return array of string constituents
                      Parse_Csv	Return field values of delimited string
                      Shunting Yard	Return Shunting Yard array

Open in new window

INITIALIZE Procedure

This procedure initializes the following processing options:

Option		   Description
                      ---------------    ---------------------------------------------------
                      Operator	   Change/Set the default list of operators
                      Op_Association	   Change/Set the corresponding association of operators
                      Op_Precedence	   Change/Set the operator precedence
                      Use_Quotes	   Enable quoted strings to be tokenized regardless of
                      		   embedded delimiters
                      Discard_Blanks	   Ignore spaces and tabs (exclude blanks from result array) 
                      Include_Operators  Includes operators in the result array
                      Debug_On	   Enables debugging messages

Open in new window

Syntax
Text2Token.INITIALIZE (
                      		  P_Operator		VARCHAR2 DEFAULT NULL
                      		, P_Op_Association	VARCHAR2 DEFAULT NULL
                      		, P_Op_Precedence	VARCHAR2 DEFAULT NULL
                      		, P_Include_Operators	BOOLEAN DEFAULT TRUE
                      		, P_Use_Quotes		BOOLEAN DEFAULT TRUE
                      		, P_Discard_Blanks	BOOLEAN DEFAULT TRUE
                      		, P_Debug		BOOLEAN DEFAULT FALSE);

Open in new window

TOKENIZE Function

This is the main engine that separates the source text into its various components according to the optional operators or delimiters.

Syntax

Text2Token.Tokenize (
                      	 P_Source_Text	VARCHAR2
                      	, P_Delimiters	VARCHAR2 DEFAULT NULL );
                      
                      Returns 
                      	Text2Token.Tokens%TYPE;

Open in new window


PARSE_CSV Function

Calls the Tokenize function and return an array containing the fields from the delimited source string.

Syntax

Text2Token.Parse_Csv (
                      	  P_ P_Source_Text	 VARCHAR2
                      	, P_Delimiters	VARCHAR2 DEFAULT Comma );
                      
                      Returns 
                      	Text2Token.Results%TYPE;

Open in new window

SHUNTING_YARD Function

Calls the Tokenize function and returns an array containing the Shunting Yard array of elements.

Syntax

Text2Token.Shunting_Yard (
                      	  P_ P_Source_Text	 VARCHAR2
                      	, P_Delimiters	VARCHAR2 DEFAULT Comma );
                      
                      Returns 
                      	Text2Token.Results%TYPE;

Open in new window


Here is the code:

text2token-pkg.sql

EXAMPLE

 
SQL> DECLARE
                        2      V_Text                  VARCHAR2 ( 1000 );
                        3      V_Results               Text2token.Results%TYPE;
                        4
                        5      PROCEDURE Print_Result ( P_Ttl VARCHAR2 )
                        6      IS
                        7      BEGIN
                        8          DBMS_OUTPUT.Put_Line ( '***** ' || P_Ttl || ' Results *****'||CHR(10)||'String ['||V_Text||']' );
                        9
                       10          FOR I IN 1 .. V_Results.COUNT
                       11          LOOP
                       12              DBMS_OUTPUT.Put_Line ( TO_CHAR ( I, '000.' ) ||'  '|| V_Results ( I ) );
                       13          END LOOP;
                       14      END;
                       15  BEGIN
                       16      Text2token.Initialize ( );
                       17      V_Text           := '3 + 4 * 2 / ( 1 - 5 ) ^ 2 ^ 3';
                       18      V_Results        := Text2token.Tokenize ( V_Text );
                       19      Print_Result ( 'Tokenize' );
                       20
                       21      Text2token.Initialize ( );
                       22      V_Results        := Text2token.Shurting_Yard ( V_Text );
                       23      Print_Result ( 'Shurting_Yard' );
                       24
                       25      Text2token.Initialize ( );
                       26      V_Text           := 'We are,the people,out fishing,with,"O''Brien, Elka",at the lake.';
                       27      V_Results        := Text2token.Parse_Csv ( V_Text );
                       28      Print_Result ( 'Parse_Csv' );
                       29
                       30  END;
                       31  /
                      ***** Tokenize Results *****
                      String [3 + 4 * 2 / ( 1 - 5 ) ^ 2 ^ 3]
                      001.  3
                      002.  +
                      003.  4
                      004.  *
                      005.  2
                      006.  /
                      007.  (
                      008.  1
                      009.  -
                      010.  5
                      011.  )
                      012.  ^
                      013.  2
                      014.  ^
                      015.  3
                      ***** Shurting_Yard Results *****
                      String [3 + 4 * 2 / ( 1 - 5 ) ^ 2 ^ 3]
                      001.  3
                      002.  4
                      003.  2
                      004.  *
                      005.  1
                      006.  5
                      007.  -
                      008.  2
                      009.  3
                      010.  ^
                      011.  ^
                      012.  /
                      013.  +
                      ***** Parse_Csv Results *****
                      String [We are,the people,out fishing,with,"O'Brien, Elka",at the lake.]
                      001.   We are
                      002.   the people
                      003.   out fishing
                      004.   with
                      005.   O'Brien, Elka
                      006.   at the lake.
                      
                      PL/SQL procedure successfully completed.

Open in new window

1
2,691 Views

Comments (3)

Author

Commented:
Ok thanks,
I'll remove the disclaimer.

Author

Commented:
Thanks!
The source code seems to be missing -- added the source code.

Author

Commented:
Corrected "Shunting_Yard" and other typos in the PL/SQL code.

   text2token-pkg.sql

Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.