Go Premium for a chance to win a PS4. Enter to Win

x

Query Syntax

49K

Solutions

19K

Contributors

SQL (Structured Query Language) is designed to be used in conjunction with relational database products as of a means of working with sets of data. SQL consists of data definition, data manipulation, and procedural elements. Its scope includes data insert, query, update and delete, schema creation and modification, and data access control.

Share tech news, updates, or what's on your mind.

Sign up to Post

Recursive SQL is one of the most fascinating and powerful and yet dangerous feature offered in many modern databases today using a Common Table Expression (CTE) first introduced in the ANSI SQL 99 standard. The first implementations of CTE began appearing in 2006-7 and as of MySQL version 8.0 and MariaDB version 10.2.2. 


Common Table Expression ( the WITH RECURSIVE … clause )

Recursive SQL is achieved by using Common Table Expression or CTE which is a temporary named resultset, derived from a simple query and defined within the execution scope of SELECT, INSERT, UPDATE or DELETE statements. CTE is often used to improve readability of a SQL code by simplifying and/or prise the code into more readable sections. One “feature” of CTE is the ability to reference recursively the named resultset, hence recursive SQL. 


With recursive SQL queries you can achieve things you would not have imagined being possible with this type of SQL and at the speed it executes. You can solve many types of business problems and even rewrite some complex SQL/application logic down to a simple recursiveSQL-call to the database. 

Some example of useful usage of recursive CTE is that you can use it to find gaps in data, create organization charts and create test data.

What I like most of using recursive SQL query is the ability to produce large amount of test-data in amount of seconds or few minutes, depending on complexity of the data and the available database/systems resources. Using recursive CTE we can produce many hundreds, thousands or millions of records in very short period of time which is then only limited to available database memory and other database/system resources. Recursive query produces test data way faster compared to other test-data procedures that I have seen and experienced.
Also, it has been proven that recursive queries outperforms other queries that take days to execute on huge amount of data by running in several minutes.


The word recursive says it all. You have a query that repeatedly calls itself with some starting point and that which is EXTREMELY IMPORTANT an ending point (a fail-safe exit as I call it). If you don't have a fail-safe exit or your recursive formula goes beyond it you are in deep trouble. Your query will go into aninfinite loop resulting in very high CPU and very high LOG utilization which will lead to memory and/or storage exhaustion. If your query goes haywire you must think very fast and stop it. If you are unable to do so then alert your DBA immediately so he/she can prevent the database system of choking by killing the runnaway thread. 


Examples

Let's look at some simple examples that shows how easy and effective it can be. 


with RECURSIVE mydates (level,nextdate) as (
                                        select 1 level, FROM_UNIXTIME(RAND()*2147483647) nextdate from DUAL
                                        union all 
                                        select level+1, FROM_UNIXTIME(RAND()*2147483647) nextdate
                                        from mydates
                                        where level < 100000
                                        )
                                        SELECT nextdate from mydates
                                        );


Here we produce 100k of  random dates. The level is our counter and fail-safe exit to safely exit the recursive query. The  line ( no 2 ) is our starting point and the line (no 4-5 ) is the recursive call with the ending point in the where clause ( no 6) . Then the last lines ( no 8 - 9 ) is the call to execute the recursive query and retrieve the data. 

The above query executed in 0.84 sec and outputted 100k of rows. Note that even though you put a LIMIT 10   on the query  as shown below it will first produce the 100k rows and then limit the result to the first 10 rows in the same amount of time or 0.84 sec


with RECURSIVE mydates (level,nextdate) as (
                                        select 1 level, FROM_UNIXTIME(RAND()*2147483647) nextdate from DUAL
                                        union all 
                                        select level+1, FROM_UNIXTIME(RAND()*2147483647) nextdate
                                        from mydates
                                        where level < 100000
                                        )
                                        SELECT nextdate from mydates
                                        ) LIMIT 10;


You can also create a table and by that physically store the data by surrounding the CTE query like this. 


create table somedates as (
                                        with RECURSIVE mynumbers (level,nextdate) as (
                                        select 1 level, FROM_UNIXTIME(RAND()*2147483647) nextdate from DUAL
                                        union all 
                                        select level+1, FROM_UNIXTIME(RAND()*2147483647) nextdate
                                        from mynumbers
                                        where level < 100000
                                        )
                                        SELECT nextdate from mynumbers
                                        );

This query ran on my laptop in 2.93 seconds storing 100k rows of dates in a table called somedates.


Another simple example, a “gaps in data” solution,  which produces last 1000 dates to current date and compares it to the order table to find all dates that don't have any orders showing the newest date first.

For this example I created 1 million rows with random orderdates using recursive SQL in 32.95 seconds. Created index on (orderdate,orderid) on the the table in 9.12 seconds, deleted thousands of random rows to create a gap and ran the below query


with RECURSIVE mydates (level,nextdate) as (
                                        select 1 level, ADDDATE(CURRENT_DATE,INTERVAL -1000 DAY) nextdate from DUAL
                                        union all 
                                        select level+1, ADDDATE(nextdate,INTERVAL 1 DAY) nextdate
                                        from mydates
                                        where level < 1000
                                        )
                                        SELECT md.nextdate from mydates md left join myorders od on md.nextdate = od.order_date
                                        where od.order_date is null order by md.nextdate desc;


This query ran on my test data and looked for gaps in the last 1000 days from current date and returned 68 dates in 0.05 seconds


Conclusion

Here I have talked briefly about recursive queries and showed how useful and powerful it can be to solve problems in very short and effective way. However, when writing the recursive query be VERY careful NOT to forget the fail-safe exit and make sure you double check twice that the calculations used in the fail-safe exit produces a stop signal. 

0
Independent Software Vendors: We Want Your Opinion
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

This month, Experts Exchange sat down with resident SQL expert, Jim Horn, for an in-depth look into the makings of a successful career in SQL.
7
One of the most important things in an application is the query performance. This article intends to give you good tips to improve the performance of your queries.
2
 

Expert Comment

by:Hari Shankar
Comment Utility
Great Article. Thank u Vitor
1
An alternative to the "For XML" way of pivoting and concatenating result sets into strings, and an easy introduction to "common table expressions" (CTEs). Being someone who is always looking for alternatives to "work your data", I came across this solution, while refreshing my recursive CTE skills.
0
It is possible to export the data of a SQL Table in SSMS and generate INSERT statements. It's neatly tucked away in the generate scripts option of a database.
0
 

Expert Comment

by:Gregorio Méndez
Comment Utility
Mr. Strauss, what a good tip, thanks a lot for your contribution
1
In part one, we reviewed the prerequisites required for installing SQL Server vNext. In this part we will explore how to install Microsoft's SQL Server on Ubuntu 16.04.
2
This post looks at MongoDB and MySQL, and covers high-level MongoDB strengths, weaknesses, features, and uses from the perspective of an SQL user.
0
A Stored Procedure in Microsoft SQL Server is a powerful feature that it can be used to execute the Data Manipulation Language (DML) or Data Definition Language (DDL). Depending on business requirements, a single Stored Procedure can return different results based on different supplied parameters.
0
I'm trying, I really am. But I've seen so many wrong approaches involving date(time) boundaries I despair about my inability to explain it. I've seen quite a few recently that define a non-leap year as 364 days, or 366 days and the list goes on.

Put bluntly: Do NOT use "between ... and" (see "Beware of Between" too)

Instead: prescribe what you what using combinations of Greater Than, Less Than and Equals

ABSOLUTELY do NOT: try to fiddle with any upper boundaries, just use Less Than - it's that simple. Do NOT subtract a minute, or a second, or a millisecond. Do NOT subtract anything from the boundary, it isn't needed. Use " less than the next time unit" (next day, next month, next year).

Right, now I've got into all sorts of twists here, especially for sql server related folks, because they have a such myriad of favourite convert, datediff, dateadd combinations and approaches to this.

Stop. Why the complexity? First principle, don't apply functions to data unless it's absolutely needed. Well, for the purposes of evaluating a date(time) field: convert isn't needed, dateadd isn't needed, datediff isn't needed. DON'T do this ON THE DATA just to decide if the data is less than some other date(time) measure.

OK, some of this off my chest.

For the curious, here are some variants of deciding if some datetime values is in June or July. Not rocket science, but…
1
 
LVL 49

Author Comment

by:PortletPaul
Comment Utility
this was a blog, when we had those things, I made it into a draft article so I didn't lose it completely. I like the blog capability btw, still think it was a silly idea to remove it. It was drafted then and I really have no time for articles at the moment.

I'm comfortable the way it is. If others aren't I'm also OK with just making into a draft I can refer to at some waayyyyy distant time when I feel EE articles are something I will revisit.

Congo reference is due to some unwelcome changes in the profile, I may change it from time to time. Changed.
0
Tableau
This article describes how to use the timestamp of existing data in a database to allow Tableau to calculate the prior work day instead of relying on case statements or if statements to calculate the days of the week.
1
 
LVL 16

Author Comment

by:Brian Pringle
Comment Utility
Tableau does not have a good set of business days.  It will calculate today and yesterday or a range of dates relative to today, but calculating based on another anchor date is not intuitive.  There are numerous questions on the Tableau Web site asking for this.  The reason that we are not using an actual table to store the dates is that we may not always have business on the weekend and other times might.  The intention of this procedure is to let the system figure out when the last business date was from the data rather than a set calendar.

The data set has a large amount of data that gets imported during each business day.  I can query the database to get the distinct list of dates and then calculate from there.  The LOD functions in Tableau are the closest thing to getting that information.
0
 
LVL 66

Expert Comment

by:Jim Horn
Comment Utility
Very well written.  Voting Yes.
0
Industry Leaders: We Want Your Opinion!
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Composite queries are used to retrieve the results from joining multiple queries after applying any filters. UNION, INTERSECT, MINUS, and UNION ALL are some of the operators used to get certain desired results.​
1
 

Expert Comment

by:alsatham hussain
Comment Utility
i can easily understand.

Thanks
0
By Mark Wills

Formatting dates and times has aways presented a few challenges in T-SQL. Seems most other languages support a FORMAT command.

Well, time to rejoice because (finally) MS SQL 2012 (and more recent) now supports the FORMAT function. One little "gotcha" is you need .Net Framework because it leverages CLR. Which also means that it cannot be remoted (which is nothing new for CLR runtime).

So, with that out of the way, and for a large majority of T-SQL requirements, it is like any other system function. Probably easiest to show you :)

First, some dummy data. Create an order table with date and price columns, then populate :-
CREATE TABLE EE_tbl_Orders (CUSTOMER varchar(20),ORDER_DATE datetime, ORDER_PRICE decimal(10,4));


INSERT INTO EE_tbl_Orders (CUSTOMER, ORDER_DATE, ORDER_PRICE) VALUES ('Fred','2015-08-20 20:30:40',1234.5678);

Open in new window


The structre of the command is FORMAT ( value, format_str [], culture ] ) and can be found at MSDN Now, [culture] is optional and I tend to leave it out so it can default to the way my server / systems are set up.

So, with our Order Table, let us experiment with our new command...

SELECT Customer, FORMAT(Order_Date,'yyyy MMM dd'), Order_Price
FROM EE_tbl_Orders


-- Results

-- Fred      2015 Aug 20      1234.5678

Open in new window


Now, we can also do something with that price as well (and note the rounding)...

SELECT Customer, FORMAT(Order_Date,'yyyy MMM dd'), FORMAT(Order_Price,'C')
FROM EE_tbl_Orders
 
-- Results
-- Fred      2015 Aug 20      $1,234.57

Open in new window

8
 
LVL 66

Expert Comment

by:Jim Horn
Comment Utility
Darn handy, especially for those that are stuck supporting 2008R2 databases.  Voted Yes.
0
Occasionally there is a need to clean table columns, especially if you have inherited legacy data. There are obviously many ways to accomplish that, including elaborate UPDATE queries with anywhere from one to numerous REPLACE functions (even within REPLACE functions). Over time this can be quite unreadable. It can also become a maintenance chore where this is not a single one-off effort.

This happened in a recent question here on EE. I'd like to share the solution that came out of that thread. I have written a function called udf_CleanDataFn. It can be invoke within any query as follows:
 
SELECT dbo.udf_CleanDataFn(YourColumn, @Options)

@Options is a varchar(255) field, which can accept any of the following "options":
 
  • ltrim - This will left-trim the value in YourColumn.
  • rtrim - This will right-trim the value in YourColumn.
  • trim - This will both left- and right-trim the value.
  • nocomma - This will remove any and all comma's.
  • nospace - This will remove any and all spaces.
  • nopunc - This will remove all standard punctuations (!@#$% etc)
  • nonum - This will remove all numbers (0-9).
  • noalpha - This will remove all alpha characters (A-Z and a-z).
  • alphaonly - This will remove everything except alpha characters.
  • numonly - This will remove everything except numbers.

Following are a few working examples that will demonstrate the different results . This first example takes out only the commas:
 
SELECT dbo.udf_CleanDataFn('This !  is *  a  13,21  @  test', 'nocomma')
Result:
1
 

Expert Comment

by:Nick Sawe
Comment Utility
ok thanks, i will give it a go. thanks for sharing this by the way !
0
 
LVL 20

Author Comment

by:dsacker
Comment Utility
You're welcome. If it proves helpful, please click the "Helpful" button. :)
0
If you have heard of RFC822 date formats, they can be quite a challenge in SQL Server. RFC822 is an Internet standard format for email message headers, including all dates within those headers. The RFC822 protocols are available in detail at:
 

The problem is that RFC822 date formats do not always fit within SQL Server's valid date formats. The CONVERT/CAST functions within SQL Server will only convert a limited variety of date formats. When trying to convert RFC822 date formats, some of the RFC822 accepted formats will result in the well-known T-SQL error:
 
Msg 241, Level 16, State 1, Line 1 
Conversion failed when converting date and/or time from character string.

According to the RFC822: Standard for ARPA Internet Text Messages, as long as a date adheres to the constraints in section 5 of the RFC822 protocols, it is a valid RFC822 date. Many RFC822 Dates fail to convert into a T-SQL datetime format.

For instance, the following is a valid RFC822 date:
 
Wed, 27 Sep 2006 21:49:19 GMT

This format (and a number of other RFC822 formats) will not convert to a datetime type. In addition SQL Server does not inherently recognize nor convert time zone codes. (See footnote 1 at the end of this article for a list of valid RFC822 date formats.)

I have written a function called …
1
 
LVL 20

Author Comment

by:dsacker
Comment Utility
Appreciate that. And thanks to Eric. He gave me some good pointers along the way.
0
 
LVL 66

Expert Comment

by:Jim Horn
Comment Utility
I ran the code blocks and the result was excellent.  Nice job.
0
PL/SQL can be a very powerful tool for working directly with database tables. Being able to loop will allow you to perform more complex operations, but can be a little tricky to write correctly. This article will provide examples of basic loops along with their Java loop equivalents to make creating useful PL/SQL loops much easier.
 

While

Under many circumstances a while(true) loop isn't the most useful tool but a basic while loop in PL/SQL can be very useful for performing repeating actions such as multiple simple insert statements.

Java


while (true)
{
   // do something
   if (exit_condition)
      break;
}

Open in new window


PL/SQL


DECLARE
BEGIN
   LOOP
      // do something
      IF exit_condition THEN
         EXIT;
      END IF;
   END LOOP;
END;
/

Open in new window



A more common while loop with a given breaking condition can be a cleaner way to perform similar tasks.

Java


while (condition)
{
   // do something
}

Open in new window


PL/SQL


DECLARE
BEGIN
   WHILE condition LOOP
      // do something
   END LOOP;
END;
/

Open in new window



For

A basic for loop can more clearly perform a distinct number of iterations of your statements.  I've often used this to test code-enforced thresholds by inserting the maximum number of rows into a table to ensure my code reacts correctly.

Java


for (int i=0; i>length; i++)
{
   // do something
}

Open in new window


PL/SQL


DECLARE
BEGIN
   FOR i IN 1..length LOOP
      // do something
   END LOOP;
END;
/

Open in new window


Java


for (int i=length; i<0; i--)
{
   // do something
}

Open in new window


PL/SQL


DECLARE
BEGIN
   FOR i IN REVERSE 1..length LOOP
      // do something
   END LOOP;
END;
/

Open in new window



Possibly the most useful loop is a for loop performed on a list of objects. This will allow you to do things like performing similar insert statements using different source data, allowing you to have more dynamic statements.

Java


for (object o : objects)
{
   // do something
}

Open in new window


PL/SQL

int objects:


Open in new window

2
Confronted with some SQL you don't know can be a daunting task.
It can be even more daunting if that SQL carries some of the old secret codes used in the Ye Olde query syntax, such as:
(+)     as used in Oracle;     *=     =*    as used in Sybase & SQL Server.

This bright new age uses ANSI standard syntax for joins, such as INNER JOIN or LEFT OUTER JOIN.
So how does one get rid of those awful old secret codes and use the far better, more standard, ANSI join syntax?

The following is my attempt to explain step by step how I do it. Please note that while this article uses an Oracle example, the process applies to Ye Olde SQL regardless of vendor.

We will start with the following SQL. This will be new and unfamiliar to you; Well it was new and unfamiliar to me also (as I didn't invent it) but for this article that is perfect. You do not need to understand the tables/fields for this exercise, Also, we care only about the FROM and WHERE clauses here.
select *
from proj.busrepts busr, proj.pbase pb, proj.WQuota wq, finance.pedproj pedp, finance.qiiview qii 
where busr.projectid = pedp.projectid(+) and 
busr.program = pedp.pedprogram  and 
busr.patnum   =  pb.fileno(+) and 
busr.curtype like '%Self%' and 
busr.wq_statusid = wq.statusid(+) and 
busr.DOS >= TO_DATE('01-JUL-2010','DD-MON-YYYY') and 
busr.balance >5  and 
(qii.address not in ('1 Pudding Lane', 'Diagon Alley') and qii.address not like '%Home%' 
and qii.address not like '%HOME%' and qii.address not like 

Open in new window

2
 
LVL 34

Expert Comment

by:Mike Eghtebas
Comment Utility
As time goes on, the information discussed here will be used more often because the SQL syntaxes has to be updated. Thank you for this article. I will come back to this article over and over again because this is not for one sitting.

Mike
0
If you find yourself in this situation “I have used SELECT DISTINCT but I’m getting duplicates” then I'm sorry to say you are using the wrong SQL technique as it only does one thing which is: produces whole rows that are unique. If the results you are getting still have “duplicates” then you are expecting something else that SELECT DISTINCT does not do. In truth, “select distinct” is a rather dull animal with minimal decision making powers.
 
The most common solution to this “I still got duplicates” is to use GROUP BY instead, but please read on. (or, cheat and click this.)
The first thing to recognize about SELECT DISTINCT is that overall it’s a bit boring. It’s isn’t a function so if you are using a query like this:
 
SELECT DISTINCT (column1), column2, ... column40
You will be disappointed to learn that those parentheses around column1 do NOT work to “make column1 unique, then by magic decide how to simplify the other columns”.  

And: With or without the parentheses, SELECT DISTINCT does not know that you really want it to concentrate on column1 or (column1 and column2) ignoring the other columns.

SELECT DISTINCT is a “row operator”, not a function, and not magic. It will evaluate ALL of the columns that you have listed and remove rows that have exactly the same values in all …
5
 
LVL 143

Expert Comment

by:Guy Hengel [angelIII / a3]
Comment Utility
thanks for the link/reference to my article :)
0
'Between' is such a common word we rarely think about it but in SQL it has a very specific definition we should be aware of. While most database vendors will have their own unique phrases to describe it (see references at end) the concept in common with most definitions is that BETWEEN equates to:

[expression] >= [low boundary] and [expression] <= [high boundary]

Note there are 2 equal signs! This means the [low boundary] and [high boundary] are BOTH included. Also note there is a definite order, [low boundary] must be given first or there will not be any matches at all.

This 'inclusive' definition may not agree with your generally held interpretation of "between". For example if I drive my car between 2 trees I don't want to include the trees (else I have tree damage and/or car damage). But if I catch a jet between New York and London, both cities are included (to catch the jet and to land the jet). So a "car between trees" is different to a "jet between cities"; in SQL 'between' is used in the 'inclusive' sense of "jet between cities".

Below we examine BETWEEN in more depth using a set of queries, and of course we need some data for this:sample data, 10 rows, "raw_data" = 10.1, Category  is 1 char (A to Z) and some datesThis first query is deliberately 'odd', it asks for category between ‘A’ and ‘A’ and this may not seem logical (if you hang on to a "car between trees" viewpoint). Let’s test it:
select
id, category, raw_data
from sample
where 
   category BETWEEN 'A' AND 

Open in new window

22
 
LVL 52

Expert Comment

by:Vitor Montalvão
Comment Utility
Excellent article. Gave my vote of course.
Cheers
0
 
LVL 26

Expert Comment

by:Nick67
Comment Utility
There's also the fun and hellacious problem of how dates are displayed in Access.
You can make a field up as DateTime and format it as "dd-mmm-yyyy"
It all looks good.  But it is stored as a Double with a Long integer value based on the number of days elapsed since 30-Dec-1899 and the decimal as a fraction of the day as the time.
Now, add some values to it using Date() and others using Now() and watch the fun ensue as you use BETWEEN...AND and get some records in at the  'boundary' and others out.
Those entered via Date() will be dd-mmm-yyyy 12:00:00 AM and will be in.
Those enter via Now() will be dd-mmm-yyyy and some time component and be out.

That trips up a lot of folks!
0

Introduction

Hopefully the following mnemonic and, ultimately, the acronym it represents is common place to all those reading: Please Excuse My Dear Aunt Sally (PEMDAS). Briefly, though, PEMDAS is used to signify the order of operations being Parenthesis (or brackets), Exponents (or roots which can be expressed as a fractional exponent 1/n), Multiplication/Division and Addition/Subtraction.

If you have ever doubted how Math can be used in everyday life, then hopefully this is a good example. I think most forget when dealing with technology or challenges in general, it is all about approaching the situation with a definitive list of known truths and methodologies how to solve issues. You don't have to have the answers going in, just keep it simple and apply what you know like Math rules. SQL syntax is no different and so we will explore in this article how PEMDAS can aid programmers in avoiding some pitfalls or ensuring proper results through being very explicit about the order in which code is evaluated.

If you do this successfully, it should make even complex formulas in your SQL syntax easy to produce.

So let's get started...

During the editing of an article some months back, the topic of order of operations came up for discussion and there the data looked as similar to the below:

Open in new window

6
 
LVL 60

Author Comment

by:Kevin Cross
Comment Utility
Yes, true.  I guess I am trained to think Exponents instead of Orders | Indices as well as my use of parenthesis, but BODMAS or BIDMAS works just fine! *smile*

Thanks for reading!
0
 
LVL 18

Expert Comment

by:Simon
Comment Utility
Thanks Kevin. Good article. I think that when I went to school they weren't much into mnemonics. I like both PEMDAS and BODMAS, but the precedence is just second nature. I very rarely get it wrong, but I don't consciously think about it.
0
Concerto's Cloud Advisory Services
LVL 5
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

As they say in love and is true in SQL: you can sum some Data some of the time, but you can't always aggregate all Data all the time!

1. Introduction

By the end of this Article it is my intention to bring the meaning and value of the above quote to those who chose to read this whether as a beginning SQL programmer or an intermediate one inexperienced with this little tidbit of SQL syntax: conditional aggregation.

Conditional aggregation is simply the use of aggregates under one or more conditions, thus, potentially altering your results of functions like SUM(), COUNT(), etcetera in a particular column while allowing you to analyze your record set as a whole.

In other words, consider the following data:
----------------+-------+------------
expert          |isMale |primaryZone
----------------+-------+------------
acperkins       |   1   | MSSQL2K5
aneeshattingal  |   1   | MSSQL2K5
angeliii        |   1   | MSSQLSVR
Helen_Feddema   |   0   | MSACCESS
matthewspatrick |   1   | SQLSYNTX
mbizup          |   0   | MSACCESS
mwvisa1         |   1   | SQLSYNTX
ScottPletcher   |   1   | MSSQLSVR
sdstuber        |   1   | ORACLEDB
slightwv        |   1   | ORACLEDB
----------------+-------+------------

Open in new window


««setup»»
To reproduce the above data, you can execute SQL similar to the below T-SQL example created for MS SQL Server.
(table structure -- create statement)
create table SQLExperts(
   expert varchar(50) primary key, 
   isMale bit, 
   primaryZone char(8)
);

Open in new window


(sample data -- insert statement)
insert into SQLExperts(expert, isMale, primaryZone)
select 'mbizup', 0, 'MSACCESS' union 
select 'Helen_Feddema', 0, 'MSACCESS' union
select 'matthewspatrick', 1, 'SQLSYNTX' union
select 'mwvisa1', 1, 'SQLSYNTX' union
select 'angeliii', 1, 'MSSQLSVR' union
select 'ScottPletcher', 1, 'MSSQLSVR' union
select 'acperkins', 1, 'MSSQL2K5' union
select 'aneeshattingal', 1, 'MSSQL2K5' union
select 'sdstuber', 1, 'ORACLEDB' union
select 'slightwv', 1, 'ORACLEDB'
;

Open in new window


To start, if we wanted to know how many Experts are on the list we would simply use:
select count(*) as cnt from SQLExperts;

Open in new window


Now, what if we wanted to know how many female Experts are on the list?
Then we would add a WHERE conditional clause to the query:
select count(*) as cnt 
from SQLExperts
where isMale = 0;

Open in new window

This works, but what is the percent of female Experts to the total?

2. Conditional Aggregates: Basics

Without conditional aggregates, to answer the previous question you would need a secondary query to get the total to do the percentage calculation.

This code look familiar?

Open in new window

21
 
LVL 61

Expert Comment

by:mbizup
Comment Utility
Kevin,

Great article!  

A few suggestions:

1. In general, it is a good practice to avoid embedding VBA functions in your Access queries, and it usually can be done.  They slow things down (Domain Aggregate functions are the most notorious for this).

I haven't actually benchmarked  this query using iif (which I don't think would have nearly the same performance hit as a domain aggregate function):

select sum(iif(isMale=0, 1, 0)) * 100.0 / count(*)
from SQLExperts;

Open in new window



But it can be written equivalently without embedded VBA as:

SELECT Sum(-1 * Not IsMale)*100/Count(*) AS [Female Experts]
FROM SQLExperts;

Open in new window



2. "NULLIF function if available on your SQL platform to generate nulls ..."
Do you want to include an alternative for other platforms as a sidenote?

select sum(-1 * (primaryZone <> 'MSACCESS')) as cnt
from SQLExperts;

Open in new window



3.  Under the Pivot section, this is an option for systems (such as Access) that don't support CASE:
select PrimaryZone
     , Sum(-1 * Not [IsMale]) as Female
     , Sum(-1 * [IsMale]) as Male
from SQLExperts
group by primaryZone;
 

Open in new window

0
 
LVL 60

Author Comment

by:Kevin Cross
Comment Utility
Thanks, Miriam, that is great feedback!
0
In database programming, custom sort order seems to be necessary quite often, at least in my experience and time here at EE.

Within the realm of custom sorting is the sorting of numbers and text independently (i.e., treating the numbers as numbers instead of character string which causes '9' to sort after '10').  I won't get into the debate on whether this is an appropriate way to store data in this article, but will presume we can't change it and so must deal with how to sort correctly.  

With that in mind, the following article was written to show how important an IsNumeric() function can be in solving this problem and further how to specifically to do this in MySQL server.

Since this can be useful to MySQL users of all experience levels, it goes through details for beginners; therefore, for intermediate/advanced users who simply want to cut to the solution, I have organized the article into the following sections to make it easy to jump around:

The Data.
What's wrong with the natural sort order?
Custom Sorting in MS SQL Server: IsNumeric() Primer.
MySQL IsNumeric() Workaround.
MySQL IsNumeric() User Defined Function (UDF).
Custom Sorting in MySQL: Using IsNumeric() UDF.

Before we get started.
Please note that, although experience can be varied, all will need to have MySQL Server, a tool such as MySQL Query Browser, and a basic knowledge of how to execute SQL statements as pre-requisites.

Download MySQL Community Server and GUI Tools
6
 
LVL 15

Expert Comment

by:dbbishop
Comment Utility
Something you need to be aware of, in SQL Server, the value '1234E32' will pass the ISNUMERIC() check (returns 1), but will fail the CAST() if casting to any data type other than FLOAT. Any string of numbers with either a single 'E' or 'D' in the string will be interpreted as scientific notation.
0
 
LVL 27

Expert Comment

by:skullnobrains
Comment Utility
likewise in mysql

select 3e2 REGEXP '^300$';
+--------------------+
| 3e2 REGEXP '^300$' |
+--------------------+
|                  1 |
+--------------------+
1 row in set (0.00 sec)

... but if you quote the '3e2', it won't happen.
0
by Mark Wills

PIVOT is a great facility and solves many an EAV (Entity - Attribute - Value) type transformation where we need the information held as data within a column to become columns in their own right. Now, in some cases that is relatively easy to do, other times it can be a challenge (and that is why we are here).

Let's have a quick look at the PIVOT function...

SELECT
          <display_column_list>

FROM

          (SELECT <source_columns> as Column_Source
                         ,<column_to_be_aggregated> as Column_Value
                         ,<column_with_new_column_names> as Column_List
           FROM <datasource> ) as DataSource

PIVOT
          (<aggregate_function>(Column_Value)  FOR  Column_List  IN
          ([<new_column_1_heading>],[<new_column_2_heading>],...,[<new_column_N_heading>]) ) PivotTable

ORDER BY <column_number_or_name>;

That looks pretty straight forward, except for one or two small details:

1) First up, we need to know the <display_column_list>
    easy enough, just do a Select * instead and problem solved (except that it does control display sequence)

2) Secondly, we need to know and hard code the new column headings as in :
     ([<new_column_1_heading>],[<new_column_2_heading>],...,[<new_column_N_heading>])

And that last point is often the big challenge. Not so bad if we are doing something static like "months in a year", just list out those months in sequence, and make sure you can cast the…
32
 
LVL 51

Author Comment

by:Mark Wills
Comment Utility
Yeah, should be...

Would add in another definition for the totals (similar to @columns).

But then we can no longer simply cheat with "select * from" we would have to add in that new computed column, and probably a parameter to decide when / if to use.

If you are feeling adventurous, there is a slightly more involved Article that should be able to do that for you : http://www.experts-exchange.com/Programming/Languages/SQL_Syntax/A_4256-Dynamic-Pivot-Procedure-without-the-Pivot-function.html

Cheers,
Mark
0
 

Expert Comment

by:Wil999
Comment Utility
Thanks voor the quick reply Mark, i will look in to the article!
0

Query Syntax

49K

Solutions

19K

Contributors

SQL (Structured Query Language) is designed to be used in conjunction with relational database products as of a means of working with sets of data. SQL consists of data definition, data manipulation, and procedural elements. Its scope includes data insert, query, update and delete, schema creation and modification, and data access control.