Query comparing two tables with different format needed.

bfuchs
bfuchs used Ask the Experts™
on
Hi Experts,

I need to compare two files containing same data with different format.

They have following columns

Patients Name
Caregiver name
Visit Date
Visit time from
Visit Time to

would need to get a list of all unmatched records.
its only considered a match if all 5 fields are the same.

Attached sample of those files, where record in both files is considered a match.

Here are the differences, one of the files has the following
1- first and last name concatenated into one field.
2- has data in parenthesis after the names which should be ignored.
3- format of time is 1459-2304
Visit-Report.xls
V_TestHHASchedule.xls
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
PortletPaulEE Topic Advisor
Most Valuable Expert 2014
Awarded 2013

Commented:
Is this data stored in Access or in SQL Server?
i.e. which database syntax do you want for the comparison?

what date/time is "1459-2304"  in yyyy-mm-dd hh:mm format ? (i.e. how is this compared to a normal date/time?)
hi,
currently data is coming from 3rd party in Excel format, can import anywhere...
would prefer Access if possible...
1459-2304 is HHMM-HHMM
Thanks,
Ben
EE Topic Advisor
Most Valuable Expert 2014
Awarded 2013
Commented:
Visit-Report.xls
+-------+--------------+------------+--------------------------------------+----------------------------------+-----------------+-----------+--------------+-----------+----------+--------------+---------------+-------------+---------+------------+--------+------------+-------------+----+----------+---------------+---------+
| RowID | AltPatientID | MedicaidNo |             PatientName              |             AideName             | CoordinatorName | VisitDate | ScheduleTime | VisitTime | Duration | ContractName | ServiceCode_1 | PayrateCode | PayRate | Discipline | Billed | InvoiceNum | InvoiceDate | TS | TotalHrs | TotalDuration | Field22 |
+-------+--------------+------------+--------------------------------------+----------------------------------+-----------------+-----------+--------------+-----------+----------+--------------+---------------+-------------+---------+------------+--------+------------+-------------+----+----------+---------------+---------+
|     1 |              |            | test last p  test first p (WW-21345) | test c last test c first (11710) |                 | 19-Nov-18 |              | 1459-2304 | 08:05    |              |               |             |         |            |        |            |             |    |          |               |         |
+-------+--------------+------------+--------------------------------------+----------------------------------+-----------------+-----------+--------------+-----------+----------+--------------+---------------+-------------+---------+------------+--------+------------+-------------+----+----------+---------------+---------+

V_TestHHASchedule.xls
+---------------------+--------------------+-------------------+----------------------+---------------------+-------------------+------------------------------+--------------------+----------------------------+--------------------------+-------------------------+-------------------------+--------------------+----------------------+--------------------+----------------------+-----------------+
| Schedule_ScheduleID | Patients_FirstName | Patients_LastName | Caregivers_FirstName | Caregivers_LastName | Caregivers_Gender | Caregivers_TimeAttendancePIN | Schedule_VisitDate | Schedule_ScheduleStartTime | Schedule_ScheduleEndTime | Schedule_VisitStartTime |  Schedule_VisitEndTime  | Schedule_IsDeleted | Patients_AdmissionID | Schedule_PatientID | Schedule_CaregiverID | Patients_TeamID |
+---------------------+--------------------+-------------------+----------------------+---------------------+-------------------+------------------------------+--------------------+----------------------------+--------------------------+-------------------------+-------------------------+--------------------+----------------------+--------------------+----------------------+-----------------+
|                     | test first p       | test last p       | test c first         | test c last         |                   |                              | 2018-11-19 00:00   |                            |                          | 11/19/2018  14:59:00 PM | 11/19/2018  23:04:00 PM |                    |                      |                    |                      |                 |
+---------------------+--------------------+-------------------+----------------------+---------------------+-------------------+------------------------------+--------------------+----------------------------+--------------------------+-------------------------+-------------------------+--------------------+----------------------+--------------------+----------------------+-----------------+

Open in new window

I believe it will be easier to concatenate the already separated columns rather than attempting the reverse.
I also suggest a similar approach to the time (compare using the hhmm-hhmm format).
Along these lines in SQL Server 2012 or later.
SELECT
    Patients_LastName + ' ' + Patients_FirstName     AS PatientName
  , Caregivers_LastName + ' ' + Caregivers_FirstName AS AideName
  , format( Schedule_VisitDate, 'dd-MMM-yy' )        AS VisitDate
  , format (Schedule_VisitStartTime, 'hhmm-' + FORMAT (Schedule_VisitEndTime, 'hhmm') AS VisitTime
FROM V_TestHHASchedule

EXCEPT

SELECT
    RTRIM( LEFT( PatientName, CHARINDEX( ' (', PatientName + ' (' ) ) )
  , AideName
  , VisitDate
  , VisitTime
FROM Visit-Report

Open in new window

@PortletPaul,
Wonder why I did not get an email notifying your post, neither did it showed up under notifications, perhaps something is malfunctioning in EE.
Along these lines in SQL Server 2012 or later.
So you mean its not compatible with earlier versions of SQL?
Currently I use SQL 2008, what entails to convert it to that version?
Thanks,
Ben
PortletPaulEE Topic Advisor
Most Valuable Expert 2014
Awarded 2013

Commented:
the format() function was introduced in SQL2012 (which is 6 years ago already)

Instead of FORMAT() you have to use CONVERT() instead which isn't that simple to do I'm afraid, but its because your db version is old.
see: https://www.experts-exchange.com/articles/12315/SQL-Server-Date-Styles-formats-using-CONVERT.html?headerLink=workspace_article
Mark WillsTopic Advisor, Page Editor
Distinguished Expert 2018

Commented:
For the string columns, I would use CHARINDEX() in SQL or INSTR() in Access and check if those columns (from V_TestHHASchedule) EXISTS in the VISITS-REPORT WHERE charindex(Patients_Firstname,Patientname) > 0 and charindex(Patients_Lastname,Patientname) > 0  (etc).

Access has INSTR but (more logically) the search string is first and the item being searched for is second ie INSTR(patientname,patients_firstname) > 0 and INSTR(patientname,patients_lastname) > 0  (etc)

Same again for caregiver (checking if it is contained within Aidename) and any other string.

Now for VisitDate, I would use datediff() e.g. in SQL Server : datediff(d,Schedule_VisitDate,visitdate) = 0
In Access : datediff("d",Schedule_VisitDate,visitdate) = 0

So that leaves us with the time....

And going to have to convert to the VisitTime format

In Access :   Format(Schedule_VisitStartTime,"hhnn")+"-"+Format(Schedule_VisitEndTime,"hhnn")  = VisitTime

and while we can use Format in SQL Server - it only happened from SQL 2012 and is basically the same as Access (use single quotes)

In SQL2008 : replace(convert(varchar(5),Schedule_VisitStartTime,108)+'-'+convert(varchar(5),Schedule_VisitEndTime,108),':','')

So, the basic approach in either SQL Server or Access will be :

Select *
from  V_TestHHASchedule V
WHERE EXISTS (SELECT NULL FROM VISITS-REPORT R WHERE <the above conditions hold true>)

and that will show rows from  V_TestHHASchedule that are also in Visits_Report

to show non-matching, then use : NOT EXISTS

Does that make sense ?
Mark WillsTopic Advisor, Page Editor
Distinguished Expert 2018
Commented:
So, SQL Server code :
SELECT V.*
FROM V_TestHHASchedule AS V
WHERE Exists (SELECT NULL FROM [Visit_Report] R 
                       WHERE  charindex(V.patients_firstname,R.patientname) > 0 
                       and charindex(V.patients_lastname,R.patientname) > 0
                       and charindex(V.CareGivers_lastname,R.AideName) > 0
                       and charindex(V.CareGivers_firstname,R.AideName) > 0
                       and datediff(d,V.Schedule_VisitDate,R.visitdate) = 0
                       and replace(convert(varchar(5),V.Schedule_VisitStartTime,108)+'-'+convert(varchar(5),V.Schedule_VisitEndTime,108),':','') = R.VisitTime

);

Open in new window

And the Access query would be  :
SELECT V.*
FROM V_TestHHASchedule AS V
WHERE Exists (SELECT NULL FROM [VISIT REPORT] R 
                       WHERE  INSTR(R.patientname,V.patients_firstname) > 0 
                       and INSTR(R.patientname,V.patients_lastname) > 0
                       and INSTR(R.AideName,V.CareGivers_lastname) > 0
                       and INSTR(R.AideName,V.CareGivers_firstname) > 0
                       and datediff("d",V.Schedule_VisitDate,R.visitdate) = 0
                       and  Format(V.Schedule_VisitStartTime,"hhnn")+"-"+Format(V.Schedule_VisitEndTime,"hhnn")  = R.VisitTime

);

Open in new window

@Mark,
Just testing your SQL version, and no records comes up.
when changing it to not exists then all records come up.
also we must use the not exists as I'm looking for the unmatched ones...
Thanks,
Ben
@Mark,

The time conversion is definitely a not functioning properly.

see below

SELECT V.Schedule_VisitStartTime,V.Schedule_VisitEndTime,replace(convert(varchar(5),V.Schedule_VisitStartTime,108)+'-'+convert(varchar(5),V.Schedule_VisitEndTime,108),':','')
FROM V_TestHHASchedule AS V

Open in new window

11/10/2018 7:00:00 AM      11/10/2018 7:00:00 PM      11/10-11/10
11/17/2018 7:15:00 AM      11/17/2018 7:00:00 PM      11/17-11/17
11/2/2018 7:00:00 AM      11/2/2018 9:00:00 AM      11/2/-11/2/
11/2/2018 2:00:00 PM      11/2/2018 7:00:00 PM      11/2/-11/2/
11/9/2018 7:15:00 AM      11/9/2018 9:00:00 AM      11/9/-11/9/
11/9/2018 1:45:00 PM      11/9/2018 7:15:00 PM      11/9/-11/9/
11/9/2018 9:00:00 AM      11/9/2018 1:45:00 PM      11/9/-11/9/
11/3/2018 7:00:00 AM      11/3/2018 7:00:00 PM      11/3/-11/3/
11/8/2018 7:00:00 AM      11/8/2018 2:00:00 PM      11/8/-11/8/
11/8/2018 2:00:00 PM      11/8/2018 7:00:00 PM      11/8/-11/8/
11/2/2018 9:00:00 AM      11/2/2018 2:00:00 PM      11/2/-11/2/
11/16/2018 7:15:00 AM      11/16/2018 9:00:00 AM      11/16-11/16

Thanks,
Ben
Thanks experts,
I ended up using a combination of both.

SELECT V.*
FROM V_TestHHASchedule AS V
WHERE  not Exists (SELECT NULL FROM [VisitReport] R
                       --WHERE  charindex(V.patients_firstname,R.patientname) > 0
                       where RTRIM( LEFT( r.PatientName, CHARINDEX( ' (', r.PatientName + ' (' ) ) ) = RTRIM( LEFT( v.patients_lastname, CHARINDEX( ' (', v.patients_lastname + ' (' ) ) ) + ' ' + RTRIM( LEFT( v.patients_firstname, CHARINDEX( ' (', v.patients_firstname + ' (' ) ) )
                       and RTRIM( LEFT( r.aideName, CHARINDEX( ' (', r.aideName + ' (' ) ) ) = RTRIM( LEFT( v.caregivers_lastname, CHARINDEX( ' (', v.caregivers_lastname + ' (' ) ) ) + ' ' + RTRIM( LEFT( v.caregivers_firstname, CHARINDEX( ' (', v.caregivers_firstname + ' (' ) ) )

                       and datediff(d,V.Schedule_VisitDate,R.visitdate) = 0
                     
                       and replace(convert(varchar(5),V.Schedule_VisitStartTime,108)+'-'+convert(varchar(5),V.Schedule_VisitEndTime,108),':','') = R.scheduled

Thank you!
Hi Experts,

One fix if you dont mind...
The file V_TestHHASchedule when its midnight it comes with only a date portion without the times.
How can that be converted to 00:00 so it will match the other file?

Thanks,
Ben
Mark WillsTopic Advisor, Page Editor
Distinguished Expert 2018

Commented:
It all depends on datatypes.

I too found that the time portion was mucking up after importing the spreadsheet.

Found that it was coming in as varchar. A simple change to the structure of the table (keeping data in place) :
alter table V_TestHHASchedule alter column schedule_visitdate datetime
alter table V_TestHHASchedule alter column schedule_visitstarttime datetime
alter table V_TestHHASchedule alter column schedule_visitendtime datetime

Open in new window

And everthing worked as expected....

It does depend a bit on how that table is structured and populated.

This is how I imported :
set dateformat MDY

select * into V_TestHHASchedule
FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0', 'Excel 12.0 Xml;Database=c:\mrwtest\ee\V_TestHHASchedule.xls;HDR=yes', 'SELECT * FROM [V_TestHHASchedule$]') as a

go

select * from INFORMATION_SCHEMA.columns where table_name = 'V_TestHHASchedule'

go

alter table V_TestHHASchedule alter column schedule_visitdate datetime
alter table V_TestHHASchedule alter column schedule_visitstarttime datetime
alter table V_TestHHASchedule alter column schedule_visitendtime datetime

go

Open in new window

By way of comparison, look at what happens when convert works with a string vs a datetime :
select replace(convert(varchar(5),'2018-12-12 07:08:09.123',108)+'-'+convert(varchar(5),'2018-12-12 08:09:10.123',108),':','')

select replace(convert(varchar(5),cast('2018-12-12 07:08:09.123' as datetime),108)+'-'+convert(varchar(5),cast('2018-12-12 08:09:10.123' as datetime),108),':','')

Open in new window

In the string date, the STYLE CODE of 108 basically tells Convert how to read the string. When it is a datetime, the style code of 108 tells converrt how to format the datetime.

Hope that helps....
PortletPaulEE Topic Advisor
Most Valuable Expert 2014
Awarded 2013

Commented:
Life is simpler when the format() function becomes available.
(Hint Hint)
Mark WillsTopic Advisor, Page Editor
Distinguished Expert 2018

Commented:
In another thread....

Ben has said they have to comply with the lowest common denominator - SQL 2008

It was strongly suggested to plan upgrades to SQL 2012 or more recent.

There is so much more to offer in SQL2012+

And by the way, if it was a varchar column, then the format(myvarchar,'date/time specifiers') would fail in SQL Server.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial