Query comparing two tables with different format needed.

Hi Experts,

I need to compare two files containing same data with different format.

They have following columns

Patients Name
Caregiver name
Visit Date
Visit time from
Visit Time to

would need to get a list of all unmatched records.
its only considered a match if all 5 fields are the same.

Attached sample of those files, where record in both files is considered a match.

Here are the differences, one of the files has the following
1- first and last name concatenated into one field.
2- has data in parenthesis after the names which should be ignored.
3- format of time is 1459-2304
Visit-Report.xls
V_TestHHASchedule.xls
LVL 6
bfuchsAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

PortletPaulEE Topic AdvisorCommented:
Is this data stored in Access or in SQL Server?
i.e. which database syntax do you want for the comparison?

what date/time is "1459-2304"  in yyyy-mm-dd hh:mm format ? (i.e. how is this compared to a normal date/time?)
bfuchsAuthor Commented:
hi,
currently data is coming from 3rd party in Excel format, can import anywhere...
would prefer Access if possible...
1459-2304 is HHMM-HHMM
Thanks,
Ben
PortletPaulEE Topic AdvisorCommented:
Visit-Report.xls
+-------+--------------+------------+--------------------------------------+----------------------------------+-----------------+-----------+--------------+-----------+----------+--------------+---------------+-------------+---------+------------+--------+------------+-------------+----+----------+---------------+---------+
| RowID | AltPatientID | MedicaidNo |             PatientName              |             AideName             | CoordinatorName | VisitDate | ScheduleTime | VisitTime | Duration | ContractName | ServiceCode_1 | PayrateCode | PayRate | Discipline | Billed | InvoiceNum | InvoiceDate | TS | TotalHrs | TotalDuration | Field22 |
+-------+--------------+------------+--------------------------------------+----------------------------------+-----------------+-----------+--------------+-----------+----------+--------------+---------------+-------------+---------+------------+--------+------------+-------------+----+----------+---------------+---------+
|     1 |              |            | test last p  test first p (WW-21345) | test c last test c first (11710) |                 | 19-Nov-18 |              | 1459-2304 | 08:05    |              |               |             |         |            |        |            |             |    |          |               |         |
+-------+--------------+------------+--------------------------------------+----------------------------------+-----------------+-----------+--------------+-----------+----------+--------------+---------------+-------------+---------+------------+--------+------------+-------------+----+----------+---------------+---------+

V_TestHHASchedule.xls
+---------------------+--------------------+-------------------+----------------------+---------------------+-------------------+------------------------------+--------------------+----------------------------+--------------------------+-------------------------+-------------------------+--------------------+----------------------+--------------------+----------------------+-----------------+
| Schedule_ScheduleID | Patients_FirstName | Patients_LastName | Caregivers_FirstName | Caregivers_LastName | Caregivers_Gender | Caregivers_TimeAttendancePIN | Schedule_VisitDate | Schedule_ScheduleStartTime | Schedule_ScheduleEndTime | Schedule_VisitStartTime |  Schedule_VisitEndTime  | Schedule_IsDeleted | Patients_AdmissionID | Schedule_PatientID | Schedule_CaregiverID | Patients_TeamID |
+---------------------+--------------------+-------------------+----------------------+---------------------+-------------------+------------------------------+--------------------+----------------------------+--------------------------+-------------------------+-------------------------+--------------------+----------------------+--------------------+----------------------+-----------------+
|                     | test first p       | test last p       | test c first         | test c last         |                   |                              | 2018-11-19 00:00   |                            |                          | 11/19/2018  14:59:00 PM | 11/19/2018  23:04:00 PM |                    |                      |                    |                      |                 |
+---------------------+--------------------+-------------------+----------------------+---------------------+-------------------+------------------------------+--------------------+----------------------------+--------------------------+-------------------------+-------------------------+--------------------+----------------------+--------------------+----------------------+-----------------+

Open in new window

I believe it will be easier to concatenate the already separated columns rather than attempting the reverse.
I also suggest a similar approach to the time (compare using the hhmm-hhmm format).
Along these lines in SQL Server 2012 or later.
SELECT
    Patients_LastName + ' ' + Patients_FirstName     AS PatientName
  , Caregivers_LastName + ' ' + Caregivers_FirstName AS AideName
  , format( Schedule_VisitDate, 'dd-MMM-yy' )        AS VisitDate
  , format (Schedule_VisitStartTime, 'hhmm-' + FORMAT (Schedule_VisitEndTime, 'hhmm') AS VisitTime
FROM V_TestHHASchedule

EXCEPT

SELECT
    RTRIM( LEFT( PatientName, CHARINDEX( ' (', PatientName + ' (' ) ) )
  , AideName
  , VisitDate
  , VisitTime
FROM Visit-Report

Open in new window

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
IT Pros Agree: AI and Machine Learning Key

We’d all like to think our company’s data is well protected, but when you ask IT professionals they admit the data probably is not as safe as it could be.

bfuchsAuthor Commented:
@PortletPaul,
Wonder why I did not get an email notifying your post, neither did it showed up under notifications, perhaps something is malfunctioning in EE.
Along these lines in SQL Server 2012 or later.
So you mean its not compatible with earlier versions of SQL?
Currently I use SQL 2008, what entails to convert it to that version?
Thanks,
Ben
PortletPaulEE Topic AdvisorCommented:
the format() function was introduced in SQL2012 (which is 6 years ago already)

Instead of FORMAT() you have to use CONVERT() instead which isn't that simple to do I'm afraid, but its because your db version is old.
see: https://www.experts-exchange.com/articles/12315/SQL-Server-Date-Styles-formats-using-CONVERT.html?headerLink=workspace_article
Mark WillsTopic AdvisorCommented:
For the string columns, I would use CHARINDEX() in SQL or INSTR() in Access and check if those columns (from V_TestHHASchedule) EXISTS in the VISITS-REPORT WHERE charindex(Patients_Firstname,Patientname) > 0 and charindex(Patients_Lastname,Patientname) > 0  (etc).

Access has INSTR but (more logically) the search string is first and the item being searched for is second ie INSTR(patientname,patients_firstname) > 0 and INSTR(patientname,patients_lastname) > 0  (etc)

Same again for caregiver (checking if it is contained within Aidename) and any other string.

Now for VisitDate, I would use datediff() e.g. in SQL Server : datediff(d,Schedule_VisitDate,visitdate) = 0
In Access : datediff("d",Schedule_VisitDate,visitdate) = 0

So that leaves us with the time....

And going to have to convert to the VisitTime format

In Access :   Format(Schedule_VisitStartTime,"hhnn")+"-"+Format(Schedule_VisitEndTime,"hhnn")  = VisitTime

and while we can use Format in SQL Server - it only happened from SQL 2012 and is basically the same as Access (use single quotes)

In SQL2008 : replace(convert(varchar(5),Schedule_VisitStartTime,108)+'-'+convert(varchar(5),Schedule_VisitEndTime,108),':','')

So, the basic approach in either SQL Server or Access will be :

Select *
from  V_TestHHASchedule V
WHERE EXISTS (SELECT NULL FROM VISITS-REPORT R WHERE <the above conditions hold true>)

and that will show rows from  V_TestHHASchedule that are also in Visits_Report

to show non-matching, then use : NOT EXISTS

Does that make sense ?
Mark WillsTopic AdvisorCommented:
So, SQL Server code :
SELECT V.*
FROM V_TestHHASchedule AS V
WHERE Exists (SELECT NULL FROM [Visit_Report] R 
                       WHERE  charindex(V.patients_firstname,R.patientname) > 0 
                       and charindex(V.patients_lastname,R.patientname) > 0
                       and charindex(V.CareGivers_lastname,R.AideName) > 0
                       and charindex(V.CareGivers_firstname,R.AideName) > 0
                       and datediff(d,V.Schedule_VisitDate,R.visitdate) = 0
                       and replace(convert(varchar(5),V.Schedule_VisitStartTime,108)+'-'+convert(varchar(5),V.Schedule_VisitEndTime,108),':','') = R.VisitTime

);

Open in new window

And the Access query would be  :
SELECT V.*
FROM V_TestHHASchedule AS V
WHERE Exists (SELECT NULL FROM [VISIT REPORT] R 
                       WHERE  INSTR(R.patientname,V.patients_firstname) > 0 
                       and INSTR(R.patientname,V.patients_lastname) > 0
                       and INSTR(R.AideName,V.CareGivers_lastname) > 0
                       and INSTR(R.AideName,V.CareGivers_firstname) > 0
                       and datediff("d",V.Schedule_VisitDate,R.visitdate) = 0
                       and  Format(V.Schedule_VisitStartTime,"hhnn")+"-"+Format(V.Schedule_VisitEndTime,"hhnn")  = R.VisitTime

);

Open in new window

bfuchsAuthor Commented:
@Mark,
Just testing your SQL version, and no records comes up.
when changing it to not exists then all records come up.
also we must use the not exists as I'm looking for the unmatched ones...
Thanks,
Ben
bfuchsAuthor Commented:
@Mark,

The time conversion is definitely a not functioning properly.

see below

SELECT V.Schedule_VisitStartTime,V.Schedule_VisitEndTime,replace(convert(varchar(5),V.Schedule_VisitStartTime,108)+'-'+convert(varchar(5),V.Schedule_VisitEndTime,108),':','')
FROM V_TestHHASchedule AS V

Open in new window

11/10/2018 7:00:00 AM      11/10/2018 7:00:00 PM      11/10-11/10
11/17/2018 7:15:00 AM      11/17/2018 7:00:00 PM      11/17-11/17
11/2/2018 7:00:00 AM      11/2/2018 9:00:00 AM      11/2/-11/2/
11/2/2018 2:00:00 PM      11/2/2018 7:00:00 PM      11/2/-11/2/
11/9/2018 7:15:00 AM      11/9/2018 9:00:00 AM      11/9/-11/9/
11/9/2018 1:45:00 PM      11/9/2018 7:15:00 PM      11/9/-11/9/
11/9/2018 9:00:00 AM      11/9/2018 1:45:00 PM      11/9/-11/9/
11/3/2018 7:00:00 AM      11/3/2018 7:00:00 PM      11/3/-11/3/
11/8/2018 7:00:00 AM      11/8/2018 2:00:00 PM      11/8/-11/8/
11/8/2018 2:00:00 PM      11/8/2018 7:00:00 PM      11/8/-11/8/
11/2/2018 9:00:00 AM      11/2/2018 2:00:00 PM      11/2/-11/2/
11/16/2018 7:15:00 AM      11/16/2018 9:00:00 AM      11/16-11/16

Thanks,
Ben
bfuchsAuthor Commented:
Thanks experts,
I ended up using a combination of both.

SELECT V.*
FROM V_TestHHASchedule AS V
WHERE  not Exists (SELECT NULL FROM [VisitReport] R
                       --WHERE  charindex(V.patients_firstname,R.patientname) > 0
                       where RTRIM( LEFT( r.PatientName, CHARINDEX( ' (', r.PatientName + ' (' ) ) ) = RTRIM( LEFT( v.patients_lastname, CHARINDEX( ' (', v.patients_lastname + ' (' ) ) ) + ' ' + RTRIM( LEFT( v.patients_firstname, CHARINDEX( ' (', v.patients_firstname + ' (' ) ) )
                       and RTRIM( LEFT( r.aideName, CHARINDEX( ' (', r.aideName + ' (' ) ) ) = RTRIM( LEFT( v.caregivers_lastname, CHARINDEX( ' (', v.caregivers_lastname + ' (' ) ) ) + ' ' + RTRIM( LEFT( v.caregivers_firstname, CHARINDEX( ' (', v.caregivers_firstname + ' (' ) ) )

                       and datediff(d,V.Schedule_VisitDate,R.visitdate) = 0
                     
                       and replace(convert(varchar(5),V.Schedule_VisitStartTime,108)+'-'+convert(varchar(5),V.Schedule_VisitEndTime,108),':','') = R.scheduled

Thank you!
bfuchsAuthor Commented:
Hi Experts,

One fix if you dont mind...
The file V_TestHHASchedule when its midnight it comes with only a date portion without the times.
How can that be converted to 00:00 so it will match the other file?

Thanks,
Ben
Mark WillsTopic AdvisorCommented:
It all depends on datatypes.

I too found that the time portion was mucking up after importing the spreadsheet.

Found that it was coming in as varchar. A simple change to the structure of the table (keeping data in place) :
alter table V_TestHHASchedule alter column schedule_visitdate datetime
alter table V_TestHHASchedule alter column schedule_visitstarttime datetime
alter table V_TestHHASchedule alter column schedule_visitendtime datetime

Open in new window

And everthing worked as expected....

It does depend a bit on how that table is structured and populated.

This is how I imported :
set dateformat MDY

select * into V_TestHHASchedule
FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0', 'Excel 12.0 Xml;Database=c:\mrwtest\ee\V_TestHHASchedule.xls;HDR=yes', 'SELECT * FROM [V_TestHHASchedule$]') as a

go

select * from INFORMATION_SCHEMA.columns where table_name = 'V_TestHHASchedule'

go

alter table V_TestHHASchedule alter column schedule_visitdate datetime
alter table V_TestHHASchedule alter column schedule_visitstarttime datetime
alter table V_TestHHASchedule alter column schedule_visitendtime datetime

go

Open in new window

By way of comparison, look at what happens when convert works with a string vs a datetime :
select replace(convert(varchar(5),'2018-12-12 07:08:09.123',108)+'-'+convert(varchar(5),'2018-12-12 08:09:10.123',108),':','')

select replace(convert(varchar(5),cast('2018-12-12 07:08:09.123' as datetime),108)+'-'+convert(varchar(5),cast('2018-12-12 08:09:10.123' as datetime),108),':','')

Open in new window

In the string date, the STYLE CODE of 108 basically tells Convert how to read the string. When it is a datetime, the style code of 108 tells converrt how to format the datetime.

Hope that helps....
PortletPaulEE Topic AdvisorCommented:
Life is simpler when the format() function becomes available.
(Hint Hint)
Mark WillsTopic AdvisorCommented:
In another thread....

Ben has said they have to comply with the lowest common denominator - SQL 2008

It was strongly suggested to plan upgrades to SQL 2012 or more recent.

There is so much more to offer in SQL2012+

And by the way, if it was a varchar column, then the format(myvarchar,'date/time specifiers') would fail in SQL Server.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Query Syntax

From novice to tech pro — start learning today.