group by min and max query

pma111
pma111 used Ask the Experts™
on
I am trying to write a query based on users home drives. We have imported a huge CSV inventory of data into an MSSQL, which is basically an inventory of all files each user has within their H drive of which there are over 3000 users and almost 1.4 million files stored between them. I am trying to 'group by' the username, and then get a min and max date (what we would use in Access), of a field called LastWriteTime (type date/time) based one each users home drive, to basically show 'for this users H:\drive this was the first LastWriteTime recorded, and this was the last LastWriteTime recorded. Having the stats per home drive folder would be perfect. So something like...


user - FirstModified - LastModified
\\server\share\user1 - 01/01/2019 - 22/11/2019
\\server\share\user2 - 01/06/2019 - 10/11/2019

The issue/challenge is, the field which represents folder/path (called FullName, type nvarchar(max)) I essentially need to use a segment of the text, e .g. \\server\share\username\ as the unique identifier on which to produce stats on. As entries lower down the inventory will be in format \\server\share\username\folder1\worddoc.docx etc. I am not sure if this would need to be done in multiple stages, but the consistent thing is, the username part would always be that before the  5th back slash, or we could even use the content between the 4th and 5th back slash, e.g. \\server\share\username\ (as everyone's H drive is unique and based upon their AD login name). Given the size of the data I cannot easily do any form of text to columns manipulation in another tool like Excel to extract the username in order to do some min/max date stats, hence we had to get the data into MSSQL in the first place.
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
PortletPaulEE Topic Advisor
Most Valuable Expert 2014
Awarded 2013

Commented:
sample of raw data would help (just a few rows for just a few users)

what version of mssql?
can you use string_split() ?

Author

Commented:
MSSQL 2016 SP2.

For some random data, its essentially a database based upon the output of this AD command, just select a directory and its the same data its just we have imported it into MSSQL to get some stats on it:

Get-childitem -path “\\server\share” -recurse | select fullname,name,attributes,lastwritetime,creationtime,LastAccessTime, length | Export-Csv c:\users\me\desktop\inventory.csv –NoTypeInformation

Open in new window


Not familiar with string split to be honest.
EE Topic Advisor
Most Valuable Expert 2014
Awarded 2013
Commented:
JSON_VALUE is available in SQL 2016 as gives us access to locating the nth substring within a string as long as the string has been converted into JSON format.

so
\\server\share\username\folder1\worddoc.docx
needs to be converted to
,"","server","share","username","folder1","worddoc.docx"

then we can pull out the 4th element of that converted string, whch is
username

CREATE TABLE mytable(
   file_path VARCHAR(45) NOT NULL
  ,file_date DATE  NOT NULL
);
INSERT INTO mytable(file_path,file_date) VALUES ('\\server\share\usernamea\folder1\worddoc.docx','2019-12-01');
INSERT INTO mytable(file_path,file_date) VALUES ('\\server\share\usernameb\folder1\worddoc.docx','2018-10-11');
INSERT INTO mytable(file_path,file_date) VALUES ('\\server\share\usernamea\folder1\worddoc.docx','2017-03-27');
INSERT INTO mytable(file_path,file_date) VALUES ('\\server\share\usernameb\folder1\worddoc.docx','2018-10-01');
INSERT INTO mytable(file_path,file_date) VALUES ('\\server\share\usernamea\folder1\worddoc.docx','2019-12-08');
INSERT INTO mytable(file_path,file_date) VALUES ('\\server\share\usernameb\folder1\worddoc.docx','2019-10-06');

Open in new window

DECLARE @position INT = 4;

select
     ca.username
   , min(t.file_date) min_date
   , max(t.file_date) max_date
from mytable t
cross apply (
    select
       JSON_VALUE('["' + REPLACE(file_path,'\','","') + '"]',CONCAT('$[',@position ,']'))
   ) ca (username)
group by
     ca.username

Open in new window

Result:
username	min_date	max_date
usernamea	27/03/2017	08/12/2019
usernameb	01/10/2018	06/10/2019

Open in new window


STRING_SPLIT is another way to break up strings into parts, but it does not allow the "nth" position as a parameter.

NOTE the efficiency of the query above may not be stellar.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial