• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 84
  • Last Modified:

sql script to purge database of old customers

so my company gives our clients a little time to hopefully come back, or ask for information from when they were a customer, however we have a cutoff, i am looking for a script to aid in this cutoff.
In A database named "subscriber"
and a table named "Subscriber Detail"
AccountID,AccountGroupCode,AcctNum, ExtraArea,MoreStuff,EvenMoreSt,Name, ...............(several more fields)
1,DR1,1234, , , ,stoped 6/25/14 bob,
2,DC,1245, , , ,ended 3/4/12 tom,

I would like to delete all lines of data in all tables within "subscriber" biased on the "AccountID" that match the following criteria
A selected "accountgroupcode".
The "name" field having a date older than 1/1/14 within it.

I would like to be able to change the date that they are older than, as i will be doing this several times throughout the year
0
csePixelated
Asked:
csePixelated
  • 23
  • 19
  • 8
  • +1
1 Solution
 
ManjuIT - Project ManagerCommented:
Delete from yourtablename where accountid in ('1234') and lastdate  <  '2014-01-01'
0
 
csePixelatedAuthor Commented:
manju I have represented the data with commas separating the values
to clarify('1234') would be the "AcctNum" feild, not the "accountgroupcode"

the above does not delete lines in the other tables. I need to delete all lines in all tables that have a  AccountID corresponding to the account id of lines with my specified "accountgroupcode" and a date within the "name" field older than the date I specify.
0
 
ManjuIT - Project ManagerCommented:
use the same query for other tables as well just by changing the table name and use appropriate column names
0
Efficient way to get backups off site to Azure

This user guide provides instructions on how to deploy and configure both a StoneFly Scale Out NAS Enterprise Cloud Drive virtual machine and Veeam Cloud Connect in the Microsoft Azure Cloud.

 
SharathData EngineerCommented:
Do you have date embedded in name field like "stoped 6/25/14 bob"?
0
 
csePixelatedAuthor Commented:
Manja, The other tables do not have the "AccountGroupCode" or "name" nor any data that would allow me to find when they were disconnected, they simply have purchase history and phone numbers and in some instances contact names.

Im sorry I did not specify this.
the only field that is in every table is AccountID, hens i need to get a list of effected account id's and apply the delete to all tables by what was a match on the "Subscriber Detail"

other tables include "Subscriber phone" "Subscriber History" "Subscriber contacts"

many other fields have dates within them, will your query take into account i only need to select lines with a date somewhere within the "name" field. from what i understand the above will delete a whole lot more than that,
0
 
csePixelatedAuthor Commented:
Sharath, yes, however with no real uniform formatting.
customers that had not ended service do not have a date within them
more examples of "name" field for old customers

dc 11/9/2014 sue brown
d/c 2/5/16 bob owens
removed 8/30/15 toms store
dc 8-8-2012 penny m

unfortunately the only requirement for data entry of ended accounts is to notate in the "name" that they are no longer a customer, and to put the date they were no longer a customer
0
 
SharathData EngineerCommented:
Do you want to pass AccountID and Date as variables to the script and check if the AccountID is older than the passed Date field?
Then delete that particular AccountID from all your tables?
0
 
csePixelatedAuthor Commented:
I am very bad at sql script, perhaps the below concept will help.

in [Subscriber Detail] get  "AccountID" where "AccountGroupCode" = DR1 and "Name" contains a date before 1/1/2014
do delete lines in all tables with  "AccountID" matching the criteria above

im not sure what i will need to do to accommodate for the date format possibly changing.
xx-xx-xxxx
xx/xx/xxxx
xx/xx/xx
xx-xx-xx
xx.xx.xx
xx.xx.xxxx

I don't really need the full date. just the year will work
0
 
SharathData EngineerCommented:
>> other tables include "Subscriber phone" "Subscriber History" "Subscriber contacts"

So you have 4 tables including "Subscriber Detail" from which you want to delete the data?
Do you have any other table? Are you looking for dynamic query or four delete queries for these tables?
0
 
csePixelatedAuthor Commented:
I am hoping for a dynamic query, as the above is merely an example, the database i need to clear them from has around 30 tables. i would like to find out what AccountID's qualify and delete all rows in all tables containing that AccountID, as that is the only piece of data on every table.

example:
["Subscriber phone"]
AccountID,PhoneNum,PhoneName
1,6475309,home
1,8004455662,work
1,1234567,aunt
3,7643218,home
3,7645893,office
2,#,bad number
2,233454651,do not call
6,33445566,home
6,2029224,alt
7,5671234,home
3,4567123,emergency
4,8899221,home
5,1230987,exit 5 steve
0
 
SharathData EngineerCommented:
Not getting time to look into this. Are you still looking for assistance here?
0
 
csePixelatedAuthor Commented:
yes, Sharath, any help will be appreciated.
0
 
ZberteocCommented:
Something like this:
declare
    @account_ids_list varchar(max)='',
    @SQL varchar(max)=''


-- make a list of AccountIds that you need to remove:
select 
    @account_ids_list=@account_ids_list+cast(AccountId as varchar(15))+',' 
from
    [Subscriber Detail]
where 
    isdate(Name)=1
    and AccountGroupCode='DR1'
    and cast(Name as date)>'2014-01-01'
-- remove last comma in @account_ids_list:
select 
    @account_ids_list=replace(@account_ids_list+'@#$',',@#$','')

-- build the delete statements into @SQL variable
select 
    @SQL=@SQL+'delete from ['+TABLE_NAME+'] where AccountId in ('+@account_ids_list+')

'
from 
    information_schema.tables 
where 
    TABLE_NAME in (select distinct TABLE_NAME from information_schema.columns where column_name='AccountId')

-- check the content of @SQL variable; !!! CAREFULL if you comment the select line below it will activate the EXEC part, which will actually execute the query and delete those AccountIds from tables
select @SQL /*
EXEC (@SQL) --*/

Open in new window

Be careful with this so you don't delete anything until you ware sure about it. If there are relations between these tables becomes more complicated as you would have to follow the child - parent order.

I assumed from your previous post that teh "name"column is of varchar type that contains dates in vary formats. This is unfortunate, if true, because it is difficult to be accurate. My script filters out the date formats that are not recognized, istdate(name)=1,  and then casts the result as a date to be compared to '2014-01-01'.
0
 
csePixelatedAuthor Commented:
so will it catch dates expressed with / ?
is their a way i can have it purge all tables in the Subscriber Detail database
declare
    @account_ids_list varchar(max)='',
    @SQL varchar(max)=''


-- make a list of AccountIds that you need to remove:
select 
    @account_ids_list=@account_ids_list+cast(AccountId as varchar(15))+',' 
from
    [Subscriber Detail]
where 
    isdate(Name)=1
    and AccountGroupCode='DR1'
    and cast(Name as date)>'2014-01-01'
-- remove last comma in @account_ids_list:
select 
    @account_ids_list=replace(@account_ids_list+'@#$',',@#$','')

-- build the delete statements into @SQL variable
select 
    @SQL=@SQL+'delete from ['+TABLE_NAME1+'] where AccountId in ('+@account_ids_list+')
    @SQL=@SQL+'delete from ['+TABLE_NAME2+'] where AccountId in ('+@account_ids_list+')
    @SQL=@SQL+'delete from ['+TABLE_NAME3+'] where AccountId in ('+@account_ids_list+')
    @SQL=@SQL+'delete from ['+TABLE_NAME4+'] where AccountId in ('+@account_ids_list+')
    

'
from 
    information_schema.tables 
where 
    TABLE_NAME in (select distinct TABLE_NAME from information_schema.columns where column_name='AccountId')

-- check the content of @SQL variable; !!! CAREFULL if you comment the select line below it will activate the EXEC part, which will actually execute the query and delete those AccountIds from tables
select @SQL /*
EXEC (@SQL) --*/

Open in new window


I have active 3 identical active servers so i can execute the query to test. so remove the -- from that line to make the script activate? sorry about all the questions i do not mess with Microsoft SQL often.
0
 
ZberteocCommented:
This new code you posted has mistakes in it. Why did you add:

    @SQL=@SQL+'delete from ['+TABLE_NAME2+'] where AccountId in ('+@account_ids_list+')
    @SQL=@SQL+'delete from ['+TABLE_NAME3+'] where AccountId in ('+@account_ids_list+')
    @SQL=@SQL+'delete from ['+TABLE_NAME4+'] where AccountId in ('+@account_ids_list+')

?

that is totally unnecessary and actually is flawed. It will fail. What are TABLE_NAME2, TABLE_NAME3...? They don't even exist as column names!

Please test my code, try to understant it and dn't just change it.
0
 
csePixelatedAuthor Commented:
Ok, sorry.
I tried running it - got the following

Msg 241, Level 16, State 1, Line 7
Conversion failed when converting date and/or time from character string.
0
 
ZberteocCommented:
The problem is you Name column. It is of varchar type and it contains date formats as text that are not recognized by the SQL server. Unfortunately this has to be dealt with manually. This will work but only for valid date formats:
declare
    @account_ids_list varchar(max)='',
    @SQL varchar(max)=''


-- make a list of AccountIds that you need to remove:
select 
    @account_ids_list=@account_ids_list+cast(AccountId as varchar(15))+',' 
from
    (
	   select 
		  AccountId,
		  Name
	   from
		  [Subscriber Detail]
	   where 
		  isdate(Name)=1
		  and AccountGroupCode='DR1'
    )q 
where
    cast(Name as date)>'2014-01-01'
-- remove last comma in @account_ids_list:
select 
    @account_ids_list=replace(@account_ids_list+'@#$',',@#$','')

-- build the delete statements into @SQL variable
select 
    @SQL=@SQL+'delete from ['+TABLE_NAME+'] where AccountId in ('+@account_ids_list+')

'
from 
    information_schema.tables 
where 
    TABLE_NAME in (select distinct TABLE_NAME from information_schema.columns where column_name='AccountId')

-- check the content of @SQL variable; !!! CAREFULL if you comment the select line below it will activate the EXEC part, which will actually execute the query and delete those AccountIds from tables
select @SQL /*
EXEC (@SQL) --*/

Open in new window

But you need to fix the dates that are not recognized:
select 
    *
from
    [Subscriber Detail]
where 
    isdate(Name)=0

Open in new window

0
 
csePixelatedAuthor Commented:
tried 32 line code above, where do i inject the 6 lines?

Msg 207, Level 16, State 1, Line 20
Invalid column name 'Name'.

the name column does exist, you can see in the picture i changed some column names, the table i am actually working with is Subscriber Data, this really has no bearing on the result.... why would it say invalid column name Name when it right there?
Untitlbed.png
0
 
SharathData EngineerCommented:
In your screenshot, you don't have Name column in the subquery. But in Zberteoc's solution, there is Name column in subquery.
0
 
csePixelatedAuthor Commented:
Ty sharath, that fixed that issue.

now i got this

Msg 529, Level 16, State 2, Line 8
Explicit conversion from data type int to date is not allowed.
0
 
csePixelatedAuthor Commented:
I assume the above is because i need to inject the 6 lines listed above, where do they go?
0
 
SharathData EngineerCommented:
I think you have date embedded in Name column. right? You need to extract the date from Name field and then compare. try this.
declare
    @account_ids_list varchar(max)='',
    @SQL varchar(max)=''


-- make a list of AccountIds that you need to remove:

;with cte1 as (
    select *,charindex(' ',name) idx1,charindex(' ',name,charindex(' ',name)+1) idx2 
      from [Subscriber Detail]
     where AccountGroupCode='DR1')
select @account_ids_list=@account_ids_list+cast(AccountId as varchar(15))+',' 
  from cte1  
 where isdate(substring(Name,idx1,idx2-idx1)) = 1
   and convert(date,substring(Name,idx1,idx2-idx1)) > '20140101'

-- remove last comma in @account_ids_list:
select 
    @account_ids_list=replace(@account_ids_list+'@#$',',@#$','')

-- build the delete statements into @SQL variable
select 
    @SQL=@SQL+'delete from ['+TABLE_NAME+'] where AccountId in ('+@account_ids_list+')

'
from 
    information_schema.tables 
where 
    TABLE_NAME in (select distinct TABLE_NAME from information_schema.columns where column_name='AccountId')

-- check the content of @SQL variable; !!! CAREFULL if you comment the select line below it will activate the EXEC part, which will actually execute the query and delete those AccountIds from tables
select @SQL /*
EXEC (@SQL) --*/

Open in new window

0
 
csePixelatedAuthor Commented:
sharath

Msg 241, Level 16, State 1, Line 8
Conversion failed when converting date and/or time from character string.
0
 
ZberteocCommented:
Please run this query and post the results here:
select top 10
    Name
from
    [Subscriber Detail]
where 
    isdate(Name)=1

Open in new window

0
 
csePixelatedAuthor Commented:
looks like nothing
res.png
0
 
ZberteocCommented:
So not even one row has a valid date. Remove the where clause and run it again and post those results. In text if possible.
0
 
ZberteocCommented:
By the way, are there really the dates inside that Name column? Why "Name" for dates?
0
 
csePixelatedAuthor Commented:
script ran all 10 shows up
also included an example with a little information. we need to update the process for removing accounts, however in the interim i need to get this done.
res2.png
exa.png
0
 
ZberteocCommented:
:0)

OK, we need some data that actually is not blank. Try this:

select top 10
    Name
from
    [Subscriber Detail]
where 
    isnull(ltrim(rtrim(Name)),'')<>''

Open in new window

From what I saw that Name column has all kind of info beside the date. It needs to be parsed unless you have the date info only in another column.
0
 
SharathData EngineerCommented:
Can you try this?
declare
    @account_ids_list varchar(max)='',
    @SQL varchar(max)=''


-- make a list of AccountIds that you need to remove:

;with cte1 as (
    select *,charindex(' ',name) idx1,charindex(' ',name,charindex(' ',name)+1) idx2 
      from [Subscriber Detail]
     where AccountGroupCode='DR1'),
     cte2 as (select * from cte1 where isdate(substring(Name,idx1,idx2-idx1)) = 1)
select @account_ids_list=@account_ids_list+cast(AccountId as varchar(15))+',' 
  from cte2 
 where convert(date,substring(Name,idx1,idx2-idx1)) > '20140101'

-- remove last comma in @account_ids_list:
select 
    @account_ids_list=replace(@account_ids_list+'@#$',',@#$','')

-- build the delete statements into @SQL variable
select 
    @SQL=@SQL+'delete from ['+TABLE_NAME+'] where AccountId in ('+@account_ids_list+')

'
from 
    information_schema.tables 
where 
    TABLE_NAME in (select distinct TABLE_NAME from information_schema.columns where column_name='AccountId')

-- check the content of @SQL variable; !!! CAREFULL if you comment the select line below it will activate the EXEC part, which will actually execute the query and delete those AccountIds from tables
select @SQL /*
EXEC (@SQL) --*/

Open in new window

0
 
ZberteocCommented:
@Sharath

We can't assume that all the values in the Name column will follow that "xxx date yyy" structure and just use the spaces to parse but if that is the case then our asker is lucky.
0
 
csePixelatedAuthor Commented:
could it be done by looking for part of a string and matching them, as it is just as the year switches
so look for names containing the
".2013" "/2013" "-2013" ".13" "-13" "/13" ".2012" "/2012" "-2012" ".12" "-12" "/12" ".2011" "/2011" "-2011" ".11" "-11" "/11" ?
can wild cards be used, like something that says only replace "#" with a single digit number?
"##/##/2013"
0
 
ZberteocCommented:
Please post some data sample from the query I gave you and we will find something. You can change with TOP 1000 to get more samples. The whole idea is to find a pattern in order to isolate the date part that will work in all cases, if possible.

A question is what do you do with the rows that are blank or there is no date in the Name column? Do you include or exclude them?
0
 
csePixelatedAuthor Commented:
looks like progress Zberteoc
RESULT.png
0
 
csePixelatedAuthor Commented:
0
 
SharathData EngineerCommented:
So do you have date anywhere in Name field in any format?
looks like you don't have date at all in some Names.
0
 
ZberteocCommented:
Well, is like I thought. Some rows don't have any data in them. Some have dates in brackets some just at the beginning of the row, in short is a mess.

The picture doesn't work because even though you had top 1000 you can only see about 15. That  is a very small sample. We need to filter data down but get more sample so you will have to run this:

select  distinct top 1000
    left(Name,15) as Name
from
    [Subscriber Detail]
where 
    isnull(ltrim(rtrim(Name)),'')<>''
    and 
    (
		  left(Name,15) like '%[0-9].[0-9]%' 
	   or left(Name,15) like '%[0-9]-[0-9]%' 
	   or left(Name,15) like '%[0-9]/[0-9]%' 
    )

Open in new window

When is done in the result panel right click on the column header > Save Result As... make sure is saved as CSV, Give the file a name and save it somewhere locally. After that attach the file to the post here in EE.
0
 
ZberteocCommented:
Some dates have no year but they have time...
0
 
ZberteocCommented:
What is the full structure of [Subscriber Detail] table? Are you sure you don't have another date column?
0
 
csePixelatedAuthor Commented:
yes some do not even put the year they were disconnected, we are not worried about them, Ive already been manually removing them.
structure1.png
structure2.png
0
 
ZberteocCommented:
Is it possible to use the dates columns at the bottom of the structure: LastChange...?
0
 
csePixelatedAuthor Commented:
Although it may not be accurate, it could work, Unfortunately it does not seem to represent the correct years.
LAST.png
LAST2.png
0
 
ZberteocCommented:
You didn't follow my post where I showed you how to save data in files rather than adding pictures. Pictures cannot be copied and paste to work with the data they show.

In regards to the dates columns, you don't have to show us just decide if they are usable or not. You are the one that knows the data. If you want to go ahead with the Name column parse I am still waiting for the text file.
0
 
csePixelatedAuthor Commented:
zberteoc,
was showing the entire table, sorry, i was not comfortable sharing the entire table as a csv.
comparing the data, it would seem that those dates will not work.
So I am back to looking solely at the name field.
0
 
ZberteocCommented:
In order to proceed you will have to parse the string in the Name column to isolate the date. For that you need to know how many different kinds of date strings you have. Without that you will not be able to cover all the situations.
0
 
csePixelatedAuthor Commented:
i have dates using "-""."and"/" all in the format dd-mm-yyyy or mm-dd-yyyy
some only have the year as 2 digits, for those will assume 20YY
some do not have a year, those will be ignored.
so
dd-mm-yyyy
dd-mm-yy
mm-dd-yyyy
mm-dd-yy
dd/mm/yyyy
dd/mm/yy
mm/dd/yyyy
mm/dd/yy
dd.mm.yyyy
dd.mm.yy
mm.dd.yyyy
mm.dd.yy

so 12 types
some may be

mm-dd
dd-mm
mm.dd
dd.mm
mm/dd
dd/mm

They will be ignored.
0
 
ZberteocCommented:
That is not enough. You can't isolate the dates just knowing the format you need to know where they start and where they end. You need to know how you can ignore what is in front and after them. You need to know if there are any kind of delimiters that can be used before and after a date.
0
 
ZberteocCommented:
If this table has not too many rows, like even in the thousands range, you could do it manually, one format at a time. A bit tedious but possible.
0
 
csePixelatedAuthor Commented:
Oh wow, ok, work has me on something else - ill get back with more info soon.
0
 
ZberteocCommented:
Wait a second, actually I came up with something.

First you will have to create a function in the same database that will return the first date it finds in a string, based on the formats you posted:
CREATE FUNCTION [dbo].[fnFindDateInString] ( 
      @string varchar(max)
) 
RETURNS varchar(max)
AS
/*******************************************************************\
    SELECT dbo.fnFindDateInString('addcc 01-01-2016 mkm')
\******************************************************************/
BEGIN

	DECLARE @dte_string varchar(10)

    select
	   @dte_string
	   =case 
		  when parse_numbers<>''	 
			 then substring(string, cast(left(parse_numbers,charindex(',',parse_numbers)-1) as int), cast(reverse(left(reverse(parse_numbers),charindex(',',reverse(parse_numbers))-1)) as int))
		  else
		  ''
	   end
    from
    (
	   select 
		  string,
		  case 
			 when
				string like   '%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]%' 
				then cast(patindex('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]%', string) as varchar(10))+',10'

			 when  string like '%[0-9][0-9]/[0-9][0-9]/[0-9][0-9]%' 
				then cast(patindex('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9]%',string) as varchar(10))+',8'

			 else ''
		  end as parse_numbers
	   from
		  (select replace(replace(@string,'.','/'),'-','/') as string ) q
    ) q2

    RETURN @dte_string
END

Open in new window


Now you can use this function to find the rows:
declare
    @account_ids_list varchar(max)='',
    @SQL varchar(max)=''


-- make a list of AccountIds that you need to remove:
select 
    @account_ids_list=@account_ids_list+cast(AccountId as varchar(15))+',' 
from
    (
	   select 
		  AccountId,
		  dbo.fnFindDateInString(Name) as dte
	   from
		  [Subscriber Detail]
	   where 
		  isdate(dbo.fnFindDateInString(Name))=1
		  and AccountGroupCode='DR1'
    )q 
where
    cast(dte as date)>'2014-01-01'
-- remove last comma in @account_ids_list:
select 
    @account_ids_list=replace(@account_ids_list+'@#$',',@#$','')

-- build the delete statements into @SQL variable
select 
    @SQL=@SQL+'delete from ['+TABLE_NAME+'] where AccountId in ('+@account_ids_list+')

'
from 
    information_schema.tables 
where 
    TABLE_NAME in (select distinct TABLE_NAME from information_schema.columns where column_name='AccountId')

-- check the content of @SQL variable; !!! CAREFULL if you comment the select line below it will activate the EXEC part, which will actually execute the query and delete those AccountIds from tables
select @SQL /*
EXEC (@SQL) --*/

Open in new window

0
 
csePixelatedAuthor Commented:
ok so to clarify run the 1st, then run the 2nd?
or do i combine them?
0
 
ZberteocCommented:
You run first just once to create the function. After that you can use it as many times as you need like I used it in the second script. So I modified the script I posted before to use this new function that makes it easy to find dates in strings.
0

Featured Post

Veeam Disaster Recovery in Microsoft Azure

Veeam PN for Microsoft Azure is a FREE solution designed to simplify and automate the setup of a DR site in Microsoft Azure using lightweight software-defined networking. It reduces the complexity of VPN deployments and is designed for businesses of ALL sizes.

  • 23
  • 19
  • 8
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now