Link to home
Start Free TrialLog in
Avatar of Eduardo Fuerte
Eduardo FuerteFlag for Brazil

asked on

Could you point a way to split a very long MSSQLServer query to make it retrieves just parts of the lines for each time?

Hi Experts

Could you point a way to split a very long MSSQLServer query to make it retrieves just parts of the lines for each time?

I'm running a query that must retrieve 3.200.0000 lines, resulted from some group operations:

     SELECT nome, Nome_Valido, cpf, CPF_Valido, data_nasc, telefone, endereco, cidade, uf, cep, 
       max(Cem) AS  cem , max(Infosaude) AS  infosaude, max(Procare) as procare
FROM v_UnificacaoPacientes_v5
GROUP BY nome, Nome_Valido, cpf, CPF_Valido, data_nasc, telefone, endereco, cidade, uf, cep
ORDER BY nome;

Open in new window


Is that a way to partitionate the results in subsets of lines and then assemble it ?

Thanks in advance.
ASKER CERTIFIED SOLUTION
Avatar of Pavel Celba
Pavel Celba
Flag of Czechia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Eduardo Fuerte

ASKER

Hi Pcelba

We need to migrate all the medical patients from legate system to the new CRM system.
By going the strategy you pointed, I'm planning to split in 04 parts

     SELECT nome, Nome_Valido, cpf, CPF_Valido, data_nasc, telefone, endereco, cidade, uf, cep, 
       max(Cem) AS  cem , max(Infosaude) AS  infosaude, max(Procare) as procare
FROM v_UnificacaoPacientes_v5
GROUP BY nome, Nome_Valido, cpf, CPF_Valido, data_nasc, telefone, endereco, cidade, uf, cep
ORDER BY nome
OFFSET 10000000
FETCH NEXT 1000000 ROWS ONLY;



     SELECT nome, Nome_Valido, cpf, CPF_Valido, data_nasc, telefone, endereco, cidade, uf, cep, 
       max(Cem) AS  cem , max(Infosaude) AS  infosaude, max(Procare) as procare
FROM v_UnificacaoPacientes_v5
GROUP BY nome, Nome_Valido, cpf, CPF_Valido, data_nasc, telefone, endereco, cidade, uf, cep
ORDER BY nome
OFFSET 10000001
FETCH NEXT 1000000 ROWS ONLY;

SELECT nome, Nome_Valido, cpf, CPF_Valido, data_nasc, telefone, endereco, cidade, uf, cep, 
       max(Cem) AS  cem , max(Infosaude) AS  infosaude, max(Procare) as procare
FROM v_UnificacaoPacientes_v5
GROUP BY nome, Nome_Valido, cpf, CPF_Valido, data_nasc, telefone, endereco, cidade, uf, cep
ORDER BY nome
OFFSET 20000001
FETCH NEXT 1000000 ROWS ONLY;

SELECT nome, Nome_Valido, cpf, CPF_Valido, data_nasc, telefone, endereco, cidade, uf, cep, 
       max(Cem) AS  cem , max(Infosaude) AS  infosaude, max(Procare) as procare
FROM v_UnificacaoPacientes_v5
GROUP BY nome, Nome_Valido, cpf, CPF_Valido, data_nasc, telefone, endereco, cidade, uf, cep
ORDER BY nome
OFFSET 30000001
FETCH NEXT 1000000 ROWS ONLY;

Open in new window


Am I in the right way?
Could you check what's wrong when running this:

User generated image
Just considering

v_UnificacaoPacientes_v5

Open in new window


Is a view...
Only missed the word ROWS

OFFSET 100000 ROWS

Open in new window

Yes the ROWS word was missing in my original post and I've fixed it later.
If you are planning just 4 parts then the concept is OK. The OFFSET in the first query should be 0 and all other offsets should be minus one - 1000000, 2000000, ...
Thank you.

Unfortunatelly the performance is very slow in my case, so I have to adopt another strategy.
Slow performance means missing index on the nome column. Did you look at execution plan? Of course, I don't know the view definition and 3 mio rows from a (possibly) complex view does not help to improve the performance...

Just remember the data porting to a new system is one time task and if it takes hours then it is still OK.
I had tried to recreate the view as an indexed view on the column name but the specific database doesn't allow that.
I agree with your considerations about data migration is an unique event that where time to be consumed on strategies must to be only the necessary.

What I did is to split the data using the name initials and save the results on .CSV(s) - and then maybe join all the CSV(s).

Another company of the group I found have at about 21 million register. By going this way is slow but possible and I'm getting it.
OFFSET and FETCH does not work on indexed views... and CSV can contain characters which are not processed correctly...

If you are porting data between two different servers then you could have two more options:
1) Create a backup, restore it on the second server and then all queries should work relatively quickly. You may even prepare "export database" on the original server and then port the backup of this db.
2) Create a Linked server on the new machine and execute necessary queries on the linked server. Of course, millions of data rows will still consume time.