asked on
Improve temporary table performance
Hi
Need to create temporary table, currently the tables contains around 100K records which are inserted into the temporary table using select staement.
create temporary table temp_distinctEmployee CHARACTER SET 'utf8mb4' COLLATE 'utf8mb4_unicode_ci' as
select distinct first_name, second_name,
substring(company_gstin,3,10) as company_pan
from employee ewbd
join IDS_TEMPTABLE ids_temp on ids_temp.id=ewbd.id;
It's taking too long (around 10-12 second) to insert and sometimes locks the database.
Is there any other way out to improve the perfromance.
Explain Plan, explain plan, explain plan = basic tool for any tuning. Do you have the explain plan?
Joins can MULTIPLY rows, why do you need to join? is this why you need "select distinct"?
(in general you should ALWAYS prefix EVERY column reference by the table or table alias - this would allow me to know which tables the columns come from without needing to ask you)
select distinct
ewbd.first_name
, ewbd.second_name
, substring(ids_temp.company_gstin,3,10) as company_pan --<< is this correct??
from employee ewbd
join IDS_TEMPTABLE ids_temp on ids_temp.id=ewbd.id;
Is the IDS_TEMPTABLE ID column indexed?
ASKER
Yes, they are indexed at the time of creating temporary table.
Substring for fetching pan from 15 character string.
the real question here would be why do you need a temporary table ?
removing the need is by far the best performance improvement you can achieve.
i see you are actually creating a tmptable based on another tmptable which is kinda weird.
given the size of the table, it will be stored on disk. which is slow.
if you have lots of ram, you probably can raise the threshold so the table is created in ram but that still does not address the tmp table issue.
You are correct that utilizing temporary tables might cause performance problems, especially when working with enormous datasets. The requirement for temporary tables would be completely eliminated, resulting in the greatest performance boost. Depending on the overall logic and workflow, there may be a method to alter the queries to do away with temporary tables.
maybe replace the tmp table with a view or just let us know what you are trying to do so we can try to figure out a different approach.
How big are the tables employee and IDS_TEMPTABLE ?
Are the columns you are selecting in an index ? If not you should create an appropriate index.
Indexes on just the ID column will not work in this situation to speed things up. Use multi-column indexes with the columns in your select statement (select and where clause).
And a functional index with these columns (id, substring(company_gstin,3,
If the database locks while running this query (which is definitely bad) you should consider running mysqltuner regularly and tune the database according to the findings in the mysqltuner report.
https://github.com/major/MySQLTuner-perl
Best regards,
Tomas Helgi
hmm. i would disagree with the statement about clustered (multicolumn) indexes in this specific case.
it is actually seldom meaningful to add selected columns though we all do it on occasions.
with innodb storage chances are the relevant part of the ID index is on the same page as the data.
if the PK of the employee table starts with the ID col (seems likely), this is a guarantee.
the gain in that case would be nil, and the additional size consequent not to mention maintainability issues.
one thing that may be a huge issue is if the IDS_TMPTABLE is neither sorted nor indexed which is likely. that would make the join VERY inefficient. but then, you really should avoid tmptables unless you REALLY have a very good reason.
running explain queries is always helpful to figure out such issues and would probably reveal the above guess to be true.
@skullnobrains That will speed up the select statement from these two tables used in the query by far. The query will be an index only query. Thus eliminating the I/O to the data pages.
The temporary table creation is however another matter as the temporary table is created "in memory" and therefore you need to tune all the necessary buffers for the temporary table to be as optimal as possible. Hence use the mysqltuner to tune. Or tune it by reading these links below.
https://dev.mysql.com/doc/refman/8.0/en/internal-temporary-tables.html
https://www.percona.com/blog/temporary-tables-in-mysql-never-ending-story/
Best regards,
Tomas Helgi
that would be debatable with myisam. if ID is the pk of an innodb table, not a chance. again, this is true in some cases but not as a general rule. not to mention it costs space and write speed. benchmark it.
ASKER
www.experts-exchange.com/dashboard/#/questions/29263820?anchorAnswerId=43555946
the real question here would be why do you need a temporary table ?
- Since the data is coming comma seprated, we are separating the data based on comma and insertngi in the temp table which further is used in joins
removing the need is by far the best performance improvement you can achieve.
- 1. can't remove as how comma separated values will be compared with the data in the another table.
i see you are actually creating a tmptable based on another tmptable which is kinda weird.
- MYSQL temp table ca't be used multiple times in joins, therefore need to copy in another temp table.
given the size of the table, it will be stored on disk. which is slow.
- Not able to get it, since temp tables are stored in memmory.
if you have lots of ram, you probably can raise the threshold so the table is created in ram but that still does not address the tmp table issue.
>>the data is coming comma seprated
this is inefficient but i assume you are not in control of that process
>>we are separating the data based on comma and insertngi in the temp table
you also create the secondary temp_distinctEmployee as tmp... i doubt the overall process actually requires multiple successive temp tables.
there are many ways to join on the fly. if the data is sorted, a handler and some coding might do better.
you also may look into missing indexes in ewd and possibly the first tmp table.
>> how comma separated values will be ...
- mysql has a csv backed that allow to use csv tables without importing them. this is quite hackish though.
- the load data infile statement can be used with a SET clause that allows to run a subquerywhich entirely avoids the first tmp table.
example to emulate a join with a single field. use a subquery per field. if ewd.id is indexed, this is fast.
load data infile ... (@id ... ) SET (select first_name from ewd where ewd.id = @id) as first_name ...
>> MYSQL temp table ca't be used multiple times in joins
that is not true. you can reuse tmp tables at will as long as you do so in the same session. what does not work for you ?
note that the memory storage engine allows to do things that cannot be done with tmp tables.
>> temp tables are stored in memory.
no. user created temporary tables are stored by default in the default storage engine (likely innodb) unless you specify you want them to use the memory engine.
internally generated tmp tables are stored in memory until you reach tmp_table_limit (not sure about the setting name) and in myisam or aria past that limit.
nb : i was wrong about the treshold which only applies to internal tmp tables. in doubt, i checked the manual.
is ewd.id indexed ?
does the data come in sorted ?
how many rows ?
maybe tell us what is the overall process so we can provide detailed suggestions. you most likely do not need 2 successive tmp tables to do whatever you do next with that csv data.
ASKER
Thanks @skullnobrains
how comma separated values will be ...
- mysql has a csv backed that allow to use csv tables without importing them. this is quite hackish though.
- the load data infile statement can be used with a SET clause that allows to run a subquerywhich entirely avoids the first tmp table.
example to emulate a join with a single field. use a subquery per field. if ewd.id is indexed, this is fast.
load data infile ... (@id ... ) SET (select first_name from ewd where ewd.id = @id) as first_name ...
>>>> We are using Azure Database for MySQL, can you pls provide an example for using the way suggested.
>> MYSQL temp table ca't be used multiple times in joins
that is not true. you can reuse tmp tables at will as long as you do so in the same session. what does not work for you ?
note that the memory storage engine allows to do things that cannot be done with tmp tables.
>>>> tried the example below, can you share some idea with the same secnaio
create temporary table tmp1 (id int, a varchar(10));
insert into tmp1 values (1,'Mac');
insert into tmp1 values (2,'Mac2');
insert into tmp1 values (3,'Mac');
insert into tmp1 values (4,'Mac4');
select * from tmp1 t1
inner join (select a from tmp1)t2
on t1.a=t2.a;
select * from tmp1 t1
inner join tmp1 t2
on t1.a=t2.a;
Both giving error
Error Code: 1137. Can't reopen table: 't1'
ASKER
thanks
A little hard to tell for sure from what's been given but the DISTINCT keyword would be my guess.
Try using GROUP BY to return unique results instead of DISTINCT. In many cases that can make a pretty big difference.