Solved

Consolidating mysql rows

Posted on 2014-07-27
19
398 Views
Last Modified: 2014-07-30
I have a small application which logs ping results when I run it, typically, 24/7 for a few days to gather some information.

Sometimes, I can have a dozen sources all sending their results to mysql so the table fills up with very large numbers of rows. I want to keep the results for historical purposes but I would like to consolidate the large number of rows.

For example, looking at mysql right now, my pings test table has 'Rows 1 to 50 of 89138 rows' of results. I certainly don't want to get rid of this data but I need to find a way of consolidating it into less rows, perhaps even into a new table.

The fields include the usual min/max/avg/loss along with a sending device id and time.

I sure would love to get a solution on how this could be accomplished without losing data.

Thanks very much.
0
Comment
Question by:projects
  • 9
  • 5
  • 4
  • +1
19 Comments
 
LVL 7

Expert Comment

by:Raghu Mutalikdesai
ID: 40223482
Can you please give the details of what columns the pings table has? Depending on your objectives, you can use GROUP BY clause and WHERE clause to consolidate results. For that, it is important to know what columns does your table contain.
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 40223833
I've often seen tables like this set up with a column that is defined by TIMESTAMP.  MySQL will put the UNIX timestamp of the last change to the row (typically an INSERT in the case of log tables) into the column.  It's very useful for historical reporting.

Find the most recent events?  SELECT ... ORDER BY my_timestamp_column DESC

Find the events on Tuesday?  SELECT ... WHERE my_timestamp_column BETWEEN...

With a design like this you can automatically (or deliberately) prune the table by copying the older rows into a history table, then deleting the older rows from the log table.  There are many MySQL date calculations that can be used with a TIMESTAMP column.
http://dev.mysql.com/doc/refman/5.7/en/date-and-time-functions.html
0
 

Author Comment

by:projects
ID: 40224887
Currently looks like this;

	id 	time 	client_id 	type 	unit 	loss 	min 	max 	avg
	1 	2014-07-22 16:08:00 	8 	ping 	ms 	0 	28.932 	707.974 	207.484
	2 	2014-07-22 16:08:21 	8 	ping 	ms 	0 	35.858 	391.053 	171.813

Open in new window

0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 40225558
I think you can use the column "time" to prune the table.
0
 

Author Comment

by:projects
ID: 40225585
Ok, but, how? Do I run a command line function? Do I run a script? How do I do this?
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 40225720
I would run a script, but that's only because I think in PHP script terms.  The general design pattern would go something like this...

1. Create a new table named "newpings" with SELECT from "pings" ORDER BY time DESC LIMIT 1
2. Rename "pings" to "oldpings"
3. Rename "newpings" to "pings"

Now you have effectively removed all of the older pings into the "oldpings" table and the "pings" table will continue to accumulate the new information.  You can deal with the "oldpings" table in any number of ways - analyze it, roll it off by date into an archive, etc.

If you're new to MySQL, this seems to be well thought of: https://www.udemy.com/mysql-database-for-beginners2/
0
 

Author Comment

by:projects
ID: 40225730
Just like a teacher Ray, always pointing to the documentation :).

Doesn't look difficult to someone who knows this but I don't.
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 40225749
Actually it's not just documentation -- it's an online course that will give you enough knowledge of the MySQL fundamentals that we can have a conversation about this.

Is there a MySQL data base administrator in your organization?  If so, (s)he can probably help.  Otherwise you might want to hire a professional software developer with MySQL DBA skills to build the application for you.  Show your consultant this question and the answer thread.  Anyone skilled in the state of the art will be able to help you quickly and inexpensively.  The alternative is to try to learn all of this stuff yourself, and that will take a considerable amount of time.  In respect of your time, my recommendation would be to hire/buy.
0
 

Author Comment

by:projects
ID: 40225801
But as I've said countless times, I am not going to become a programmer, I am only filling in for a while which is why I am on this site. I don't have time to become a programmer, I just need help fixing/maintaining certain things until someone is hired.

I have been hiring here and there but sometimes, I prefer to ask if it's something I can follow or get a script to do it,e tc.
0
Netscaler Common Configuration How To guides

If you use NetScaler you will want to see these guides. The NetScaler How To Guides show administrators how to get NetScaler up and configured by providing instructions for common scenarios and some not so common ones.

 
LVL 35

Expert Comment

by:Terry Woods
ID: 40225862
I'd like to mention that 89,000 rows isn't very many, in terms of what most databases can generally handle. As long as the columns you are interested in filtering with are indexed, queries should still run quickly.

Key concerns as the amount of data grows are:
How much disk space does it use?
How is it being backed up? (does it need to be backed up?)
How often are queries run against it, what kind of queries are run, and how quickly do they need to run?

How long did it take to accumulate the existing 89,000 rows? Maybe, once the table is indexed, you won't have a problem that needs solving any time soon.
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 40225884
I am not going to become a programmer... I don't have time to become a programmer...
This dialog could just as well be about a broken-down car and you might be saying...
I am not going to become a mechanic, I am only filling in for a while which is why I am in this garage. I don't have time to become a mechanic...
The best way forward may be to hire a professional programmer (or mechanic).

I'm out.  And I wish you the best.
0
 

Author Comment

by:projects
ID: 40225905
@TerryAtOpus; It only takes a few days to accumulate around 100K rows and that is with only a handful of scripts sending. Once this goes into operation, there could be hundreds sending at the same time.

@Ray; Plenty of people have simply provided a small script or some commands, basically a solution. I would not be paying for this site if all I wanted was to learn programming, I'd be learning programming. I can't take that job on along with my own as I've said before so am on this site looking for solutions, much appreciated ones.
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 40225920
Then let's discuss what your preferred solution would be. Would you like to consolidate the data, for example by taking an average of multiple rows and storing it in a new table? You probably need to give some specific goals before we can help much further.

It's certainly possible to set up scripting to move, consolidate, or archive records on a regular basis. We just need a definitive goal.

With such large volumes, you might like to try to minimise the row size. This can be done by using appropriate data types that take the minimum of space. eg instead of having a text column for type containing "ping", you could potentially have a 1 digit or 2 digit integer code that uses a lookup table. Do you even need that column?  Same goes for the unit column.

Which OS are you wanting to use to do this, and do you have a preferred language or tool to script in? Personally, I'd write something in PHP in this case, because it's easy and fast.
0
 

Author Comment

by:projects
ID: 40225968
>Then let's discuss what your preferred solution would be. Would you like to
>consolidate the data, for example by taking an average of multiple rows and
>storing it in a new table? You probably need to give some specific goals before
>we can help much further.

This is why I posted, so that I could get some insight on how to handle this.

The goal is to keep the relevant data while consolidating the table on a regular basis in order to keep the database as optimized as possible.
Once the individual records are in, I really just need to keep the averages once they are calculated from the raw data. I have no idea how such things would be done however. I suspect that I would pick a range, say keeping the last week in raw format, then moving the rest to consolidated data in a new table?

>With such large volumes, you might like to try to minimise the row size.

Yes, this could be done. I have no problem changing what ever is needed.

>Which OS are you wanting to use to do this, and do you have a preferred language
>or tool to script in? Personally, I'd write something in PHP in this case, because it's easy
>and fast.

Everything is Linux based. PHP would be fine but I can also use bash scripting of course, perhaps running a cron task or something.
0
 
LVL 35

Accepted Solution

by:
Terry Woods earned 500 total points
ID: 40225970
Only you can know which part of the data is relevant to you, so we're relying on you to say what can be removed as part of any consolidation.

Can you please post some details on the process of how the consolidation would be done, assuming you intend for that to be part of the script you're wanting help with. eg calculate the mean of the avg column per client id per week, and store against the client id in a ping_history table.
0
 

Author Comment

by:projects
ID: 40225973
Ok, I'll work on that tomorrow and post it. I'm not sure it will sound as you expect but I'll certainly think about it and post it and hopefully it will make sense. I will need to ask a few questions along the way.
0
 

Author Comment

by:projects
ID: 40226935
Step one, table has been updated;
    	Field name    	Type    	Allow nulls?    	Key    	Default value    	Extras   
	id 	int(11) 	No 	Primary 	NULL 	auto_increment
	time 	timestamp 	No 	None 	CURRENT_TIMESTAMP 	on update CURRENT_TIMESTAMP
	client_id 	int(8) 	Yes 	None 	NULL 	
	loss 	varchar(8) 	Yes 	None 	NULL 	
	min 	decimal(10,3) 	Yes 	None 	NULL 	
	max 	decimal(10,3) 	Yes 	None 	NULL 	
	avg 	decimal(10,3) 	Yes 	None 	NULL 	

Open in new window


Step two.
Need to figure out how much raw data is useful to keep then convert the rest. I'm not sure how others in the industry would decide this, it's almost arbitrary.
0
 

Author Comment

by:projects
ID: 40229477
Turns out, there is no need to consolidate this, only to gather stats then get rid of it.
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 40230604
Then you just need to determine which stats you need to gather. Obviously this will be bssed on the underlying need for doing all of this in the first place.
0

Featured Post

Control application downtime with dependency maps

Visualize the interdependencies between application components better with Applications Manager's automated application discovery and dependency mapping feature. Resolve performance issues faster by quickly isolating problematic components.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Password hashing is better than message digests or encryption, and you should be using it instead of message digests or encryption.  Find out why and how in this article, which supplements the original article on PHP Client Registration, Login, Logo…
Introduction This article is intended for those who are new to PHP error handling (https://www.experts-exchange.com/articles/11769/And-by-the-way-I-am-New-to-PHP.html).  It addresses one of the most common problems that plague beginning PHP develop…
Video by: Steve
Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

895 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

14 Experts available now in Live!

Get 1:1 Help Now