Solved

Consolidating mysql rows

Posted on 2014-07-27
19
394 Views
Last Modified: 2014-07-30
I have a small application which logs ping results when I run it, typically, 24/7 for a few days to gather some information.

Sometimes, I can have a dozen sources all sending their results to mysql so the table fills up with very large numbers of rows. I want to keep the results for historical purposes but I would like to consolidate the large number of rows.

For example, looking at mysql right now, my pings test table has 'Rows 1 to 50 of 89138 rows' of results. I certainly don't want to get rid of this data but I need to find a way of consolidating it into less rows, perhaps even into a new table.

The fields include the usual min/max/avg/loss along with a sending device id and time.

I sure would love to get a solution on how this could be accomplished without losing data.

Thanks very much.
0
Comment
Question by:projects
  • 9
  • 5
  • 4
  • +1
19 Comments
 
LVL 7

Expert Comment

by:Raghu Mutalikdesai
Comment Utility
Can you please give the details of what columns the pings table has? Depending on your objectives, you can use GROUP BY clause and WHERE clause to consolidate results. For that, it is important to know what columns does your table contain.
0
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
I've often seen tables like this set up with a column that is defined by TIMESTAMP.  MySQL will put the UNIX timestamp of the last change to the row (typically an INSERT in the case of log tables) into the column.  It's very useful for historical reporting.

Find the most recent events?  SELECT ... ORDER BY my_timestamp_column DESC

Find the events on Tuesday?  SELECT ... WHERE my_timestamp_column BETWEEN...

With a design like this you can automatically (or deliberately) prune the table by copying the older rows into a history table, then deleting the older rows from the log table.  There are many MySQL date calculations that can be used with a TIMESTAMP column.
http://dev.mysql.com/doc/refman/5.7/en/date-and-time-functions.html
0
 

Author Comment

by:projects
Comment Utility
Currently looks like this;

	id 	time 	client_id 	type 	unit 	loss 	min 	max 	avg
	1 	2014-07-22 16:08:00 	8 	ping 	ms 	0 	28.932 	707.974 	207.484
	2 	2014-07-22 16:08:21 	8 	ping 	ms 	0 	35.858 	391.053 	171.813

Open in new window

0
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
I think you can use the column "time" to prune the table.
0
 

Author Comment

by:projects
Comment Utility
Ok, but, how? Do I run a command line function? Do I run a script? How do I do this?
0
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
I would run a script, but that's only because I think in PHP script terms.  The general design pattern would go something like this...

1. Create a new table named "newpings" with SELECT from "pings" ORDER BY time DESC LIMIT 1
2. Rename "pings" to "oldpings"
3. Rename "newpings" to "pings"

Now you have effectively removed all of the older pings into the "oldpings" table and the "pings" table will continue to accumulate the new information.  You can deal with the "oldpings" table in any number of ways - analyze it, roll it off by date into an archive, etc.

If you're new to MySQL, this seems to be well thought of: https://www.udemy.com/mysql-database-for-beginners2/
0
 

Author Comment

by:projects
Comment Utility
Just like a teacher Ray, always pointing to the documentation :).

Doesn't look difficult to someone who knows this but I don't.
0
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
Actually it's not just documentation -- it's an online course that will give you enough knowledge of the MySQL fundamentals that we can have a conversation about this.

Is there a MySQL data base administrator in your organization?  If so, (s)he can probably help.  Otherwise you might want to hire a professional software developer with MySQL DBA skills to build the application for you.  Show your consultant this question and the answer thread.  Anyone skilled in the state of the art will be able to help you quickly and inexpensively.  The alternative is to try to learn all of this stuff yourself, and that will take a considerable amount of time.  In respect of your time, my recommendation would be to hire/buy.
0
 

Author Comment

by:projects
Comment Utility
But as I've said countless times, I am not going to become a programmer, I am only filling in for a while which is why I am on this site. I don't have time to become a programmer, I just need help fixing/maintaining certain things until someone is hired.

I have been hiring here and there but sometimes, I prefer to ask if it's something I can follow or get a script to do it,e tc.
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 35

Expert Comment

by:Terry Woods
Comment Utility
I'd like to mention that 89,000 rows isn't very many, in terms of what most databases can generally handle. As long as the columns you are interested in filtering with are indexed, queries should still run quickly.

Key concerns as the amount of data grows are:
How much disk space does it use?
How is it being backed up? (does it need to be backed up?)
How often are queries run against it, what kind of queries are run, and how quickly do they need to run?

How long did it take to accumulate the existing 89,000 rows? Maybe, once the table is indexed, you won't have a problem that needs solving any time soon.
0
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
I am not going to become a programmer... I don't have time to become a programmer...
This dialog could just as well be about a broken-down car and you might be saying...
I am not going to become a mechanic, I am only filling in for a while which is why I am in this garage. I don't have time to become a mechanic...
The best way forward may be to hire a professional programmer (or mechanic).

I'm out.  And I wish you the best.
0
 

Author Comment

by:projects
Comment Utility
@TerryAtOpus; It only takes a few days to accumulate around 100K rows and that is with only a handful of scripts sending. Once this goes into operation, there could be hundreds sending at the same time.

@Ray; Plenty of people have simply provided a small script or some commands, basically a solution. I would not be paying for this site if all I wanted was to learn programming, I'd be learning programming. I can't take that job on along with my own as I've said before so am on this site looking for solutions, much appreciated ones.
0
 
LVL 35

Expert Comment

by:Terry Woods
Comment Utility
Then let's discuss what your preferred solution would be. Would you like to consolidate the data, for example by taking an average of multiple rows and storing it in a new table? You probably need to give some specific goals before we can help much further.

It's certainly possible to set up scripting to move, consolidate, or archive records on a regular basis. We just need a definitive goal.

With such large volumes, you might like to try to minimise the row size. This can be done by using appropriate data types that take the minimum of space. eg instead of having a text column for type containing "ping", you could potentially have a 1 digit or 2 digit integer code that uses a lookup table. Do you even need that column?  Same goes for the unit column.

Which OS are you wanting to use to do this, and do you have a preferred language or tool to script in? Personally, I'd write something in PHP in this case, because it's easy and fast.
0
 

Author Comment

by:projects
Comment Utility
>Then let's discuss what your preferred solution would be. Would you like to
>consolidate the data, for example by taking an average of multiple rows and
>storing it in a new table? You probably need to give some specific goals before
>we can help much further.

This is why I posted, so that I could get some insight on how to handle this.

The goal is to keep the relevant data while consolidating the table on a regular basis in order to keep the database as optimized as possible.
Once the individual records are in, I really just need to keep the averages once they are calculated from the raw data. I have no idea how such things would be done however. I suspect that I would pick a range, say keeping the last week in raw format, then moving the rest to consolidated data in a new table?

>With such large volumes, you might like to try to minimise the row size.

Yes, this could be done. I have no problem changing what ever is needed.

>Which OS are you wanting to use to do this, and do you have a preferred language
>or tool to script in? Personally, I'd write something in PHP in this case, because it's easy
>and fast.

Everything is Linux based. PHP would be fine but I can also use bash scripting of course, perhaps running a cron task or something.
0
 
LVL 35

Accepted Solution

by:
Terry Woods earned 500 total points
Comment Utility
Only you can know which part of the data is relevant to you, so we're relying on you to say what can be removed as part of any consolidation.

Can you please post some details on the process of how the consolidation would be done, assuming you intend for that to be part of the script you're wanting help with. eg calculate the mean of the avg column per client id per week, and store against the client id in a ping_history table.
0
 

Author Comment

by:projects
Comment Utility
Ok, I'll work on that tomorrow and post it. I'm not sure it will sound as you expect but I'll certainly think about it and post it and hopefully it will make sense. I will need to ask a few questions along the way.
0
 

Author Comment

by:projects
Comment Utility
Step one, table has been updated;
    	Field name    	Type    	Allow nulls?    	Key    	Default value    	Extras   
	id 	int(11) 	No 	Primary 	NULL 	auto_increment
	time 	timestamp 	No 	None 	CURRENT_TIMESTAMP 	on update CURRENT_TIMESTAMP
	client_id 	int(8) 	Yes 	None 	NULL 	
	loss 	varchar(8) 	Yes 	None 	NULL 	
	min 	decimal(10,3) 	Yes 	None 	NULL 	
	max 	decimal(10,3) 	Yes 	None 	NULL 	
	avg 	decimal(10,3) 	Yes 	None 	NULL 	

Open in new window


Step two.
Need to figure out how much raw data is useful to keep then convert the rest. I'm not sure how others in the industry would decide this, it's almost arbitrary.
0
 

Author Comment

by:projects
Comment Utility
Turns out, there is no need to consolidate this, only to gather stats then get rid of it.
0
 
LVL 35

Expert Comment

by:Terry Woods
Comment Utility
Then you just need to determine which stats you need to gather. Obviously this will be bssed on the underlying need for doing all of this in the first place.
0

Featured Post

Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

Join & Write a Comment

Entering time in Microsoft Access can be difficult. An input mask often bothers users more than helping them and won't catch all typing errors. This article shows how to create a textbox for 24-hour time input with full validation politely catching …
Entity Framework is a powerful tool to help you interact with the DataBase but still doesn't help much when we have a Stored Procedure that returns more than one resultset. The solution takes some of out-of-the-box thinking; read on!
Video by: Steve
Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now