Solved

UDF for Fastest Possible IP translation

Posted on 2003-11-21
8
379 Views
Last Modified: 2008-02-07
hey i am working on a LARGE analysis services/datamart project where i need to group by different networks of ip addresses.

i have more than 64,000 IP addresses (especially for the destination fields)

i need to:

1) store these the most efficiently (smallest).  isnt integer the smallest way to do this?
2) convert these (to the destination format the most efficiently)
3) roll these up into different networks (ie i think that i want a table where i store:
IP_ADDY           PARENT_IP
192.168.5.112   192.168.5.0
192.168.5.113   192.168.5.0

my preferences are to not use any DTS/vba for this (because i am using bcp/bulk insert)

and the UDF that I have previously used seems WWWWWAAAAAAAAYYYYYYY tooo slow.

i have seen people use an extended stored proc to do this (written in C++ i assume)-- is this utterly the fastest way to do this?

i am talking about 300million new records every day.

thanks!!
0
Comment
Question by:__Holly__
  • 3
  • 2
  • 2
  • +1
8 Comments
 
LVL 34

Expert Comment

by:arbert
ID: 9801939
I assume that you would have 4integer fields to store each part of the IP?  If not, you would have to use a bigint field to be able to store the maximum possible IP address (that, and I'm not sure how you would know whether it was 192.168.5.0 or 192.16.85.0 when you go to convert it back)

As far as DTS, you can also use the Bulk Insert task within DTS :)

Have you thought about simply keeping the IPs stored as an integer and putting a view on top of the data for the users that add the "IP formatting" on the fly?
0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 9802634
One approach you can take is to store the IP as a varchar column and use the T-SQL PARSENAME function to pull out individual parts of the IP address.  Yes, I realize this use of the PARSENAME function is not very orthodox as it is not intended for that purpose, but does do the job very well.

Anthony
0
 
LVL 9

Accepted Solution

by:
PaulBarbin earned 400 total points
ID: 9804682
I like the parsename approach.  I have used it and it works well.  

My question would be, do you need the individual octets or do you just need the complete IP address?  You have to be concerned about processing time as well as storage space.  For example, you could use four tinyint fields and store this in 4 bytes (I think that's the most storage efficient), but its definitely more processor intensive and costs time to separate it to insert and combine it to select.  

Another question to consider is: Are you having to compare previous data to see if the row already exists?  Or are you just recording new data every day?  Are you storing each day in one table and then doing a nightly merge?

Also, don't see the point of storing the parent address.  Is it always something you can calcuate?  That is, is the subnet the same?  Or could you just store the subnet value (another tinyint) and calcuate the parent when/if you need it.  If you often need this value, then I would suggest storing it.  I don't know whether it makes sense with this much volume to try and go the strict relational route which would indicate a foreign key to the parent IP address table.  Just thinking out loud...

Final note:
300 million records is a LOT of IP addresses!  Seems like you'd run out of unique IP addresses after a few weeks at that pace!  Can I ask what this is for?

Paul
0
 
LVL 9

Expert Comment

by:PaulBarbin
ID: 9804695
Sorry, just read your post again and I see that this is for OLAP (should have figured).

Btw, if you go the varchar route, then no UDF is necessary.  Otherwise, you could do something like this:

Insert INTO IPTable   (
     firstOctet,
     secondOctet,
     thirdOctet,
     fourthOctet)

Select
     ParseName(fullIPAddress, 1),
     ParseName(fullIPAddress, 2),
     ParseName(fullIPAddress, 3),
     ParseName(fullIPAddress, 4)
From MyTable

Paul
0
Enterprise Mobility and BYOD For Dummies

Like “For Dummies” books, you can read this in whatever order you choose and learn about mobility and BYOD; and how to put a competitive mobile infrastructure in place. Developed for SMBs and large enterprises alike, you will find helpful use cases, planning, and implementation.

 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 10232037
Do you still need help with this old open question?
0
 
LVL 34

Expert Comment

by:arbert
ID: 10232074
She's "deadly with SQL Server & Access"  but not very good at maintaining her questions....
0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 10232120
LOL, but we can only hope.
0
 
LVL 1

Author Comment

by:__Holly__
ID: 10239505
sorry guys.

THANKS A TON that is exactly what i needed; and i am sure it is faster than all that other BS i was writing by hand.

i work for the big M and i report on internal traffic... the destination IPs have a high cardinality, but the source IPs are a lil less dynamic.

i think that i have something like 800k client ip addresses; but im still not too familiar with the real shape of it.
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The Delta outage: 650 cancelled flights, more than 1200 delayed flights, thousands of frustrated customers, tens of millions of dollars in damages – plus untold reputational damage to one of the world’s most trusted airlines. All due to a catastroph…
I have a large data set and a SSIS package. How can I load this file in multi threading?
Viewers will learn how to use the INSERT statement to insert data into their tables. It will also introduce the NULL statement, to show them what happens when no value is giving for any given column.
Viewers will learn how to use the SELECT statement in SQL and will be exposed to the many uses the SELECT statement has.

867 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now