Solved

UDF for Fastest Possible IP translation

Posted on 2003-11-21
8
378 Views
Last Modified: 2008-02-07
hey i am working on a LARGE analysis services/datamart project where i need to group by different networks of ip addresses.

i have more than 64,000 IP addresses (especially for the destination fields)

i need to:

1) store these the most efficiently (smallest).  isnt integer the smallest way to do this?
2) convert these (to the destination format the most efficiently)
3) roll these up into different networks (ie i think that i want a table where i store:
IP_ADDY           PARENT_IP
192.168.5.112   192.168.5.0
192.168.5.113   192.168.5.0

my preferences are to not use any DTS/vba for this (because i am using bcp/bulk insert)

and the UDF that I have previously used seems WWWWWAAAAAAAAYYYYYYY tooo slow.

i have seen people use an extended stored proc to do this (written in C++ i assume)-- is this utterly the fastest way to do this?

i am talking about 300million new records every day.

thanks!!
0
Comment
Question by:__Holly__
  • 3
  • 2
  • 2
  • +1
8 Comments
 
LVL 34

Expert Comment

by:arbert
ID: 9801939
I assume that you would have 4integer fields to store each part of the IP?  If not, you would have to use a bigint field to be able to store the maximum possible IP address (that, and I'm not sure how you would know whether it was 192.168.5.0 or 192.16.85.0 when you go to convert it back)

As far as DTS, you can also use the Bulk Insert task within DTS :)

Have you thought about simply keeping the IPs stored as an integer and putting a view on top of the data for the users that add the "IP formatting" on the fly?
0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 9802634
One approach you can take is to store the IP as a varchar column and use the T-SQL PARSENAME function to pull out individual parts of the IP address.  Yes, I realize this use of the PARSENAME function is not very orthodox as it is not intended for that purpose, but does do the job very well.

Anthony
0
 
LVL 9

Accepted Solution

by:
PaulBarbin earned 400 total points
ID: 9804682
I like the parsename approach.  I have used it and it works well.  

My question would be, do you need the individual octets or do you just need the complete IP address?  You have to be concerned about processing time as well as storage space.  For example, you could use four tinyint fields and store this in 4 bytes (I think that's the most storage efficient), but its definitely more processor intensive and costs time to separate it to insert and combine it to select.  

Another question to consider is: Are you having to compare previous data to see if the row already exists?  Or are you just recording new data every day?  Are you storing each day in one table and then doing a nightly merge?

Also, don't see the point of storing the parent address.  Is it always something you can calcuate?  That is, is the subnet the same?  Or could you just store the subnet value (another tinyint) and calcuate the parent when/if you need it.  If you often need this value, then I would suggest storing it.  I don't know whether it makes sense with this much volume to try and go the strict relational route which would indicate a foreign key to the parent IP address table.  Just thinking out loud...

Final note:
300 million records is a LOT of IP addresses!  Seems like you'd run out of unique IP addresses after a few weeks at that pace!  Can I ask what this is for?

Paul
0
 
LVL 9

Expert Comment

by:PaulBarbin
ID: 9804695
Sorry, just read your post again and I see that this is for OLAP (should have figured).

Btw, if you go the varchar route, then no UDF is necessary.  Otherwise, you could do something like this:

Insert INTO IPTable   (
     firstOctet,
     secondOctet,
     thirdOctet,
     fourthOctet)

Select
     ParseName(fullIPAddress, 1),
     ParseName(fullIPAddress, 2),
     ParseName(fullIPAddress, 3),
     ParseName(fullIPAddress, 4)
From MyTable

Paul
0
Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 10232037
Do you still need help with this old open question?
0
 
LVL 34

Expert Comment

by:arbert
ID: 10232074
She's "deadly with SQL Server & Access"  but not very good at maintaining her questions....
0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 10232120
LOL, but we can only hope.
0
 
LVL 1

Author Comment

by:__Holly__
ID: 10239505
sorry guys.

THANKS A TON that is exactly what i needed; and i am sure it is faster than all that other BS i was writing by hand.

i work for the big M and i report on internal traffic... the destination IPs have a high cardinality, but the source IPs are a lil less dynamic.

i think that i have something like 800k client ip addresses; but im still not too familiar with the real shape of it.
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Suggested Solutions

Everyone has problem when going to load data into Data warehouse (EDW). They all need to confirm that data quality is good but they don't no how to proceed. Microsoft has provided new task within SSIS 2008 called "Data Profiler Task". It solve th…
Slowly Changing Dimension Transformation component in data task flow is very useful for us to manage and control how data changes in SSIS.
Familiarize people with the process of utilizing SQL Server functions from within Microsoft Access. Microsoft Access is a very powerful client/server development tool. One of the SQL Server objects that you can interact with from within Microsoft Ac…
Via a live example, show how to extract insert data into a SQL Server database table using the Import/Export option and Bulk Insert.

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

22 Experts available now in Live!

Get 1:1 Help Now