Solved

Load CSV to DB: how to exclude some columns?

Posted on 2014-04-23
1
464 Views
Last Modified: 2014-04-24
Hello all,

I'm loading CSVs into a SQL Server database table and I found out that the CSV file format doesn't completely match the table (column names are not 100% identical, and finding extra unnecessary columns as well).  Target table looks like this:

create table tmp
(
  id int,
  name varchar(50),
  address varchar(200),
  city varchar(50),
  state varchar(50),
  zip varchar(50)
)

Open in new window


But flat file looks like this:

id,name,address,state,city,zip,phonenumber
1,'Bob Jones','123 Main St.','WA','Seattle','12345','555-5555'

Open in new window


Is it possible to specifically route columns into a different order when constructing the insert statement?  This is what I have cooked up so far:

import csv
import ceODBC

f = open(fname,'r',encoding='utf_8',newline='')
reader = csv.reader(f,delimiter=',')
header = next(reader)

conn = ceODBC.connect(...connection info....,autocommit=false)
cursor = conn.cursor()

#I don't think this is correct...
for row in reader:
     cursor.execute('insert into tmp (id,name,address,city,state,zip) values({?},{?],{?},{?},{?],{?})',row[0],row[1],row[2],row[4],row[3],row[5])

cursor.close
conn.close

Open in new window


Any help is appreciated.

Thanks,
Glen
0
Comment
Question by:jisoo411
1 Comment
 
LVL 28

Accepted Solution

by:
pepr earned 500 total points
ID: 40018792
Firstly, do not forget the f.close().

Now add the following lines just below your line 6:
insert_tpl = 'insert into tmp (' + ','.join(header[:6])
print(insert_tpl)
insert_tpl += ') values (' + ','.join(['{?}'] * 6) + ')'
print(insert_tpl)

Open in new window

It constructs the insert command from your header and uses only the first 6 elements. Then it prints the first phase of the effort. Then it adds the value part with that many {?} joined also by commas. I can see on my screen:
insert into tmp (id,name,address,state,city,zip
insert into tmp (id,name,address,state,city,zip) values ({?},{?},{?},{?},{?},{?})

Open in new window

Now the row contains the info in the prescribed order. You only need to use the first 6 elements from the row:
         cursor.execute(insert_tpl, row[:6])

Open in new window

I did not try but I believe the SQL engine will swap the values for you. I do not know if the command is then slower or not.

If you need to use other names for the columns than in CSV file, set your header list explicitly with the wanted names for the database but in the order used in the CSV file.

The above description is about how to get rid of the tail of the row (the slicing of the row variable) and how to fix the order of the values (the template uses names in the CSV order -- or from the CSV header or set explicitly).

It may be the case, you have set the header list explicitly in the correct order, but you need to skip some of the row values (from the middle). You can modify the row before calling the cursor.execute. You can delete some parts of the row using the del command. Let's simulate the row as a list of numbers:
row = list(range(12))
print(row)

Open in new window

It prints
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

Open in new window

This is, twelve elements from zero to eleven. The index of the element is the same as the value. When deleting some element, the indices of the rest will change. Because of that, it is easier to delete them from right to the left:
del row[10]
del row[5:8] # from 5 including to 8 excluding
del row[2]
del row[0]
print(row)                                  

Open in new window

Now the row looks this way:
[1, 3, 4, 8, 9, 11]

Open in new window

0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Introduction On September 29, 2012, the Python 3.3.0 was released; nothing extremely unexpected,  yet another, better version of Python. But, if you work in Microsoft Windows, you should notice that the Python Launcher for Windows was introduced wi…
When we want to run, execute or repeat a statement multiple times, a loop is necessary. This article covers the two types of loops in Python: the while loop and the for loop.
Learn the basics of strings in Python: declaration, operations, indices, and slicing. Strings are declared with quotations; for example: s = "string": Strings are immutable.: Strings may be concatenated or multiplied using the addition and multiplic…
Learn the basics of while and for loops in Python.  while loops are used for testing while, or until, a condition is met: The structure of a while loop is as follows:     while <condition>:         do something         repeate: The break statement m…

867 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now