Solved

Load CSV to DB: how to exclude some columns?

Posted on 2014-04-23
1
465 Views
Last Modified: 2014-04-24
Hello all,

I'm loading CSVs into a SQL Server database table and I found out that the CSV file format doesn't completely match the table (column names are not 100% identical, and finding extra unnecessary columns as well).  Target table looks like this:

create table tmp
(
  id int,
  name varchar(50),
  address varchar(200),
  city varchar(50),
  state varchar(50),
  zip varchar(50)
)

Open in new window


But flat file looks like this:

id,name,address,state,city,zip,phonenumber
1,'Bob Jones','123 Main St.','WA','Seattle','12345','555-5555'

Open in new window


Is it possible to specifically route columns into a different order when constructing the insert statement?  This is what I have cooked up so far:

import csv
import ceODBC

f = open(fname,'r',encoding='utf_8',newline='')
reader = csv.reader(f,delimiter=',')
header = next(reader)

conn = ceODBC.connect(...connection info....,autocommit=false)
cursor = conn.cursor()

#I don't think this is correct...
for row in reader:
     cursor.execute('insert into tmp (id,name,address,city,state,zip) values({?},{?],{?},{?},{?],{?})',row[0],row[1],row[2],row[4],row[3],row[5])

cursor.close
conn.close

Open in new window


Any help is appreciated.

Thanks,
Glen
0
Comment
Question by:jisoo411
1 Comment
 
LVL 28

Accepted Solution

by:
pepr earned 500 total points
ID: 40018792
Firstly, do not forget the f.close().

Now add the following lines just below your line 6:
insert_tpl = 'insert into tmp (' + ','.join(header[:6])
print(insert_tpl)
insert_tpl += ') values (' + ','.join(['{?}'] * 6) + ')'
print(insert_tpl)

Open in new window

It constructs the insert command from your header and uses only the first 6 elements. Then it prints the first phase of the effort. Then it adds the value part with that many {?} joined also by commas. I can see on my screen:
insert into tmp (id,name,address,state,city,zip
insert into tmp (id,name,address,state,city,zip) values ({?},{?},{?},{?},{?},{?})

Open in new window

Now the row contains the info in the prescribed order. You only need to use the first 6 elements from the row:
         cursor.execute(insert_tpl, row[:6])

Open in new window

I did not try but I believe the SQL engine will swap the values for you. I do not know if the command is then slower or not.

If you need to use other names for the columns than in CSV file, set your header list explicitly with the wanted names for the database but in the order used in the CSV file.

The above description is about how to get rid of the tail of the row (the slicing of the row variable) and how to fix the order of the values (the template uses names in the CSV order -- or from the CSV header or set explicitly).

It may be the case, you have set the header list explicitly in the correct order, but you need to skip some of the row values (from the middle). You can modify the row before calling the cursor.execute. You can delete some parts of the row using the del command. Let's simulate the row as a list of numbers:
row = list(range(12))
print(row)

Open in new window

It prints
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

Open in new window

This is, twelve elements from zero to eleven. The index of the element is the same as the value. When deleting some element, the indices of the rest will change. Because of that, it is easier to delete them from right to the left:
del row[10]
del row[5:8] # from 5 including to 8 excluding
del row[2]
del row[0]
print(row)                                  

Open in new window

Now the row looks this way:
[1, 3, 4, 8, 9, 11]

Open in new window

0

Featured Post

Gigs: Get Your Project Delivered by an Expert

Select from freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely and get projects done right.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Here I am using Python IDLE(GUI) to write a simple program and save it, so that we can just execute it in future. Because when we write any program and exit from Python then program that we have written will be lost. So for not losing our program we…
When we want to run, execute or repeat a statement multiple times, a loop is necessary. This article covers the two types of loops in Python: the while loop and the for loop.
Learn the basics of lists in Python. Lists, as their name suggests, are a means for ordering and storing values. : Lists are declared using brackets; for example: t = [1, 2, 3]: Lists may contain a mix of data types; for example: t = ['string', 1, T…
Learn the basics of if, else, and elif statements in Python 2.7. Use "if" statements to test a specified condition.: The structure of an if statement is as follows: (CODE) Use "else" statements to allow the execution of an alternative, if the …

815 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now