Solved

Load CSV to DB: how to exclude some columns?

Posted on 2014-04-23
1
463 Views
Last Modified: 2014-04-24
Hello all,

I'm loading CSVs into a SQL Server database table and I found out that the CSV file format doesn't completely match the table (column names are not 100% identical, and finding extra unnecessary columns as well).  Target table looks like this:

create table tmp
(
  id int,
  name varchar(50),
  address varchar(200),
  city varchar(50),
  state varchar(50),
  zip varchar(50)
)

Open in new window


But flat file looks like this:

id,name,address,state,city,zip,phonenumber
1,'Bob Jones','123 Main St.','WA','Seattle','12345','555-5555'

Open in new window


Is it possible to specifically route columns into a different order when constructing the insert statement?  This is what I have cooked up so far:

import csv
import ceODBC

f = open(fname,'r',encoding='utf_8',newline='')
reader = csv.reader(f,delimiter=',')
header = next(reader)

conn = ceODBC.connect(...connection info....,autocommit=false)
cursor = conn.cursor()

#I don't think this is correct...
for row in reader:
     cursor.execute('insert into tmp (id,name,address,city,state,zip) values({?},{?],{?},{?},{?],{?})',row[0],row[1],row[2],row[4],row[3],row[5])

cursor.close
conn.close

Open in new window


Any help is appreciated.

Thanks,
Glen
0
Comment
Question by:jisoo411
1 Comment
 
LVL 28

Accepted Solution

by:
pepr earned 500 total points
ID: 40018792
Firstly, do not forget the f.close().

Now add the following lines just below your line 6:
insert_tpl = 'insert into tmp (' + ','.join(header[:6])
print(insert_tpl)
insert_tpl += ') values (' + ','.join(['{?}'] * 6) + ')'
print(insert_tpl)

Open in new window

It constructs the insert command from your header and uses only the first 6 elements. Then it prints the first phase of the effort. Then it adds the value part with that many {?} joined also by commas. I can see on my screen:
insert into tmp (id,name,address,state,city,zip
insert into tmp (id,name,address,state,city,zip) values ({?},{?},{?},{?},{?},{?})

Open in new window

Now the row contains the info in the prescribed order. You only need to use the first 6 elements from the row:
         cursor.execute(insert_tpl, row[:6])

Open in new window

I did not try but I believe the SQL engine will swap the values for you. I do not know if the command is then slower or not.

If you need to use other names for the columns than in CSV file, set your header list explicitly with the wanted names for the database but in the order used in the CSV file.

The above description is about how to get rid of the tail of the row (the slicing of the row variable) and how to fix the order of the values (the template uses names in the CSV order -- or from the CSV header or set explicitly).

It may be the case, you have set the header list explicitly in the correct order, but you need to skip some of the row values (from the middle). You can modify the row before calling the cursor.execute. You can delete some parts of the row using the del command. Let's simulate the row as a list of numbers:
row = list(range(12))
print(row)

Open in new window

It prints
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

Open in new window

This is, twelve elements from zero to eleven. The index of the element is the same as the value. When deleting some element, the indices of the rest will change. Because of that, it is easier to delete them from right to the left:
del row[10]
del row[5:8] # from 5 including to 8 excluding
del row[2]
del row[0]
print(row)                                  

Open in new window

Now the row looks this way:
[1, 3, 4, 8, 9, 11]

Open in new window

0

Featured Post

What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

This article will show the steps for installing Python on Ubuntu Operating System. I have created a virtual machine with Ubuntu Operating system 8.10 and this installing process also works with upgraded version of Ubuntu OS. For installing Py…
A set of related code is known to be a Module, it helps us to organize our code logically which is much easier for us to understand and use it. Module is an object with arbitrarily named attributes which can be used in binding and referencing. …
Learn the basics of strings in Python: declaration, operations, indices, and slicing. Strings are declared with quotations; for example: s = "string": Strings are immutable.: Strings may be concatenated or multiplied using the addition and multiplic…
Learn the basics of modules and packages in Python. Every Python file is a module, ending in the suffix: .py: Modules are a collection of functions and variables.: Packages are a collection of modules.: Module functions and variables are accessed us…

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now