Help with python script to confront csv file to mysql and let user decide about fields to update

Posted on 2014-04-15
Medium Priority
Last Modified: 2014-04-22
Hello there!

I've passed the last days fighting with this topic without great results since I'm a beginner.

I have a .csv files containing various fields: name, surname, city, etc

I want to prepare a python script that reads this file row by row and searches on the db for records with the same name, surname and city.

If I find such record I would like to make a comparison of all the other fields and allow the user to update or not the database with data contained (for the same record) in the .csv

Here's what I did but I can't get it to work.

Any kind of help is greatly appreciated: code correction, ideas, faux code, analysis...

I am ready to start from scratch (I did it more than once) but I can't understand how I should plan the program to let it work:

import os
import csv
import string
import datetime
from warnings import filterwarnings
import MySQLdb
import MySQLdb as Database
filterwarnings('ignore', category = Database.Warning)

os.system('cls' if os.name == 'nt' else 'clear')
now = datetime.datetime.now()
timestamp = "_"+str(now.day)+"-"+str(now.month)+"-"+str(now.year)+"_"+str(now.hour)+"-"+str(now.minute)+"-"+str(now.second)
#import_fields = {"cognome" : "", "nome" : "", "azienda" : "", "riferimento" : "", "carica" : "", "indirizzo" : "", "cellulare" : "", "cellulare2" : "", "cellulare3" : "", "telefono" : "", "telefono2" : "", "fax" : "", "email" : "", "email2" : "", "skyp" : "", "note" : ""}
import_fields = {}

db_fields = {}
csv_fields = {}

bold_start = "\033[1m"
bold_end = "\033[0;0m"

tot_non_doppioni = 0
tot_doppioni = 0
numero_riga = 0

def fields_check(csv_fields, db_fields):
    global tot_doppioni
    for k in csv_fields:
        csv_value = csv_fields[k]
        db_value = db_fields[k]
        if csv_value == db_value:
        elif csv_value != db_value:
            while True:
                print "The key is:", k
                print "1) Field in csv:", csv_value
                print "2) Field in db: ", db_value
                user_answer = raw_input("Which do you want to keep? [1 o 2]: ")
                if user_answer == "1":
                    import_fields[k] = csv_value
                    print "You dedided to keep the field name:", k,
                    print "with value:", csv_value
                elif user_answer == "2":
                    import_fields[k] = db_value
                    print "You dedided to keep the field name:", k, "with value:", db_value
                    print "Error!"
    return dict(db_fields.items() + import_fields.items())

cognome_import = ""
nome_import = ""
riferimento_import = ""
carica_import = ""
note_import = ""
cellulare_import = ""
cellulare2_import = ""
telefono_import = ""
telefono2_import = ""
fax_import = ""
azienda_import = ""
email_import = ""
indirizzo_import = ""
residenza_import = ""

mydb = MySQLdb.connect(host='localhost',
cursor = mydb.cursor()

csv_check_doppioni = csv.reader(file('doppioni.csv'))

for row in csv_check_doppioni:
    cognome_csv = row[0].lower().rstrip().lstrip()
    nome_csv = row[1].lower().rstrip().lstrip()
    riferimento_csv = row[2].lower().rstrip().lstrip()
    carica_csv = row[3].lower().rstrip().lstrip()
    note_csv = row[4].lower().rstrip().lstrip()
    cellulare_csv = row[5].lower().rstrip().lstrip()
    cellulare2_csv = row[6].lower().rstrip().lstrip()
    telefono_csv = row[7].lower().rstrip().lstrip()
    telefono2_csv = row[8].lower().rstrip().lstrip()
    fax_csv = row[9].lower().rstrip().lstrip()
    azienda_csv = row[10].lower().rstrip().lstrip()
    email_csv = row[11].lower().rstrip().lstrip()
    indirizzo_csv = row[12].lower().rstrip().lstrip()
    residenza_csv = row[13].lower().rstrip().lstrip()
    csv_fields = {"cognome" : cognome_csv, "nome" : nome_csv, "riferimento" : riferimento_csv, "carica" : carica_csv, "note" : note_csv, "cellulare" : cellulare_csv, "cellulare2" : cellulare2_csv, "telefono" : telefono_csv, "telefono2" : telefono2_csv, "fax" : fax_csv, "azienda" : azienda_csv, "email" : email_csv, "indirizzo" : indirizzo_csv}
    numero_riga += 1
    cursor.execute("SELECT * FROM contatti WHERE cognome like %s AND nome like %s", ("%"+cognome_csv+"%", "%"+nome_csv+"%"))
    result = cursor.fetchall()

    if result == ():
        tot_non_doppioni += 1
        with open("da_importare"+timestamp+".csv", "a") as text_file:
            text_file.write(', '.join(row))
        id = result[0][0]
        id_comune = result[0][1]
        cognome_db = result[0][2].lower().rstrip().lstrip()
        nome_db = result[0][3].lower().rstrip().lstrip()
        azienda_db = result[0][4].lower().rstrip().lstrip()
        riferimento_db = result[0][5].lower().rstrip().lstrip()
        carica_db = result[0][6].lower().rstrip().lstrip()
        indirizzo_db = result[0][7].lower().rstrip().lstrip()
        cellulare_db = result[0][8].lower().rstrip().lstrip()
        cellulare2_db = result[0][9].lower().rstrip().lstrip()
        cellulare3_db = result[0][10].lower().rstrip().lstrip()
        telefono_db = result[0][11].lower().rstrip().lstrip()
        telefono2_db = result[0][12].lower().rstrip().lstrip()
        fax_db = result[0][13].lower().rstrip().lstrip()
        email_db = result[0][14].lower().rstrip().lstrip()
        email2_db = result[0][15].lower().rstrip().lstrip()
        skype_db = result[0][16].lower().rstrip().lstrip()
        note_db = result[0][17].lower().rstrip().lstrip()

        db_fields = {"id" : id, "id_comune" : id_comune, "cognome" : cognome_db, "nome" : nome_db, "riferimento" : riferimento_db, "carica" : carica_db, "note" : note_db, "cellulare" : cellulare_db, "cellulare2" : cellulare2_db, "telefono" : telefono_db, "telefono2" : telefono2_db, "fax" : fax_db, "azienda" : azienda_db, "email" : email_db, "indirizzo" : indirizzo_db}

        doppione_corretto = fields_check(csv_fields, db_fields)
        print doppione_corretto
        tot_doppioni += 1


Open in new window

Question by:ltpitt
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
LVL 29

Accepted Solution

pepr earned 2000 total points
ID: 40006153
There are several things to fix.

For the timestamp. It may be the case that you cannot choose and you need to follow the string format given to you by someone else. However, it is better to follow the experience of others. When sorting the table by the timestamp column, then sorting by correctly constructed strings mean sorting by date. But it works only when you place year first, month as second, day, hour, minute, and second as the last. All timestamps also have to have the same number of characters. It is usually done by padding single digits by zero from the left. If it is acceptable, you can use the .strftime() method of your now. See below:
>>> import datetime
>>> now = datetime.datetime.now()

Open in new window

Your code produces
>>> timestamp = "_"+str(now.day)+"-"+str(now.month)+"-"+str(now.year)+"_"+str(no
>>> timestamp

Open in new window

If you accept zero padding, you can use the .strftime
>>> now.strftime("_%d-%m-%Y_%H-%M-%S")

Open in new window

There even is a special function to get the ISO format of the timestamp (as a string). Notice the T is recommended to separate the date and time:
>>> now.isoformat()

Open in new window

But you may not want the fraction of seconds. You can use the strftime:
>>> now.strftime("%Y-%m-%dT%H:%M:%S")

Open in new window

Sometimes more technical (dense) format is used:
>>> now.strftime("%Y%m%dT%H%M%S")

Open in new window

Now for the following line:
csv_check_doppioni = csv.reader(file('doppioni.csv'))

Open in new window

It is tempting to use file as the file-object constructor (especially if you know say C++). However, the open function was always and is recommended. There is at least one serious reason. Python 3 does not have the file class at all.

Secondly, Python 2 documentation says that you have to open the file in binary mode when used by the csv module. Thirdly, the open file should always be closed. You know the with construct (as you show below) that ensures closing the object created using the with construct. Then the way to open CSV file should be:
with open('doppioni.csv', 'rb') as csv_input_file, \
        open("da_importare"+timestamp+".csv", "ab") as csv_output_file:
    reader = csv.reader(csv_input_file)
    writer = csv.writer(csv_output file)

    for row in reader:
        row = [element.strip() for element in row]   # strip all the elements (instead of .rstrip().lstrip()

Open in new window

Now for construction of the csv_fields dictionary. You know the column names. I will simulate the row containing the row where elements are strings with leading and trailing spaces like this:
>>> row = ['  ' + str(e) + '  ' for e in range(14)]
>>> row
['  0  ', '  1  ', '  2  ', '  3  ', '  4  ', '  5  ', '  6  ', '  7  ', '  8  '
, '  9  ', '  10  ', '  11  ', '  12  ', '  13  ']

Open in new window

You can define outside the loop:
>>> columns = ["cognome", "nome", "riferimento", "carica", "note", "cellulare",
"cellulare2", "telefono", "telefono2", "fax", "azienda", "email", "indirizzo"]

Open in new window

You can put the two togeter using the standard zip function it returns a list of tuples where first element of the tuple is from the first list, the second element is from second list:
>>> zip(columns, row)
[('cognome', '  0  '), ('nome', '  1  '), ('riferimento', '  2  '), ('carica', '
  3  '), ('note', '  4  '), ('cellulare', '  5  '), ('cellulare2', '  6  '), ('t
elefono', '  7  '), ('telefono2', '  8  '), ('fax', '  9  '), ('azienda', '  10
 '), ('email', '  11  '), ('indirizzo', '  12  ')]

Open in new window

If you want to construct the dictionary with stripped values, you can use a dictionary comprehension (called also dictionary display):
>>> csv_fields = {t[0]:t[1].strip() for t in zip(columns, row)}
>>> csv_fields
{'fax': '9', 'indirizzo': '12', 'riferimento': '2', 'nome': '1', 'cognome': '0',
 'telefono2': '8', 'note': '4', 'azienda': '10', 'carica': '3', 'cellulare': '5'
, 'telefono': '7', 'email': '11', 'cellulare2': '6'}

Open in new window

The %s in the SQL after the LIKE should probably be wrapped in quotes.

Author Closing Comment

ID: 40014245
Thank you for your precious time and priceless skill

Featured Post

On Demand Webinar: Networking for the Cloud Era

Ready to improve network connectivity? Watch this webinar to learn how SD-WANs and a one-click instant connect tool can boost provisions, deployment, and management of your cloud connection.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

A set of related code is known to be a Module, it helps us to organize our code logically which is much easier for us to understand and use it. Module is an object with arbitrarily named attributes which can be used in binding and referencing. …
Here I am using Python IDLE(GUI) to write a simple program and save it, so that we can just execute it in future. Because when we write any program and exit from Python then program that we have written will be lost. So for not losing our program we…
Learn the basics of modules and packages in Python. Every Python file is a module, ending in the suffix: .py: Modules are a collection of functions and variables.: Packages are a collection of modules.: Module functions and variables are accessed us…
Learn the basics of while and for loops in Python.  while loops are used for testing while, or until, a condition is met: The structure of a while loop is as follows:     while <condition>:         do something         repeate: The break statement m…
Suggested Courses
Course of the Month12 days, 12 hours left to enroll

777 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question