Avatar of ocean O
ocean OFlag for United States of America

asked on 

save dataframe to a csv file in python

Hi, I have a question for python.  I had use cursor.fetchmany() to fetch the data. It will fetch multiple times depends on the data number. I want to generate one csv file after all the fetching. How can I do it? Code is below. Thanks

 datafile = pd.DataFrame([])
       
            while True:

                results = cursor.fetchmany(7000)

                if not results:
                    break
               

                col_Names = ['market', 'site_id', 'latitude', 'longitude', 'state', 'zip', 'structure_type']
                data = pd.DataFrame(results)
                data.to_csv(filename, index=False, header=col_Names) //This is the first time fetching
          //how can I append the next fetching data to the csv file?
               

        cursor.close()
Python

Avatar of undefined
Last Comment
Ryan Chong
Avatar of Ryan Chong
Ryan Chong
Flag of Singapore image

have you heard about the to_csv method?

pandas.DataFrame.to_csv
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html#pandas-dataframe-to-csv

some examples:

Saving a Pandas Dataframe as a CSV
https://www.geeksforgeeks.org/saving-a-pandas-dataframe-as-a-csv/

How to Export Pandas DataFrame to a CSV File
https://datatofish.com/export-dataframe-to-csv/
Avatar of David Favor
David Favor
Flag of United States of America image

Just open the .csv file outside your loop. Write to the .csv file as desired. Close the .csv file when your SQL looping is complete.
Avatar of ocean O
ocean O
Flag of United States of America image

ASKER

I can't put all the data in one dataframe. Since the data is too big, like 7241764, if everytime I fetch 5000, it will take over 1400 times to fetch. so I am thinking everytime when fetching 5000, then I will write it to a csv file , then append the second csv file from second time fetching to the previous csv file. But I am not sure how to append the csv file in python.

 if execution_type == _READ:
            while True:

                results = cursor.fetchmany(5000)

                if not results:
                    break
       

                if i == 0:
                   data = pd.DataFrame(results)
                   data.to_csv(filename, index=False, header=col_Names)

                if i >= 1:
                    data.to_csv(filename1, index=False)
                    append = filename.join(filename1) //How to append csv file filename1 to csv file filename???
                i += 1

        cursor.close()
Avatar of Ryan Chong
Ryan Chong
Flag of Singapore image

Since the data is too big, like 7241764, if everytime I fetch 5000, it will take over 1400 times to fetch.

what's the database you are using? if we are saying to export 7.2 mil of records, we probably need to have a better approach on this.
ASKER CERTIFIED SOLUTION
Avatar of Louis LIETAER
Louis LIETAER
Flag of France image

Blurred text
THIS SOLUTION IS ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
Avatar of ocean O
ocean O
Flag of United States of America image

ASKER

Ryan:
we are using postgresql
Avatar of Ryan Chong
Ryan Chong
Flag of Singapore image

@ocean O

sorry as I didn't use much on postgresql, so can't really provide advice on that

perhaps with such volume of data, some off-the-shelf software + RPA would be a more reliable solution for you?

dbForge Studio for PostgreSQL
https://www.devart.com/dbforge/postgresql/studio/postgresql-migration.html
Python
Python

Python is a widely used general-purpose, high-level programming language. Its design philosophy emphasizes code readability, and its syntax allows programmers to express concepts in fewer lines of code than would be possible in other languages. Python supports multiple programming paradigms, including object-oriented, imperative and functional programming or procedural styles. It features a dynamic type system and automatic memory management and has a large and comprehensive set of standard libraries, including NumPy, SciPy, Django, PyQuery, and PyLibrary.

6K
Questions
--
Followers
--
Top Experts
Get a personalized solution from industry experts
Ask the experts
Read over 600 more reviews

TRUSTED BY

IBM logoIntel logoMicrosoft logoUbisoft logoSAP logo
Qualcomm logoCitrix Systems logoWorkday logoErnst & Young logo
High performer badgeUsers love us badge
LinkedIn logoFacebook logoX logoInstagram logoTikTok logoYouTube logo