ocean O
asked on
Python sql
Hi, I have a question about python.
I have a line of code in py file: results = cursor.fetchall()
But I need change to retrieve the rows in some chunks ( ex: 5000 rows) instead of "fetchall"
For example: rows = cursor.fetchmany(5000)
How can I do it?
Thanks
I have a line of code in py file: results = cursor.fetchall()
But I need change to retrieve the rows in some chunks ( ex: 5000 rows) instead of "fetchall"
For example: rows = cursor.fetchmany(5000)
How can I do it?
Thanks
cursor.fetchmany(size) is probably what you want
fetchmany should do what you want, isn't it working?
ASKER
Thanks for reply!
Actually I need do something like : if in the DB there are records of 11000, I need fetch three times, the first two 5000, the third time 1000. And put this 11000 row into one csv file . How can I do it? Thanks
Actually I need do something like : if in the DB there are records of 11000, I need fetch three times, the first two 5000, the third time 1000. And put this 11000 row into one csv file . How can I do it? Thanks
Why not fetch all the records in one go?
Are you processing them before the data in the CSV file?
Are you processing them before the data in the CSV file?
I need fetch three times, the first two 5000, the third time 1000. And put this 11000 row into one csv file . How can I do it?
what's the time gap for these 3 processes we are talking about? we also have mentioned how frequent you wish to do this.
if there's no time gap, then try fetch all records in one go as mentioned.
if there's a time gap, perhaps you can try to save the output in 3 separate CSVs, and then combine them together in your final step.
ASKER
Thanks for the reply.
I am calling a postgresql db in python, since data in some table is very big which can affect the performance. So I am thinking to use something like "cursor fetch batch", so I can fetch three times instead of one time, then I can combine all the data and return one csv file.
Thanks
I am calling a postgresql db in python, since data in some table is very big which can affect the performance. So I am thinking to use something like "cursor fetch batch", so I can fetch three times instead of one time, then I can combine all the data and return one csv file.
Thanks
I don't really have a hands-on experience on postgresql + python implementation, but I guess in order to fetch large data from a very large table.
perhaps you could try to:
1. Replicate the data needed to be exported into another temporary table.
2. Build a FIFO process model, when your ETL (or extraction python code) is running, mark the records that have been exported
3. When it reaches certain amount of records being exported, combine the CSVs into a single CSV.
4. Delete such marked records in temporary table when it's necessary
in short, you may need more than one python scripts to handle these processes.
perhaps you could try to:
1. Replicate the data needed to be exported into another temporary table.
2. Build a FIFO process model, when your ETL (or extraction python code) is running, mark the records that have been exported
3. When it reaches certain amount of records being exported, combine the CSVs into a single CSV.
4. Delete such marked records in temporary table when it's necessary
in short, you may need more than one python scripts to handle these processes.
ASKER CERTIFIED SOLUTION
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
ASKER
Hi, Norie:
Thanks for the suggestion.
Do you have any example that need fetch data 3 times and then put into dataframe?
Thanks
Thanks for the suggestion.
Do you have any example that need fetch data 3 times and then put into dataframe?
Thanks
ASKER
I had written a function like below, but I am not sure how to put all the data into one dataframe. For example if it fetchs 3 times, how can I append all the data into one dataframe? Thanks
def db_execute(sql, execution_type):
try:
conn = get_connection_handle()
conn.autocommit = True
cursor = conn.cursor()
cursor.execute(sql)
if execution_type == _READ:
while True:
results = cursor.fetchmany(4000)
if not results:
break
for r in results:
data = pd.DataFrame(results)
cursor.close()
close_connection_handle(co nn)
return data
def db_execute(sql, execution_type):
try:
conn = get_connection_handle()
conn.autocommit = True
cursor = conn.cursor()
cursor.execute(sql)
if execution_type == _READ:
while True:
results = cursor.fetchmany(4000)
if not results:
break
for r in results:
data = pd.DataFrame(results)
cursor.close()
close_connection_handle(co
return data