Wm Allen Smith
asked on
Using Python to Loop through a folder and print specified sheet(s) in excel workbook
Hi,
I am attempting to use python to loop through a specified folder (sales_folder monthlysales*.xlsx) and pull column data (salesTot) from a named tab (currentmonth) in within the spreadsheet. I can get this
to work with one file, but not with looping through the folder. Th ecode for one file:
import openpyxl
import pandas as pd
pd.set_option('display.max _rows', 500)
pd.set_option('display.max _columns', 25)
pd.set_option('display.wid th', 200)
from pandas import ExcelWriter
from pandas import ExcelFile
d1 = pd.read_excel("C:\\sales_f older\ monthlysales", sheet_name= "currentmonth", index=range(5), columns=list('abc')
)
print(d1)
d1.columns
print(d1.columns)
df1=pd.DataFrame(data=d1)
print(d1)
d1_transposed=d1.T
print(d1_transposed)
d1_transposed.to_csv('C:\\ Folder\\Sales.csv')
I have tried to use glob but I cannot get the loop to run through the folder, find the tab and print the output to a csv. Please advise.
I am attempting to use python to loop through a specified folder (sales_folder monthlysales*.xlsx) and pull column data (salesTot) from a named tab (currentmonth) in within the spreadsheet. I can get this
to work with one file, but not with looping through the folder. Th ecode for one file:
import openpyxl
import pandas as pd
pd.set_option('display.max
pd.set_option('display.max
pd.set_option('display.wid
from pandas import ExcelWriter
from pandas import ExcelFile
d1 = pd.read_excel("C:\\sales_f
)
print(d1)
d1.columns
print(d1.columns)
df1=pd.DataFrame(data=d1)
print(d1)
d1_transposed=d1.T
print(d1_transposed)
d1_transposed.to_csv('C:\\
I have tried to use glob but I cannot get the loop to run through the folder, find the tab and print the output to a csv. Please advise.
ASKER
Norie,
Great! This first step worked. Here is the tricky part--The actual data is laid out with each field as a row and the value of the file in in the neighboring column:
Col1 Col 2
Fielid1 Value1
Field2 Value2....
I need to get his data into a dataframe, transpose it so that a standard table format and save it to a csv file.
Please advise.
Great! This first step worked. Here is the tricky part--The actual data is laid out with each field as a row and the value of the file in in the neighboring column:
Col1 Col 2
Fielid1 Value1
Field2 Value2....
I need to get his data into a dataframe, transpose it so that a standard table format and save it to a csv file.
Please advise.
Where is the data actually located on the sheet?
How do you want the it all combined in the output file?
How do you want the it all combined in the output file?
ASKER
Filenames are in Column A, Values in Column B.
Thanks
Thanks
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Norrie, this worked quite well! Thank you for your input.
Another thing I would like to do is in the error handling, if there is a problem file, move it to another folder. I was using the shutil module, but that moved all of the files to another destination folder, not just the ones that triggered an error:
try:
# read data from sales file
df = pd.read_excel(filename, sheet_name= 'Data', header=None)
# transpose data
df = df.T
# use first row as headers
df.columns = df.iloc[0]
# remove first row
df = df[1:]
# append to main dataframe
dfAllSales = dfAllSales.append(df)
except:
print(f'Problem reading from file {filename}')
source = 'C:\\\Documents\\Updatedat aTestFolde r\\'
dest1 ='C:\\Documents\\Updatedat aTestFolde r\\badfold er'
files = os.listdir(source)
for f in files:
print(f'Problem reading from file {filename}')
shutil.move(source+f, dest1)
print (filename)
Please advise.
Another thing I would like to do is in the error handling, if there is a problem file, move it to another folder. I was using the shutil module, but that moved all of the files to another destination folder, not just the ones that triggered an error:
try:
# read data from sales file
df = pd.read_excel(filename, sheet_name= 'Data', header=None)
# transpose data
df = df.T
# use first row as headers
df.columns = df.iloc[0]
# remove first row
df = df[1:]
# append to main dataframe
dfAllSales = dfAllSales.append(df)
except:
print(f'Problem reading from file {filename}')
source = 'C:\\\Documents\\Updatedat
dest1 ='C:\\Documents\\Updatedat
files = os.listdir(source)
for f in files:
print(f'Problem reading from file {filename}')
shutil.move(source+f, dest1)
print (filename)
Please advise.
You can use shutil to copy/move a single file.
import glob
import os
import shutil
import pandas as pd
dfAllSales = pd.DataFrame()
pathBadFiles = 'bad_files'
pathGoodFiles = 'good_files'
pathSales = 'sales_folder'
for filepath in glob.glob(f'C:/{pathSales}/monthlysales*.xls*'):
filename = os.path.split(filepath)[1]
try:
# read data from sales file
df = pd.read_excel(filepath, sheet_name= 'currentmonth', header=None)
# transpose
df = df.T
# use first row as headers
df.columns = df.iloc[0]
# remove first row
df = df[1:]
# append to main dataframe
dfAllSales = dfAllSales.append(df)
# copy file to good_files folder
shutil.copy(filepath, os.path.join(f'c:/{pathSales}/{pathGoodFiles}', filename))
except:
print(f'Problem reading from file {filename}')
# move file to bad_files folder
shutil.move(filepath, os.path.join(f'c:/{pathSales}/{pathBadFiles}', filename))
print(dfAllSales)
# save combined data to CSV
dfAllSales.to_csv('C:/sales_folder/Sales.csv', index=None)
ASKER
I was not able to get this version with shutil to work, only returned empty data sets, even when I hard coded the file paths.
Do you mean it didn't work at all?
The only thing that changed was the copying/moving of the files and that shouldn't affect the original code that was getting the data.
The only thing that changed was the copying/moving of the files and that shouldn't affect the original code that was getting the data.
ASKER
Question: You are using a variable for the Path Sales in the "for filepath in glob.glob(f'C:/{pathSales} /monthlysa les*.xls*' ):"
statement. I am a supplying a path value for the pathSales i.e. pathSales = 'N:/Documents_Main/ExcelFi les'. When referencing the path in the "for" statement, you have "for filepath in glob.glob(f'C:/{pathSales} /monthlysa les*.xls*' ):", I am able to get this to work when I replace the C:/{pathSales}/monthlysale s*.xls*' with the actual path--which I though te pathSales variable was doing. Is this an incorrect assumption?
statement. I am a supplying a path value for the pathSales i.e. pathSales = 'N:/Documents_Main/ExcelFi
In the code I posted the variable pathSales refers to the string 'sales_folder', i.e. only the folder name.
In this code {pathSales} is replaced with the value it refers to.
So, without the variable the above would look like this.
If you have pathSales referring to the full path, including the drive letter you could try this.
In this code {pathSales} is replaced with the value it refers to.
glob.glob(f'C:/{pathSales}/monthlysales*.xls*')
So, without the variable the above would look like this.
glob.glob(f'C:/sales_folder/monthlysales*.xls*')
If you have pathSales referring to the full path, including the drive letter you could try this.
glob.glob(f'{pathSales}/monthlysales*.xls*')
ASKER
I got it to work without using the variable. thank you so much for your assistance!! :)
For each file it finds it'll then try to read data from a sheet named 'currentmonth' and print it out
Open in new window
If you want to collect the data from all the files in the folder and output them in a separate file you'll need to give more details.