Brent Guttmann
asked on
PYTHON create word table performing slow
Hello, I am trying to populate a table in MSWord from Python - I am currently using the below but it is EXTREMELY slow... ive got liek 500 lines to output and its take about 1 line every 3 seconds. Can anyone help with what I am doing wrong?
What I am doing is taking a list I have created from JSON response and populating the a word template with the info. I need to use the table columns because I need the column widths to remain the same --> then I will be printing to pdf
What I am doing is taking a list I have created from JSON response and populating the a word template with the info. I need to use the table columns because I need the column widths to remain the same --> then I will be printing to pdf
from docx import Document
from docx.shared import Inches
document = Document('c:\\pythondev\\Invoice_Template.docx')
table = document.add_table(rows=woncount, cols=6)
r = 0
for row,line in enumerate(W):
try:
c = 0
for col, ct in enumerate(line):
cell = table.cell(r, c)
try:
cell.text = ct
except:
cell.text = str(ct)
c = c + 1
r = r + 1
except:
break
document.save("c:\\pythondev\\testworddoc.docx")
ASKER
Okay, before I go and change all of the data (which is confidential) to try and give you a full working example -- do you think its possible for you to give me an example of how to print a [list] to columns within a word template? (I need to use the template because I have an image as a header, footer, and page numbers)
this may be the quickest way to resolve this -- if not, I can go ahead and figure something out to give you a working example, without the sensitive info
the columns would be coming from the [list] which is in the format below... i have attached an example of 3
row = 'Org','ID','Title','Addres s','Type', 'Date','Amount'
data.txt
this may be the quickest way to resolve this -- if not, I can go ahead and figure something out to give you a working example, without the sensitive info
the columns would be coming from the [list] which is in the format below... i have attached an example of 3
row = 'Org','ID','Title','Addres
data.txt
Is there an existing table in the template for the list/data?
ASKER
Nope, I thought it would be easier to add it each time
ASKER
I am also open to other ways to create the pdf output -- so long as I can add images for the header and have a formatted table
There are, or were, various python libraries for creating PDFs.
This one seems to be mentioned a lot, ReportLab and the user guide has sections on tables and images.
This one seems to be mentioned a lot, ReportLab and the user guide has sections on tables and images.
ASKER
Yes I tried with reportlab but was unable to get it working. Also, I am using python 36
What couldn't you get working?
The library itself or creating the PDF you want.?
The library itself or creating the PDF you want.?
ASKER
the library
ASKER
sorry, i take that back - I couldnt figure out how to generate the table dynamically since the number of rows is always changing
ASKER
here is the code I have to generate the [[lists]
# -*- coding: utf-8 -*-
from money import Money
from docx import Document
from docx.shared import Inches
import json, csv, sys
from urllib.request import urlopen
import xlsxwriter
from datetime import datetime
import re
import decimal
document = Document('c:\\pythondev\\Invoice_Template.docx')
api_key = 'api key'
workbook = xlsxwriter.Workbook("c:\\PythonDev\\Output\\SuppData.xlsx")
worksheet = workbook.add_worksheet('SuppData')
url = 'api site' + api_key
json_obj = urlopen(url)
data = json.load(json_obj)
orgnamescount = 0
suppcount = 0
woncount = 0
S = []
orgz = []
orgnames = []
row = 'Org','DealId','Title','Address','Date Won','Old Rcv','New Rcv','Gain %'
S.append(row)
for item in data['data']:
orgItem = item['org_id']
org = orgItem['name']
if org not in orgnames:
orgadd = org,orgItem['address']
orgz.append(orgadd)
orgnames.append(org)
orgnamescount = orgnamescount + 1
dealid = item['id']
wontime = datetime.strptime(item['won_time'],"%Y-%m-%d %H:%M:%S")
fromdate = datetime.strptime('2018-06-08 00:00:00', "%Y-%m-%d %H:%M:%S")
if wontime < fromdate:
break
try:
title = item['title'].split('(')
title = title[0].rstrip()
except:
title = 'error'
try:
propaddress = item['0c239af05b93765738541bd35d46a2507cd862f5_formatted_address'].replace(', USA','')
except:
try:
propaddress = item['0c239af05b93765738541bd35d46a2507cd862f5'].replace(', USA','')
except:
print('------------ ERROR: ' + item)
propaddress = 'error'
propaddress = propaddress.replace(title, '').strip()
try:
newrcvval = float(decimal.Decimal(item['ebfcdf208f26bcfcc8f213dbe5f0cfe93d4421b5']))
except:
newrcvval = 'error'
try:
oldrcvval = float(decimal.Decimal(item['40fecd575124af5ff0d70175fc83203b2accbc3d']))
except:
oldrcvval = 'error'
try:
val1 = item['formatted_value']
except:
val1 = 'error'
try:
orgname = item['org_name']
except:
orgname = 'error'
try:
val = '{:.2%}'.format(1 - (oldrcvval / newrcvval))
except:
val = '0.00%'
try:
newrcvval2 = "${0:,.2f}".format(newrcvval)
except:
newrcvval2 = "$0.00"
try:
oldrcvval2 = "${:,.2f}".format(oldrcvval)
except:
oldrcvval2 = "$0.00"
row = org,dealid,title,propaddress,datetime.date(wontime),oldrcvval2,newrcvval2,val
print(row)
S.append(row)
suppcount = suppcount + 1
for row, line in enumerate(S):
for col, cell in enumerate(line):
worksheet.write(row, col, cell)
formatdict = {'num_format':'mm/dd/yy'}
fmt = workbook.add_format(formatdict)
worksheet.set_column('E:E', None, fmt)
workbook.close()
workbook = xlsxwriter.Workbook("c:\\PythonDev\\Output\\WonData.xlsx")
worksheet = workbook.add_worksheet('WonData')
url = 'url' + api_key
json_obj2 = urlopen(url)
data = json.load(json_obj2)
W = []
row = 'Org','ID','Title','Address','Type', 'Date','Amount'
W.append(row)
for item in data['data']:
try:
orgname = item['org_name']
if org not in orgnames:
orgnames.append(org)
if orgname not in [Company 1','Company 2','Company 3','Company 4','Company 5','Company 6'']:
amt = '$50.00'
else:
amt = '$40.00'
except:
orgname = 'error'
amt = '$00.00'
dealid = item['id']
pipeline = float(item['pipeline_id'])
pipeline2 = ''
if pipeline in [1]:
pipeline2 = 'Estimate'
elif pipeline in [2]:
pipeline2 = 'Supplement'
elif pipeline in [3]:
pipeline2 = 'Depreciation Release'
wontime = datetime.strptime(item['won_time'],"%Y-%m-%d %H:%M:%S")
fromdate = datetime.strptime('2018-05-01 00:00:00', "%Y-%m-%d %H:%M:%S")
if wontime < fromdate:
break
try:
title = item['title'].split('(')
title = title[0].rstrip()
except:
title = 'error'
try:
propaddress = item['0c239af05b93765738541bd35d46a2507cd862f5_formatted_address'].replace(', USA','')
except:
try:
propaddress = item['0c239af05b93765738541bd35d46a2507cd862f5'].replace(', USA','')
except:
print('------------ ERROR: ' + item)
propaddress = 'error'
propaddress = propaddress.replace(title, '').strip()
row = org,dealid,title,propaddress,pipeline2,datetime.date(wontime),amt
print(row)
W.append(row)
woncount = woncount + 1
for row, line in enumerate(W):
for col, cell in enumerate(line):
worksheet.write(row, col, cell)
formatdict = {'num_format':'mm/dd/yy'}
fmt = workbook.add_format(formatdict)
worksheet.set_column('F:F', None, fmt)
workbook.close()
url = 'url' + api_key
json_obj3 = urlopen(url)
data = json.load(json_obj3)
O = []
workbook = xlsxwriter.Workbook("c:\\PythonDev\\Output\\OrgData.xlsx")
worksheet = workbook.add_worksheet('OrgData')
for item in data['data']:
ocheck = item['name']
if any( ocheck in s for s in orgnames):
try:
phone = item['8347789818807f91476bbde9b589444c3bf665e0']
except:
phone = ''
try:
email = item['6eba5596981713615b6425b7c3dbcfa9fe0531a5']
except:
email = ''
try:
addr = item['address']
except:
addr = ''
row = ocheck,phone,email,addr
print(row)
O.append(row)
for row, line in enumerate(O):
for col, cell in enumerate(line):
worksheet.write(row, col, cell)
#print('Total Organizations = {}'.format(orgnamescount))
#print('Total Supplements = {}'.format(suppcount))
#print('Total Won = {}'.format(woncount))
#for org in enumerate(orgnames):
#print('Total Won for {} = '.format(W[1].count(org)))
#print('Total Supp for {} = '.format(S[1].count(org)))
#print('Total for {} = '.format(W[1].count(org) + S[1].count(org)))
# add table ------------------
table = document.add_table(rows=woncount, cols=6)
r = 0
for row,line in enumerate(W):
try:
c = 0
for col, ct in enumerate(line):
cell = table.cell(r, c)
try:
cell.text = ct
except:
cell.text = str(ct)
c = c + 1
r = r + 1
except:
break
document.save("c:\\pythondev\\testworddoc.docx")
I can't find anything indicating that ReportLab isn't available for Python 3.6 or reports of people having trouble installing it on that version of Python.
How did you try installing it?
Here's a link to a page that has a download for Python 3.6 - https://pypi.org/project/reportlab/#files
How did you try installing it?
Here's a link to a page that has a download for Python 3.6 - https://pypi.org/project/reportlab/#files
ASKER
yes sorry i mispoke - I was having difficulty creating the table and reverted to word as I thought it would be simpler
ASKER
Let me simplify - how could I populate a table like the example below, with the data from my list [[W] ?
#!/usr/local/bin/python
from reportlab.lib import colors
from reportlab.lib.pagesizes import A4, inch, landscape
from reportlab.platypus import SimpleDocTemplate, Table, TableStyle, Paragraph
from reportlab.lib.styles import getSampleStyleSheet
doc = SimpleDocTemplate("test_report_lab.pdf", pagesize=A4, rightMargin=30,leftMargin=30, topMargin=30,bottomMargin=18)
doc.pagesize = landscape(A4)
elements = []
data = [
["Letter", "Number", "Stuff", "Long stuff that should be wrapped"],
["A", "01", "ABCD", "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"],
["B", "02", "CDEF", "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB"],
["C", "03", "SDFSDF", "CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC"],
["D", "04", "SDFSDF", "DDDDDDDDDDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDDDDDDDDDDDDDDDD"],
["E", "05", "GHJGHJGHJ", "EEEEEEEEEEEEEE EEEEEEEEEEEEEEEEE EEEEEEEEEEEEEEEEEEEE"],
]
#TODO: Get this line right instead of just copying it from the docs
style = TableStyle([('ALIGN',(1,1),(-2,-2),'RIGHT'),
('TEXTCOLOR',(1,1),(-2,-2),colors.red),
('VALIGN',(0,0),(0,-1),'TOP'),
('TEXTCOLOR',(0,0),(0,-1),colors.blue),
('ALIGN',(0,-1),(-1,-1),'CENTER'),
('VALIGN',(0,-1),(-1,-1),'MIDDLE'),
('TEXTCOLOR',(0,-1),(-1,-1),colors.green),
('INNERGRID', (0,0), (-1,-1), 0.25, colors.black),
('BOX', (0,0), (-1,-1), 0.25, colors.black),
])
#Configure style and word wrap
s = getSampleStyleSheet()
s = s["BodyText"]
s.wordWrap = 'CJK'
data2 = [[Paragraph(cell, s) for cell in row] for row in data]
t=Table(data2)
t.setStyle(style)
#Send the data and build the file
elements.append(t)
doc.build(elements)
ASKER
Going to assume no one knows and make a macro i guess
@Brent what I (we) need is not a full working example.
It as an example, that is complete enough to reproduce the described behavior (for solving the speed problem) or an example file with sample values and a description (screen shot) of the expected result.
If you want to follow the docx path that would be a python script and a word template file, that will allow to reproduce the very slow table creation that you talk about.
If you want to follow the report lab path it would be a small code snippet, that populates W with some example data and a screen shot or a good description of the expected result (the resulting table)
It as an example, that is complete enough to reproduce the described behavior (for solving the speed problem) or an example file with sample values and a description (screen shot) of the expected result.
If you want to follow the docx path that would be a python script and a word template file, that will allow to reproduce the very slow table creation that you talk about.
If you want to follow the report lab path it would be a small code snippet, that populates W with some example data and a screen shot or a good description of the expected result (the resulting table)
ASKER
Okay, using the reportlab code I provided, I am just trying to use a list (lets say a list generated from the data I provided in the attachment) to populate a table. The number of columns would be 7 and the column titles would be 'Org','ID','Title','Addres s','Type', 'Date','Amount'. The number of lines would be variable (which is the core of my question)
ASKER
data2.xlsxOkay, so piggy backing on the above -- using reportlab -- attached is example data on sheet "data" -- which would be contained within the [list] and then sheet "output" shows the desired output -- which would be in pdf format
It seems, that basically you want to replace the contents of the variable data with the contents of W.
Below I declare a function, that takes a list, prefixes it with the title row and creates a table.
then I populate W (this has obviously to be replaced with your code generating the data)
if the data is already in an .xlsx file, then we 'just' had to write a reader extracting this data
and call the function
Below I declare a function, that takes a list, prefixes it with the title row and creates a table.
then I populate W (this has obviously to be replaced with your code generating the data)
if the data is already in an .xlsx file, then we 'just' had to write a reader extracting this data
and call the function
#!/usr/local/bin/python
from reportlab.lib import colors
from reportlab.lib.pagesizes import A4, inch, landscape
from reportlab.platypus import SimpleDocTemplate, Table, TableStyle, Paragraph
from reportlab.lib.styles import getSampleStyleSheet
def mk_report_from_W(W, report_name):
""" function taking a list and a file name and creating a table with predefined columns
"""
doc = SimpleDocTemplate("test_report_lab.pdf", pagesize=A4, rightMargin=30,
leftMargin=30, topMargin=30,bottomMargin=18)
doc.pagesize = landscape(A4)
elements = []
title = ['Org', 'ID', 'Title', 'Address', 'City', 'Date', 'Amount1', 'Amount2', 'Percentage']
data = [ title ] + W
#TODO: Get this line right instead of just copying it from the docs
style = TableStyle([('ALIGN',(1,1),(-2,-2),'RIGHT'),
('TEXTCOLOR',(1,1),(-2,-2),colors.red),
('VALIGN',(0,0),(0,-1),'TOP'),
('TEXTCOLOR',(0,0),(0,-1),colors.blue),
('ALIGN',(0,-1),(-1,-1),'CENTER'),
('VALIGN',(0,-1),(-1,-1),'MIDDLE'),
('TEXTCOLOR',(0,-1),(-1,-1),colors.green),
('INNERGRID', (0,0), (-1,-1), 0.25, colors.black),
('BOX', (0,0), (-1,-1), 0.25, colors.black),
])
#Configure style and word wrap
s = getSampleStyleSheet()
s = s["BodyText"]
s.wordWrap = 'CJK'
data2 = [[Paragraph(cell, s) for cell in row] for row in data]
t=Table(data2)
t.setStyle(style)
#Send the data and build the file
elements.append(t)
doc.build(elements)
# Next line has to be replaced with whatever code generates your W data
W = [ ['Company Name1', '4306', ' Name1', ' 123 Address1', 'City',
'6/11/2018', '$12,157.47', '$21,890.32', '44.46%'],
['Company Name2', '4553', ' Name2', ' 345 Address2', 'DeBary',
'6/11/2018', '$11,998.00', '$13,332.00', '10.08%'],
['Company Name3', '5470', ' Name3', ' 3125 Address3 Long Name', 'Orlando',
'6/11/2018', '$14,165.27', '$15,772.28', '10.19%'],
]
# Now call our function with the right data
mk_report_from_W(W, "test_report_lab.pdf")
ASKER
So the W part here is what I am confused about. I understand how to write it if I know what the data will be like in the example you provided, what I need help with is writing the script to produce that W variable there regardless of how many rows are in the list, or the info in the list, as it will constantly be changing. How do I get the list [W] into the format for the variable W that you have there?
ASKER
error I get when using - which makes sense...
File "c:\users\bgutt\appdata\local\programs\python\python36-32\lib\site-packages\reportlab\platypus\paragraph.py", line 65, in split
return [uword for uword in (_wsc_re_split(text) if delim is None and u'\xa0' in text else text.split(delim))]
AttributeError: 'int' object has no attribute 'split'
ASKER
Was thinking something like this.... but its giving me this error..
#Configure style and word wrap
s = getSampleStyleSheet()
s = s["BodyText"]
s.wordWrap = 'CJK'
data2 = [[Paragraph(cell, s) for cell in row] for row in data]
t=Table(data2)
t.setStyle(style)
#Send the data and build the file
elements.append(t)
doc.build(elements)
ii = '['
for ll,i in enumerate(W):
ii = ii,'['
d = i[1]
e = i[2]
f = i[3]
ii = ii,d,e,f,']',
ii = ii,']'
#print(ii)
mk_report_from_W(ii, "test_report_lab.pdf")
File "c:\PythonDev\testnew.py", line 223, in mk_report_from_W
data = [ title ] + ii
TypeError: can only concatenate list (not "tuple") to list
Brent
You could use pandas to read the data from the Excel file into a dataframe, add the appropriate headings and then convert to to a list.
That might look something like this - unable to test right now.:)
You could use pandas to read the data from the Excel file into a dataframe, add the appropriate headings and then convert to to a list.
That might look something like this - unable to test right now.:)
import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile
df = pd.read_excel('data2.xlsx', sheetname='data', header=0)
df.columns=['Org','ID','Title','Address','City','Date','Amount1','Amount2','Percentage']
W = df.values.tolist()
ASKER
I was trying to not have to output the data to excel. You can see in the code that the list is built based on JSON returns. I provided the excel sheet so that the data was shown before and after
for item in data['data']:
try:
orgname = item['org_name']
if org not in orgnames:
orgnames.append(org)
if orgname not in [Company 1','Company 2','Company 3','Company 4','Company 5','Company 6'']:
amt = '$50.00'
else:
amt = '$40.00'
except:
orgname = 'error'
amt = '$00.00'
dealid = item['id']
pipeline = float(item['pipeline_id'])
pipeline2 = ''
if pipeline in [1]:
pipeline2 = 'Estimate'
elif pipeline in [2]:
pipeline2 = 'Supplement'
elif pipeline in [3]:
pipeline2 = 'Depreciation Release'
wontime = datetime.strptime(item['won_time'],"%Y-%m-%d %H:%M:%S")
fromdate = datetime.strptime('2018-05-01 00:00:00', "%Y-%m-%d %H:%M:%S")
if wontime < fromdate:
break
try:
title = item['title'].split('(')
title = title[0].rstrip()
except:
title = 'error'
try:
propaddress = item['0c239af05b93765738541bd35d46a2507cd862f5_formatted_address'].replace(', USA','')
except:
try:
propaddress = item['0c239af05b93765738541bd35d46a2507cd862f5'].replace(', USA','')
except:
print('------------ ERROR: ' + item)
propaddress = 'error'
propaddress = propaddress.replace(title, '').strip()
row = org,dealid,title,propaddress,pipeline2,datetime.date(wontime),amt
print(row)
W.append(row)
woncount = woncount + 1
Brent
Sorry, think I forgot that part and assumed you had it nicely in Excel.
Does the code you've just posted create the list you want or is this the part you are actually having problems with?
Sorry, think I forgot that part and assumed you had it nicely in Excel.
Does the code you've just posted create the list you want or is this the part you are actually having problems with?
ASKER
The lists are being created without any problems, just trying to output to the pdfs -- I CAN output to excel first but obv would prefer to skip the step for the sake of efficiency - example data is that data from the excel workbook.
Brent
If you have the list(s) then it should be straightforward to output them to the PDF(s).
In fact as far as I can see gelonida posted a function that you can simply pass the list and the filename to create the pdf.
If you have the list(s) then it should be straightforward to output them to the PDF(s).
In fact as far as I can see gelonida posted a function that you can simply pass the list and the filename to create the pdf.
ASKER
see my response to that - i cannot just plug the list in.
these two..
Your Comment
by:Brent Guttmann
Business Analyst
Comment posted
4h ago
So the W part here is what I am confused about. I understand how to write it if I know what the data will be like in the example you provided, what I need help with is writing the script to produce that W variable there regardless of how many rows are in the list, or the info in the list, as it will constantly be changing. How do I get the list [W] into the format for the variable W that you have there?
Where are the Best and Assisted buttons?
Active today
Your Comment
by:Brent Guttmann
Business Analyst
Comment posted
3h ago
Comment Utility
error I get when using - which makes sense...
File "c:\users\bgutt\appdata\lo cal\progra ms\python\ python36-3 2\lib\site -packages\ reportlab\ platypus\p aragraph.p y", line 65, in split
return [uword for uword in (_wsc_re_split(text) if delim is None and u'\xa0' in text else text.split(delim))]
AttributeError: 'int' object has no attribute 'split'
these two..
Your Comment
by:Brent Guttmann
Business Analyst
Comment posted
4h ago
So the W part here is what I am confused about. I understand how to write it if I know what the data will be like in the example you provided, what I need help with is writing the script to produce that W variable there regardless of how many rows are in the list, or the info in the list, as it will constantly be changing. How do I get the list [W] into the format for the variable W that you have there?
Where are the Best and Assisted buttons?
Active today
Your Comment
by:Brent Guttmann
Business Analyst
Comment posted
3h ago
Comment Utility
error I get when using - which makes sense...
File "c:\users\bgutt\appdata\lo
return [uword for uword in (_wsc_re_split(text) if delim is None and u'\xa0' in text else text.split(delim))]
AttributeError: 'int' object has no attribute 'split'
OK so W is a list.
what we need to help you would be an example of W to get it's correct structure.
If W is a list and contains many lines you could even just dump the first two rows:
Now you could open the json file with a text editor and strip off any confidential information and post the result of the modified json.
what we need to help you would be an example of W to get it's correct structure.
import json
with open("dumped_w.json", "w") as fout:
json.dump(W, fout, indent=1)
If W is a list and contains many lines you could even just dump the first two rows:
import json
with open("dumped_w.json", "w") as fout:
json.dump(W[:2], fout, indent=1)
Now you could open the json file with a text editor and strip off any confidential information and post the result of the modified json.
ASKER
there is a text file above with an example of W. attached again here.
I dont understand why we are using this..
you can see below that I am only adding certain items to the list, ( W.append(row)) from the JSON (data)... so, this step, to output to the pdf, is beyond the JSON portion, I will be going through the list and for each "Company" will be selecting those and adding to the pdf...
for line in enumerate(W):
if some name = some name:
add this line to the table
data.txt
I dont understand why we are using this..
with open("dumped_w.json", "w") as fout:
you can see below that I am only adding certain items to the list, ( W.append(row)) from the JSON (data)... so, this step, to output to the pdf, is beyond the JSON portion, I will be going through the list and for each "Company" will be selecting those and adding to the pdf...
for item in data['data']:
try:
orgname = item['org_name']
if org not in orgnames:
orgnames.append(org)
if orgname not in [Company 1','Company 2','Company 3','Company 4','Company 5','Company 6'']:
amt = '$50.00'
else:
amt = '$40.00'
except:
orgname = 'error'
amt = '$00.00'
dealid = item['id']
pipeline = float(item['pipeline_id'])
pipeline2 = ''
if pipeline in [1]:
pipeline2 = 'Estimate'
elif pipeline in [2]:
pipeline2 = 'Supplement'
elif pipeline in [3]:
pipeline2 = 'Depreciation Release'
wontime = datetime.strptime(item['won_time'],"%Y-%m-%d %H:%M:%S")
fromdate = datetime.strptime('2018-05-01 00:00:00', "%Y-%m-%d %H:%M:%S")
if wontime < fromdate:
break
try:
title = item['title'].split('(')
title = title[0].rstrip()
except:
title = 'error'
try:
propaddress = item['0c239af05b93765738541bd35d46a2507cd862f5_formatted_address'].replace(', USA','')
except:
try:
propaddress = item['0c239af05b93765738541bd35d46a2507cd862f5'].replace(', USA','')
except:
print('------------ ERROR: ' + item)
propaddress = 'error'
propaddress = propaddress.replace(title, '').strip()
row = org,dealid,title,propaddress,pipeline2,datetime.date(wontime),amt
print(row)
W.append(row)
woncount = woncount + 1
But, for simplification, what I need is something likefor line in enumerate(W):
if some name = some name:
add this line to the table
data.txt
Well normally my code should work if W contains a list with the right number of columns.
So I assume I do not understand what W really contains and dumping W as json is a good way to show us what W contains
Not sharing the exact structure of W will make it complicated to help you.
Therefore I would like to see what W (the first few entries of W) really contain. (I'm interested in the structure, not the actual data)
A json dump shows me the structure and allows you to replace confidential text with a text editor before posting it here.
A json dump allows me further to load the data into python and try to generate a pdf.
you can do of course the filtering before creating the dump:
Following code will filter your data and dump the first two rows into a json file,
which you can manually anonymize before posting it here.
Having this json should speed us up to give you the right solution.
So I assume I do not understand what W really contains and dumping W as json is a good way to show us what W contains
Not sharing the exact structure of W will make it complicated to help you.
Therefore I would like to see what W (the first few entries of W) really contain. (I'm interested in the structure, not the actual data)
A json dump shows me the structure and allows you to replace confidential text with a text editor before posting it here.
A json dump allows me further to load the data into python and try to generate a pdf.
you can do of course the filtering before creating the dump:
Following code will filter your data and dump the first two rows into a json file,
which you can manually anonymize before posting it here.
filtered_W = []
for line_number, line in enumerate(W):
if some name = some name:
filtered_W.append(line)
import json
with open("dumped_w.json", "w") as fout:
json.dump(filtered_W[:2], fout, indent=1)
Having this json should speed us up to give you the right solution.
A line in your text file looks like:
further your row contains a datetime.date object.
You have to convert it into a string to make things work.
so you should convert each datetime object into a string
you can do this with .strftime('%m/%d/%Y')
In fact trying to dump as a json would have failed as datetime is not a base type.
('Company Name', 4306, 'Name1', '123 Address1, FL 34743', datetime.date(2018, 6, 11), '$12,157.47', '$21,890.32', '44.46%')
which contains 8 elements and not 9 as would be expected by['Org','ID','Title','Address','City','Date','Amount1','Amount2','Percentage']
(
'Company Name',
4306,
'Name1',
'123 Address1, FL 34743',
datetime.date(2018, 6, 11),
'$12,157.47',
'$21,890.32',
'44.46%'
)
Address and city should be two fields, but they are just one string.further your row contains a datetime.date object.
You have to convert it into a string to make things work.
so you should convert each datetime object into a string
you can do this with .strftime('%m/%d/%Y')
In fact trying to dump as a json would have failed as datetime is not a base type.
ASKER
Ok, here is the JSON
{"success":true,"data":[{"id":4216,"creator_user_id":{"id":3037790,"name":"Chris LastName","email":"chris@Domain.com","has_pic":true,"pic_hash":"bb2213d092ba012c307a974462f7cc0c","active_flag":true,"value":3037790},"user_id":{"id":2651625,"name":"Mike LastName","email":"mike@Domain.com","has_pic":false,"pic_hash":null,"active_flag":true,"value":2651625},"person_id":{"name":"Mike LastName","email":[{"label":"","value":"miked@email.com","primary":true}],"phone":[{"label":"work","value":"(888.134.4345)","primary":true}],"value":213},"org_id":{"name":"Company of America","people_count":95,"owner_id":2651625,"address":null,"cc_email":"email@email.com","value":74},"stage_id":6,"title":"John Doe","value":8032.5,"currency":"USD","add_time":"2018-02-05 20:15:26","update_time":"2018-06-13 17:36:57","stage_change_time":"2018-06-13 17:35:56","active":false,"deleted":false,"status":"won","probability":null,"next_activity_date":null,"next_activity_time":null,"next_activity_id":null,"last_activity_id":28347,"last_activity_date":"2018-06-13","lost_reason":null,"visible_to":"3","close_time":"2018-06-13 17:36:56","pipeline_id":1,"won_time":"2018-06-13 17:36:56","first_won_time":"2018-06-13 17:36:56","lost_time":null,"products_count":0,"files_count":7,"notes_count":2,"followers_count":3,"email_messages_count":5,"activities_count":9,"done_activities_count":9,"undone_activities_count":0,"reference_activities_count":0,"participants_count":1,"expected_close_date":"2018-02-26","last_incoming_mail_time":"2018-06-13 17:36:30","last_outgoing_mail_time":"2018-03-08 20:09:39","aece2d44042af2bb43ed03c0d9b0f3993f2f6c02":{"name":"Some Company Name","people_count":44,"owner_id":2440666,"address":"","cc_email":"email@email.com","value":39},"394f2ca7030899fd835a8673b223ca7d09ffbb04":"HO0517208384","7c5b33f2ba3ad0f9570f0f41b7bee69fd4cf4507":{"name":"John Doe","email":[{"value":"","primary":true}],"phone":[{"label":"work","value":"954-400-1803","primary":true}],"value":2958},"0c239af05b93765738541bd35d46a2507cd862f5":"1234 Address Drive, Orlando, FL 32836, United States","0c239af05b93765738541bd35d46a2507cd862f5_subpremise":null,"0c239af05b93765738541bd35d46a2507cd862f5_street_number":"1234","0c239af05b93765738541bd35d46a2507cd862f5_route":"Address Drive","0c239af05b93765738541bd35d46a2507cd862f5_sublocality":null,"0c239af05b93765738541bd35d46a2507cd862f5_locality":"Orlando","0c239af05b93765738541bd35d46a2507cd862f5_admin_area_level_1":"Florida","0c239af05b93765738541bd35d46a2507cd862f5_admin_area_level_2":"Orange County","0c239af05b93765738541bd35d46a2507cd862f5_country":"United States","0c239af05b93765738541bd35d46a2507cd862f5_postal_code":"32836","0c239af05b93765738541bd35d46a2507cd862f5_formatted_address":"1234 Address Dr, Orlando, FL 32836, USA","106abe665ea7141956a4976afe4283156a5c8155":{"name":"David Dole","email":[{"label":"work","value":"david@solutionscss.com","primary":true}],"phone":[{"label":"work","value":"(800) 531-8722","primary":true},{"label":"work","value":"29476","primary":false}],"value":321},"cc5efe8284a306809c77972f55569d1fe68f4174":"2","d904c121c625b42c7facd4aabd613499aa81cc6c":501.63,"d904c121c625b42c7facd4aabd613499aa81cc6c_currency":"USD","ebfcdf208f26bcfcc8f213dbe5f0cfe93d4421b5":35064.68,"ebfcdf208f26bcfcc8f213dbe5f0cfe93d4421b5_currency":"USD","40fecd575124af5ff0d70175fc83203b2accbc3d":27032.18,"40fecd575124af5ff0d70175fc83203b2accbc3d_currency":"USD","8c553d66bdd8d0c890bc54005fcbedbfebf15ed0":null,"stage_order_nr":6,"person_name":"Jane Doe","org_name":"Company Name of America","next_activity_subject":null,"next_activity_type":null,"next_activity_duration":null,"next_activity_note":null,"formatted_value":"$8,032.50","weighted_value":8032.5,"formatted_weighted_value":"$8,032.50","weighted_value_currency":"USD","rotten_time":null,"owner_name":"Mike LastName","cc_email":"name+deal1234@domain.com","org_hidden":false,"person_hidden":false}],"additional_data":{"pagination":{"start":0,"limit":1,"more_items_in_collection":true,"next_start":1}},"related_objects":{"user":{"3037790":{"id":3037790,"name":"Chris LastName","email":"chris@domain.com","has_pic":true,"pic_hash":"bb2213d092ba012c307a974462f7cc0c","active_flag":true},"2651625":{"id":2651625,"name":"Mike LastName","email":"mike@domain.com","has_pic":false,"pic_hash":null,"active_flag":true}},"organization":{"74":{"id":74,"name":"Some Company of America","people_count":95,"owner_id":2651625,"address":null,"cc_email":"name@domain.com"},"39":{"id":39,"name":"Some Company Name","people_count":44,"owner_id":2440666,"address":"","cc_email":"email@domain.com"}},"person":{"213":{"id":213,"name":"Mike LastName","email":[{"label":"","value":"miked@domain.com","primary":true}],"phone":[{"label":"work","value":"(999.123.4567)","primary":true}]},"2958":{"id":2958,"name":"Jane Doe","email":[{"label":"work","value":"david@domain.com","primary":true}],"phone":[{"label":"work","value":"(800) 123-4567","primary":true},{"label":"work","value":"29476","primary":false}]}}}}
you did not use the command that I suggested to create the json file.
Otherwise you would have a multi line file, that is easy to read by humans.
Is this really the json dump of your variable W or the filtered version of it ???
This is not even a list
Otherwise you would have a multi line file, that is easy to read by humans.
Is this really the json dump of your variable W or the filtered version of it ???
This is not even a list
ASKER
Yeah, its not a list, I create the list from retrieving the information I need from json responses (there are over 5000, that are in the format I provided. The text file I provided is what would be output if i print(the list).
I have attached a formatted version but just fyi, many sites and editors have this... here is a site
https://jsonformatter.curiousconcept.com/
I dont think we need to go down the JSON path here. Lets say we have a list.
1,ITEM1,ITEM2,ITEM3,ITEM4
2,ITEM5,ITEM6,ITEM7,ITEM8
3,ITEM6,ITEM7,ITEM8,ITEM9
How do I create the 'W' variable to be passed to table if the number of lines (3) in the list varies? That is the only question I need help with
I have attached a formatted version but just fyi, many sites and editors have this... here is a site
https://jsonformatter.curiousconcept.com/
I dont think we need to go down the JSON path here. Lets say we have a list.
1,ITEM1,ITEM2,ITEM3,ITEM4
2,ITEM5,ITEM6,ITEM7,ITEM8
3,ITEM6,ITEM7,ITEM8,ITEM9
How do I create the 'W' variable to be passed to table if the number of lines (3) in the list varies? That is the only question I need help with
Brent
I think I explained myself badly and you misinterpreted my request.
I just wanted to have a JSON dump of two rows of your already assembled data that is if I understand correctly in variable W.
I'm not at all interested in the json, that you used to assemble your data. As you stated: this is of no interested for the problem.
So the first two rows of W (or the filtered version of it should be fine)
I answered this question already:
you just concatenate your W with the header line and pass it to Table().
But if this does not work out for you we have to find the cause and thus I wanted to have a json dump of the W (or it's first two rows)
to analyze the problem.
Depending on the data stored in W, a copy / paste of the output of
Did you read my post, where I indicated¸ that the number of columns is not correct and that your problem is very probably the fact, that you are using a datetime.date abject and that you should convert it into a string?
So here the code that should work if W has the correct contents. (same as in my previous post)
I think I explained myself badly and you misinterpreted my request.
I just wanted to have a JSON dump of two rows of your already assembled data that is if I understand correctly in variable W.
I'm not at all interested in the json, that you used to assemble your data. As you stated: this is of no interested for the problem.
So the first two rows of W (or the filtered version of it should be fine)
How do I create the 'W' variable to be passed to table if the number of lines (3) in the list varies? That is the only question I need help with
I answered this question already:
you just concatenate your W with the header line and pass it to Table().
But if this does not work out for you we have to find the cause and thus I wanted to have a json dump of the W (or it's first two rows)
to analyze the problem.
Depending on the data stored in W, a copy / paste of the output of
print(repr(W[:2])
might do.Did you read my post, where I indicated¸ that the number of columns is not correct and that your problem is very probably the fact, that you are using a datetime.date abject and that you should convert it into a string?
So here the code that should work if W has the correct contents. (same as in my previous post)
#!/usr/local/bin/python
from reportlab.lib import colors
from reportlab.lib.pagesizes import A4, inch, landscape
from reportlab.platypus import SimpleDocTemplate, Table, TableStyle, Paragraph
from reportlab.lib.styles import getSampleStyleSheet
def mk_report_from_W(W, report_name):
""" function taking a list and a file name and creating a table with predefined columns
"""
doc = SimpleDocTemplate("test_report_lab.pdf", pagesize=A4, rightMargin=30,
leftMargin=30, topMargin=30,bottomMargin=18)
doc.pagesize = landscape(A4)
elements = []
title = ['Org', 'ID', 'Title', 'Address', 'City', 'Date', 'Amount1', 'Amount2', 'Percentage']
data = [ title ] + W
#TODO: Get this line right instead of just copying it from the docs
style = TableStyle([('ALIGN',(1,1),(-2,-2),'RIGHT'),
('TEXTCOLOR',(1,1),(-2,-2),colors.red),
('VALIGN',(0,0),(0,-1),'TOP'),
('TEXTCOLOR',(0,0),(0,-1),colors.blue),
('ALIGN',(0,-1),(-1,-1),'CENTER'),
('VALIGN',(0,-1),(-1,-1),'MIDDLE'),
('TEXTCOLOR',(0,-1),(-1,-1),colors.green),
('INNERGRID', (0,0), (-1,-1), 0.25, colors.black),
('BOX', (0,0), (-1,-1), 0.25, colors.black),
])
#Configure style and word wrap
s = getSampleStyleSheet()
s = s["BodyText"]
s.wordWrap = 'CJK'
data2 = [[Paragraph(cell, s) for cell in row] for row in data]
t=Table(data2)
t.setStyle(style)
#Send the data and build the file
elements.append(t)
doc.build(elements)
# Next line has to be replaced with whatever code generates your W data
W = [ ['Company Name1', '4306', ' Name1', ' 123 Address1', 'City',
'6/11/2018', '$12,157.47', '$21,890.32', '44.46%'],
['Company Name2', '4553', ' Name2', ' 345 Address2', 'DeBary',
'6/11/2018', '$11,998.00', '$13,332.00', '10.08%'],
['Company Name3', '5470', ' Name3', ' 3125 Address3 Long Name', 'Orlando',
'6/11/2018', '$14,165.27', '$15,772.28', '10.19%'],
]
# Now call our function with the right data
mk_report_from_W(W, "test_report_lab.pdf")
I try to give you another minimalistic code snippet that show's you how you can create a table.
Every element in the list must be a string if not, the code will fail.
So any datetime.date object must be converted explicitely into a string.
I created now a more robust version, that converts every cell into a string and should therefore not fail if you
pass it a date time object, but the result will not look as you want except you add custom code to convert date objects to your desired format.
This code is fully operational, so please run it and look at the created pdfs
Every element in the list must be a string if not, the code will fail.
So any datetime.date object must be converted explicitely into a string.
I created now a more robust version, that converts every cell into a string and should therefore not fail if you
pass it a date time object, but the result will not look as you want except you add custom code to convert date objects to your desired format.
#!/usr/local/bin/python
import datetime
from reportlab.lib import colors
from reportlab.lib.pagesizes import A4, inch, landscape
from reportlab.platypus import SimpleDocTemplate, Table, TableStyle, Paragraph
from reportlab.lib.styles import getSampleStyleSheet
def mk_report(fname, title_row, data):
""" function taking a title_row and a data row and displays them in a table
as a robustness measure every cell will be converted to a string
the result will often not be what we want if we pass non string objects.
"""
doc = SimpleDocTemplate(fname, pagesize=A4, rightMargin=30,
leftMargin=30, topMargin=30,bottomMargin=18)
doc.pagesize = landscape(A4)
elements = []
# concatenate title row and data rows
full_data = [ title_row ] + data
#Configure style and word wrap
s = getSampleStyleSheet()
s = s["BodyText"]
s.wordWrap = 'CJK'
data2 = [[Paragraph(str(cell), s) for cell in row] for row in data]
t=Table(data2)
#Send the data and build the file
elements.append(t)
doc.build(elements)
title = ['Name', 'Age']
W = [
('Mary', 40),
('Peter', 20),
('Tom', 60),
]
mk_report('test1.pdf', title, W)
title = ['Name', 'birthday']
W = [
('Mary', datetime.date(2018-40,1,1)),
('Peter', datetime.date(2018-20,2,10)),
('Tom', datetime.date(2018-60,3,20)),
]
mk_report('test2.pdf', title, W)
title = ['Name', 'birthday']
W = [
('Mary', datetime.date(2018-40,1,1).strftime('%m/%d/%Y')),
('Peter', datetime.date(2018-20,2,10).strftime('%m/%d/%Y')),
('Tom', datetime.date(2018-60,3,20).strftime('%m/%d/%Y')),
]
mk_report('test3.pdf', title, W)
This code is fully operational, so please run it and look at the created pdfs
ASKER
Here is the output for the print
[('Org', 'ID', 'Title', 'Address', 'Type', 'Date', 'Amount'), ('Company Name', 6116, 'John Doe', '1234 Address Cir N, Jacksonville, FL 32246', 'Supplement', '06/14/2018', '$50.00')]
[('Org', 'ID', 'Title', 'Address', 'Type', 'Date', 'Amount'), ('Company Name', 6116, 'John Doe', '1234 Address Cir N, Jacksonville, FL 32246', 'Supplement', '06/14/2018', '$50.00')]
ASKER
I have a header row included, which can be removed.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Yes, that worked. Sorry, I probably could have been a bit more clear there. I appreciate your help!
Enjoy,
The most important thing is, that things are working now.
The most important thing is, that things are working now.
ASKER
Thanks! One last q if you have a sec - off topic
I am trying to sort the W list alphabetically accoriding to the company name
None of these are working though...
What am i missing here? If you dont see the answer right off the bat, no worries - just thought it would be quick
I am trying to sort the W list alphabetically accoriding to the company name
None of these are working though...
sort(W)
sorted(W, key=lambda company: company[2])
sorted(W, key=itemgetter(0))
What am i missing here? If you dont see the answer right off the bat, no worries - just thought it would be quick
either
I used row instead of company as what you pass to the lambda function is a row, but you could use any other name
what's important is, that
sort sorts a list in place, whereas sorted creates a new sorted copy, (which had then to reassign)
so sort uses less memory, whereas sorted allows you to keep the original list unmodified
you cold also write:
newW = sorted(W, key=lambda row: row[2])
orW.sort(key=lambda row: row[2])
should do the jobI used row instead of company as what you pass to the lambda function is a row, but you could use any other name
what's important is, that
sort sorts a list in place, whereas sorted creates a new sorted copy, (which had then to reassign)
so sort uses less memory, whereas sorted allows you to keep the original list unmodified
you cold also write:
W = sorted(W, key=lambda row: row[2])
but this consumes for a short time a little more memory than sort
ASKER
Got it - will add it in tomorrow - appreciate the additional help
woncount and W for example are not declared.
Could you please post the smallest working code snippet (and ideally also a small template file), so that we can try to reproduce?
I personally never used the docx module but would like to try on my machine to see if the slow down is related to the code / the template / the OS or your the machine.
The slowness might also come from the surrounding code.
So it's important to paste a snippet and see, that the issue also shows up in your small snippet.