Start Free Trial

asked on

PYTHON create word table performing slow

Hello, I am trying to populate a table in MSWord from Python - I am currently using the below but it is EXTREMELY slow... ive got liek 500 lines to output and its take about 1 line every 3 seconds. Can anyone help with what I am doing wrong?

What I am doing is taking a list I have created from JSON response and populating the a word template with the info. I need to use the table columns because I need the column widths to remain the same --> then I will be printing to pdf

from docx import Document
from docx.shared import Inches
document = Document('c:\\pythondev\\Invoice_Template.docx')
table = document.add_table(rows=woncount, cols=6)
r = 0

for row,line in enumerate(W):
        try:
                c = 0
                for col, ct in enumerate(line):
                        cell = table.cell(r, c)
                        try:
                                cell.text = ct
                        except:
                                cell.text = str(ct)
                        c = c + 1
                r = r + 1
        except:
                break

document.save("c:\\pythondev\\testworddoc.docx")

Open in new window

your code doesn't seem to be complete.

woncount and W for example are not declared.

Could you please post the smallest working code snippet (and ideally also a small template file), so that we can try to reproduce?

I personally never used the docx module but would like to try on my machine to see if the slow down is related to the code / the template / the OS or your the machine.

The slowness might also come from the surrounding code.
So it's important to paste a snippet and see, that the issue also shows up in your small snippet.

ASKER

Okay, before I go and change all of the data (which is confidential) to try and give you a full working example -- do you think its possible for you to give me an example of how to print a [list] to columns within a word template? (I need to use the template because I have an image as a header, footer, and page numbers)

this may be the quickest way to resolve this -- if not, I can go ahead and figure something out to give you a working example, without the sensitive info

the columns would be coming from the [list] which is in the format below... i have attached an example of 3

row = 'Org','ID','Title','Address','Type', 'Date','Amount'
data.txt

Is there an existing table in the template for the list/data?

ASKER

Nope, I thought it would be easier to add it each time

ASKER

I am also open to other ways to create the pdf output -- so long as I can add images for the header and have a formatted table

There are, or were, various python libraries for creating PDFs.

This one seems to be mentioned a lot, ReportLab and the user guide has sections on tables and images.

ASKER

Yes I tried with reportlab but was unable to get it working. Also, I am using python 36

What couldn't you get working?

The library itself or creating the PDF you want.?

ASKER

the library

ASKER

sorry, i take that back - I couldnt figure out how to generate the table dynamically since the number of rows is always changing

ASKER

here is the code I have to generate the [[lists]

# -*- coding: utf-8 -*-
from money import Money
from docx import Document
from docx.shared import Inches
import json, csv, sys
from urllib.request import urlopen
import xlsxwriter
from datetime import datetime
import re
import decimal

document = Document('c:\\pythondev\\Invoice_Template.docx')

api_key = 'api key'
workbook = xlsxwriter.Workbook("c:\\PythonDev\\Output\\SuppData.xlsx")
worksheet = workbook.add_worksheet('SuppData')
url = 'api site' + api_key
json_obj = urlopen(url)
data = json.load(json_obj)
orgnamescount = 0
suppcount = 0
woncount = 0
S = []
orgz = []
orgnames = []
row = 'Org','DealId','Title','Address','Date Won','Old Rcv','New Rcv','Gain %'
S.append(row)

for item in data['data']:

        orgItem = item['org_id']
        org = orgItem['name']

        if org not in orgnames:
                orgadd = org,orgItem['address']
                orgz.append(orgadd)
                orgnames.append(org)
                orgnamescount = orgnamescount + 1
        dealid = item['id']
        wontime = datetime.strptime(item['won_time'],"%Y-%m-%d %H:%M:%S")
        fromdate = datetime.strptime('2018-06-08 00:00:00', "%Y-%m-%d %H:%M:%S")

        if wontime < fromdate:
                break
        try:        
                title = item['title'].split('(')     
                title = title[0].rstrip()
        except: 
                title = 'error'
        try:
                propaddress = item['0c239af05b93765738541bd35d46a2507cd862f5_formatted_address'].replace(', USA','')
        except:
                try:
                        propaddress = item['0c239af05b93765738541bd35d46a2507cd862f5'].replace(', USA','')

                except:
                        print('------------ ERROR: ' + item)
                        propaddress = 'error'

        propaddress = propaddress.replace(title, '').strip()

        try:
                newrcvval = float(decimal.Decimal(item['ebfcdf208f26bcfcc8f213dbe5f0cfe93d4421b5']))
        except:
                newrcvval = 'error'
        try:        
                oldrcvval =  float(decimal.Decimal(item['40fecd575124af5ff0d70175fc83203b2accbc3d']))
        except: 
                oldrcvval = 'error' 

        try:        
                val1 = item['formatted_value']
        except: 
                val1 = 'error'        
        try:        
                orgname = item['org_name']
        except: 
                orgname = 'error'
        try:
                val = '{:.2%}'.format(1 - (oldrcvval / newrcvval))
        except:
                val = '0.00%'
        try:
                newrcvval2 = "${0:,.2f}".format(newrcvval)
        except:
                newrcvval2 = "$0.00"
        try:
                oldrcvval2 = "${:,.2f}".format(oldrcvval)
        except:
                oldrcvval2 = "$0.00"

        row = org,dealid,title,propaddress,datetime.date(wontime),oldrcvval2,newrcvval2,val

        print(row)
        S.append(row)
        suppcount = suppcount + 1

for row, line in enumerate(S):
    for col, cell in enumerate(line):
        worksheet.write(row, col, cell)

formatdict = {'num_format':'mm/dd/yy'}
fmt = workbook.add_format(formatdict)
worksheet.set_column('E:E', None, fmt)

workbook.close()

workbook = xlsxwriter.Workbook("c:\\PythonDev\\Output\\WonData.xlsx")
worksheet = workbook.add_worksheet('WonData')
url = 'url' + api_key
json_obj2 = urlopen(url)
data = json.load(json_obj2)
W = []

row = 'Org','ID','Title','Address','Type', 'Date','Amount'
W.append(row)
                        
        
for item in data['data']:
        try:        
            orgname = item['org_name']
            if org not in orgnames:
                orgnames.append(org)
            if orgname not in [Company 1','Company 2','Company 3','Company 4','Company 5','Company 6'']:
                amt = '$50.00'
            else:
                amt = '$40.00'
        except:
            orgname = 'error'
            amt = '$00.00'
            
        dealid = item['id']
        pipeline = float(item['pipeline_id'])
        pipeline2 = ''
        if pipeline in [1]:
                pipeline2 = 'Estimate'
        elif pipeline in [2]:
                pipeline2 = 'Supplement'
        elif pipeline in [3]:
                pipeline2 = 'Depreciation Release'
        wontime = datetime.strptime(item['won_time'],"%Y-%m-%d %H:%M:%S")
        fromdate = datetime.strptime('2018-05-01 00:00:00', "%Y-%m-%d %H:%M:%S")

        if wontime < fromdate:
                break
        try:        
                title = item['title'].split('(')     
                title = title[0].rstrip()
        except: 
                title = 'error'
        try:
                propaddress = item['0c239af05b93765738541bd35d46a2507cd862f5_formatted_address'].replace(', USA','')
        except:
                try:
                        propaddress = item['0c239af05b93765738541bd35d46a2507cd862f5'].replace(', USA','')

                except:
                        print('------------ ERROR: ' + item)
                        propaddress = 'error'

        propaddress = propaddress.replace(title, '').strip()

        row = org,dealid,title,propaddress,pipeline2,datetime.date(wontime),amt

        print(row)
        W.append(row)
        woncount = woncount + 1

for row, line in enumerate(W):
        for col, cell in enumerate(line):
                worksheet.write(row, col, cell)

formatdict = {'num_format':'mm/dd/yy'}
fmt = workbook.add_format(formatdict)
worksheet.set_column('F:F', None, fmt)

workbook.close()


url = 'url' + api_key
json_obj3 = urlopen(url)
data = json.load(json_obj3)
O = []
workbook = xlsxwriter.Workbook("c:\\PythonDev\\Output\\OrgData.xlsx")
worksheet = workbook.add_worksheet('OrgData')

for item in data['data']:
    ocheck = item['name']
    if any( ocheck in s for s in orgnames): 
        try:
            phone = item['8347789818807f91476bbde9b589444c3bf665e0']
        except:
            phone = ''
        try:
            email = item['6eba5596981713615b6425b7c3dbcfa9fe0531a5']
        except:
            email = ''
        try:
            addr = item['address']
        except:
            addr = ''
        row = ocheck,phone,email,addr
        print(row)
        O.append(row)

for row, line in enumerate(O):
        for col, cell in enumerate(line):
                worksheet.write(row, col, cell)

#print('Total Organizations = {}'.format(orgnamescount))
#print('Total Supplements = {}'.format(suppcount))
#print('Total Won = {}'.format(woncount))

#for org in enumerate(orgnames):
        #print('Total Won for {} = '.format(W[1].count(org)))
        #print('Total Supp for {} = '.format(S[1].count(org)))
        #print('Total for {} = '.format(W[1].count(org) + S[1].count(org)))

# add table ------------------
table = document.add_table(rows=woncount, cols=6)
r = 0

for row,line in enumerate(W):
        try:
                c = 0
                for col, ct in enumerate(line):
                        cell = table.cell(r, c)
                        try:
                                cell.text = ct
                        except:
                                cell.text = str(ct)
                        c = c + 1
                r = r + 1
        except:
                break

document.save("c:\\pythondev\\testworddoc.docx")

Open in new window

I can't find anything indicating that ReportLab isn't available for Python 3.6 or reports of people having trouble installing it on that version of Python.

How did you try installing it?

Here's a link to a page that has a download for Python 3.6 - https://pypi.org/project/reportlab/#files

ASKER

yes sorry i mispoke - I was having difficulty creating the table and reverted to word as I thought it would be simpler

ASKER

Let me simplify - how could I populate a table like the example below, with the data from my list [[W] ?

#!/usr/local/bin/python
 
from reportlab.lib import colors
from reportlab.lib.pagesizes import A4, inch, landscape
from reportlab.platypus import SimpleDocTemplate, Table, TableStyle, Paragraph
from reportlab.lib.styles import getSampleStyleSheet
 
doc = SimpleDocTemplate("test_report_lab.pdf", pagesize=A4, rightMargin=30,leftMargin=30, topMargin=30,bottomMargin=18)
doc.pagesize = landscape(A4)
elements = []
 
data = [
["Letter", "Number", "Stuff", "Long stuff that should be wrapped"],
["A", "01", "ABCD", "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"],
["B", "02", "CDEF", "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB"],
["C", "03", "SDFSDF", "CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC"],
["D", "04", "SDFSDF", "DDDDDDDDDDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDDDDDDDDDDDDDDDD"],
["E", "05", "GHJGHJGHJ", "EEEEEEEEEEEEEE EEEEEEEEEEEEEEEEE EEEEEEEEEEEEEEEEEEEE"],
]
 
#TODO: Get this line right instead of just copying it from the docs
style = TableStyle([('ALIGN',(1,1),(-2,-2),'RIGHT'),
                       ('TEXTCOLOR',(1,1),(-2,-2),colors.red),
                       ('VALIGN',(0,0),(0,-1),'TOP'),
                       ('TEXTCOLOR',(0,0),(0,-1),colors.blue),
                       ('ALIGN',(0,-1),(-1,-1),'CENTER'),
                       ('VALIGN',(0,-1),(-1,-1),'MIDDLE'),
                       ('TEXTCOLOR',(0,-1),(-1,-1),colors.green),
                       ('INNERGRID', (0,0), (-1,-1), 0.25, colors.black),
                       ('BOX', (0,0), (-1,-1), 0.25, colors.black),
                       ])
 
#Configure style and word wrap
s = getSampleStyleSheet()
s = s["BodyText"]
s.wordWrap = 'CJK'
data2 = [[Paragraph(cell, s) for cell in row] for row in data]
t=Table(data2)
t.setStyle(style)
 
#Send the data and build the file
elements.append(t)
doc.build(elements)

Open in new window

ASKER

Going to assume no one knows and make a macro i guess

@Brent what I (we) need is not a full working example.

It as an example, that is complete enough to reproduce the described behavior (for solving the speed problem) or an example file with sample values and a description (screen shot) of the expected result.

If you want to follow the docx path that would be a python script and a word template file, that will allow to reproduce the very slow table creation that you talk about.

If you want to follow the report lab path it would be a small code snippet, that populates W with some example data and a screen shot or a good description of the expected result (the resulting table)

ASKER

Okay, using the reportlab code I provided, I am just trying to use a list (lets say a list generated from the data I provided in the attachment) to populate a table. The number of columns would be 7 and the column titles would be 'Org','ID','Title','Address','Type', 'Date','Amount'. The number of lines would be variable (which is the core of my question)

ASKER

data2.xlsxOkay, so piggy backing on the above -- using reportlab -- attached is example data on sheet "data" -- which would be contained within the [list] and then sheet "output" shows the desired output -- which would be in pdf format

It seems, that basically you want to replace the contents of the variable data with the contents of W.

Below I declare a function, that takes a list, prefixes it with the title row and creates a table.

then I populate W (this has obviously to be replaced with your code generating the data)
if the data is already in an .xlsx file, then we 'just' had to write a reader extracting this data

and call the function

#!/usr/local/bin/python
 
from reportlab.lib import colors
from reportlab.lib.pagesizes import A4, inch, landscape
from reportlab.platypus import SimpleDocTemplate, Table, TableStyle, Paragraph
from reportlab.lib.styles import getSampleStyleSheet
 
def mk_report_from_W(W, report_name):
    """ function taking a list and a file name and creating a table with predefined columns
    """
    doc = SimpleDocTemplate("test_report_lab.pdf", pagesize=A4, rightMargin=30,
                            leftMargin=30, topMargin=30,bottomMargin=18)
    doc.pagesize = landscape(A4)
    elements = []
     
    title = ['Org', 'ID', 'Title', 'Address', 'City', 'Date', 'Amount1', 'Amount2', 'Percentage']
    
    
    data = [ title ] + W
     
    #TODO: Get this line right instead of just copying it from the docs
    style = TableStyle([('ALIGN',(1,1),(-2,-2),'RIGHT'),
                           ('TEXTCOLOR',(1,1),(-2,-2),colors.red),
                           ('VALIGN',(0,0),(0,-1),'TOP'),
                           ('TEXTCOLOR',(0,0),(0,-1),colors.blue),
                           ('ALIGN',(0,-1),(-1,-1),'CENTER'),
                           ('VALIGN',(0,-1),(-1,-1),'MIDDLE'),
                           ('TEXTCOLOR',(0,-1),(-1,-1),colors.green),
                           ('INNERGRID', (0,0), (-1,-1), 0.25, colors.black),
                           ('BOX', (0,0), (-1,-1), 0.25, colors.black),
                           ])
     
    #Configure style and word wrap
    s = getSampleStyleSheet()
    s = s["BodyText"]
    s.wordWrap = 'CJK'
    data2 = [[Paragraph(cell, s) for cell in row] for row in data]
    t=Table(data2)
    t.setStyle(style)
     
    #Send the data and build the file
    elements.append(t)
    doc.build(elements)
# Next line has to be replaced with whatever code generates your W data
W = [ ['Company Name1', '4306', ' Name1', ' 123 Address1', 'City', 
        '6/11/2018', '$12,157.47', '$21,890.32', '44.46%'], 
    ['Company Name2', '4553', ' Name2', ' 345 Address2', 'DeBary',
        '6/11/2018', '$11,998.00', '$13,332.00', '10.08%'],
    ['Company Name3', '5470', ' Name3', ' 3125 Address3 Long Name', 'Orlando',
          '6/11/2018', '$14,165.27', '$15,772.28', '10.19%'],
    ]

# Now call our function with the right data
mk_report_from_W(W, "test_report_lab.pdf")

Open in new window

ASKER

So the W part here is what I am confused about. I understand how to write it if I know what the data will be like in the example you provided, what I need help with is writing the script to produce that W variable there regardless of how many rows are in the list, or the info in the list, as it will constantly be changing. How do I get the list [W] into the format for the variable W that you have there?

ASKER

error I get when using - which makes sense...

  File "c:\users\bgutt\appdata\local\programs\python\python36-32\lib\site-packages\reportlab\platypus\paragraph.py", line 65, in split
    return [uword for uword in (_wsc_re_split(text) if delim is None and u'\xa0' in text else text.split(delim))]
AttributeError: 'int' object has no attribute 'split'

Open in new window

ASKER

Was thinking something like this.... but its giving me this error..

    #Configure style and word wrap
    s = getSampleStyleSheet()
    s = s["BodyText"]
    s.wordWrap = 'CJK'
    data2 = [[Paragraph(cell, s) for cell in row] for row in data]
    t=Table(data2)
    t.setStyle(style)
     
    #Send the data and build the file
    elements.append(t)
    doc.build(elements)
ii = '['
for ll,i in enumerate(W):
        ii = ii,'['
        d = i[1]
        e = i[2]
        f = i[3]
        ii = ii,d,e,f,']',
ii = ii,']'
#print(ii)

mk_report_from_W(ii, "test_report_lab.pdf")

Open in new window

  File "c:\PythonDev\testnew.py", line 223, in mk_report_from_W
    data = [ title ] + ii
TypeError: can only concatenate list (not "tuple") to list

Open in new window

Brent

You could use pandas to read the data from the Excel file into a dataframe, add the appropriate headings and then convert to to a list.

That might look something like this - unable to test right now.:)

import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile
 
df = pd.read_excel('data2.xlsx', sheetname='data', header=0)

df.columns=['Org','ID','Title','Address','City','Date','Amount1','Amount2','Percentage']

W = df.values.tolist()

Open in new window

ASKER

I was trying to not have to output the data to excel. You can see in the code that the list is built based on JSON returns. I provided the excel sheet so that the data was shown before and after

for item in data['data']:
        try:        
            orgname = item['org_name']
            if org not in orgnames:
                orgnames.append(org)
            if orgname not in [Company 1','Company 2','Company 3','Company 4','Company 5','Company 6'']:
                amt = '$50.00'
            else:
                amt = '$40.00'
        except:
            orgname = 'error'
            amt = '$00.00'
            
        dealid = item['id']
        pipeline = float(item['pipeline_id'])
        pipeline2 = ''
        if pipeline in [1]:
                pipeline2 = 'Estimate'
        elif pipeline in [2]:
                pipeline2 = 'Supplement'
        elif pipeline in [3]:
                pipeline2 = 'Depreciation Release'
        wontime = datetime.strptime(item['won_time'],"%Y-%m-%d %H:%M:%S")
        fromdate = datetime.strptime('2018-05-01 00:00:00', "%Y-%m-%d %H:%M:%S")

        if wontime < fromdate:
                break
        try:        
                title = item['title'].split('(')     
                title = title[0].rstrip()
        except: 
                title = 'error'
        try:
                propaddress = item['0c239af05b93765738541bd35d46a2507cd862f5_formatted_address'].replace(', USA','')
        except:
                try:
                        propaddress = item['0c239af05b93765738541bd35d46a2507cd862f5'].replace(', USA','')

                except:
                        print('------------ ERROR: ' + item)
                        propaddress = 'error'

        propaddress = propaddress.replace(title, '').strip()

        row = org,dealid,title,propaddress,pipeline2,datetime.date(wontime),amt

        print(row)
        W.append(row)
        woncount = woncount + 1

Open in new window

Brent

Sorry, think I forgot that part and assumed you had it nicely in Excel.

Does the code you've just posted create the list you want or is this the part you are actually having problems with?

ASKER

The lists are being created without any problems, just trying to output to the pdfs -- I CAN output to excel first but obv would prefer to skip the step for the sake of efficiency - example data is that data from the excel workbook.

Brent

If you have the list(s) then it should be straightforward to output them to the PDF(s).

In fact as far as I can see gelonida posted a function that you can simply pass the list and the filename to create the pdf.

ASKER

see my response to that - i cannot just plug the list in.

these two..

Your Comment

by:Brent Guttmann
Business Analyst
Comment posted
4h ago
So the W part here is what I am confused about. I understand how to write it if I know what the data will be like in the example you provided, what I need help with is writing the script to produce that W variable there regardless of how many rows are in the list, or the info in the list, as it will constantly be changing. How do I get the list [W] into the format for the variable W that you have there?
Where are the Best and Assisted buttons?

Active today
Your Comment

by:Brent Guttmann
Business Analyst
Comment posted
3h ago
Comment Utility
error I get when using - which makes sense...

File "c:\users\bgutt\appdata\local\programs\python\python36-32\lib\site-packages\reportlab\platypus\paragraph.py", line 65, in split
return [uword for uword in (_wsc_re_split(text) if delim is None and u'\xa0' in text else text.split(delim))]
AttributeError: 'int' object has no attribute 'split'

OK so W is a list.

what we need to help you would be an example of W to get it's correct structure.

import json
with open("dumped_w.json", "w") as fout:
    json.dump(W, fout, indent=1)

Open in new window

If W is a list and contains many lines you could even just dump the first two rows:

import json
with open("dumped_w.json", "w") as fout:
    json.dump(W[:2], fout, indent=1)

Open in new window

Now you could open the json file with a text editor and strip off any confidential information and post the result of the modified json.

ASKER

there is a text file above with an example of W. attached again here.

I dont understand why we are using this..

with open("dumped_w.json", "w") as fout:

Open in new window

you can see below that I am only adding certain items to the list, ( W.append(row)) from the JSON (data)... so, this step, to output to the pdf, is beyond the JSON portion, I will be going through the list and for each "Company" will be selecting those and adding to the pdf...

for item in data['data']:
        try:        
            orgname = item['org_name']
            if org not in orgnames:
                orgnames.append(org)
            if orgname not in [Company 1','Company 2','Company 3','Company 4','Company 5','Company 6'']:
                amt = '$50.00'
            else:
                amt = '$40.00'
        except:
            orgname = 'error'
            amt = '$00.00'
            
        dealid = item['id']
        pipeline = float(item['pipeline_id'])
        pipeline2 = ''
        if pipeline in [1]:
                pipeline2 = 'Estimate'
        elif pipeline in [2]:
                pipeline2 = 'Supplement'
        elif pipeline in [3]:
                pipeline2 = 'Depreciation Release'
        wontime = datetime.strptime(item['won_time'],"%Y-%m-%d %H:%M:%S")
        fromdate = datetime.strptime('2018-05-01 00:00:00', "%Y-%m-%d %H:%M:%S")

        if wontime < fromdate:
                break
        try:        
                title = item['title'].split('(')     
                title = title[0].rstrip()
        except: 
                title = 'error'
        try:
                propaddress = item['0c239af05b93765738541bd35d46a2507cd862f5_formatted_address'].replace(', USA','')
        except:
                try:
                        propaddress = item['0c239af05b93765738541bd35d46a2507cd862f5'].replace(', USA','')

                except:
                        print('------------ ERROR: ' + item)
                        propaddress = 'error'

        propaddress = propaddress.replace(title, '').strip()

        row = org,dealid,title,propaddress,pipeline2,datetime.date(wontime),amt

        print(row)
        W.append(row)
        woncount = woncount + 1

Open in new window

But, for simplification, what I need is something like

for line in enumerate(W):
if some name = some name:
add this line to the table
data.txt

Well normally my code should work if W contains a list with the right number of columns.
So I assume I do not understand what W really contains and dumping W as json is a good way to show us what W contains
Not sharing the exact structure of W will make it complicated to help you.

Therefore I would like to see what W (the first few entries of W) really contain. (I'm interested in the structure, not the actual data)
A json dump shows me the structure and allows you to replace confidential text with a text editor before posting it here.

A json dump allows me further to load the data into python and try to generate a pdf.

you can do of course the filtering before creating the dump:

Following code will filter your data and dump the first two rows into a json file,
which you can manually anonymize before posting it here.

filtered_W = []
for line_number, line in enumerate(W):
     if some name = some name:
            filtered_W.append(line)

import json
with open("dumped_w.json", "w") as fout:
    json.dump(filtered_W[:2], fout, indent=1)

Open in new window

Having this json should speed us up to give you the right solution.

A line in your text file looks like:

('Company Name', 4306, 'Name1', '123 Address1, FL 34743', datetime.date(2018, 6, 11), '$12,157.47', '$21,890.32', '44.46%')

Open in new window

which contains 8 elements and not 9 as would be expected by

['Org','ID','Title','Address','City','Date','Amount1','Amount2','Percentage']

Open in new window

(
    'Company Name', 
    4306, 
    'Name1', 
    '123 Address1, FL 34743', 
    datetime.date(2018, 6, 11), 
    '$12,157.47', 
    '$21,890.32', 
    '44.46%'
)

Open in new window

Address and city should be two fields, but they are just one string.
further your row contains a datetime.date object.
You have to convert it into a string to make things work.
so you should convert each datetime object into a string

you can do this with .strftime('%m/%d/%Y')

In fact trying to dump as a json would have failed as datetime is not a base type.

ASKER

Ok, here is the JSON

{"success":true,"data":[{"id":4216,"creator_user_id":{"id":3037790,"name":"Chris LastName","email":"chris@Domain.com","has_pic":true,"pic_hash":"bb2213d092ba012c307a974462f7cc0c","active_flag":true,"value":3037790},"user_id":{"id":2651625,"name":"Mike LastName","email":"mike@Domain.com","has_pic":false,"pic_hash":null,"active_flag":true,"value":2651625},"person_id":{"name":"Mike LastName","email":[{"label":"","value":"miked@email.com","primary":true}],"phone":[{"label":"work","value":"(888.134.4345)","primary":true}],"value":213},"org_id":{"name":"Company of America","people_count":95,"owner_id":2651625,"address":null,"cc_email":"email@email.com","value":74},"stage_id":6,"title":"John Doe","value":8032.5,"currency":"USD","add_time":"2018-02-05 20:15:26","update_time":"2018-06-13 17:36:57","stage_change_time":"2018-06-13 17:35:56","active":false,"deleted":false,"status":"won","probability":null,"next_activity_date":null,"next_activity_time":null,"next_activity_id":null,"last_activity_id":28347,"last_activity_date":"2018-06-13","lost_reason":null,"visible_to":"3","close_time":"2018-06-13 17:36:56","pipeline_id":1,"won_time":"2018-06-13 17:36:56","first_won_time":"2018-06-13 17:36:56","lost_time":null,"products_count":0,"files_count":7,"notes_count":2,"followers_count":3,"email_messages_count":5,"activities_count":9,"done_activities_count":9,"undone_activities_count":0,"reference_activities_count":0,"participants_count":1,"expected_close_date":"2018-02-26","last_incoming_mail_time":"2018-06-13 17:36:30","last_outgoing_mail_time":"2018-03-08 20:09:39","aece2d44042af2bb43ed03c0d9b0f3993f2f6c02":{"name":"Some Company Name","people_count":44,"owner_id":2440666,"address":"","cc_email":"email@email.com","value":39},"394f2ca7030899fd835a8673b223ca7d09ffbb04":"HO0517208384","7c5b33f2ba3ad0f9570f0f41b7bee69fd4cf4507":{"name":"John Doe","email":[{"value":"","primary":true}],"phone":[{"label":"work","value":"954-400-1803","primary":true}],"value":2958},"0c239af05b93765738541bd35d46a2507cd862f5":"1234 Address Drive, Orlando, FL 32836, United States","0c239af05b93765738541bd35d46a2507cd862f5_subpremise":null,"0c239af05b93765738541bd35d46a2507cd862f5_street_number":"1234","0c239af05b93765738541bd35d46a2507cd862f5_route":"Address Drive","0c239af05b93765738541bd35d46a2507cd862f5_sublocality":null,"0c239af05b93765738541bd35d46a2507cd862f5_locality":"Orlando","0c239af05b93765738541bd35d46a2507cd862f5_admin_area_level_1":"Florida","0c239af05b93765738541bd35d46a2507cd862f5_admin_area_level_2":"Orange County","0c239af05b93765738541bd35d46a2507cd862f5_country":"United States","0c239af05b93765738541bd35d46a2507cd862f5_postal_code":"32836","0c239af05b93765738541bd35d46a2507cd862f5_formatted_address":"1234 Address Dr, Orlando, FL 32836, USA","106abe665ea7141956a4976afe4283156a5c8155":{"name":"David Dole","email":[{"label":"work","value":"david@solutionscss.com","primary":true}],"phone":[{"label":"work","value":"(800) 531-8722","primary":true},{"label":"work","value":"29476","primary":false}],"value":321},"cc5efe8284a306809c77972f55569d1fe68f4174":"2","d904c121c625b42c7facd4aabd613499aa81cc6c":501.63,"d904c121c625b42c7facd4aabd613499aa81cc6c_currency":"USD","ebfcdf208f26bcfcc8f213dbe5f0cfe93d4421b5":35064.68,"ebfcdf208f26bcfcc8f213dbe5f0cfe93d4421b5_currency":"USD","40fecd575124af5ff0d70175fc83203b2accbc3d":27032.18,"40fecd575124af5ff0d70175fc83203b2accbc3d_currency":"USD","8c553d66bdd8d0c890bc54005fcbedbfebf15ed0":null,"stage_order_nr":6,"person_name":"Jane Doe","org_name":"Company Name of America","next_activity_subject":null,"next_activity_type":null,"next_activity_duration":null,"next_activity_note":null,"formatted_value":"$8,032.50","weighted_value":8032.5,"formatted_weighted_value":"$8,032.50","weighted_value_currency":"USD","rotten_time":null,"owner_name":"Mike LastName","cc_email":"name+deal1234@domain.com","org_hidden":false,"person_hidden":false}],"additional_data":{"pagination":{"start":0,"limit":1,"more_items_in_collection":true,"next_start":1}},"related_objects":{"user":{"3037790":{"id":3037790,"name":"Chris LastName","email":"chris@domain.com","has_pic":true,"pic_hash":"bb2213d092ba012c307a974462f7cc0c","active_flag":true},"2651625":{"id":2651625,"name":"Mike LastName","email":"mike@domain.com","has_pic":false,"pic_hash":null,"active_flag":true}},"organization":{"74":{"id":74,"name":"Some Company of America","people_count":95,"owner_id":2651625,"address":null,"cc_email":"name@domain.com"},"39":{"id":39,"name":"Some Company Name","people_count":44,"owner_id":2440666,"address":"","cc_email":"email@domain.com"}},"person":{"213":{"id":213,"name":"Mike LastName","email":[{"label":"","value":"miked@domain.com","primary":true}],"phone":[{"label":"work","value":"(999.123.4567)","primary":true}]},"2958":{"id":2958,"name":"Jane Doe","email":[{"label":"work","value":"david@domain.com","primary":true}],"phone":[{"label":"work","value":"(800) 123-4567","primary":true},{"label":"work","value":"29476","primary":false}]}}}}

Open in new window

you did not use the command that I suggested to create the json file.
Otherwise you would have a multi line file, that is easy to read by humans.

Is this really the json dump of your variable W or the filtered version of it ???
This is not even a list

ASKER

Yeah, its not a list, I create the list from retrieving the information I need from json responses (there are over 5000, that are in the format I provided. The text file I provided is what would be output if i print(the list).

I have attached a formatted version but just fyi, many sites and editors have this... here is a site

https://jsonformatter.curiousconcept.com/

I dont think we need to go down the JSON path here. Lets say we have a list.

1,ITEM1,ITEM2,ITEM3,ITEM4
2,ITEM5,ITEM6,ITEM7,ITEM8
3,ITEM6,ITEM7,ITEM8,ITEM9

How do I create the 'W' variable to be passed to table if the number of lines (3) in the list varies? That is the only question I need help with

Brent

I think I explained myself badly and you misinterpreted my request.

I just wanted to have a JSON dump of two rows of your already assembled data that is if I understand correctly in variable W.

I'm not at all interested in the json, that you used to assemble your data. As you stated: this is of no interested for the problem.

So the first two rows of W (or the filtered version of it should be fine)

How do I create the 'W' variable to be passed to table if the number of lines (3) in the list varies? That is the only question I need help with

I answered this question already:

you just concatenate your W with the header line and pass it to Table().

But if this does not work out for you we have to find the cause and thus I wanted to have a json dump of the W (or it's first two rows)
to analyze the problem.

Depending on the data stored in W, a copy / paste of the output of

print(repr(W[:2])

Open in new window

might do.
Did you read my post, where I indicated¸ that the number of columns is not correct and that your problem is very probably the fact, that you are using a datetime.date abject and that you should convert it into a string?

So here the code that should work if W has the correct contents. (same as in my previous post)

#!/usr/local/bin/python
 
from reportlab.lib import colors
from reportlab.lib.pagesizes import A4, inch, landscape
from reportlab.platypus import SimpleDocTemplate, Table, TableStyle, Paragraph
from reportlab.lib.styles import getSampleStyleSheet
 
def mk_report_from_W(W, report_name):
    """ function taking a list and a file name and creating a table with predefined columns
    """
    doc = SimpleDocTemplate("test_report_lab.pdf", pagesize=A4, rightMargin=30,
                            leftMargin=30, topMargin=30,bottomMargin=18)
    doc.pagesize = landscape(A4)
    elements = []
     
    title = ['Org', 'ID', 'Title', 'Address', 'City', 'Date', 'Amount1', 'Amount2', 'Percentage']
    
    
    data = [ title ] + W
     
    #TODO: Get this line right instead of just copying it from the docs
    style = TableStyle([('ALIGN',(1,1),(-2,-2),'RIGHT'),
                           ('TEXTCOLOR',(1,1),(-2,-2),colors.red),
                           ('VALIGN',(0,0),(0,-1),'TOP'),
                           ('TEXTCOLOR',(0,0),(0,-1),colors.blue),
                           ('ALIGN',(0,-1),(-1,-1),'CENTER'),
                           ('VALIGN',(0,-1),(-1,-1),'MIDDLE'),
                           ('TEXTCOLOR',(0,-1),(-1,-1),colors.green),
                           ('INNERGRID', (0,0), (-1,-1), 0.25, colors.black),
                           ('BOX', (0,0), (-1,-1), 0.25, colors.black),
                           ])
     
    #Configure style and word wrap
    s = getSampleStyleSheet()
    s = s["BodyText"]
    s.wordWrap = 'CJK'
    data2 = [[Paragraph(cell, s) for cell in row] for row in data]
    t=Table(data2)
    t.setStyle(style)
     
    #Send the data and build the file
    elements.append(t)
    doc.build(elements)
# Next line has to be replaced with whatever code generates your W data
W = [ ['Company Name1', '4306', ' Name1', ' 123 Address1', 'City', 
        '6/11/2018', '$12,157.47', '$21,890.32', '44.46%'], 
    ['Company Name2', '4553', ' Name2', ' 345 Address2', 'DeBary',
        '6/11/2018', '$11,998.00', '$13,332.00', '10.08%'],
    ['Company Name3', '5470', ' Name3', ' 3125 Address3 Long Name', 'Orlando',
          '6/11/2018', '$14,165.27', '$15,772.28', '10.19%'],
    ]

# Now call our function with the right data
mk_report_from_W(W, "test_report_lab.pdf")

Open in new window

I try to give you another minimalistic code snippet that show's you how you can create a table.

Every element in the list must be a string if not, the code will fail.
So any datetime.date object must be converted explicitely into a string.

I created now a more robust version, that converts every cell into a string and should therefore not fail if you
pass it a date time object, but the result will not look as you want except you add custom code to convert date objects to your desired format.

#!/usr/local/bin/python
 
import datetime

from reportlab.lib import colors
from reportlab.lib.pagesizes import A4, inch, landscape
from reportlab.platypus import SimpleDocTemplate, Table, TableStyle, Paragraph
from reportlab.lib.styles import getSampleStyleSheet
 
def mk_report(fname, title_row, data):
    """ function taking a title_row and a data row and displays them in a table
        as a robustness measure every cell will be converted to a string
        the result will often not be what we want if we pass non string objects.

    """
    doc = SimpleDocTemplate(fname, pagesize=A4, rightMargin=30,
                            leftMargin=30, topMargin=30,bottomMargin=18)
    doc.pagesize = landscape(A4)
    elements = []
    
    # concatenate title row and data rows
    full_data = [ title_row ] + data
     
    #Configure style and word wrap
    s = getSampleStyleSheet()
    s = s["BodyText"]
    s.wordWrap = 'CJK'
    data2 = [[Paragraph(str(cell), s) for cell in row] for row in data]
    t=Table(data2)
     
    #Send the data and build the file
    elements.append(t)
    doc.build(elements)


title = ['Name', 'Age']
W = [
    ('Mary', 40),
    ('Peter', 20),
    ('Tom', 60),
    ]
mk_report('test1.pdf', title, W)

title = ['Name', 'birthday']
W = [
    ('Mary', datetime.date(2018-40,1,1)),
    ('Peter', datetime.date(2018-20,2,10)),
    ('Tom', datetime.date(2018-60,3,20)),
    ]
mk_report('test2.pdf', title, W)

title = ['Name', 'birthday']
W = [
    ('Mary', datetime.date(2018-40,1,1).strftime('%m/%d/%Y')),
    ('Peter', datetime.date(2018-20,2,10).strftime('%m/%d/%Y')),
    ('Tom', datetime.date(2018-60,3,20).strftime('%m/%d/%Y')),
    ]
mk_report('test3.pdf', title, W)

Open in new window

This code is fully operational, so please run it and look at the created pdfs

ASKER

Here is the output for the print

[('Org', 'ID', 'Title', 'Address', 'Type', 'Date', 'Amount'), ('Company Name', 6116, 'John Doe', '1234 Address Cir N, Jacksonville, FL 32246', 'Supplement', '06/14/2018', '$50.00')]

ASKER

I have a header row included, which can be removed.

ASKER CERTIFIED SOLUTION

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

ASKER

Yes, that worked. Sorry, I probably could have been a bit more clear there. I appreciate your help!

Enjoy,
The most important thing is, that things are working now.

ASKER

Thanks! One last q if you have a sec - off topic

I am trying to sort the W list alphabetically accoriding to the company name

None of these are working though...

sort(W)

Open in new window

sorted(W, key=lambda company: company[2])

Open in new window

sorted(W, key=itemgetter(0))

Open in new window

What am i missing here? If you dont see the answer right off the bat, no worries - just thought it would be quick

either

newW = sorted(W, key=lambda row: row[2])

Open in new window

or

W.sort(key=lambda row: row[2])

Open in new window

should do the job

I used row instead of company as what you pass to the lambda function is a row, but you could use any other name
what's important is, that

sort sorts a list in place, whereas sorted creates a new sorted copy, (which had then to reassign)

so sort uses less memory, whereas sorted allows you to keep the original list unmodified

you cold also write:

W = sorted(W, key=lambda row: row[2])

Open in new window

but this consumes for a short time a little more memory than sort

ASKER

Got it - will add it in tomorrow - appreciate the additional help