Use keywords to indent nested structures in Excel

phoffric
phoffric used Ask the Experts™
on
Attached is an example Excel spreadsheet with syntax similar to a table in a 60 page word doc. It represents a data structure that has many substructures in it that are controlled by loops and if statements. It is hard to see the structure, so I have learned a couple of techniques to try to indent sets of rows to the next column. For example, I now have a macro that will indent a manually selected set of rows one column to the right.
Nested-labeled_Conditions.xlsm

To get an idea of the desired result, this link gives a similar example of pictures of the a sheet before indenting and after indenting.
https://www.experts-exchange.com/questions/29130882/Excel-Spreadsheet-Error-when-moving-rows-one-column-to-the-right.html#a42766586

If you have a solution, I'll add an extra nested structure to see if it works in general.

<<EDIT 2019-01-16 12:40am EST>>
I changed the title in an important way. No longer should the Headers and Footers be considered as expressions. I now can see that expressions are much too difficult to maintain and vary immensely from table to table as there are potentially many different authors with their descriptive writing style.

The Headers and Footers should be considered as a single known word(s) - and it will be the first word in Column A. Currently, it is safe to say that the Footer will be always:
End - but test should be case insensitive, so end also is a footer.

Some headers I have seen in general are as follows (not all might be used in attached sample spreadsheet):
begin
loop
for
if


Luckily, I have not seen else, so don't worry about that. If I find other keywords, then I should be able to modify the macro's test condition.

The special ellipses char (i.e., 3 dots) before a keyword should be ignored as I can easily remove them from Excel by replacing them. (And feel free to remove them.)

One major problem is that there are vertical merged cells which possibly can result in data loss when indenting, and losing text is undesirable. Also, a potentially minor problem is that the headers and footers are usually merged horizontal cells. For a given field in Column A, that record may have in the actual spreadsheet up to 7 vertical merged cells in different columns in Column A; and in the field's record, other columns might have no merged cells, or another column could have a different number and organization of merged cells. Crazy, right? What is known is that if a field name has no merged cells in Column A, then there will be no merged cells in the other columns in the record.

Here is a schematic drawing of a sheet before indenting:
1st header
data
2nd header
data
3rd header
4th header
data
4th footer
data
3rd footer
data
data
5th header
data
6th header
data
6th footer
5th footer
data
2nd footer
data
data
1st footer
data


After indentation (used EE indent tags):

1st header
data
2nd header
data
3rd header
4th header
data
4th footer
data
3rd footer
data
data
5th header
data
6th header
data
6th footer
5th footer
data
2nd footer
data
data
1st footer
data


Thanks in advance.
Paul
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Top Expert 2014

Commented:
This can be done with a stack structure, where you push a set of items on the stack when some condition is detected and remove them when another condition is detected.  In your case the first set of conditions is a cell that begins with ("loop", "if") and the second set of conditions is a cell that begins with ("end").  If you're working in Excel, you might use the stack 'level' (depth) as a parameter in an .Offset() method.

With as simple a problem as this one, your stack structure can be an integer variable.  You might indent as you iterate the rows.

Author

Commented:
>> With as simple a problem as this one
Glad to hear this is "simple". I agree. If each row identifier in the first column had only the same number of cells (i.e., one cell per column) in the columns to the right, I'd be able to export this to a csv file, and do the indentation myself in C/C++. But they didn't make it that easy.
Top Expert 2014

Commented:
Does this data come to you as a CSV file?  If so, you can use commas at the start of each row to enable indentation when the file is imported into Excel.

You can use the string(lngDepth, ",") function to do this with lngDepth being the current depth.
Ensure you’re charging the right price for your IT

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden using our free interactive tool and use it to determine the right price for your IT services. Start calculating Now!

Top Expert 2014

Commented:
... in C/C++
You could also process the CSV file with VBA, VBScript, or Powershell.

Author

Commented:
The table comes in a pdf file. When copy/pasting it to excel, the one record in the PDF file becomes multiple rows in the Excel file with merging of the extra cells in col A.
Top Expert 2014

Commented:
Please post a representative sample PDF

Author

Commented:
I will see if I can find one at work tomorrow.
Martin LissOlder than dirt
Most Valuable Expert 2017
Distinguished Expert 2018

Commented:
Take a look at this. There is no dependence on color but it does require that headers and footers be marked with Hn for the header where 'n' is a number and the footer for that header should Fn where the 'n' is the same number. That is made easier by a macro you can execute via Ctrl+Shift+H which will guide you through the process. When you open the workbook you'll see a sheet where the data has been marked with headers and footers and also indented. Use Ctrl+Shift+Z to restore it. Other macros available to you are:

Ctrl+Shift+I to Indent
Ctrl+Shift+M to manually Indent or outdent (in other words right or left)
Ctrl+Shift+Z to undo
29131489.xlsm

Author

Commented:
Thank you so much Martin! Maybe with these questions, I'll actually learn a little VBA.

Heh, my Starter Windows 7 Excel doesn't see macros, so I'll have to wait till tomorrow to view the macro and learn from it. I see what you did in your attachment. Even though I have to wade through 60 pages of the table, adding in Hn, Fn is not a bad exercise. Only wondering about this: If I get cross-eyed going through the 60 pages, and accidentally skip a small H33, F33 combo, or maybe I just skip more than one header/footer. So I've labeled incorrectly. How to recover from this if I don't realize this until I get to the end of the doc? Is there a natural easy way to get the number ordering correct, or do I start over with the backup copy?

Seems like you are very close to a fully automated solution (provided that my actual headers and footers are loop/if and end, respectively. If you have a stack in your VBA, I am thinking that it may be possible to avoid the need to add Hn/Fn. On the other hand, having this notation eliminates table naming conventions, and even shows a degree of how complex the table is. Thanks again. I am looking forward in trying this out tomorrow at work.

Regards,
Paul

p.s. - I know I should try to learn VBA, but my job demands that I learn other things at the moment. (And this moment has been going on for a few years now.)
Martin LissOlder than dirt
Most Valuable Expert 2017
Distinguished Expert 2018

Commented:
so I'll have to wait till tomorrow to view the macro and learn from it.
If you don't understand the current 'Indent' sub, I can either answer specific questions or add more comments.
If I get cross-eyed going through the 60 pages, and accidentally skip a small H33, F33 combo, or maybe I just skip more than one header/footer. So I've labeled incorrectly. How to recover from this if I don't realize this until I get to the end of the doc?
Currently, once you've added the headers and footers it won't allow you to do that, and while that could be changed, it would be easier [for me:) ] if you just manually edited it.
If you have a stack in your VBA
I think what aikimark is talking about is simply that in VBA code if your are, say, referring to the active cell, you can refer to the cell 4 rows down and 2 columns to the right with ActiveCell.Offset(4, 2).

One idea I have for making this process simpler is that we could use column 'A' for the header and footer markers and start the data in column 'B'. That way it would look like this, and column 'A' could if it visually gets in the way, be hidden or given a column width of 0.Column 'A'That would make it easier for you to manually edit, and while it would probably take me several hours to change the code, it would make the code simpler to maintain.
Martin LissOlder than dirt
Most Valuable Expert 2017
Distinguished Expert 2018

Commented:
In addition to the above,  if the document could start at row 2, I could put buttons in row one for 'Indent' and the other functions.

Author

Commented:
I can do anything to the excel doc. It's purpose is to help me understand the structure. Hence the need to indent.

The header keywords are (at the start of the line):
...Begin
Begin
...If
Start of
For

In the PDF file and in Excel, the cursor jumps over the ... as if the ... were one character??


The footer keywords are:
...End
End

Author

Commented:
Scanning the table:
Upon seeing a header keyword, push the row number onto the stack.
Upon seeing a footer keyword:
1. record the row number of the footer.
2. Extract the row number at the top of the stack
3. Pop the stack
4. Indent from (Header row # + 1) to (footer row # -1)

Author

Commented:
@aikimark,
I will have to find a table that represents the table. On my workstation, I have a PDF file, but there is also a Word document with red lines. So, in another question, I could take the OP spreadsheet, and copy it into a word doc; and then see if I get the problems when I copy from Word to a new Excel doc. I'll have to ask around to see if there is a PDF file on the www that exhibits the copy to Excel issue related to merging cells in one but not all columns.
Top Expert 2014

Commented:
Either the Word document or the PDF, as long as it is a representative sample of the actual input.

Is the real problem you are trying to solve the preservation of indentation that is lost when you copy/past content?  Or is the problem something else?

Author

Commented:
In original doc, first column A is a single cell. Next column B on the row has notes. If the notes have multiple line breaks, then when copying from Word/PDF to Excel, column B will be on several rows (e.g., 1-7) depending upon how many new lines there are in the source doc. Col A will show those cells merged into one cell. When indenting manually, sometimes I get a warning that only the top line gets copied, and I lose data on all the other cells associated with the col A attribute. This issue has been solved in previous questions - something about recognizing the format of the source.
Martin LissOlder than dirt
Most Valuable Expert 2017
Distinguished Expert 2018

Commented:
Scanning the table:...
That is essentially what my code does now and has done in each version of the workbook, and it obviously does work but only if I can find the associated footer. If it were not for the fact that the headers and footers are nested, the approach as you described it would work. However when the headers and footers are like this

first header
second header
footer associated with second header
footer associated with first header

then that simple approach which indents everything between the first header and the first footer would produce this

first header
second header
footer associated with second header
footer associated with first header

which isn't correct. That's why there needs to be some way to know that "footer associated with first header" is associated with "first header".

Author

Commented:
I'm taking your example and prefixing with the line numbers in a fake example. These numbers are kept internally in the macro - they are not written to the spreadsheet.
data1
20 first header
data2
33 second header
data3
55 footer associated with second header
data4
77 footer associated with first header
data5

Open in new window

Find first header; push 20 onto stack. Stack looks like:
20

Open in new window

Find second header; push 33 onto stack. Stack looks like:
33
20

Open in new window

Find a footer keyword - do these 4 steps:
1. record the row number of the footer.
55 = footer row #

2. Extract the row number at the top of the stack to get the row number of the associated header:
33 = header row #

3. Pop the stack; Stack now looks like:
20

4. Indent from (Header row # + 1) to (footer row # -1); that is
Indent from (33  + 1) to (55 -1)
Indent from (34)      to (54)

Open in new window

Find a footer keyword - do these 4 steps:
1. record the row number of the footer.
77 = footer row #

2. Extract the row number at the top of the stack to get the row number of the associated header:
20 = header row #

3. Pop the stack; Stack now looks like:
<<empty>>

4. Indent from (Header row # + 1) to (footer row # -1); that is
Indent from (20  + 1) to (77 -1)
Indent from (21)      to (76)

Open in new window

This approach should yield:
data1
first header
    data2
    second header
        data3
    footer associated with second header
    data4
footer associated with first header
data5

Open in new window

Martin LissOlder than dirt
Most Valuable Expert 2017
Distinguished Expert 2018

Commented:
These numbers are kept internally in the macro...
I hadn't thought of that, but I believe there's still a problem in that I don't think it would work with a structure like the one in the workbook which is this. I think it would fail at around line 10.

1st header
data
2nd header
data
3rd header
4th header
data
4th footer
data
3rd footer
data
data
5th header
data
6th header
data
6th footer
5th footer
data
2nd footer
data
data
1st footer

Open in new window

Author

Commented:
here is what I get when the macro processes line 10:
1  1st header
2  data
3  2nd header
4  data
5  3rd header
6  4th header
stack:        6
              5
              3
              1
7  data
8  4th footer: Indent (6+1) to (8-1):: (7) to (7):: i.e., one line is indented
stack:        5
              3
              1
9  data
10 3rd footer: Indent (5+1) to (10-1):: (6) to (9):: i.e., indent 4 lines
11 data

Open in new window

At this point we have:
1  1st header
2  data
3  2nd header
4  data
5  3rd header
6      4th header
7          data
8      4th footer
9      data
10 3rd footer
11 data
12 data

Open in new window

Martin LissOlder than dirt
Most Valuable Expert 2017
Distinguished Expert 2018

Commented:
Does it still work if you let it go to the end?

Author

Commented:
I can't see any reason why it shouldn't work. The algorithmic loop just does the same thing. The 5th and 6th H/F pairs get knocked out in a similar way, leaving only the 2nd and 1st H/F pairings.The 2nd H/F pair then get indented; and finally the 1st H/F pair is indented.

Author

Commented:
I suppose I could modify the spreadsheet so that Begin is the start of every Header, and End is the start of every footer.

Author

Commented:
>> here is what I get when the macro processes line 10:
>> Does it still work if you let it go to the end?
Just wanted to clarify somethiing. When I said "what I get", I didn't mean that I ran a macro program in Excel. I meant that I ran the macro in my head. I didn't execute anything except in my head. So, when you ask "Does it still work", my answer was referring to the macro in my head. Hope that clarifies what I meant.
Martin LissOlder than dirt
Most Valuable Expert 2017
Distinguished Expert 2018

Commented:
I don't think that your stack solution will work and I don't have the several hours coding time to test it, so what I'd like to do is to implement my Column 'A' solution. with buttons in row 1. Is that good enough? If you aren't clear on what I'm proposing please let me know.

Author

Commented:
Sure. Your suggestion is great. Thanks again.

Maybe later w can talk about why you think the  proposed algorithm will fail. I can complete the scenario on paper sometime.

But getting something  else to work is fine. I can  easily make the changes to my excel sheet.

Thanks again.
Martin LissOlder than dirt
Most Valuable Expert 2017
Distinguished Expert 2018

Commented:
Try this.
29131489a.xlsm

Author

Commented:
Hey Martin,
Really nice! Going through the results in detail now.
ID and NUMELEM on rows 5-8 got the cells all correct. So the algorithm works there really well. Not sure what happened for ITEM4 which originally had Note 1, Note 2, Note 3 in original doc. In indented result, it said "some item. I may have some files mixed up. Will check  now and see what I may have done wrong.

Author

Commented:
I'll try from scratch the OP file, Nested-labeled_Conditions.xlsm and see what I get. Thanks again!
Martin LissOlder than dirt
Most Valuable Expert 2017
Distinguished Expert 2018

Commented:
If it looks reasonable to you then please accept one or more of my answers as the solution and I'll work on the "stack" solution.  However above you said "Find first header; push 20 onto stack" and In order to work on that solution I need to know how to find that first header.

Author

Commented:
Over here https:#a42771446 I provided a list of header expressions.
...Begin
Begin
...If
Start of
For

Open in new window

The peculiar thing is that the ... acts as a single character when I move the cursor.
and footer expressions.
...End
End

Open in new window

And for simplicity, I have no trouble making them all:
Begin
End

Open in new window

Author

Commented:
Could you please tell me the procedure for transferring the buttons and new macro to a new Excel document.

Author

Commented:
I googled and learned how to add buttons and map to macros. When I opened your spreadsheet, the macros appeared in the new spreadsheet. I added Hn and Fn, hit Hide, but that didn't work. Then I hit Indent, and that didn't work.

I guess I'll need your procedure on how to transfer the buttons and new macro to a new Excel document so I can test.

Thanks again, Martin. I am very impressed by what you have accomplished in these questions. Is this a common type of Excel problem that others need?

Author

Commented:
@aikimark,
I think I just found a document that shows the copy problem. I'll try to find time to post another question this weekend.

Author

Commented:
If I can learn how to transfer the macros and buttons on this computer, then I'll assume that I can transfer it on the huge document. Thanks again.
Martin LissOlder than dirt
Most Valuable Expert 2017
Distinguished Expert 2018

Commented:
I’m done for the night but you should be able to just copy/paste your document into my workbook instead of the other way around.

Author

Commented:
I mispoke when I listed some header/footer labels in an earlier post. To be fair, I shouldn't be changing the OP.

The OP spreadsheet example is Nested-labeled_Conditions.xlsm. As the OP says,

The attached spreadsheet has a sheet called "Labels". Column A shows typical keywords and expressions that denote the start of a nested data structure, and Column B denotes the end of that data structure.
The "Labels" sheet indicates that headers will begin with one of these words:
begin
if
loop

Open in new window

and footers will begin with this word:
end

Open in new window

Author

Commented:
In a 60 page table, it isn't surprising that there are multiple authors. And the expressions that designate a header/footer vary from author to author.

Not a mistake. And the Labels sheet is a little misleading in that I included full expressions. Just the first whole word is all that is necessary to designate a header or footer. In fact, if I happen to find another word for a header that is not on the list - should be a relatively rare occurrence from a quick scan - I will prefix the word with Begin as a standard header identifier.

Every footer happens to begin with End (not sure about case sensitivity - could be end). So, the words following End, such as if or repeat or anything else should be ignored.
Martin LissOlder than dirt
Most Valuable Expert 2017
Distinguished Expert 2018

Commented:
Are you saying that currently at least that there are just two headers, "if" and "loop" (ignoring case) and just two footers "end" and "end of"?

Author

Commented:
Assuming case insensitivity, there is one footer:
"end"
because it doesn't matter what follows "end" - could be "end of", "end if", "end repeat", and "end anything else"
If the first word in the first column is "end", then that row is a footer.

If the first word in the first column is
"if"
"loop"
"begin"
then that row number can be put on the stack as a header indicator.

Author

Commented:
I am really glad you are going to handle the keyword approach - I hope it works. If it is truly not possible, then I will accept that as the answer.

I moved your sheet with buttons to my other computer (I can only do that certain times during the week, and not on weekends.) I picked a small table - only 11 pages long.

The Add Tags asks for the header row - no problem. Then it asks for the corresponding footer row. Hmm, wonder where that might be? I looked and found it at over 1/2 way down. A lot of searching. Then went back up. Split screen would have helped for the first H/F pair, but with nested items, I think I'd need multiple splits. So I started entering by hand. H1, H2, H3, F3. Now I had to skip a few pages, H4, and so on. I'd start forgetting the count, so I started writing them down. I'd make a mistake along the way due to scrolling and re-checking the matching words. But if that is what it takes, then that is what will be done.
Martin LissOlder than dirt
Most Valuable Expert 2017
Distinguished Expert 2018

Commented:
using what I now know about the headers I could probably modify this code to automatically pick the matching footer, but then the stack approach, if I’m successful, and I think I will be, that will be a better way. So please close this question, ask a new one and I’ll work on it.

Author

Commented:
Still here at work and leaving in a few. Just tested on the 11 page sheet. I must have made a cockpit error because a popup appeared with a "debug" button. No problem, I figure. I just copied the original spreadsheet over to my working folder, and then copied the Word table to the new sheet copy.

Everything should be pristine, I thought. I added the tags, and hit Indent. Weird popup said:
The data is already indented
I checked again. The data was not indented. Very confused now.

Author

Commented:
Ok, so now I selected one row between my H1/F1 tags and did a manual indent - it worked.
Then I selected a whole bunch of rows between H2/F2 tags and did a manual indent.
I got a message saying
The operation will cause some merged cells to unmerge. Do you wish to continue?
I hit OK and more of the same. More OK's. I mean I would like to see the indentation.
Then I get a popup:
Unable to set the ColumnWidth property of the Range class.
And this popup has a Debug button.

Maybe this error is related to the copied sheet somehow thinking that "The data is already indented"??

Maybe I'll come in this weekend if you think I can get something to work.
Again, this is with the 11 page sheet.

Not sure what I am doing wrong operationally.

Thanks again.
Martin LissOlder than dirt
Most Valuable Expert 2017
Distinguished Expert 2018

Commented:
I'm sorry that you wasted time trying to fix those errors. What's wrong is that the workbook you're using assumes that merged cells always have just two rows. That assumption was based on the previous data samples, but I should have asked. Other assumptions that are made are that each row, merged or not, has three columns, and that if there are merged cells that they will only be in columns 'A' and 'C'. Are those safe assumptions?

Author

Commented:
>> merged cells always have just two rows.
I've seen as many as seven.

Cell A which represents the field name has its cells merged. Other cells in the original table get split up, I have seen, when there are sentences or lists on new lines.

I will get a sample of the source posted in another question as I promised.
Martin LissOlder than dirt
Most Valuable Expert 2017
Distinguished Expert 2018

Commented:
The main problem now is not matching headers and footers, but rather the complexity of the merged cells pattern.

Author

Commented:
I'll ask another question on copying the word table to excel to see if there is a way to have one word row become one excel row.
Martin LissOlder than dirt
Most Valuable Expert 2017
Distinguished Expert 2018

Commented:
No, please don't. I'd rather figure out how to handle the structure.
Top Expert 2014

Commented:
I'm still waiting to see a representative sample of the input

Author

Commented:
This question show the Word copy to Excel problem.
https://www.experts-exchange.com/questions/29132143

Author

Commented:
>> the workbook you're using assumes that merged cells always have just two rows.
Oh, so that explains the data loss error. I understand.

What is strange though is that when I started fresh, and did everything manually with no errors, then everything should be pristine, I thought. I added the tags manually, and hit Manual Indent. Weird popup said:
The data is already indented
I checked again. The data was not indented.

How did that error message show up? Why wouldn't the Manual Indent Button indent?
Top Expert 2014

Commented:
That is a specification document and not input to the indentation process.

Author

Commented:
Maybe not normally. But since I can't understand the structure in its flat form, I need to indent it.

There's gotta be a way?

Maybe examining the column A and if it had a single field word in it then, merge the other columns in an appropriate way so that there is one row per  field.
Martin LissOlder than dirt
Most Valuable Expert 2017
Distinguished Expert 2018

Commented:
The data is already indented
That only appears after you click the 'Indent' button and some cell in column 'A' is blank.
I added the tags manually, and hit Manual Indent. Weird popup...
I believe that's a misunderstanding. After adding the tags totally by hand, or using 'Add Tags', you should click the 'Indent' button as normally, and not 'Manual Indent'.  The purpose of 'Manual Indent' is to allow you to select a few rows and then indent or outdent  them.

Author

Commented:
->> The data is already indented
Which operations can produce this message? Maybe I wrote the wrong thing down after i left my computer.

So now it looks like the indentation is doable assuming that I can get one word row to copy into one Excel row.

Do you think your macro could get a list of header words from a sheet marked "labels" so that if another table has different header words, I'll be able to just add that new word to the table?

If you want to give it a try, let me know and I'll go to work if i can get out of here. Looks like 5 inches of snow.

Author

Commented:
I can ask another question similar to this but without the extra complexity of having some columns merged and other columns not merged.
Martin LissOlder than dirt
Most Valuable Expert 2017
Distinguished Expert 2018

Commented:
->> The data is already indented
Which operations can produce this message?
Just the 'Indent' button or Ctrl+Shift+I, both of which call the same code.
I can ask another question similar to this but without the extra complexity of having some columns merged and other columns not merged.
Sure. Is your question about Word, resolved?

Author

Commented:
Yesterday, I marked the other Copy Word to Excel question as resolved. I can't see how it won't work on the Word tables at work.
Martin LissOlder than dirt
Most Valuable Expert 2017
Distinguished Expert 2018

Commented:
I saw that, but you asked the additional question
How do you use the chr(13) to do the global replace?
and I wanted to know if my answer to that question worked.

Author

Commented:
Oh, I'll have to wait till I get to work since my laptop doesn't have a numeric keypad, and it does not work with just the usual numbers. But the special formatting global replace that I figured out worked fine (for paragraph and lines). Thanks.

I am manually indenting (with no macros since I don't have macros on my laptop) the 193 line file, and will create a new question shortly. The only merged horizontal cells are in the header/footers, and I ignore the message that they will become unmerged. Sometimes I notice that footer doesn't indent, so I move it to an open area and then move it to the correct cell.

Also, if I have:
begin something
if something

then when it comes time to indent the if something, I have to move its merged cell to an open space, and then move it back to the correct location (as unmerged).

This is a heads-up in case you have figured out how to handle the stack to do the indentation. Other than the merged cells for headers/footers, no other cells will be merged in the other question.

Then, if, as you said yesterday, you want to do a general approach to this questions more complex problem due to the merged cells, let me know, and I'll keep it open for you to complete. At least with the two other questions on prepping the word document and then having no vertical merged cells, I'll be able to format the spreadsheet. Thanks again.

I'll go to work to try out new macros if it is not too late.

Author

Commented:
>>  Sometimes I notice that footer doesn't indent, so I move it to an open area and then move it to the correct cell.
Double checked when it happened again. Turns out it was a cockpit error. Trying to do this indentation by hand makes me cross-eyed. No problem when I do it right.

>> and I wanted to know if my answer to that question worked.
To avoid confusion on my part, it is better to keep this comment in the question that it pertains to.

Author

Commented:
If I am pressed to understand those tables early in the week, I will use a simple macro from another question to indent semi-manually.
What I did for the other question was to select a block and move it one to the right. The width of the block often keeps increasing, of course, so I have to determine the width. With the simple macro, I think all I have to do is select the Col A set of rows and run the macro.

If I do that, then I won't have further need of the best solutions. So let me know if/when you think this is doable. I'll have the other question posted within 15 minutes.
Martin LissOlder than dirt
Most Valuable Expert 2017
Distinguished Expert 2018

Commented:
To avoid confusion on my part
, and mine I assume you'll be doing that.

Author

Commented:
This has an excel document that has no vertical cell merging.
Hopefully that makes the indentation problem much simpler.
https://www.experts-exchange.com/questions/29132183/

Author

Commented:
@aikimark,
Although it appeared in your last post that you didn't think that a specification table could be indented, I'd like to thank you for your first post that said
With as simple a problem as this one, your stack structure can be an integer variable.  You might indent as you iterate the rows.
This design comment set the stage for an implementation.
Top Expert 2014

Commented:
The specification document did not align with your problem definition.

Author

Commented:
Interesting.. Aside from producing a more exhaustive document, what additional wording should I have written to make the specification perfect?
Top Expert 2014

Commented:
The core problem seems to be the transfer of the Word table into Excel.  You actually state this core problem at the top of the Word document.  The core problem is different than the indentation problem.
Top Expert 2014

Commented:
Also, looking at some of the headers, I see some start with Begin and some start with ...Begin.  I'll have to assay these values to determine the variety of such starting values.  It isn't clear how to treat these (block) items when it comes to indentation.
Martin LissOlder than dirt
Most Valuable Expert 2017
Distinguished Expert 2018

Commented:
Some of what look like 3 periods (aka full-stops) are actually horizontal ellipsis.
Here's a version of Jan's code that changes them to three actual periods and removes any spaces directly following them.
Private Sub Elip()
    Dim lIndentLevel As Long
    Dim oCell As Range
    Dim sVal As String
    Dim sTrim As String
    
    Application.ScreenUpdating = False
    
    For Each oCell In Worksheets("Fix").UsedRange.Columns(1).Cells
        sVal = LCase(CStr(oCell.Value))
        ' look for horizontal ellipse
        If InStr(1, sVal, Chr(133)) > 0 Then
            sVal = "..." & Mid$(sVal, 2)
            If Mid(sVal, 4, 1) = " " Then
                ' Delete one or more spaces following the 3 periods
                Do
                Loop Until Mid(sVal, 4, 1) = " "
            End If
        End If

        If sVal Like "?end*" Or sVal Like "???end*" Or sVal Like "end*" Then
            lIndentLevel = lIndentLevel - 1
        End If
        If lIndentLevel < 0 Then lIndentLevel = 0
        If lIndentLevel > 0 Then
            oCell.Resize(, lIndentLevel).Insert xlToRight
        End If
        If (sVal Like "?begin*" Or sVal Like "begin*" Or sVal Like "???begin*" _
            Or sVal Like "?if *" Or sVal Like "if *" Or sVal Like "???if *" _
            Or sVal Like "?for *" Or sVal Like "for *" Or sVal Like "???for *") Then
            lIndentLevel = lIndentLevel + 1
        End If
    Next
    
    Application.ScreenUpdating = True
    
End Sub

Open in new window

Author

Commented:
@aikimark,

>> some start with Begin and some start with ...Begin
I am very flexible. In another question, I just did a global replace of the "..." completely removing them.

>> The core problem seems to be the transfer of the Word table into Excel.  You actually state this core problem at the top of the Word document.
That may be the case in one of my other related questions, but this question doesn't attach a Word doc in the OP. So, the question, as it stands appears self-contained to me. Based upon what Martin says, the core problem is that there are vertical merged cells. Martin has asked at times for other separate questions, which I did. The core problem has been solved via Martin's posts.

I'll keep this question open to see if someone wants to handle the vertical merged cell issue.
Top Expert 2014

Commented:
Here are the 'code' block indicators/delimiters I've identified from your sample document:
Begin repeated for
Begin for each
End repeat for
If
End if
Begin If

Open in new window


and a Possible 'coding' error:
End repeat for each parameter...End repeat for each parameter

Open in new window


Here is a starting point for our conversation about importing and indenting.
1. Are some of these block indicators/delimiters identical?
2. Which of these should cause indentation?
3. Are these to be indented identically?
For instance, do the rows within a Begin repeated block align with the Begin or are they indented.
Top Expert 2014

Commented:
Please note that I'm approaching this differently than Martin.

Author

Commented:
Hi aikimark,
I have tried to address your questions by editing the Title and the body of the OP - See Edit at bottom of OP. If something is not clear, let me know, and I'll try to adjust the OP to be clearer.

Thanks,
Paul
Top Expert 2014

Commented:
What about the "possible coding error" I mentioned?

Author

Commented:
Only the first word in A column should be checked for a header or footer. Some coding errors might be if the first keyword found were a footer.  I wouldn't expect that to work and you don't have to deal with it. Also the number of headers should equal the number of footers but you don't need to check . To keep things simple we are not concerned with matching if with end if and loops with end loops . I rely on the table authors to get that right . Garbage in - garbage out. Assume no garbage. I'll look at the results and if garbage in, I'll fix the garbage to conform to the rules.

This is not being used for production. I just want to see the table structure so that it is easier to understand. So no special error handling is necessary. This would be a different story if this were a production feature or even an in house general purpose utility.
Top Expert 2014

Commented:
Thanks for that clarification.

Author

Commented:
Sure thing. :)

Author

Commented:
If you do get this to work, I will move the macro to my workstation and try it out on the real thing.
Jan's solution worked very nicely.
https://www.experts-exchange.com/questions/29132183/Excel-document-needs-auto-indenting-based-upon-header-footer-keywords.html

But for it to work, I simplified the problem by replacing the paragraph marks in the Word document with "  ||  " as described here by Martin:
https://www.experts-exchange.com/questions/29132143/How-to-Copy-Word-Table-to-Excel-so-that-the-number-of-Rows-are-the-same.html
Doing this step eliminated the vertical merged cells.
(In fact, after that replacement, there were no line breaks that had to be replaced.)

Since the Word modification is very easy to do, and the results are great, I am happy. :)

At this point, this question isn't really needed for me to understand the document. But, it seems like a nice challenge, and maybe someday, I'll want to keep the original formatting.
Top Expert 2014

Commented:
Is there a Word document that isn't a big table?

Author

Commented:
>> Is there a Word document that isn't a big table?
Are you wanting a word doc with less row in the table, or maybe less columns?

Let me know what you need in a little more detail. Do you just want the large excel sheet with less rows, or do you actually need a Word doc? The input to this question is an excel doc, so I'm not clear about the what and the why regarding a word doc.


Are you aware of  Word_to_Excel_Issue doc in the OP of this question?
https://www.experts-exchange.com/questions/29132143/How-to-Copy-Word-Table-to-Excel-so-that-the-number-of-Rows-are-the-same.html
Top Expert 2014
Commented:
I've packaged the solution code into the uploaded workbook.  To invoke the code, I executed the following statement in the immediate window:
Q_29131489 "C:\Users\New User\Downloads\Word_to_Excel_Issue.docx"
Of course, you would supply the path to your Word document.  Make sure the Word document is not open when you run this code.

After the code runs against your document, check the immediate window for any "unexpected condition" lines.

Option Explicit

Function CleanCell(parmCellText As String) As String
    Static oRE As Object
    If oRE Is Nothing Then
        Set oRE = CreateObject("vbscript.regexp")
        oRE.Global = True
        oRE.Pattern = "^\W*(.*\w)\W*$"
    End If
    If oRE.test(parmCellText) Then
        CleanCell = oRE.Replace(parmCellText, "$1")
    Else
        CleanCell = parmCellText
    End If
End Function

Function IsBlockStart(parmCellText As String) As Boolean
    Static oRE As Object
    If oRE Is Nothing Then
        Set oRE = CreateObject("vbscript.regexp")
        oRE.Global = False
        oRE.ignorecase = True
        oRE.Pattern = "^((?:Begin (?:repeated )?for)|(?:(?:Begin )?if))"
    End If
    IsBlockStart = oRE.test(parmCellText)
End Function

Function IsBlockEnd(parmCellText As String) As Boolean
    Static oRE As Object
    If oRE Is Nothing Then
        Set oRE = CreateObject("vbscript.regexp")
        oRE.Global = False
        oRE.ignorecase = True
        oRE.Pattern = "^((?:End (?:repeat(?:ed)? )?for)|(?:End if))"
    End If
    IsBlockEnd = oRE.test(parmCellText)

End Function

Sub Q_29131489(parmDocFile As String)
    Dim oWd As Object
    Dim oDoc As Object
    Dim oTbl As Object
    Dim oRow As Object
    Dim oCell As Object
    
    Dim wks As Worksheet
    Dim rng As Range
    
    Dim CellValue As String
    Dim lngIndent As Long
    Dim EndOfCell As String
    
    If Len(Dir(parmDocFile)) = 0 Then
        MsgBox "Source document file not found", vbCritical
        Exit Sub
    End If
    
    Set oWd = CreateObject("word.application")
    Set oDoc = oWd.documents.Open(parmDocFile, True)
    
    'assumption is that there is only one table in the source document
    'validate this assumption
    If oDoc.tables.Count <> 1 Then
        MsgBox "Single table not found in document", vbCritical
        oDoc.Close
        oWd.Quit
        Set oWd = Nothing
    End If
    
    Set wks = ActiveSheet
    Set rng = wks.Cells(1, 1)
    EndOfCell = Chr(13) & Chr(7)
    Set oTbl = oDoc.tables(1)
    For Each oRow In oTbl.Rows
        If oRow.Cells.Count = 1 Then
            CellValue = oRow.Cells(1).Range.Text
            CellValue = CleanCell(CellValue)
            Select Case True
                Case IsBlockStart(CellValue)
                    'write block and then increment indent
                    rng.Value = CellValue
                    lngIndent = lngIndent + 1
                Case IsBlockEnd(CellValue)
                    'decrement indent and then write block
                    lngIndent = lngIndent - 1
                    rng.Offset(0, -1).Value = CellValue
                Case Else
                    Debug.Print "Unexpected Condition: " & CellValue
            End Select
            'Debug.Print CellValue
        Else
            For Each oCell In oRow.Cells
                CellValue = oCell.Range.Text
                If Right(CellValue, 2) = EndOfCell Then
                    CellValue = Left(CellValue, Len(CellValue) - 2)
                Else
                    Stop
                End If
                CellValue = Replace(CellValue, Chr(11), Chr(10))
                CellValue = Replace(CellValue, Chr(13), Chr(10))
                rng.Value = CellValue
                Set rng = rng.Offset(0, 1)
                'Stop
            Next
        End If
        Set rng = wks.Cells(rng.Row + 1, 1 + lngIndent)
        
    Next

    oDoc.Close
    oWd.Quit
    Set oWd = Nothing

End Sub

Open in new window


If this works, I can tweak its performance.
Q_29131489.xlsm

Author

Commented:
Thanks aikimark. I don't have macro access on my laptop, so I have to wait till Friday to give this a try.
I took a quick look at the spreadsheet that represents the result of running your macro. Very impressive that you could keep the formatting from the Word document.

BTW, the termination appears to have left off one of the footers near the end, so a tiny amount of info is lost. If that is all the problem is, I'll be very satisfied since I can fill in a few rows by hand.

Now, I'll try to find out what an "immediate window" is. I may have operational questions tomorrow. We'll see.

Author

Commented:
In your spreadsheet, it looks like row 191 is blank. Should be:
…End DIRECT_GROUP_FLAG is 1

Just want to make sure this is only a premature termination issue, and not something deeper.

>> If this works, I can tweak its performance.
Sounds good. I hope to be able to test it this evening or if I get to work, this weekend.

If I ask another question to explain the vba for each line, could you do that? I'd like to at least get the gist of what is going on.
Top Expert 2014

Commented:
The End DIRECT_GROUP_FLAG is 1 does not meet any prior stated criteria.  The code is looking for if and for statements as the start of groups and end if and end for as end of groups.  Did this statement show up in the immediate window?
Top Expert 2014

Commented:
The gist of this code is to iterate the cells in the Word document table, populating the cells of the Excel worksheet.  The 'header' cells need some clean-up (CleanCell function) and the multi-line cells need some tweaking (Replace functions) because Word and Excel use different encoding for line termination inside a cell.
Martin LissOlder than dirt
Most Valuable Expert 2017
Distinguished Expert 2018

Commented:
FYI, I'm dropping out of this thread.

Author

Commented:
Hi aikimark,
For headers and footers I clarified in the OP that only the first single word in Column A determines whether the row is a header or footer. So, "end" is the only footer that should be checked.

The Headers and Footers should be considered as a single known word(s) - and it will be the first word in Column A. Currently, it is safe to say that the Footer will be always:
End - but test should be case insensitive, so end also is a footer.

>> oRE.Pattern = "^((?:End (?:repeat(?:ed)? )?for)|(?:End if))"
Not sure how to fix this to only check for "end"
Here is my guess:
oRE.Pattern = "^(?:End)"
What is "ed" for?

I just learned what an immediate window is.

Author

Commented:
How do I open an immediate window?
I looked at View-> macro and didn't see anything. What do you mean by "packaged the solution code". If I want to look at it or tweak it for the "end", how do I do that?

Thanks again.
Paul

Author

Commented:
Alt+F11
Ctrl+G
"Unexpected Condition: End DIRECT_GROUP_FLAG is 1"

Can I copy the excel doc, rename it, and use it for other word docs?
Does this command   Q_29131489 "H:\Docs\ExtraCells\Word_to_Excel_Issue_01.docx" cause the current doc to be cleared an overwritten with the new Word doc as the source?

Why does "View" --> Macro show nothing?
How do I "Package" a new macro and get it to stick around after I close Excel. (I can create a macro, and run it, but it disappears when I close/open the Excel doc.)

Will try to get in this weekend to see how this works.
Top Expert 2014
Commented:
I've tweaked the code to correctly identify the simpler end condition.  It should also perform a bit faster.
Option Explicit

Function CleanCell(parmCellText As String) As String
    Static oRE As Object
    If oRE Is Nothing Then
        Set oRE = CreateObject("vbscript.regexp")
        oRE.Global = True
        oRE.Pattern = "^\W*(.*\w)\W*$"
    End If
    If oRE.test(parmCellText) Then
        CleanCell = oRE.Replace(parmCellText, "$1")
    Else
        CleanCell = parmCellText
    End If
End Function

Function IsBlockStart(parmCellText As String) As Boolean
    Static oRE As Object
    If oRE Is Nothing Then
        Set oRE = CreateObject("vbscript.regexp")
        oRE.Global = False
        oRE.ignorecase = True
        oRE.Pattern = "^(?:(?:Begin (?:repeated )?for)|(?:(?:Begin )?if))"
    End If
    IsBlockStart = oRE.test(parmCellText)
End Function

Function IsBlockEnd(parmCellText As String) As Boolean
    Static oRE As Object
    If oRE Is Nothing Then
        Set oRE = CreateObject("vbscript.regexp")
        oRE.Global = False
        oRE.ignorecase = True
        oRE.Pattern = "^End "
        'oRE.Pattern = "^((?:End (?:repeat(?:ed)? )?for)|(?:End if))"
    End If
    IsBlockEnd = oRE.test(parmCellText)

End Function

Sub Q_29131489(parmDocFile As String)
    Dim oWd As Object
    Dim oDoc As Object
    Dim oTbl As Object
    Dim oRow As Object
    Dim oCell As Object
    
    Dim wks As Worksheet
    Dim rng As Range
    
    Dim CellValue As String
    Dim lngIndent As Long
    Dim EndOfCell As String
    
    If Len(Dir(parmDocFile)) = 0 Then
        MsgBox "Source document file not found", vbCritical
        Exit Sub
    End If
    
    Set oWd = CreateObject("word.application")
    Set oDoc = oWd.documents.Open(parmDocFile, True)
    
    'assumption is that there is only one table in the source document
    'validate this assumption
    If oDoc.tables.Count <> 1 Then
        MsgBox "Single table not found in document", vbCritical
        oDoc.Close
        oWd.Quit
        Set oWd = Nothing
    End If
    
    Set wks = ActiveSheet
    Set rng = wks.Cells(1, 1)
    EndOfCell = Chr(13) & Chr(7)
    Set oTbl = oDoc.tables(1)
    Application.ScreenUpdating = False
    For Each oRow In oTbl.Rows
        If oRow.Cells.Count = 1 Then
            CellValue = oRow.Cells(1).Range.Text
            CellValue = CleanCell(CellValue)
            Select Case True
                Case IsBlockStart(CellValue)
                    'write block and then increment indent
                    rng.Value = CellValue
                    lngIndent = lngIndent + 1
                Case IsBlockEnd(CellValue)
                    'decrement indent and then write block
                    lngIndent = lngIndent - 1
                    rng.Offset(0, -1).Value = CellValue
                Case Else
                    Debug.Print "Unexpected Condition: " & CellValue
            End Select
            'Debug.Print CellValue
        Else
            For Each oCell In oRow.Cells
                CellValue = oCell.Range.Text
                If Right(CellValue, 2) = EndOfCell Then
                    CellValue = Left(CellValue, Len(CellValue) - 2)
                Else
                    Stop
                End If
                CellValue = Replace(CellValue, Chr(11), Chr(10))
                CellValue = Replace(CellValue, Chr(13), Chr(10))
                rng.Value = CellValue
                Set rng = rng.Offset(0, 1)
                'Stop
            Next
        End If
        Set rng = wks.Cells(rng.Row + 1, 1 + lngIndent)
        
    Next
    Application.ScreenUpdating = True

    oDoc.Close
    oWd.Quit
    Set oWd = Nothing

End Sub

Sub TestSub()
    Q_29131489 "C:\Users\New User\Downloads\Word_to_Excel_Issue.docx"
End Sub

Open in new window

Why does "View" --> Macro show nothing?
Because it has parameters. If you drop the pasted code into your workbook's VB project, you should be able to see the TestSub routine.

Author

Commented:
Came in to work to test because I'm not sure how much time I'll have during the week.
Awesome job! I checked the results and it worked.

Author

Commented:
Thank you all very much for showing me the power of VBA. This seemed like a pretty hard challenge so congrats to you in figuring this out. Now that I have an indented table, I am hoping that my crossed eyes will get better over time. :)

Author

Commented:
@aikimark,

Thanks again. I never expected a Macro that would start with a Word source document and copy the table to Excel. Nice going!

You wrote: "Please post a representative sample PDF"
Does this mean you can do this on PDF files? I have some tables in PDF, and it is difficult to copy a page at a time to Word. Actually, less than a page since whenever there is either a header or footer in a page, the copy does not work well.

Were you just planning on converting the PDF to Word and then proceed from there?

My Adobe reader cannot convert PDF to Word. Someone on my team said they may have Acrobat which can convert. Just wondering whether there is a technique (without bringing in special 3rd party programs, which is not allowed) to copy PDF tables to Word or Excel?
Top Expert 2014

Commented:
For such problems, I usually use a utility program that will export the text layer of the PDF to a text file.  Such utilities can usually process multiple PDFs at a time.  Then the VBA/VBScript code processes the text file(s).

You might upload the PDF documents to Google Docs.  There are some free PDF utilities there.

Office 365 provides some PDF processing.
https://support.office.com/en-us/article/opening-pdfs-in-word-1d1d2acc-afa0-46ef-891d-b76bcd83d9c8

If you can select all the text in the PDF, you can copy it to the clipboard.  VBA code can access the clipboard object through MSOForms.  This might be a solution if there aren't too many PDFs and there aren't any HUGE PDF documents.

With Adobe Reader, you might print the document to the generic/text device, producing a text file.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial