Link to home
Start Free TrialLog in
Avatar of Anthony Mellor
Anthony MellorFlag for United Kingdom of Great Britain and Northern Ireland

asked on

How to !SPILL a CSV string?

So I have a comma separated string like say a,b,c,d,e,f,g,h,1,2,3,4

Is there a way to make this dynamically spill? In to a column?
Next question will be can we do that but every 4 items go across in columns?
Will Filter do that?

Anthony.
Avatar of aikimark
aikimark
Flag of United States of America image

please post an example of the result you are describing
I would use a macro, but you don't lile macros, right?
The other option may be to use a left or mid function to splitt it up, but you would need to do it in every cell and it depends a bit if the sequence is equal  (1 letter, comma, 1 letter etc.).
It can get a quite complex structure.
With a macros, you can easily and fast splitt it up in any format.

If all the values are single character then the simpler formula will do

=MID(","&A4&",",FIND("#",SUBSTITUTE(","&A4&",",",","#",ROW(OFFSET($A$1,1,1,LEN(A4)-LEN(SUBSTITUTE(A4,",",""))+1,1))-1),1)+1,1)

Else a longer formula

=MID(","&A4&",",FIND("#",SUBSTITUTE(","&A4&",",",","#",ROW(OFFSET($A$1,1,1,LEN(A4)-LEN(SUBSTITUTE(A4,",",""))+1,1))-1),1)+1,FIND("#",SUBSTITUTE(","&A4&",",",","#",ROW(OFFSET($A$2,1,1,LEN(A4)-LEN(SUBSTITUTE(A4,",",""))+1,1))-1),1)+1-FIND("#",SUBSTITUTE(","&A4&",",",","#",ROW(OFFSET($A$1,1,1,LEN(A4)-LEN(SUBSTITUTE(A4,",",""))+1,1))-1),1)-2)

In either case change A4 to your cell address.
We already sorta did that for you a few questions ago. Just take out the TEXTJOIN part and change the SEQUENCE functions to generate single row/multiple column arrays:

=LET(Count,ROUNDDOWN((LEN(A1)-LEN(SUBSTITUTE(A1,",","")))/4+1,0),Start,FIND("|",SUBSTITUTE(","&A1,",","|",SEQUENCE(1,Count,1,4))),Length,FIND("|",SUBSTITUTE(A1&",,,,",",","|",SEQUENCE(1,Count,4,4)))-Start,MID(A1,Start,Length))

Kevin
Avatar of Anthony Mellor

ASKER

Thanks - I hate to ask, but can you find or recall which question it was? I'm going round in circles here for the last few days. This new stuff is an order of magnitude more complicated and very hard to "debug" when I return to it a while, say a week, later.

I get it that SEQUENCE seems to be amazingly useful when in the first instance it does not look that way at all.

Wait a sec.. ROUNDDOWN? First time I saw that was yesterday or so.
So why do we have to use the amazingly complicated text manipulation functions when we have delimited data? Not least static arrays - but that's in my other question.

I suppose what I am missing is what VBA can do because it has a function to do this and also Windows Excel with the new XML function which seems to offer what I require?

Need to digest your above, I may be some time.

Anthony
and another thing: this business of functions that can go round and round inside themselves. That's real programmer thought processes. That's a culture shock.

And it's not apparent when such loops are in play and it's not apparent from any documentation I have yet seen - granted it's great, but that doesn't change the sea change that it is.

Anyway, I am persevering and appreciate the patience shown me in here, as it's clearly stuff well worth learning even if there is a pain barrier.

Anthony
is it possible for that formula to drop each item in to a cell, so splitting it out from its csv form?

I added Transpose to make it drop downwards.

Anthony
Hi Anthony,
the other experts here create really interesting constuctions...I never would even try...I even not aware about all of them. Maybe because I started with Excel 4.0 where a lot of formulas just not exists.  
So I'm used to macros for a log time.  
I just want to add a comment, why it is worth to have a look into macros, not for your current issue, just in general, maybe for the future.

Years ago, I held an Excel macro course for a bank, and they proudly presented me an Excel sheet full with formulas, where they built up something like a book keeping sheet. The sheet was such extremly full with formulas, that they had to switch off auto calculation and startet the calculation manually in the evening in the hope, the calculation is ready on the next morning. I never have seen such an Excel sheet in my life.

At the end of the course, we used the existing sheet as an example just to replace a few of the formulas just by a little bit code. And with every replaced formula you could see their eyes starting more an more sparkling.

After the course, they were not so proud anymore, because they recognized that they invested a lot of time into a solution, which costs them a lot of time, not only to construct it but especially to run it.

Cell based formulas just have the lack, that they interact with the sceen. And screen interaction is slow. Even you see in your own constructions, they can become extremly complex and difficult to debug. As I said, I never would even try to create such complex constuctions as even only to think about how to create them takes more time than my macro is ready and working.
The second point is just a point of permformace. If you construct it in the right way, it is much faster than any formula.
In the case of the bank above, we have measured it and could measure a performance difference (their construction in comparison of optimized vb code) of the factor 1.000. OK, we have faster machines now, but they also run vb code faster.

I just want to make you a bit appetite to try a little bit with macros, you have a macros recorder which can give you the objects you need (not perfomance optimal as it records, what you do on the sheet). And with a little bit help of all the experts here, you will learn how to write really short and efficent code, as I would assume they all have such code in their pockets (or better in their sheets). I'm sure, you eyes will start sparkling too after the first difficulties to get aware, how it works.  

I work with formulas too of course, but as far as I feel I loose the overview, I just write a little bit code to solve my problems. Even if I throw the code away after the job is done.

Now, I leave you back to the other experts, which are more common with the formulas :-)

 

Although the TRANSPOSE function will take a row and make it a column and vice versa, you cab achieve the same result by rejiggering the SEQUENCE functions to generate a column of values versus a row of values. Excel uses the lowest level arrays generated to determine what intermediate and final results look like.

This formula will generate a column of values instead of a row and all I did was swap the first two parameters to the SERIES function:

=LET(Count,ROUNDDOWN((LEN(A1)-LEN(SUBSTITUTE(A1,",","")))/4+1,0),Start,FIND("|",SUBSTITUTE(","&A1,",","|",SEQUENCE(Count,,1,4))),Length,FIND("|",SUBSTITUTE(A1&",,,,",",","|",SEQUENCE(Count,,4,4)))-Start,MID(A1,Start,Length))

SEQUENCE is a powerful tool but it wasn't all that revolutionary - we used to use the ROW(A1:A100) or COLUMN(A1:Z1) technique to generate a sequence of integers. SEQUENCE just made it easier.

ROUNDDOWN is used only to pre-determine how many sections we have. It basically counts the number of commas and divides by four.

We need to do this monkey business because, while Microsoft decided that a text join function would be a nice addition to CONCATENATE and CONCAT, they still don't see enough of a need to add a VB-like Split function.

XML will not do what you want. The source has to be valid XML which a CSV series of strings is not.

Recursive logic is as old as the hills but it is rarely used because it presents to the coder a bit of a conundrum, and applications for it are rare. It's used mostly when there is a recursive nature to the task where the same code can be used over and over to do whatever it is we are doing. But it's easy to create a recursive system that doesn't know when to stop and then we get a stack overflow. Recursive logic isn't obvious just from the language constructs used - it's up to the programmer to add comments to document the behavior. Any language that supports function calls can be used to create recursive logic.

"is it possible for that formula to drop each item in to a cell, so splitting it out from its csv form?"

That's exactly what the formula I provided does.

Kevin
deleted - need to read what you say properly.

Anthony 
Here's my file, see allocations H1 and your formula in h2, each row remains csv

GVEE-16DEVDEV.xlsb

I should probably have uploaded this earlier.

Anthony
To be clear, the formula built for the previous question replaced every forth comma with a semicolon enclosed in double quotes. In this question you asked for the string to be split into smaller strings of four tokens each.

Or at least that is what I thought I read. As I look more closely and consider the previous question (why would you want a semicolon every four tokens?) I am now beginning to realize that what you want to do is spill the CSV values into a two-dimension array, four columns wide and enough rows down to accommodate all tokens. If we are to map the resulting array into cells A1:D? then the first token would be in cell A1, the second in B1, third in C1, fourth in D1, fifth in A2, etc.

Is that a correct understanding of the goal?

Kevin
how on earth to document excel formulas, short of using some text editor and that sounds like using an IDE - even if we could add text on the end these complex bits of what seem to warrant being called code, need more granular notes - and I send half my time trying to figure out where brackets close.

Anthony
Is that a correct understanding of the goal?


yes.

You will see in the example file the original we are replicating is on the left, but may be more simply.

and we are skipping an interim step as well.

You may recognise it.
We are creating sales and purchase ledger controls to the invoice level from a csv column in the "cashbook" (Sheet DATA!) , but I am trying to simplify while understanding because I spent the last week struggling to "debug" my work at the input level.

Anthony
ASKER CERTIFIED SOLUTION
Avatar of zorvek (Kevin Jones)
zorvek (Kevin Jones)
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thanks, I vote for your raise.

Using your previous just as an example:

=LET(
Count , ROUNDDOWN((LEN(H1)-LEN(SUBSTITUTE(H1,",","")))/4+1,0),

Start    ,FIND("|",SUBSTITUTE(","&H1,    ","  , "|",SEQUENCE(Count,,1,4))),
Length,FIND("|",SUBSTITUTE(H1&",,,,", "," ,  “|",SEQUENCE(Count,,4,4)))-Start,

MID(H1,Start,Length))

I See the count working out lengths with and without commas
I see Start finding where to start, itself using what looks like a count inside itself?
and Length determining how many characters to pickup with the same count,
count presumably changes, but why doesn't it start again at the beginning instead of where it left off which (now I am guessing) it must do.
so in there I see some sort of recursion (?) but why does it recurse?

I can see that simply has to loop because one MID on its won't cut it.
How does Excel know to loop? I can't see any counter or anything says "go again"?

Anthony




SUBSTITUTE(text, old_text, new_text, [instance_num])
FIND(find_text, within_text, [start_num])
ROUNDDOWN(number, num_digits)


so that has reduced my formula count on that sheet from 7 to 2. All dynamic except 1.
Thank you to the other contributors.
Yes I appreciate macros (my largest had 12,000 lines of code),but they are no longer for me - except occasionally when building something.

Anthony

for me to find at a later date: my final formula is this in Allocations sheet:

=LET(RowCount,ROUNDDOWN((LEN(TEXTJOIN(",",,DATA!AH11:AH9947))-LEN(SUBSTITUTE(TEXTJOIN(",",,DATA!AH11:AH9947),",","")))/4+1,0),TokenIndex,SEQUENCE(RowCount,4,1),Start,FIND("|",SUBSTITUTE(","&TEXTJOIN(",",,DATA!AH11:AH9947),",","|",TokenIndex)),Length,FIND("|",SUBSTITUTE(TEXTJOIN(",",,DATA!AH11:AH9947)&",,,,",",","|",TokenIndex))-Start,IFERROR(MID(TEXTJOIN(",",,DATA!AH11:AH9947),Start,Length),""))

Open in new window




There is no recursion. More like steps in sequence.

When the Excel calculation engine sees an array being passed to a function's scalar parameter, it calls the function once per array element, and puts the result into a same-dimensioned array.

So think of an array formula as really just multiple calls to the function, one call per element of the array. Each call is separate from all other call and hence, must be calculated on it's own. That's why we are starting from scratch on each call and can't rely on some index counter construct.

Let's take my last formula:

=LET(RowCount,ROUNDDOWN((LEN(A1)-LEN(SUBSTITUTE(A1,",","")))/4+1,0),TokenIndex,SEQUENCE(RowCount,4,1),Start,FIND("|",SUBSTITUTE(","&A1,",","|",TokenIndex)),Length,FIND("|",SUBSTITUTE(A1&",,,,",",","|",TokenIndex))-Start,IFERROR(MID(A1,Start,Length),""))

This determines the number of rows or groups of four tokens:

RowCount,ROUNDDOWN((LEN(A1)-LEN(SUBSTITUTE(A1,",","")))/4+1,0)

This creates a 4 x n array where 4 is the number of columns and n is the number of rows or RowCount. Each element is an integer that starts at 1 and goes to the number of tokens in the string:

TokenIndex,SEQUENCE(RowCount,4,1)

This finds all the token start positions using the indices in TokenIndex. Since TokenIndex is a 4 x n array, the resulting positions are also in a 4 x n array:

Start,FIND("|",SUBSTITUTE(","&A1,",","|",TokenIndex))

This calculates all of the token lengths, again using the same 4 x n array:

Length,FIND("|",SUBSTITUTE(A1&",,,,",",","|",TokenIndex))-Start

And this pulls, using the MID function, all the tokens using the Starts and Lengths - working in the same 4 x n array:

IFERROR(MID(A1,Start,Length),"")

Kevin
Anthony! You gotta help us help you and, while asking whatever granular questions you want, include the overall general problem so we can get outside the box and devise the most practical overall solution.

If all you wanted to do was take a single column of values and map it out into a 4 x n array, this is all you need:

=IFERROR(INDEX(DATA!AH11:AH9947,SEQUENCE(ROWS(DATA!AH11:AH9947)/4+1,4)),"")

Kevin
I'm very happy to describe the overall problem I am trying to solve, but I have always found being too general on EE gets no response at all. Apologies if I have gone too far the other way.

"If all you wanted to do was take a single column of values and map it out into a 4 x n array"

It's not quite as simple as that, maybe not at all as simple. In the example file in the Data sheet you can see the "source column" headed Allocator. Each cell contains a formula that creates a csv string of 4 items. However, this can be any number of sets of 4 items in any one cell. That's what I think is the catch.
 
Here the overall question:
It's the difference between one payment paying one invoice and several payments part paying several invoices. So difference between one-to-one and one-to-many, this last being where the multiple sets of 4 items in one cell arise. That was where I was exploring creating a static array with a formula in the hope that could be read like any static array.

You in particular have been solving this very problem for me for several years, though you may not recall my earlier years' questions, I barely do. They all amount to the same thing, which is creating an audit trail of payments against invoices where the match is not one-to-one and without creating a separate sheet or table or using VBA - and doing it in such a way that data entry is not tortuous either keyboard wise or mentally. Indeed if we had had Lambdas back when you first solved this that may have worked in practice. Maybe I should dig out that early solution.

I tried your above but see below next comment.

Anthony





=IFERROR(INDEX(DATA!AH11:AH9947,SEQUENCE(ROWS(DATA!AH11:AH9947)/4+1,4)),"")

Open in new window


Here's what that does here:

on the left the "correct" answer on the right what the above does, if I had to guess it's the impact of the variable length cell contents in the source?

User generated image
and the source looks like this:
User generated image
where those content rows two and three have two sets of 4 csv items - meaning two invoices were settled with one payment.

Note: my example file is missing the latest version of the formula in AH which is this:
IF(EXACT([@INV],"Invoice"),LAMBDA(INVNO,LEDGER,NL,PAYEE,LET(many,ManyPaid(NL,RAs[[#All],[LOOKUP]]),IF(many,INDEX(RAs[[#All],[CrossRefs]],many,0),Allocator(INVNO,LEDGER,NL,PAYEE))))([@INVNO],[@LEDGER],[@NL],[@Payee]),"")

Open in new window

Difference being the IF(Exact, but this does not change the csv content structure, only which cells are blank.

Anthony
the actual problem I am trying to solve is the one in the Allocator column AH. Or even above that, how much is the remaining balance on each invoice as is calculated in the next column INVBAL which uses the data in AH to work that out. Then Ledger Control sheet lists each invoice and each payment made against it, using data from the Allocations sheet which again gets its data from AH.

AH picks up INV number and payment REFs and amounts pertaining to each and records them as related using 4 items of data, or 8 or 12 etc depending how many payments are made against each invoice. We can also have one payment settling multiple invoices.
IF I can get AH right, the rest should be "merely" reporting on it.

In database terms AH would be a table in its own right.

edit: I am battling with whether AH did work and I somehow broke it, or whether it never worked in certain circumstances that I did not notice By "broke it" I (think I) mean if I have altered the data/variables fed in.
 during testing.

Anthony
"I'm very happy to describe the overall problem I am trying to solve, but I have always found being too general on EE gets no response at all. Apologies if I have gone too far the other way."

Which is why I suggested doing both. We work for free (points and accolades) so we shy away from larger, more nebulous, "projects". But most of us are higher level consultants and appreciate seeing the bigger picture while asked a focused question that we perceive can be answered with five minutes or less of effort. Given the larger perspective, we can then guide you if we choose to, or devise an overall better solution if we have the time and inclination. If we don't have the bigger picture, everyone involved can end up spending more time overall going down little rat holes that probably don't need to be explored and can be avoided given the additional information.

At the end of the day, it's your problem and your question, and our choice to help.

And now, on to the bigger problem....

Kevin
Thank you, I will design my Q's accordingly

Anthony
I think what I might give a try to is posting a "this is my project" sort of idea question, with the file and then link to that as context from my "5 minutes or less" granular questions. To avoid adding loads of text to Questions I know many want to be kept focussed.

I do all the time wonder what you would all think of the approach I am using to whatever overall problem it is, but have feared to tread there. I also don't ask any question trivially, only after having struggled with it and reached my wits' end, literally. Often after days of working on it.

No problem at all doing that, thanks for the steer.

Anthony
Does this solve your problem?

Solution.xlsx

Kevin
ooh... let me see
looks like it does, but I need to try using it for real. Which I will do and get back to you. I see you use a combined key, I have often wondered if that might be the way.

And your DA is nice and short.

Anthony
PG02 and 3 are settled by the payment at BankA row 31 46440
so the only remaining balances are PGG04 and ABCD04

If (big if at this point) I understand this correctly and it can work, I can get rid of three columns of my sheet INVPAY Allocator InvBalance and the RAs sheet, all of which are what are causing all my pain.

Just to be clear, this is not about the one-to-one matches, it is about where an amount is part paid, several times, or as in the above, one payment settles several invoices (just two in the above case).

Anthony
I wrote a file to try to illustrate the problem: Allocations illustrator.xlsx

User generated image
If you want to see if a client is current, then do the list by customer only and not customer and invoice number.

Here is my test workbook with a second method for getting the balance by customer and invoice, and a method for getting the balance by customer:

Solution.xlsx

Takeaways:

1) You made this more complicated than it needed to be. Which is why I said what I said about presenting the problem.

2) I think your accounting system could be improved.

3) When presenting your data, clean it up and make it is as boring as possible. Your data table was riddled with different colors and formats making it more difficult to decipher. The first thing I did was to copy values only to a new sheet.

4) The table you presented was not only populated with intermediate solution formulas, many of the formulas were not compatible with most versions of Excel. You are the only one I know who has access to LAMBDA. When presenting the problem, at least the core problem, try to leave as much of that out as possible unless you are absolutely certain it is a necessary step towards the solution.

Kevin
Balance by customer I don't need as that is available in the basic cashbook and is not the problem.
Balance by invoice does not work in this version either and is the problem.

See if my fresh example is more illustrative? Allocations illustrator.xlsx

Anthony
2) I think your accounting system could be improved.

Views welcome.

Anthony

Further to my comments above, sometimes the solution will end up being something you didn't even know you wanted. In other words, askers sometimes go down their own rabbit hole and the solution they think they want is more about going down some path that doesn't really get them what they really need.

In this case, perhaps all you really need to know is if a specific invoice has been paid in full or not, regardless of how many payments were made against that invoice. In other words, you really don't care if it was paid with one, two, or three payments - rather you just want to know if has been paid in full. Or maybe you just want to know if a customer is paid up regardless of how many invoices there are.

Whatever the case, a good consultant will ask what you are really trying to do and work from that discovery to derive a solution that answers the real question and not an intermediate question such as how can I join this text and split that text.

Kevin
Yes, the last solutions I have provided do not handle multiple invoices paid with a single payment.

Why is that important? What is the root problem you are trying to solve? You said that balance by customer is not needed because that is provided by the accounting system - so why are you concerned about multiple invoices paid by a single payment?

Kevin
In other words, you really ..... care if it was paid with one, two, or three payments


This is exactly what I require,
by invoice against payments and
by payments against invoices.
This is an audit trail.

What I don't really care about is the overall customer and supplier balances as those are easy to acquire. One to one connections are no problem, it is the one-to-many ones that are difficult, in two dimensions. 

Anthony


If you had a magic wand and could ask for exactly what you need from the sample table you posed above, what would it look like?

Kevin
incidentally, if you need different or more typed input data to do this, then that is available to whatever design is needed.



Anthony
Here is the crux of your problem:

PGG02/3PD

That's what the accounts receivable clerk entered into the accounting system when they received a single check covering two invoices.

What you need to do is either create two entries in your ledger - one for each invoice - and allocate the payment accordingly, or find a new accounting system that does not force you to make these kinds of entries.

Another solution (but it's a hack) would be to modify the entry to contain the invoice numbers of all invoices paid separated with a comma or pipe:

PGG02|PGG03

and leave off the "PD" part.

Then we can split that out and work with that but it's still going to be a challenge.

Kevin
I fixed an issue with my solution workbook:

Solution.xlsx

Kevin
This is a sliced pivot table version:
User generated image
The idea is to get the data in to a form upon which it can be reported at will, by Pivot Table is my old preferred and by DA  I am learning. The result must be capable of showing the total such that it can be compared with the overall (ledgers) balance ie total of all customer and supplier balances derived elsewhere, this being the control.

So another version of that is:


User generated image
from which the preceding Pivot Table is produced.

Anthony

ah, yes. One of the underlying challenges of all this is to not split anything over multiple rows.
Funny you spotted my PGG02 and 03 entry, I was highlighting the problem with that while thinking about it.

Another solution (but it's a hack) would be to modify the entry to contain the invoice numbers of all invoices paid separated with a comma or pipe:

That is what the AH Allocations column does, if I understand you.
And that itself, in the case of one-to-many items, is prepared in the RAs sheet, where the process is that the creation of the payment to the supplier is in play, so the creation of that combined reference is part of the creation of the payment to the supplier, so we do it in that screen instead of on a post-it.

In deed, that is a not a hack, it is a recognition and implementation of administrative reality. We are identifying the human process and including it in our system so the knowledge involved at that time is not lost (which usually it is).  As regards incoming payments against sales invoices, it's not much different, we still have to allocate the payment across the related sales invoices, so we "document" that with our remittance advice screen. All of which gets us our allocations data all in one cell for that payment in AH.

Looking at your other input now.

Anthony



or find a new accounting system


I have written this file from scratch to learn Structured Tables, DAs, Lambdas and Lets - while still using my preferred pivot tables.

I will be writing further systems per use case, as I have been doing for years, but this one to many problem has always been the most intractable, though avoidable as you say with multiple line entries or simply ignoring it and having balances by customer/supplier without further depth.

There comes a point where a formal system like Xero et al are needed. However, this is for use where that is not necessary and I require the information out of it in real time - as always. So overlaying these new techniques on old methods is proving interesting.

Anthony
Yes, that's the challenge: handling one payment covering multiple invoices.

If your overall goal is to balance out the books, why do you care if the individual invoices are listed? Tick and tie by customer and then, for those customers without a zero balance, look manually at the ledger.

And, without splitting the payment between invoices when it is entered into the ledger, how do you expect to split it later so that you can do the cross check? You can't do this simply by doing some LAMBDA action and creating keys that span payments and invoices - you have to split the amount between the invoices which no formula magic can do.

Which brings us back to doing the totals by customer versus invoices. Or change you accounting system/practices so that the clerk entering the payment splits it as they see fit and you have one entry per invoice being paid.

Do that and this whole headache goes away.

If you are concerned that the incoming payment will get lost somehow, you can add a check ID to the ledger so that you can aggregate them if needed and recover the amount of the original check. Much easier to do that than to split the amount later to accommodate whatever invoices your think it should cover.

Kevin
I fixed an issue with my solution workbook:
You certainly did and highlighted the missing link in the process. Looks to me like the payment reference would complete the picture where PG2 AND 3 are settled by  PGG02/03 PGG02/03
User generated image 

Anthony
 look manually at the ledger.
Please. The purpose here is to not have to do that.

If your overall goal is to balance out the books,


It's not, that is a given. My books always balance all the time in real time, indeed DAs offerto enhance this.

And, without splitting the payment between invoices when it is entered into the ledger, how do you expect to split it later so that you can do the cross check?...- you have to split the amount between the invoices which no formula magic can do.

See above, we do create the cross reference. Remember "cross refs" a while ago?
That was all about this detail of the problem. This is in the RAs sheet. (Remittance Advices which means exactly what we are describing here, what pays what. US English may have a different  word, but there will be one.)

Do that and this whole headache goes away.
We are doing this because it is hard. I've spent years trying to crack this and the RAs sheet is the compromise. This compromise is based on the recognition that most invoices are indeed one-to-one and only the few get split. edit: that said you are tantalising me with the possibility even the RAs sheet can be avoided.

And by the way, the input clerk is me.

Perhaps you need to know that there is (UK) law around what was paid when by which split specified by who. The dodge solution you are suggesting assumes no split is specified, in which case it is "oldest first". However, where there is a dispute or other reason then specifics become important.
Aside from the legalities of allocations, an audit trail is required and is useful when discussing who paid what when and so to speak "where the money went" beyond a vague "to the customer account". 

Anthony








your latest update above is a hair's breadth away from doing it all as far as I can quickly see.
Pivot table version, but it still does not solve the highlighted bit to show two invoices settled by one payment - because the payment isn't split anywhere such as it is in the RAs table and from there picked up by AH Allocator column.User generated image
Anthony
The problem with your approach (which starts with the entry process) is that, once entered, you can't, through formulas, assign the payment amount to multiple invoices. The process is too complex for a formula. FIFO, for example, involves dates and sorting and.... And what if the payment does not cover a whole number of invoices?

However, if you break the payment up at the time of entry (which is a manual process so can easily follow whatever rules and regulations you want to follow) then the downstream logic to do the cross checking becomes simple. And, since we are now working with more granularity versus less, we can easily aggregate the pertinent ledger entries to determine the original total amount of the payment. This is what we call a more normalized design in the database world. Values such as "PGG02/3PD" are not in any way normalized values and greatly increase the complexity of analysis.

To do this, all you need to do is use another field that identifies the payment itself. Use a unique value like you do with the invoice numbers. Once you have implemented this process, now all audit processes to connect payments to invoices becomes as easy as what we have already done the last few solutions in this thread.

Kevin
edit You may like to skip this to my analysis in my next comment.

There is already a column that does that, gives the payment reference. Those Allocx values are payment references. What used to be cheque numbers which I tend to call CQ01 CQ02 etc, but sadly these are also cell references so I have to use something else.

And what if the payment does not cover a whole number of invoices?

That's called "payment in advance" and is a balance OR is part payment and leaves a balance.
However, if you break the payment up at the time of entry (which is a manual process so can easily follow whatever rules and regulations you want to follow)
Yes exactly, we are on the same page here. The allocations column contains this information which in the case of one to one transactions is taken directly from the Data sheet and for  complex transactions is taken from the manually input RAs sheet, the green section.

Then the invoice balance column works out the balance against each invoice.
Meanwhile the Allocations sheet extracts the transaction history to provide an automated source for reporting the audit trail.
Then there is the Control Accounts sheet which has two sliced pivot tables where any customer or supplier combination can be selected and show the detailed history from both invoices to payments or payments to invoices.

All of this flows from solving the many-to-many problem we are discussing.
Of course not doing it is easy, but then so is not climbing Everest.
 
You may wonder why I am exploring this again now. Because these new features seem to offer possible opportunities not previously available.

I might add I am sure you have solved this in the past and I m looking to see if I can find the Question. It was years ago.

Here is how Brian Redmonb approached it in 2012, so nearly ten years ago.
https://www.experts-exchange.com/questions/27595791/How-to-record-payments-against-invoices-in-Excel-No-VBA-Yes-UDF.html

Back then I was allowing for a vba derived solution.

So, at input time we can choose whatever data we wish to have input, such as invoice number and payment number and even how much of each payment goes against each invoice and how many invoices are paid or part paid by a payment.After that it is all automated and "live" meaning no pivot table refreshing to get to the results, except for the slicer which is by definition a manual process anyway and is the equivalent of where you said "manually check the ledger" sort of idea.

The first time I "solved" this problem was in Lotus 123 in 1989, but back then I stored the payment and invoice references against the related invoice and payment, but did not do any reporting on them. That was left to manual intervention.

edit: Values such as "PGG02/3PD" are not in any way normalized values and greatly increase the complexity of analysis.
Yes like I said before, that unique entry is there only to demonstrate the problem, while I was thinking. There is zero (database) normalisation in this design, that's rather the point of it all, no dodging the hard part.

edit:
           
Pay one invoice with  one payment
Pay multiple invoices with one payment
Pay part of an invoice with one payment
Pay part of multiple invoices with one  payment
Pay multiple invoices and multiple part  invoices with one payment

So it is all about the allocation of one  payment
Thus that defines the input data required to generate my required output edit: these are illustrated in my next post.

It's lovely to have you along on my journey, quite a thrill in fact.

Anthony
 
So, here is an illustration of what is in my head:
Allocations_illustrator2.xlsx
and it contains as follows, on the left:

User generated image
and on the right, following on from the above, cross references derived:
User generated image

Where the final cross references shown in there in Column P look like these:
 
 PGG04,CQ05,-0010000.00;PGG05,CQ05,-0005000.00;PGG06,CQ05,-0008000.00 
 PGG04,CQ06,-0003000.00;PGG05,CQ06,-0003000.00;PGG06,CQ06,-0003000.00 


 PGG04,CQ07,-0001200.00;PGG05,CQ07,-0001300.00;PGG06,CQ07,-0001400.00;PGG07,CQ07,-0014400.00;PGG08,CQ07,-0012000.00 

With that data (I think) we can confidently bi-directionally audit trail invoices to payments and payments to invoices.

There are various methods of delimitation built in to those using textjoin.
1) Comma delimits
2) Semi colon delimits after each item, where an item (token?) is in three parts.
Both easily replaced with something(s) else, as also in the live file.
3) Fixed width, so references are always fixed at x characters and values modified to always be ten characters including the minus sign. There may or may not be a minus sign, where not there is an extra zero.
This is how the cross references are constructed in the RA (Remittance Advice) sheet.

The difference between the three parts in each above and the four parts in my EE Questions is the customer/supplier name is not present in this analysis, it being a luxury for this analysis. In real life it maybe claimed as a human necessity.

Anthony

Happy to post that as another question if you prefer.
Let's try a different angle.

At the end of the day, what are you going to do with the information you are attempting to derive? Is the end of all this going to be:

"Mrs. Jones, you have an outstanding balance of .... When can I expect payment?"

Or:

"Mr. Smith, you have overpaid your account by ... How how would you like to handle this overpayment?"

Or something else?

Kevin
Is your data coming to you as an Excel workbook or as a text file (CSV, TSV)?
Kevin: Something else. Audit trail is I think your answer and there is no escape from this.
I find your resistance interesting though; if there is another way that would be interesting.
For the record the first time I was asked for this information was circa 1984.

aikimark: My data comes to me by manual input, I think is what you are looking for as an answer.
We can make up our data entry to suit our purpose, without adding endless extra rows and columns. The idea is to bury the complexity, not to expose it.

How we derive the necessary cross reference data is the current sticking point, so if we could avoid my current manual method that would be great, though it remains to be seen if this is possible. The RAs (Remittance Advices) sheet creates the multi payment references whereas the simplex ones are simply derived in the main Data sheet column AH.

Here is the Table Field (column) that calculates the individual invoice balance, this being me figuring out how it (this Lambda) works. This is the INVBAL column in my Structured Table.:

User generated image
Anthony
The reason I am pushing back is that I have yet to discern a real reason for doing what you are trying to do. An audit trail is something that is self determining and, is probably hard coded. What you are doing is soft and dependent on the whim of the day. That will not stand up in court nor will the tax auditor appreciate it.

If you were to say something like: "the rules are set and the process is clear." Then the next question is "well then why do you not create multiple entries for the single payment?" Doing so locks in the allocation - no fuzzy logic, no LAMBDAs, no monkey business.

You say you want an audit trail, so create one at the time of entry. If a payment exceeds the amount of one invoice, make two entries - one for the oldest invoice with the amount equal to the remaining balance on that invoice, and the rest allocated to the next invoice. Set the Allocx to the same value for both entries.

Now, if you were to say: "Kevin! Now I've lost the entry for that payment!" I would respond that, since we are dealing with a normalized table now, "we can discern that payment by summing the allocated amounts over the Allocx value."

You see, it's easier and possible to go from granular to less granular than it is to go from less granular to more granular. It's virtually impossible to go from less granular to more granular without specific rules and complex algorithms.

And, again, you still haven't answered my question: what question are you trying to answer? What problem are you trying to solve? If this is an exercise in how to use LAMBDA and other cool Excel tricks to solve a complex problem then I'm all in. But I don't think you are really in that space - I think you are trying to answer a question and I have yet to understand what that question is.

You're not interested in what the customer's balance is.

If you say "I am trying to tie all the payments to invoices" then I would ask, again, why? What are you going to do with that information? Is an auditor asking for this information? A judge? How does knowing how a payment is split between two or more invoices going to help you with your job/customer relations/the tax accountant/the judge? Does this have something to do with how income is distributed across tax reporting years? Accruing interest on outstanding customer balances?

And a new question: why can't we make two entries for one payment instead of one entry other than you would have to go back and re-enter some payments?

Kevin

Does this have something to do with how income is distributed across tax reporting years? Accruing interest on outstanding customer balances?


No.

How does knowing how a payment is split between two or more invoices going to help you with your job/customer relations/the tax accountant/the judge?
I already answered that. Remittance Advices law.
If this is an exercise in how to use LAMBDA and other cool Excel tricks to solve a complex problem then I'm all in.
Indeed verily it is.

But I don't think you are really in that space -

I am.

I think you are trying to answer a question


I'm not.
Other than how to do the impossible. I like doing the impossible, it just takes a little longer, a couple of decades so far.

and I have yet to understand what that question is.

I guess because there isn't one, at least in terms of my motivation.
I AM the tax auditor by the way.

 How does knowing how a payment is split between two or more invoices going to help you with your job/customer relations
It is a question of precision and confidence in the figures, in your above context and not looking like an idiot for not knowing. Accountants are expected to know these things, other are not expected to know or care. Like I said I was first asked for this level of detail in 1984, during the audit of a global corporate - I was just the (qualified) operative back then. Never forgot how hard it was to acquire the data. Come to think of it also earlier than that, in around 1980. No computers then.

And a new question: why can't we make two entries for one payment instead of one entry other than you would have to go back and re-enter some payments?
Because we are allowing for multiple, even unlimited payments. As illustrated in my above illustration.
Plus the fact I absolutely do not want to go backwards.

Just because you can't see it doesn't mean I don't want it, you know.
I've been chasing this rabbit for a very long time.

There is I suppose a list of reasons why one might wish to see the transaction audit trail. Sure hard wired in formal systems, usually multi user which is pretty much why audit trails exist, this isn't a formal system and likely only one user. However, it is an audit trail in generic terms, it allows one to audit the history of both the transaction and the personal account.
It also allows one to see how much of what was paid when and often also why, instead of guessing which is what you are doing if you don't have this.
 Throw sliced Pivot Table reporting in to the mix and it's wonderful, all point and click reporting on what is (was!) impossible. Point being we can't point and click dynamic arrays.

Remember I do have a solution that works, but as you have roundly observed it is replete with intermediary formulas and even sheets. I have always had solutions that work, but never anything that I now think may be possible with these new features we have available. Trouble is I have first had to try to learn said features and I am still doing that, all of which is wonderfully distracting from world affairs.

It's fun! :-)

Anthony


One of us is going to wear the other down. I'm beginning to think that I'm the one who is going to give in.

But, before I do, a few more questions.

If this is because of tax accounting law, then, I implore you to do the allocation at data entry time. Not only will this be 100% acceptable to any forensic accountant, this whole decades long headache/rabbit chase goes away-instantly. If you don't want to go back simply because it will take too much time, then how much time have you spent thus far chasing this elusive little bunny? And how much time is still in the queue to be spent looking for the little rascal?

Regarding the solution that works, perhaps if I can see it I might be able to look for a way to transform it into a more dynamic array structure.

The key to this mess seems to be - in your mind anyway - this "column P" LAMBDA business. Can you describe in simple terms what you are creating here? I can't follow your LAMBDA post above.

And why not VBA? I could bang out a VBA solution in very short order given my current understanding of the problem. It's basically a process of, using FIFO, assigning payments to invoices and vice versa. What is it about VBA that has you so spooked? This is clearly a problem that is just a bit too complex for Excel worksheet formulas but a walk in the park with VBA.

Kevin
One of us is going to wear the other down. I'm beginning to think that I'm the one who is going to give in.

But, before I do, a few more questions.

If this is because of tax accounting law, then, I implore you to do the allocation at data entry time. Not only will this be 100% acceptable to any forensic accountant, this whole decades long headache/rabbit chase goes away-instantly. If you don't want to go back simply because it will take too much time, then how much time have you spent thus far chasing this elusive little bunny? And how much time is still in the queue to be spent looking for the little rascal?

Regarding the solution that works, perhaps if I can see it I might be able to look for a way to transform it into a more dynamic array structure.

The key to this mess seems to be - in your mind anyway - this "column P" LAMBDA business. Can you describe in simple terms what you are creating here? I can't follow your LAMBDA post above.

And why not VBA? I could bang out a VBA solution in very short order given my current understanding of the problem. It's basically a process of, using FIFO, assigning payments to invoices and vice versa. What is it about VBA that has you so spooked? This is clearly a problem that is just a bit too complex for Excel worksheet formulas but a walk in the park with VBA.

Kevin
there's an echo in here....
Explain column P in a way I can understand? I can't follow your LAMDA post.

Why not VBA? You haven't answered that question.

A new question: would a single customer solution work for you? That could probably be wrangled without VBA.

Kevin
Explain column P in a way I can understand? I can't follow your LAMDA post.

Why not VBA? You haven't answered that question.

A new question: would a single customer solution work for you? That could probably be wrangled without VBA.

Kevin
ok I'll work through that bit by bit:

One of us is going to wear the other down. I'm beginning to think that I'm the one who is going to give in.

But, before I do, a few more questions.

If this is because of tax accounting law,

No it's not. Merely that it is relevant.

then, I implore you to do the allocation at data entry time.

I do, it's in the RAs (Remittance Advices) sheet.

Not only will this be 100% acceptable to any forensic accountant, this whole decades long headache/rabbit chase goes away-instantly.
I'd be lost without my rabbit. And besides I know him so well. 

If you don't want to go back simply because it will take too much time,

That's the second time you've said something like that. To be clear,I can re-write the entire file in a couple of hours, it's no problem at all.

then how much time have you spent thus far chasing this elusive little bunny? And how much time is still in the queue to be spent looking for the little rascal?
Well, I suppose my life expectancy is something like another 20 or if lucky 30 years.
Unless Microsoft relent and let us read formulaically constructed static arrays.
Then I have this solved in a ten seconds. Which by the way and I still haven't found it, is what I am sure you did for me several years ago. Maybe more than several, in here. 

Regarding the solution that works, perhaps if I can see it I might be able to look for a way to transform it into a more dynamic array structure.
You already have it, except it might be a bit broken cause of all this messing about. I'll send you a live one.

The key to this mess seems to be - in your mind anyway - this "column P" LAMBDA business.

Column P? Erm... Lost. Not to worry, next question makes this irrelevant.

Can you describe in simple terms what you are creating here? I can't follow your LAMBDA post above.
Simple terms? No.
Sure wasn't expecting you to follow my Lambda. Ha! You should try following yours!

I could bang out a VBA solution in very short order given my current understanding of the problem.

Yes I well know you could. BUT this rabbit is ALL ABOUT using spreadsheet formulas and not programming. Remember I wrote the original of this in 1988 using LDL (Lotus Development Language) a.k.a. their macro language.

It's basically a process of, using FIFO, assigning payments to invoices and vice versa.

No it's not. FIFO is in the absence of any other evidence, the last resort if you like.
In fact none of the allocations in my example are FIFO, they are all "as advised by the customer" or "determined by us": what that means is payments matched to invoices explicitly, not by FIFO default.

What is it about VBA that has you so spooked? This is clearly a problem that is just a bit too complex for Excel worksheet formulas but a walk in the park with VBA.
And why not VBA?
Spooked? I'm not spooked, I just know what I want and how I want it.
I will use VBA snippets as occasional tools, but not for spreadsheet operations.
It's not too complex for Excel.
Why? Well. The trick is to realise that while we are automating everything, in real life use, the process of doing the bookkeeping includes various administrative tasks one of which is the process of preparing payments to suppliers, which even using formal accounting systems is a manual process unless defaulting everything to FIFO, which is bad practice if not unusual. Supplier statements need to be reconciled and likewise all this with customers, if slightly differently.
All this is what you unwittingly refer to as:
then, I implore you to do the allocation at data entry time."
So we are on the same page in this. Even though it may not feel like it.
The Remittance Advices (RAs) sheet replicates the manual process and even makes it easier as so little needs to be entered. A future round of evolution might even automate it a bit, but that's not for now as it's trivial to do.

Use case - but not why I am doing this:
I'd be shot if I allocate a customer payment to the wrong invoice, by the way. Credit control being crucial - of course getting the confirmation of what is being settled can be a pain in the usual.

I'll send you that file.
The payment allocations system works in this file.
May need explanation, let me know.

Anthony
you are double posting?
"I do, it's in the RAs (Remittance Advices) sheet."

Wait...what?!? You mean you already are recording multiple entries for payments that span more than one invoice? Where and why have you not provided that information? That would make this task so much easier! But the RAs sheet isn't it. The transactions need to be shown in the main list. Are they hidden somewhere?

Beyond that, I'm pooped. I need a break. I'm trying to figure this rabbit out while flying in a tiny little seat from Chicago to San Francisco.

We need to do another online meeting.

Kevin
"I do, it's in the RAs (Remittance Advices) sheet."

Wait...what?!? You mean you already are recording multiple entries for payments that span more than one invoice? Where and why have you not provided that information? That would make this task so much easier! But the RAs sheet isn't it. The transactions need to be shown in the main list. Are they hidden somewhere?

Beyond that, I'm pooped. I need a break. I'm trying to figure this rabbit out while flying in a tiny little seat from Chicago to San Francisco.

We need to do another online meeting.

Kevin
I have tidied up the presentation a bit so please no not go creating fresh sheets with no formulas in them, assuming you want to see how this works and thus what I am trying to improve.

Make a copy and to simplify things in that file, delete all sheets to the left of DATA. Delete last sheet on right Cashflow.

That leaves only 4 sheets, of which only two are directly relevant:


DATA The input sheet
In here the only relevant columns are:

INVNO (The Invoice Ref)
PAYEE
NL (The Payment Ref)
LEDGER The amount of INV or Payment.

RAs the sheet that generates the magical complex references, automatically feed back into DATA in col AH Allocations.



In RAs and only for a complex settlement, using only cols A B & C
enter
in
A INVREF The invoice number form INVNO in Data sheet.
B SETTLED The amount of the invoice being settled
C CQNO The Payment reference from the NL column in Data sheet.
The rest is automatic… picked up in Data AH



If entering a fresh transaction one would enter similar to existing entries, I usually copy a similar one and edit to taste.

For the allocations to work the basic necessities are INVNO PAYEE and NL require entries, plus if it is a simplex allocation, (one pays one) put the NL payment ref also in the INVNO Column, that makes the one-to-one link.

And of course an invoice goes directly in to the LEDGER COLUMN and a payment goes in the bank column AND the ledger column, sign reversed, like existing examples (debits and credits in action). If Debits and credits are not your strong point just copy a previous entry like I do.

I am unable to see any way around having the RAs sheet and that complex cross ref construction, which seems to be where you are also.

That's it.


That's it, but also remaining are, and which flow from the above magic:
Allocations Contains your last formula solution which extracts all the allocations.
ControlAccs Which reports on Allocations - and is the equivalent of the traditional Sales and Purchase Ledger Control accounts.
Where and why have you not provided that information?

I have over and over, but it IS the RAs. So Yes as you say an online session seems needed.
Clear up a few things.
At least the file I just sent you is a working version.

Sounds like you are flying on a jump seat? Or a small plane? Anyway, enjoy.
My last rabbit was all chocolate, didn't last very long.

Anthony
trying to figure this rabbit out

Only Invoice Number, Payment Number, Amount settled are needed.
Your last contribution massively simplified the Allocations sheet.
 

Easier said than done as we are proving, though I wonder how much is down to our communication success or otherwise in this big picture scenario. As you observed previously, it's not a five minute thing when stepping outside the focussed granular question, so thanks again for your interest.

Anthony
Been doing this all night, updated file sent, now 5.50am so calling it a night, though outside is daylight.
Anthony