Avatar of Ray Erden
Ray Erden
Flag for United States of America asked on

Text File Manipulation and Excel Power Pivot

I use text files as source data to feed Access tables due to 2GB size limitation of MS Access.  I am currently using two text files that are merged daily after the one is updated, one of the text files is 2016-2017.txt historical data merged with daily updated 2018.txt file.  The merged file size is 1.24 GB and I can work just fine with it without any problems.  

Lately I have decided to use a bigger size text file for a larger historical analysis like going back to 2012.  I compiled text files into one  2012-2017.txt file using Windows commands on C:\ prompt and added my daily updated 2018.txt file into this history so I could work with one historical data set all the way back to the year 2012.  This is the method that I use daily for my current daily updates anyway without any issues.  The expanded file size is 3.45 GB and when opened in Excel Power Pivot it displays 90 fields. My goal is after loading the data model in Excel power pivot cerate various pivot tables for historical data analysis.

The problem is that when I merge daily refreshed 2018.txt file into the historical file then connect to the resulting merged file via Excel Power Pivot (Access fails to open this file due to size but power pivot works ok along with Power BI) and insert a pivot table what I see is that all the years' total sales dollars from 2012 to 2017 get displayed correctly but 2018 is incomplete by a huge margin.  I tried to load and re-load several times in different Excel files and I tried to merge history (2012-2017.txt) into the current year (2018.txt) the end result is that whatever the merged text file is it shows incomplete in Excel power pivot.

I know that there is no file size limitation in text files and the only limit is the size of the HDD so I just can't find any reason why this is the case.  When I created my text files I used comma delimiter and in the same fashion I load them into Excel Power Pivot data model.

Does anyone have any idea or experience with a problem similar to what I have just described here?  

Thanks in advance
Windows BatchMicrosoft Excel* Text EditorMicrosoft OfficePower BI

Avatar of undefined
Last Comment
Ray Erden

8/22/2022 - Mon
Michal Ziemba

Have you tried to open files separately before merge?
Did you succeed to open the 2018.txt and 2012-2017.txt in the pivot table without any problems?
If yes, then I would focus on the merge process and look at the place in the file where those two files are merged. There might be some additional lines, characters, whatever which makes it impossible to read on properly.

This would be my first thought.
Ray Erden

ASKER
2018.txt file is loading just fine and showing  good results after having it merged with 2016-2017.txt file thus 2018.txt file does not have any issues by itself by the same token 2016-2017.txt file either.  

When it comes to 2012-2017.txt file, data for these years display just fine as well, so the question is why after merging the 2012-2017.txt file wit the 2018.txt is causing a problem?  In case the size is the culprit I tried it with a shorter history like 2014-2017.txt merged with 2018.txt and obtained the same problem results. After going backward and merging the history of 2012-2015 to 2016-2018.txt produced also the same problem.

Below is the commands that I use to merge files as I use daily without any issues.  Merged file do have the exact same number of fields, field definitions and data types.  Below is just one example of my file merging process on the command line.

type zz_OCEAN_2015.txt >> newfile_Extended.txt
Michal Ziemba

It would be much easier for me to diagnose it if I could get the files and process them the same way as you do.
Is it possible that you can send them to me using a private message?
Your help has saved me hundreds of hours of internet surfing.
fblack61
Ray Erden

ASKER
I don't think I can send as the files are truly huge like 5 million rows of data and of course confidential company data.  I will keep at it for a while then I might get back to you again depending on my progress or lack thereof.  What puzzles me is that every morning I am adding the year 2018 YTD data into 2016-2017 historical data in text format using the same batch process without any issues.  Yet when I add 2018 to 2012-2017 all data show when opened in Excel except 2018.  Same process, same format etc.
Ray Erden

ASKER
Hello Michal after I got back to the problem posted in this question I have noticed the below situation.

I merged 2018 text file with 2012-2017 history text file again.  After the merger I connected the Excel Power Pivot to the newly merged file.  In the Excel's data model screen I checked the number of rows and they are exactly the same number of rows as they are in the Access table for 2018 data.  Therefore I conclude from this that text file merger worked okay.  

So that leaves me one option and that is a potential issue in the creation of  pivot table based on the power pivot's data model in Excel.  It either shows very little data for 2018 while all the other years' totals are correct or a very large sum for the year 2018 which is totally blown out of proportion by a huge margin.

Excel power pivot is not having this issue when I merge the same exact 2018 text file into the 2016-2017 history text file yet with a larger history it acts up like that.  Are there any known issues about the Excel Power Pivot data models?  Total number of rows are in excess of 5 million rows for this operation.

Any thoughts on that?

Thank you again
Ray Erden

ASKER
Just a quick follow up here...I have merged the 2018 text file with 2014-2017 history text then linked to it via Excel power pivot and have obtained the same problem results a few minutes ago. Will try to add 2015-2017 history to see how it acts.  For the life of me I just can't put a finger on the reason causing this strange situation when I add 2018 to a history longer than 2 years.  I am totally dumbfounded on this one.
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
aikimark

Just because there are the same number of columns, doesn't mean that the data aligns.  There are problems that you will need to check.  For instance.

* are the columns in the same order?
* are there header rows?
* are the headers the same?
* are the column data types the same?
* are the rows delimited the same?
Ray Erden

ASKER
Yes for all of the above bullets.

Each text file regardless of the time frame has the exact same setup in terms of number of columns, column sequence, header rows, data type for each field and the way rows are delimited.  Each text file was generated from Access tables with the identical table structure and the only difference amongst them is the date range.   That is why I commented on the number of records as a measure in short.

After file merge is done and data from the resulting text file is pulled into Excel data model for power pivots  and pivot table is built I get 2016, 2017, 2018 values accurately.  This is done by merging 2018 text file by 2016-2017 file and this routine is used every morning without an issue.

However, I also have 2012-2017 and 2014-2017 historical text files and I merge them with the 2018 file in the same fashion using the batch process with those historical files and this is where I hit problems.

The implication of this issue is that if I want to pull in longer historical periods I would not be able to have reliable results in my dashboards.

Please take a look at the sample file to support the illustration.
EE_Illustration.xlsx
Ray Erden

ASKER
aikimark - have you had a chance to review the attached file to have any thoughts?
I started with Experts Exchange in 2004 and it's been a mainstay of my professional computing life since. It helped me launch a career as a programmer / Oracle data analyst
William Peck
Ray Erden

ASKER
Interesting! Its been 18 days since I posted this question on this forum along with using alert function twice yet obviously it did not stimulate much interest around here.  If one reads into the question one would see that it indeed would be called an interesting one I think.
aikimark

What OS and Excel versions are you using?  I wonder if there's a 32-bit limit somewhere that we're overlooking.

Have you tried bringing in the 2018 data with a combined size less than 2 GB?

The way you described the combining of text files has me concerned that you have headers in the middle of your data that might be messing up the import engine in PowerQuery.
Ray Erden

ASKER
OS - Windows 8.1 64 bit
Excel - 32 bit

I bring in and merge the 2018 data in text file format with another text file with size 987MB (2016-2017 data) on a daily basis without any issues. I have 2014-2017 and 2012-2017 historical files exactly in the same format as 2016-2017 file.

So as you see 2018 merges successfully without any issues with 2016-2017 file yet it fails if I merge it with the other two bigger historical files.

You mentioned that there might be headers in the middle of data which could be interfering with the import process into Excel power query.  
This is not possible and the reason being is that, like I stated above the format of the larger historical files and the 2018 data are exactly the same.  If it was the case 2018 would not be merging with 2016-2017 data.  I am having this problem when I try to merge that file with larger size text files as in 2012-2017 data with the size 3.96 GB and 2014-2017 text file with the size 1.95 GB.

My predicament here is that Excel PowerQuery is perfectly able to handle such large data sets, for example I can connect to 1.95 or 3.96 GB size data files in text format without any issues and work with them, same as I did wit MS Power BI.  But by adding the 2018 file I obtain the results as illustrated on my attached file to this posting.  Clearly 2018 file itself which is being updated as a YTD data does not have any formatting issues as I am perfectly able to merge it with 2016-2017 data with 987MB size.  

Also, I might add that the text file merging process is using exactly the same batch script that I use daily without any issues.  To summarize the problem again, type of error that I have been having with this process is that if I combine 2018 text file with a bigger historical data 2018 values that are displayed are way off.  If I merge the bigger historical data with the 2018 data historical values that are displayed are way off.  

Keep in mind that I did build those historical files in a text file format using the same batch process for merging text files without any issues with the correct values.

So my frustration now is that if I decide to expand the historical analysis period including the current YTD (2018) data then I will be getting completely wrong values for the current year which will be feeding the reports such as year over year comparisons etc.
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
Ray Erden

ASKER
To simplify and clarify my long-standing question and to make it easier to read, understand and work with I will try below illustration.

Text Files in the Process:

File1 – 2018 (321 MB)
File2 – 2016-2017 (987 MB)
File3 – 2012-2017 (3.96 GB)
File4 – 2014-2017 (1.95 GB)

The Program Flow Description:

The resulting merged file which is File1+File2=newfile is linked to Access table.  Queries in Access use this table and support the Excel Power Pivot data model.  Data is refreshed daily and pivot tables in Excel get updated accordingly.

Working Process:

On daily basis using a batch script I can merge File1 and File2 without any issues.  The process basically takes the File1 and appends to File2 to obtain 2016 and 2017 historical values combined with the 2018 YTD values.  Data is used by Excel Power Pivot feeding pivot tables. When I check the values years 2016, 2017 and 2018 are all correct and complete every morning after the same append process running the same script.
The field titles, file format, delimit type are all the same for the listed files in the process.

Experimental Process:

Currently the working process is feeding BI reports for 3 years of analysis period and I have been planning on expanding the historical time frame by adding a year or two to the current analysis period. So, I tried to add File1 to File3 in the same fashion as I do every day with File1 and File2.  The results were such that the years from 2012 to 2017 were all good but 2018 was totally off.
So, I tried the same routine with a smaller size file which is File4 but obtained the same problem results as described above.  This time I decided to add from the other way around and modified my script and added history file to the current year’s file so added File3 to File1 and File4 to File 1 in separate experiments.  This time the current year totals were displayed correctly but the historical values were way off.
I provided in my thread an illustration for ithe resulting values with explanations for each result on attached Excel file.

Problem Impact:

The issue with this problem is that I will not be able to provide a larger date frame for comparative year over year analysis if I am not able to merge the current year with the larger history in text file format that feeds the process.  My daily dashboard reporting process which is based on Excel powerquery working with Access and text files storing data text file manipulation is an essential and integral part hence must be resolved.
Rob

Hi,

Sorry I haven't seen this sooner.  I've had to do something very similar but with MySQL and another BI tool but the principle is the same.

What concerned me is that you're using "type".  I've had a lot of success just using "copy" on the command line, which doesn't involve needing to "read" the files each time.

e.g.
copy 2012.txt+2013.txt+2014.txt+2015.txt+2016.txt+2017.txt+2018.txt sales.txt
Rob

The other thing I had issues with is header rows being in each of the files, which i ended up removing in each of the archive files and kept a header file separate so that my merge went something like this:

copy headers.txt+2017.txt+2018.txt last_year.txt

Reading through your question, if you haven't already, go for the systematic approach.... test each year's file on it's own, then merge with the subsequent year until you hit the issue.

ie. you may have already done this but just have to ask.
  • 2018.txt+2017.txt - issue?
  • 2018.txt+2017.txt+2016.txt - issue?
  • 2018.txt+2017.txt+2016.txt+2015.txt - issue?
  • 2018.txt+2017.txt+2016.txt+2015.txt+2014.txt - issue?
  • 2018.txt+2017.txt+2016.txt+2015.txt+2014.txt+2013.txt - issue?
  • 2018.txt+2017.txt+2016.txt+2015.txt+2014.txt+2013.txt+2012.txt - issue?
All of life is about relationships, and EE has made a viirtual community a real community. It lifts everyone's boat
William Peck
Ray Erden

ASKER
Rob,
First of thank you for looking at my question which has been sitting here since Sept 21st with three broad alerts sent out on its status.

Based on your recommendations I have taken below steps,

Instead of using "type" command I used "copy" command per below script,
copy history2014_2017.txt newfile1.txt
copy newfile1.txt+OCEANXX.txt (OCEANXX.txt stores the year 2018 data)

After recreating the new text file this way I connected to the resulting text file via Excel Power Pivot and created a new pivot table.
What I have noticed was that the years 2014-2017 were showing the correct values yet the year 2018 was again displaying entirely frivolous values same as in my previous trials.

On the other hand, I used the same process for adding the 2016-2017 history to 2018 YTD and had the correct values as I have each morning, so no issues for this operation.

Upon obtaining the incorrect results as described above I moved on to your second advice about headers and created a text file with headers only (there are 91 field names) and removed headers from each file that is in the process.  After I was done with headers in each text file I applied the below script,

copy OCEAN_HEADERS.txt+OCEAN_2018_NoHeaders.txt+history_2016_2017_NoHeaders.txt+zz_OCEAN_2015_NoHeaders.txt+zz_OCEAN_2014_NoHeaders.txt+zz_OCEAN_2013_NoHeaders.txt+zz_OCEAN_2012_NoHeaders.txt

For the new text files without headers in this trial process I used the file name with "NoHeaders" so I can differentiate between the files.  This time I obtained the correct values for the year 2018 but wrong totals for the rest of the years in the analysis period.

Also, referring to your incremental check illustration at the bottom of your post to trying each step until hitting the error this is what I can summarize:
The process is good with your first and second lines so no issues there. But adding 2018 to 2014-2017 and/or to 2012-2017 history have issues.
Ray Erden

ASKER
Rob,
Another trial this time I added file by file to the blank file storing headers only using type command.

Added 2018 (with no headers) to headers only file and got 2018 results correct.
Added 2016-2017 (with no headers) to the new file all three years values are correct.
Added 2015 (with no headers) to the new file and 2015 values came out wrong while the others are still correct.

So frustrating as I just can't find any reason that makes it wrong once I exceed 3 years to combine records.  What is even more frustrating is that I do have 2012-2017 (6 years history) file, obviously I was able to create it correctly while keeping headers in each file and using the type command.
Rob

Thanks Ray that's really helpful information as it tells me that it *could* be a limitation in potentially in PowerPivot and the number of lines in the resulting file?  At least worth ruling that one out.

For the combined file 2015-2018 that has the issue - how many lines total are in the file?
the 2012-2017 you say is correct yeah? - how many lines in that?
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
Rob

If possible, would you be able to provide me some sample files to work with? Is it easy enough to obfuscate the values?
We can use a dropbox if the files are too large
Ray Erden

ASKER
Number of Records by File (after file merge script run between the historical files and 2018.txt):
2016-2018.txt - Over 1,700,000 records  - 2016-2017 file merges with 2018 daily without any issues
2014-2018.txt - Over 3,500,000 records - problem file merge
2012-2018.txt - Over 5,000,000 records - problem file merge
2018.txt - About 430,000 records

File Size:
2018.txt -  321 MB
2016-2017.txt -  987 MB
2012-2017.txt -  3.96 GB
2014-2017.txt -  1.95 GB

Each data set has the identical 91 columns

The attached file is provided to display the resulting values as obtained after Excel PowerPivot is connected to the combined files after various file combination trials.
EE_Illustration.xlsx
Ray Erden

ASKER
Rob - Message sent to your inbox...
Experts Exchange has (a) saved my job multiple times, (b) saved me hours, days, and even weeks of work, and often (c) makes me look like a superhero! This place is MAGIC!
Walt Forbes
ASKER CERTIFIED SOLUTION
Rob

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.
Ray Erden

ASKER
Rob,

File size issue does make a full sense to me with one comment to add here, for these particular trials described on this thread I have not been using Access.  Due to 2GB size limitation of Access files I store data in text files and Access table links to that text file for 2016-2018 data for my daily operation, my goal was linking Access to a longer history stored in text format.  

As far as this question is concerned Access has been out of the loop, never the less I fully agree with your statement of taking Access out and bringing more powerful options into the equation.

According to Microsoft papers Excel PowerPivot should be totally capable to handle insanely large data sets without any issue where the only limitation would be the hardware and memory.   Because of that claim from Microsoft I was getting frustrated with this failure after my each attempt to merge text files to build a file with a longer history for analysis where user can simply refresh pivot tables to support his/her analysis.

At this point I do consider your verdict as the final determination of the root cause of this problem and pretty much wrapping up my work.  

Data size will grow even much more per future demands and analysis needs so to handle that well I have already had SQL Server 2017, SQL Server Management Studio installed on my machine but due to some initial glitches it is not operational at the present time so will try MySQL as an interim solution.

I certainly do appreciate all your help throughout the process.
Ray Erden

ASKER
I would like to thank to all experts who participated on this post, in particular I would like to extend my sincere thanks to Rob who stayed with me throughout the process and went above and beyond.
Rob

I agree it's frustrating! Especially when you're told it should handle it. I suspect it's the file parsing / import that's the issue more so than the large data set.

And after sleeping on it I realised my mistake that access had been out of the equation from the beginning LoL

I'll still like to try importing into MySQL to confirm there's no corruption in the large merged file but that said it's easy enough to load each year rather than merging years into text files :)
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
Ray Erden

ASKER
MySQL is in the process of getting installed on my machine for initial experiment albeit long term goal is the SQL Server. Once done with going through organizational security formalities to get clearance for MySQL I will start loading data year by year so I can work by connecting to a single table holding all years of data. Excel PowerPivot will be connecting to MySQL in this case as the front end available to the users.  I believe Excel powerquery will be just fine handling this way however big the data size is, if this proves to be a viable option which I believe it will then I will stop working through text files.  This change is really essential for me to provide a long term reliable BI platform to the user community within the organization and the issue that we worked on here has been pivotal.  Thank you again for your insight.