Text file comparison using spark data frames

Hi, i would like to compare any 2 text files. These files may be some what bigger in size.  Would like to do using spark dataframes. Sample requirement shared below.

Ideally, take first record from file1 and search in entire file2 and it should bring all matched occurrences and export to output file by putting proper flag as i mentioned (like update/delete/insert/same) by using Pyspark. Similarly all other records from file1 also should follow same approach.

DataSet1 - (file1.txt)
1   IT  RAM     1000    
2   IT  SRI     600
3   HR  GOPI    1500    
5   HW  MAHI    700

DataSet2 - (file2.txt)
1   IT   RAM    1000    
2   IT   SRI    900
4   MT   SUMP   1200    
5   HW   MAHI   700

Output Dataset - (outputfile.txt)
1   IT  RAM     1000    S
2   IT  SRI     900     U
4   MT  SUMP    1200    I
5   HW  MAHI    700     S
3   HR  GOPI    1500    D

ram kjr devAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.