I need to quickly process very large text files (90MB < 1GB ). I have found a solution, but I need a faster one.
1. Remove info from begining to code word (; NS) **Taking too long
2. Remove info from every line after the first space **Taking way too long
3. Find records found in yesterdays list, but not todays. **Works good
4. Remove Duplicates **Works good
sed -e '1,/; NS/d' -e 's/ .*//' new_file | diff -e - last_file | sed '/\.\|,\|a/d'| sort -u >done_file
sed -e '1,/; NS/d' (Remove info from begining up to "; NS")
-e 's/ .*//' new_file (Remove info from every line after first space)
| diff -e - last_file (Compare to yesterdays list)
| sed '/\.\|,\|a/d' (Remove unneeded info from diff output)
| sort -u >done_file (Remove Duplicates)
Here is what I would like:
1. A better way to accomplish the first 2 parts.
2. A way to not output the line numbers from diff. Just need additions. No nums, changes, deletions
***This is needed ASAP. 250 POINTS***