Link to home
Start Free TrialLog in
Avatar of GeorgeTowers
GeorgeTowers

asked on

Parse a file with Regular Expression with PHP

I'm trying to parse a file and I'm using regular expression to split in chunks the records, below is so far what I have:
 $file_url = 'https://raw.githubusercontent.com/unitedstates/congress-legislators/master/committee-membership-current.yaml';

  // get raw file contents
  $raw_data = file_get_contents($file_url);

  // arrays for each record
  $id = $id_fields = $name = $name_fields = array();
  
  // get record chunks
  preg_match_all('/(.*?)\n[A-Z0-9]/sm', $raw_data, $record_chunks);

  var_dump($record_chunks);

Open in new window


The first chunk is getting all the info, the second value from the first array is loosing the first letter, example:
array (size=2)
  0 => 
    array (size=212)
      0 => string 'HLIG:
- name: Devin Nunes
  party: majority
.......
- name: Jeff Miller
  party: majority
.......
- name: K. Michael Conaway
  party: majority
.......
      1 => string 'LIG01:
- name: Frank A. LoBiondo
  party: majority
......
- name: K. Michael Conaway
  party: majority
..........
      2 => string 'LIG02:
- name: Lynn A. Westmoreland
  party: majority
 ....
      3 => string 'LIG03:
- name: Thomas J. Rooney
  party: majority
.......

Open in new window


As you can see just the first element (HLIG) is correct the consecutive ones  (LIG01, LIG0X) are losing the first letter (H), what am I doing wrong?

Thanks.
ASKER CERTIFIED SOLUTION
Avatar of Dan Craciun
Dan Craciun
Flag of Romania image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of GeorgeTowers
GeorgeTowers

ASKER

Thanks, works like a charm.
You're welcome.

Glad I could help!