Link to home
Start Free TrialLog in
Avatar of mosemadl
mosemadl

asked on

Parsing Twitter Data

I'm using R to collect data from Twitter. I only want to capture certain fields and am having an issue separating the expanded_url field from the rest of the data. Here's the code:

tweets <- searchTwitter(q.col, n=n.col, lang="en", resultType="Recent")
dataTweets = twListToDF(tweets)
Encoding(dataTweets$text) <- "latin1"
data <- data.frame(tweets$text, tweets$favoriteCount, tweets$created,
                                  tweets$screenName, tweets$id, tweets$retweetCount,
                                   tweets$longitude, tweets$latitude)
names(data)[names(data)=="dataTweets.text"] <- "text"
names(data)[names(data)=="dataTweets.favoriteCount"] <- "favorite"
names(data)[names(data)=="dataTweets.created"] <- "created"
names(data)[names(data)=="dataTweets.screenName"] <- "screenName"
names(data)[names(data)=="dataTweets.id"] <- "id"  
names(data)[names(data)=="dataTweets.retweetCount"] <- "retweets"
names(data)[names(data)=="dataTweets.longitude"] <- "long"
names(data)[names(data)=="dataTweets.latitude"] <- "lat"  

User generated image
When I send list of tweets (attached image) to a data frame it leaves the urls data out. The data holding the urls is a data frame itself and I'm unsure how to identify that piece of data. I hope I've explained this well enough.
Avatar of Kyle Hamilton
Kyle Hamilton
Flag of United States of America image

what is the output of:

head(tweets)

Open in new window

and
head(dataTweets)

Open in new window

Avatar of mosemadl
mosemadl

ASKER

User generated image
Clearly dataTweets doesn't include all of the attributes of a tweet.

However, from the documentation:
https://cran.r-project.org/web/packages/twitteR/twitteR.pdf

twListToDF
A function to convert twitteR lists to data.frames
Description
This function will take a list of objects from a single twitteR class and return a data.frame version
of the members
Usage
twListToDF(twList)
Arguments
twList A list of objects of a single twitteR class, restrictions are listed in details
Details
The classes supported by this function are status, user, and directMessage

Urls are not included in this. So You will not get the urls data frame this way anyway. The doc is not great. I couldn't find any instance of 'urls'.

hold on... I'm still looking into it
SOLUTION
Avatar of Kyle Hamilton
Kyle Hamilton
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
feel free to request attention for this question
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial