Analisi dei dati con R

All' ombra de' cipressi e dentro l' urne Confortate di pianto è forse il sonno Della morte men duro? Ove più il Sole Per me alla terra non fecondi questa Bella d' erbe famiglia e d' animali, E quando vaghe di lusinghe innanzi A me non danzeran l' ore future, Né da te, dolce amico, udrò più il verso E la mesta armonia che lo governa, Né più nel cor mi parlerà lo spirto Delle vergini Muse e dell' Amore, Unico spirto a mia vita raminga, Qual fia ristoro a' dì perduti un sasso Che distingua le mie dalle infinite

sepolcri.txt.1 "All' ombra de' cipressi e dentro l' urne Confortate di pianto è forse il sonno Della morte men duro?" sepolcri.txt.2 "Ove più il Sole Per me alla terra non fecondi questa Bella d' erbe famiglia e d' animali, E quando vaghe di lusinghe innanzi A me non danzeran l' ore future, Né da te, dolce amico, udrò più il verso E la mesta armonia che lo governa, Né più nel cor mi parlerà lo spirto Delle vergini Muse e dell' Amore, Unico spirto a mia vita raminga, Qual fia ristoro a' dì perduti un sasso Che distingua le mie dalle infinite Ossa che in terra e in mar semina morte?"

library(rtweet) # autorizzazione (log-in) t_oauth <- create_token(app = "miapp", consumer_key = "xxxxxxxxxxxxxxxxxxxx", consumer_secret = "xxxxxxxxxxxxxxxxxxxx") tw.df <- search_tweets("#rstats", n = 5)

names(tw.df)

 [1] "user_id"                 "status_id"               "created_at"             
 [4] "screen_name"             "text"                    "source"                 
 [7] "display_text_width"      "reply_to_status_id"      "reply_to_user_id"       
[10] "reply_to_screen_name"    "is_quote"                "is_retweet"             
[13] "favorite_count"          "retweet_count"           "hashtags"               
[16] "symbols"                 "urls_url"                "urls_t.co"              
[19] "urls_expanded_url"       "media_url"               "media_t.co"             
[22] "media_expanded_url"      "media_type"              "ext_media_url"          
[25] "ext_media_t.co"          "ext_media_expanded_url"  "ext_media_type"         
[28] "mentions_user_id"        "mentions_screen_name"    "lang"                   
[31] "quoted_status_id"        "quoted_text"             "quoted_created_at"      
[34] "quoted_source"           "quoted_favorite_count"   "quoted_retweet_count"   
[37] "quoted_user_id"          "quoted_screen_name"      "quoted_name"            
[40] "quoted_followers_count"  "quoted_friends_count"    "quoted_statuses_count"  
[43] "quoted_location"         "quoted_description"      "quoted_verified"        
[46] "retweet_status_id"       "retweet_text"            "retweet_created_at"     
[49] "retweet_source"          "retweet_favorite_count"  "retweet_retweet_count"  
[52] "retweet_user_id"         "retweet_screen_name"     "retweet_name"           
[55] "retweet_followers_count" "retweet_friends_count"   "retweet_statuses_count" 
[58] "retweet_location"        "retweet_description"     "retweet_verified"       
[61] "place_url"               "place_name"              "place_full_name"        
[64] "place_type"              "country"                 "country_code"           
[67] "geo_coords"              "coords_coords"           "bbox_coords"            
[70] "status_url"              "name"                    "location"               
[73] "description"             "url"                     "protected"              
[76] "followers_count"         "friends_count"           "listed_count"           
[79] "statuses_count"          "favourites_count"        "account_created_at"     
[82] "verified"                "profile_url"             "profile_expanded_url"   
[85] "account_lang"            "profile_banner_url"      "profile_background_url" 
[88] "profile_image_url"

[1] "Getting prepared for #rstats @uRosconf!\nSee you on monday in Bucharest! https://t.co/XfYiCbKv4U" [2] "this, when you feed iris and mtcars in the bath after midnight . or you know just arrive at work on Monday. #rstats via @rstatsmemes FB https://t.co/PzqXSJo7TK" [3] "Hi #rstats #brasil, next week I'll be giving a talk and a mini course on highcharter at IV SER :) ser you there https://t.co/lTKQ34B2lp https://t.co/WdVZU2767y" [4] "\U0001f4d5 #Rbook: R, Databases, and Docker\nBy John David Smith, Sophie Yang, M. Edward (Ed) Borasky, Jim Tyhurst, Scott Came, Mary Anne Thygesen, and Ian Frantz\n#rstats #Rdatabase #dplyr #SQL #havefun\nhttps://t.co/PWWJTovQGK" [5] "New tutorial paper with @cedricbatailler and @paulbuerkner : An introduction to Bayesian multilevel models using #brms #Rstats \n\n https://t.co/oP8IndqrTN\n\nAs always, preprint on @PsyArXiv, code and supplementary materials are available on OSF: https://t.co/xvlHqetS1b"

## [[1]] ## [1] "getting" "prepared" "#rstats" "@uRosconf" "see" "monday" ## [7] "bucharest" ## ## [[2]] ## [1] "<U+0001F4D5>" "#Rbook" "r" "databases" "docker" ## [6] "john" "david" "smith" "sophie" "yang" ## [11] "m" "edward" "ed" "borasky" "jim" ## [16] "tyhurst" "scott" "came" "mary" "anne" ## [21] "thygesen" "ian" "frantz" "#rstats" "#Rdatabase" ## [26] "#dplyr" "#SQL" "#havefun"

Esempio “I sepolcri”

Il testo

I frammenti o segmenti

Token e Types (forme)

La legge di Zipfs

Dai testi alle forme: la normalizzazione del testo

Le parole vuote

I poliformi

Normalizzazione delle parole flesse

lemmatizzazione

stemming

La matrice documenti per forme

Che intendiamo per testo, documento, parola?

Testi e documenti

Forme

La costruzione della base dati

Codice: dal testo ai token

Wordcloud

Analisi delle co-occorrenze

Grafi e networks

Text network analysis

Grafo delle co-occorrenze di un termine

Multidimensional Scaling (MDS)

Analisi del contenuto

Il Social Mood Economy Index dell’Istat

Problemi metodologici

Campionamento

Le Api

I messaggi di Twitter

I campi

I testi

Problemi

Parsing “facile”

Le decisioni dell’Istat

La sentiment analysis

L’indice giornaliero dell’Istat