Twitter lexical analysis reveals the existence of distinct cultural regions in the U.S.

April 20, 2023

An international team of researchers, led by scientists from IFISC (UIB-CSIC), has mapped the different cultural regions in the United States of America through a lexical analysis of the content that citizens themselves post on their social networks. The results show a clear separation between Northern and Southern cultures, the latter influenced by the African-American population, as well as subtler differences between the East-West axis and urban or rural populations. To obtain the extent of these regions, they calculated the occurrence frequency of words within 3.3 billion geolocated tweets, published between 2015 and 2021.  This allowed them to find the hotspots where discussions or debates on specific topics were held. These results have been recently appeared in Nature’s Humanities and Social Sciences Communications.

 The idea of the existence of cultural areas in the United States of America is used as a case study in various fields of social sciences. However, the selection of common characteristics that make up a cultural region can be arbitrary and influenced by prejudices and biases. Therefore, an approach is needed to identify these cultural regions in an unbiased and more objective manner. Taking advantage of the enormous amount of data generated on the internet, especially through social networks, represents a relatively new opportunity with high potential.

 The researchers decided to analyze the case of the United States for several reasons, including having a huge set of geolocated Twitter data. In addition, the vast majority of Americans speak the same language (English), which is crucial for using the analysis tools. Another relevant aspect, the authors explain, is that the history of the USA is relatively recent but rich and varied, so the formation of different cultural regions within the same national territory is possible. 

The method presented in this paper is based on the principle that cultural affiliation can be inferred from the topics that people discuss with each other. The more messages sent from a region, the greater the interest of the population of that area in the topics contained in the tweets. Specifically, the authors measured regional variations in written discourse in U.S. social networks, using frequency distributions of content words in geolocated tweets to find those regional hotspots where certain topics appeared more frequently than others. From there, principal components of regional variation were derived and hierarchical clustering analysis was applied to derive the distinct cultural areas and the topics of discussion that define them. 

The study found a clear North-South separation influenced primarily by African-American culture, as well as other divisions that provide a complete picture of modern American cultural areas. While the work has confirmed that factors such as ethnicity and religion are important in defining American cultural regions, it has also found substantial variations in the relevance of these factors across the country. In other words, the study not only mapped cultural regions, but also identified the cultural factors that are important in defining these regions. In addition, the analysis identified other subtler cultural patterns such as attention to social interaction, interest in outdoor activities, family or leisure. The identification of these patterns is a novelty in the analysis of the U.S. society, as they are difficult to capture through analysis of traditional sources. 

The authors of the study conclude that, although their method has only analyzed one genre of American English, it could also be applied to any big data resource with linguistic value and provide a basis for a more complete picture of the cultural landscape, both for the U.S. case and for different nations.

Louf, T., Gonçalves, B., Ramasco, J.J. et al. American cultural regions mapped through the lexical analysis of social media. Humanit Soc Sci Commun 10, 133 (2023).


