On the shape of semantic space - what can we infer from large-scale st...
Buscar en todas las publicaciones
On the shape of semantic space - what can we infer from large-scale statistical properties of texts?
Czegel,Daniel (Supervisor: Maxi San Miguel)
Master Thesis (2017)
The large amount of digitized linguistic data opens up the unique possibility of using the methodology of complex systems to understand high-level human cognitive processes. Two such issues are i) the way we categorize the continuous space of real-world features into discrete concepts, and ii) the way we use language to copy a line a thought from one brain to another. In this work I address both questions by formulating a simple text generation model which reproduces the three major characteristic large-scale statistical laws of human language streams, namely Zipf’s law, Heaps’ law and Burstiness. Furthermore, the generation itself can be described as a random walk on a scale-free, highly clustered and low dimensional complex network, suggesting that this class of networks is appropriate as a minimal model of the semantic space. Entangling the global characteristics of the semantic space is an inevitable step towards analyzing texts as trajectories in such a space, with promising applications such as author or style identification, personal disorder diagnosis, or the evolution of cultural traits mirrored by text production characteristics.
Esta web utiliza cookies para la recolección de datos con un propósito estadístico. Si continúas navegando, significa que aceptas la instalación de las cookies.