The Yale University Library Digital Humanities Laboratory is one of the places on campus where students and faculty can learn about textual analysis resources.

Language, Computers, and the Internet

Although the age of the internet is a tiny blip compared to the hundred thousand years or more that human language has been around, it's a major source of linguistic  material -- at least for some languages. This part of the exhibit covers two very different sets of ideas: humans’ use of language on the internet, and the use of computer modeling to study language.

Humans on the Internet

Internet speak is not a single type of language. It includes a bunch of different conventions. Its use varies by age, social media platform, and other things. But like other types of language use, it is systematic and there are conventions which are interpreted in particular ways. For example, to type the smiley face emoji 😀 , do you type :) or :-)? This is what's known as an "age-based variable" -- that is, people who use the :-) variant are highly likely to be older.  

Most language use on the web is English. In fact, there is an increasing "digital divide" where languages with millions of speakers do not have good digital infrastructure (for example, for spell check or language-specific search results), and much internet traffic is concentrated around a few large languages. However, there are ways to find more information about specific languages, such as the list of Twitter users at Indigenous Tweets.

Featured Titles

Because internet: Understanding the new rules of language

by Gretchen McCulloch

Absolutely everything you ever wanted to know about the linguistics of the internet. McCulloch covers everything from the grammar of emojis to the many ways in which texting is not just ‘sloppy speech’, to the generational differences in internet language use.

Access

Introduction to Natural Language Processing

by Jacob Eisenstein

This is one of several textbooks which show the wide array of possibilities for analyzing textual data and using computers for tasks like automated machine translation.

Maths meets myths: Quantitative approaches to ancient narratives

edited by Ralph Kenna, Máirín MacCarron, Pádraig MacCarron

The chapters in this book use mathematical and computational techniques to discover commonalities in folktales from around the world, an exciting way in which oral traditions preserve human history.

Access

Natural Language Processing

The second part of this section concerns the use of computational tools and models to study language: this covers everything from automated speech recognition systems to machine translation to constructing efficient search engines. This is the field of Natural Language Processing. This is a huge field with many different techniques and material which ranges from the highly mathematical to work that has relatively little linguistics behind it. One very common tool is semantic vectorization. It underlies a great deal of natural language processing, from search engines to restaurant recommendations. This premise involves turning text into a numerical matrix based on which words are frequently used together.  The Nordic Language Processing Laboratory Web Vectors site lets you explore English vector spaces.

 

Digital Humanities

Parts of linguistics also have points in common with digital humanities. The methods used overlap, though the types of analyses tend to differ. One example of where the two meet is captured in the Maths Meets Myths book (above). Another example is around work on the Voynich Manuscript, a 15th Century manuscript (Beinecke MS 408) in an unknown language and script. Linguistic methods can be used to assist decipherment (if it’s in a cipher) or at least in working out the features of the language system that underlies the manuscript, and whether it’s gibberish, code, or something else.

Yale Courses

Language and Computation I (LING 227). Design and analysis of computational models of language. Topics include finite state tools, computational morphology and phonology, grammar and parsing, lexical semantics, and the use of linguistic models in applied problems. Professor: Robert Frank

Natural Language Processing (CPSC 477). Linguistic, mathematical, and computational fundamentals of natural language processing (NLP). Topics include part of speech tagging, Hidden Markov models, syntax and parsing, lexical semantics, compositional semantics, machine translation, text classification, discourse, and dialogue processing. Additional topics such as sentiment analysis, text generation, and deep learning for NLP. Professor: Dragomir R. Radev

Learn More

 

Prev Next