Midterm: Using Word Analysis to explore North America

This project aims to examine prevalent discoveries compiled during the United States' formative years using word analysis. The results are illustrated through a word cloud and line graph.

Sources

For this project, I used a copy of the ebook Travels through the Interior Parts of North America, in the Years 1766, 1767, and 1768. The book was written by Jonathan Carver and was initially published in 1778. The copy I used for this project was obtained from Project Gutenberg.

Processes

I first uploaded a copy of the ebook into TextEdit. I then scrolled through the book to delete text that was not part of the book's content, including the transcribers note, advertisement, publication history, table of contents, and other unnecessary information dispersed throughout. I uploaded my curated copy of the book into for text analysis.

In Voyant Tools, I filtered out words that did not contribute to the overall understanding of observations recorded in the book. I began by using Voyant Tool's stop list feature, which automatically filters out words common to the English language. Seeing as this was not a comprehensive list, I filtered out additional terms that were not telling of the overall exploration, such as like, use, and near, but were still visible in the word cloud. The second most used word was great after analyzing the original text, I found that great was used in several different contexts and not entirely relevant to the exploration, so I added it to the stop list.

Presentation

I decided to embed a word cloud and line graph for presentational purposes. The word cloud provides an effective visual representation of words most frequently used throughout the book. This allows viewers to gain context for the main ideas of the book. The line graph focuses on four variables. I chose these based on the frequency at which they appeared in the book and their correlation with each other. These variables are Indians/Indian, as it appeared far more frequently than other words in the book, bodies of water, and direction. The graph depicts the frequency at which these words occur in the text chronologically. I chose a line graph because it allows viewers to compare trends between the variables easily. Additionally, the interactive nature of this project will enable users to change the type of graph or words displayed.

Significance

Applying a digital approach to this book presents viewers with an interactive way to engage with the information presented. This method of presenting data also provides insights that may not have been observed from reading the book. One example is the connection between Indigenous peoples and bodies of water. In the last few segments of the line graph, the Indian/Indians and lake/river variables directly correlate. This can prompt the viewer to infer why? A reason could be the proximity of Indigenous people to bodies of water, as human civilization relies on water for survival. This connection opens questions about this connection, prompting further exploration. The digitalization of humanities projects provides people with a new way to synthesize information and data.