Corpus Analysis

News VS Magazines from 1965-1975

My analysis will be between some text files from the years 1965 through 1975. The comparative element in this analysis will be that one set of files is from new articles, and the other set of files is from magazines. I will be looking at Keywords In Context, N-grams and clusters. To do so, i will use the two tools, Antconc and Voyant. Below are my findings.

First Look

This is the main view of Voyant when viewing the news files.
Here is a screen capture of the main page on Voyant for the News articles.
This is the main view of Voyant when viewing the magazine files.
Here is a screen capture of the main page on Voyant for the Magazine articles.

From these two screencaptures one can tell what the most used words in each article are, the word count, and a view of the text itself.

News Articles

Most Common Words

A screenshot of Voyant Tools most common words for the news articles.
Screenshot of the "Most Common Words" from News articles from 1965 to 1975.

When looking at the screenshot above, from Voyant Tools, you will see the most common used words in COHA News articles from 1965-1975. When excluding words like "said" and "mr." the most common word in this file set is:

"President"

I believe, "President," is the most common word used because the News text files are either talking about or addresing the President. President's Johnson, Nixon, and Ford were the three active presidents during the time period of 1965-1975.

Key Words In Context(KWIC)

A screenshot of the KWIC of the word 'President.'
Here is a screenshot of the KWIC page of the tool ANTCONC. The word searched for in this image is "President."

When scoping for the context of these keywords and my set of text files, it is best to scope anywhere from 5-8 words on each side of the word in context. When I went lower than 5, it wasnt really enough context to help determine what thes texts are trying to tell me. When going above 8, there were minimal hits.

From what one can tell from this image, is that these news articles talk about various President's from different organizations, President Johnson, and President Nixon. Vietnam is briefley mentioned. The president of Egypt at the time, Gamal Nasser, is also mentioned due to the time period of these files being around two separate Arab-Israeli wars. This particular text file talking about Nasser was from the "6 Days of War" in 1967.

N-Gram

screnshot of 3 N-Gram in ANTCONC
Screenshot of the 3 N-Gram view in ANTCONC
screenshot of 4 N-Gram in ANTCONC
Screenshot of the 4 N-Gram view in ANTCONC
screenshot of 5 N-Gram in ANTCONC
Screenshot of the 5 N-Gram view in ANTCONC

For the news articles, I used N-Gram clusters of 3, 4, and 5 words. This range had given me the best results. As one could assume, especially around this time period, the news was full of politics. With clusters talking about city and state level politics as well as presidential politics.

Magazine Articles

Most Common Words

screenshot of the most common used words in the magazine articles.
Screenshot of the "Most Common Words" in magazine articles from 1965 to 1975.

In this screenshot you will seee the "most common words" in a COHA set of news articles ranging in the years of 1965 to 1975. The most common word of these text files is "said." This is a filler word in my opinin and has no true significance so the next most common word that is significant is "People."

Key Words In Context(KWIC)

screnshot of the KWIC of the word 'people'
Screenshot of the KWIC of the word 'People.'
screenshot of KWIC of the word President.
Screenshot of the KWIC for the word 'President.'

The magazine files, like stated previously, is 'People.' The screenshot to the left here shows the KWIC for the word 'People.' Although the news files most common word is 'President,' the magazine files SECOND most common word is 'President.'

N-Gram

Instead of using 3, 4, 5 N-Grams like I did for the news articles, I did N-Gram clusters of 4, 5, and 6. This was because the results of 2 N-Grams was not very adequate.

screnshot of 4 N-Gram in ANTCONC
Screenshot of the 4 N-Gram view in ANTCONC
screenshot of 5 N-Gram in ANTCONC
Screenshot of the 5 N-Gram view in ANTCONC
screenshot of 6 N-Gram in ANTCONC
Screenshot of the 6 N-Gram view in ANTCONC

Due to the nature of magazine articles, my results were slightly different on this N-Gram pull. Although the magazine articles still had some politics within, it still had some text involving information from what seems to may have been interviews.

Conclusions

Before scoping through these text files, I truthfully expected them to be political. This was because of the time period. Yes I was right about the politics, but I learned much more about these text files after scoping them.

The News talked about events happening around the world and how it was being dealt with. Like the mentionings of all the US presidents and even the President of Egypt at the time.

The magazines seemed to have more interview based content. A lot of what I read during the KWIC scoping was that it was made up of alot of dialogue.

Although these groups of text files are different, when using NLA and the right tools, one can find that there is alot more to a set of text than one may originally percieve. These tools allow deep searches of text with the use of context.