Using Textstat

Posted on  by 

TextSTATis a concordance program which was designed to be user friendly and provide simple Internet functionality. Texts can be combined to form corpora (which can also be stored as such). The program analyses these text corpora and displays word frequency lists and concordances to search terms. To compute sentiment, textstatsentiment will count the two positive and zero negative matches from the first example, and average these across all matches, for score of 1.0. In the second document, the positive match will generate a score of 1.0, and in the third document, the scores will be sum (1, -1.

  1. Using Textstat To Print
  2. Using Test Statistic
  3. Using Test Statistic To Find P-value
  4. Using Textstat To Change

Description

Find

Textstatkeyness compares two partitions of a corpus to determine the words that are 'key' or differentially occurring between the two partitions. So for you to compare any target corpus to a baseline corpus, you would need to combine the two into a single dfm, and then specify the target appropriately. TextSTATis a concordance program which was designed to be user friendly and provide simple Internet functionality. Texts can be combined to form corpora (which can also be stored as such). The program analyses these text corpora and displays word frequency lists and concordances to search terms. Nov 22, 2020 Textstat. Textstat is an easy to use library to calculate statistics from text. It helps determine readability, complexity, and grade level. Photo by Patrick Tomasso on Unsplash.

Produces counts and document frequencies summaries of the features in adfm, optionally grouped by a docvars variable or other suppliedgrouping variable.

Usage

Arguments

a dfm object

(optional) integer specifying the top n features to be returned,within group if groups is specified

either: a character vector containing the names of documentvariables to be used for grouping; or a factor or object that can becoerced into a factor equal in length or rows to the number of documents.NA values of the grouping value are dropped.See groups for details.

character string specifying how ties are treated. Seedata.table::frank() for details. Unlike that function,however, the default is 'min', so that frequencies of 10, 10, 11would be ranked 1, 1, 3.

additional arguments passed to dfm_group(). This canbe useful in passing force = TRUE, for instance, if you are grouping adfm that has been weighted.

Value

a data.frame containing the following variables:

feature

(character) the feature

frequency

Using Textstat To Print

count of the feature

rank

Using Test Statistic

rank of the feature, where 1 indicates the greatestfrequency

docfreq

Using Test Statistic To Find P-value

Textstat

document frequency of the feature, as a count (thenumber of documents in which this feature occurred at least once)

docfreq

document frequency of the feature, as a count

group

Using Textstat To Change

(only if groups is specified) the label of the group.If the features have been grouped, then all counts, ranks, and documentfrequencies are within group. If groups is not specified, the groupcolumn is omitted from the returned data.frame.

textstat_frequency returns a data.frame of features andtheir term and document frequencies within groups.

Examples

Coments are closed