Skip to content

TextChunk

What is TextChunk?

TextChunk is the base class of NLP, Document, Sentence and Subsentence. It offers different methods that can be accessed through children classes.

Data analysis:

METHODDESCRIPTION
vocabulary()Returns vocabulary from current data.
word_count()Returns word count from current data.
word_frequency()Returns word frequency of current data.
list_entities()Returns dictionaries of detected entities by type.
get_emotion()Returns emotion results at the specified hierarchical level
get_sentiment()Returns sentiment results at the specified hierarchical level
word_sentiment()Returns average sentiment for each word of the whole vocabulary
word_emotion()Returns average emotion for each word of the whole vocabulary
meaning_sentiment()Returns average sentiment for each meaning
meaning_emotion()Returns average emotion for each meaning
filter_polarity()Filters Sentence or Subsentence of the specified polarity
filter_emotion()Filters Sentence or Subsentence of the specified emotions
filter_type()Filters Sentence of the specified types
match_pattern()Returns matches from given patterns.

TextChunk methods

vocabulary

vocabulary(filter_pos = None, lemma=False)

Returns vocabulary from current data with their associated POS tag i.e. if a word appears both as a verb and a noun it will be in two tuples (word, 'V'), (word, 'N'). Allows filtering by POS tags.

Parameters:

NameTypeDescriptionOptional
filter_poslist of stringTags to use for filtering.True
lemmastringWhether to use lemma or plain words.True

Return:

TypeDescription
list of tupleList of unique tuples (token, POStag).

word_count

word_count(filter_pos = None, lemma=False):

Returns count of words from current data with their associated POStag i.e. if a word appears both as a verb and a noun it will be in two tuples (word, 'V'), (word, 'N'). Allows filtering by POS tags.

Parameters:

NameTypeDescriptionOptional
filter_poslist of stringTags to use for filtering.True
lemmastringWhether to use lemma or plain words.True

Return:

TypeDescription
dictionarydictionary of word counts { (token, POStag): occurences }.

word_frequency

word_frequency(filter_pos = None, lemma=False)

Returns words or lemma frequency, allows filtering by POS tag

Parameters:

NameTypeDescriptionOptional
filter_poslist of stringTags to use for filtering.True
lemmaboolWhether to use lemma or plain words.True

Return:

TypeDescription
dictionaryDictionary of word frequency

list_entities

list_entities()

Returns dictionaries of detected entities by type.

Return:

TypeDescription
list of dictionaryList of dictionaries of different entities at the specified level.

get_emotion

get_emotion(granularity = 'sentence')

Returns emotion results, granularity defines whether to use emotion by sentence or by subsentence.

Parameters:

NameTypeDescriptionOptional
granularitystringLevel at which emotions are analyzed. One of 'sentence' or 'subsentence'.True

Return:

TypeDescription
list of dictList of dictionaries with emotions as keys and dict {'occurences','sum','average'} as values.

get_sentiment

get_sentiment(granularity = 'sentence')

Returns sentiment results, granularity defines whether to use sentiment by sentence or by subsentence.

Parameters:

NameTypeDescriptionOptional
granularitystringLevel at which sentiments are analyzed. One of 'sentence' or 'subsentence'.True

Return:

TypeDescription
list of dictList of dictionaries with polarity as keys and dict {'occurences','sum','average'} as values.

word_sentiment

word_sentiment(granularity = 'sentence', lemma = False, filter_pos = None, average=True)

Returns an average sentiment score for each word or lemma. For each sentence or subsentence (granularity parameter), the sentiment score is added to each of the words present. The scores are divided by the number of sentences or subsentences to get an average.

Parameters:

NameTypeDescriptionOptional
granularitystringWhether to use sentiment by 'sentence' or 'subsentence' for scoring.True
lemmaboolWhether to use lemma or plain words.True
filter_poslist of stringPOStags to use for filtering.True
averageboolWhether to return average or list of values.True

Return:

TypeDescription
dictionaryDictionary with words as keys and sentiment as value

word_emotion

word_emotion(granularity = 'sentence', lemma = False, filter_pos = None, average=True)

Returns the average score for each emotion for each word or lemma in the vocabulary. For each sentence or subsentence (granularity parameter), the emotion scores are added to each of the words present. The scores are divided by the number of sentences or subsentences to get an average (or list of values if 'average' == False).

Parameters:

NameTypeDescriptionOptional
granularitystringWhether to use emotion by 'sentence' or 'subsentence' for scoring.True
lemmaboolWhether to use lemma or plain words.True
filter_poslist of stringPOStags to use for filtering.True
averageboolWhether to return average or list of values.True

Return:

TypeDescription
dictionaryDictionary with (words, POS tag) as keys and a dictionary with emotion scores as value.

Example return

{
    ('patients', 'N'): -0.4917,
    ('male', 'N'): -0.4275,
    ('age', 'N'): -0.5167,
    ('cure', 'N'): 0.6421
}

meaning_sentiment

meaning_sentiment(granularity='sentence', filter_meaning=None, average=True)

Returns average sentiment score for each meaning For each sentence or subsentence(granularity parameter), the sentiment score is added to each of the meaning present. The scores are divided by the number of sentences or subsentences to get an average. This can be used with custom meaning to get the sentiment associated with a particular meaning, for example 'customer service' or 'pricing' when analyzing customer reviews.

Parameters:

NameTypeDescriptionOptional
granularitystringWhether to use sentiment by 'sentence' or 'subsentence' for scoring.True
filter_meaninglist of stringFilters results by list of meaningsTrue
averageboolWhether to return average or list of values.True

Return:

TypeDescription
dictionaryDictionary with meanings as keys and sentiment as value

Example return

{
    ('patients', 'N'): {'surprise': 0.753, 'neutral': 0.445},
    ('male', 'N'): {'neutral': 0.8},
    ('surgery', 'N'): {'sadness': 0.79}
}

meaning_emotion

meaning_emotion(granularity='sentence', filter_meaning=None, average=True)

Returns average emotion scores for each meaning. For each sentence or subsentence(granularity parameter), the score for each emotion is added to each of the meaning. The scores are divided by the number of sentences or subsentences to get an average. This can be used with custom meaning to get the emotion associated with a particular meaning, for example 'customer service' or 'pricing' when analyzing customer reviews.

Parameters:

NameTypeDescriptionOptional
granularitystringWhether to use emotion by 'sentence' or 'subsentence' for scoring.True
filter_meaninglist of stringFilters results by list of meaningsTrue
averageboolWhether to return average or list of values.True

Return:

TypeDescription
dictionaryDictionary with meanings as keys and sentiment as value

filter_polarity

filter_polarity(polarity, granularity='sentence')

Filters Sentence or Subsentence of the specified polarity.

Parameters:

NameTypeDescriptionOptional
polaritystring or list of stringPolarity, 'neutral', 'positive', 'negative'.False
granularitystringWhether to use sentiment by 'sentence' or 'subsentence' for scoring.True

Return:

TypeDescription
list of instances of Sentence or SubsentenceList of instances of objects with the specified polarity.

filter_emotion

filter_emotion(emotions, granularity='sentence')

Filters Sentence of the specified emotions.

Parameters:

NameTypeDescriptionOptional
emotionsstring or list of stringEmotions to filter, one of 'joy', 'love', 'surprise', 'anger', 'sadness', 'fear' or 'neutral'.False
granularitystringWhether to use sentiment by 'sentence' or 'subsentence' for scoring.True

Return:

TypeDescription
list of instances of Sentence or SubsentenceList of instances of objects with the specified emotion.

filter_type

filter_type(sentence_type)

Filters Sentence of the specified emotions.

Parameters:

NameTypeDescriptionOptional
sentence_typestring or list of stringTypes to filter, one of 'assert', 'command', 'question_open', 'question_closed'.False

Return:

TypeDescription
list of instances of SentenceList of instances of Sentence with the specified type.

match_pattern

match_pattern(self, patterns_json, level = None, print_tree=False, skip_errors=False)

Match given pattern (either Token Pattern or Dependency Pattern) on the current TextChunk object.

The 'level' argument specifies on which level the matching should be done, i.e. on the document level (returns matches per document), on the sentence or subsentence level. The default level is one level below in the hierarchy, document for NLP class, sentence for Document class and subsentence for Sentence class.

For more information on patterns look at the dedicated section: Patterns.

Parameters:

NameTypeDescriptionOptional
patterns_jsondictionaryToken Pattern or Dependency PatternFalse
levelstringLevel on which matching is done, one of 'document', 'sentence', 'subsentence'True
print_treeboolWhether to print the dependency tree for dependency patterns.True
skip_errorsboolWhether to skip errors and continue matching.True

Return:

TypeDescription
list of tupleList of tuple (TextChunk object, match dictionary)