TextChunk
What is TextChunk?
TextChunk is the base class of NLP, Document, Sentence and Subsentence. It offers different methods that can be accessed through children classes.
Data analysis:
METHOD | DESCRIPTION |
---|---|
vocabulary() | Returns vocabulary from current data. |
word_count() | Returns word count from current data. |
word_frequency() | Returns word frequency of current data. |
list_entities() | Returns dictionaries of detected entities by type. |
get_emotion() | Returns emotion results at the specified hierarchical level |
get_sentiment() | Returns sentiment results at the specified hierarchical level |
word_sentiment() | Returns average sentiment for each word of the whole vocabulary |
word_emotion() | Returns average emotion for each word of the whole vocabulary |
meaning_sentiment() | Returns average sentiment for each meaning |
meaning_emotion() | Returns average emotion for each meaning |
filter_polarity() | Filters Sentence or Subsentence of the specified polarity |
filter_emotion() | Filters Sentence or Subsentence of the specified emotions |
filter_type() | Filters Sentence of the specified types |
match_pattern() | Returns matches from given patterns. |
TextChunk methods
vocabulary
vocabulary(filter_pos = None, lemma=False)
Returns vocabulary from current data with their associated POS tag i.e. if a word appears both as a verb and a noun it will be in two tuples (word, 'V'), (word, 'N'). Allows filtering by POS tags.
Parameters:
Name | Type | Description | Optional |
---|---|---|---|
filter_pos | list of string | Tags to use for filtering. | True |
lemma | string | Whether to use lemma or plain words. | True |
Return:
Type | Description |
---|---|
list of tuple | List of unique tuples (token, POStag). |
word_count
word_count(filter_pos = None, lemma=False):
Returns count of words from current data with their associated POStag i.e. if a word appears both as a verb and a noun it will be in two tuples (word, 'V'), (word, 'N'). Allows filtering by POS tags.
Parameters:
Name | Type | Description | Optional |
---|---|---|---|
filter_pos | list of string | Tags to use for filtering. | True |
lemma | string | Whether to use lemma or plain words. | True |
Return:
Type | Description |
---|---|
dictionary | dictionary of word counts { (token, POStag): occurences }. |
word_frequency
word_frequency(filter_pos = None, lemma=False)
Returns words or lemma frequency, allows filtering by POS tag
Parameters:
Name | Type | Description | Optional |
---|---|---|---|
filter_pos | list of string | Tags to use for filtering. | True |
lemma | bool | Whether to use lemma or plain words. | True |
Return:
Type | Description |
---|---|
dictionary | Dictionary of word frequency |
list_entities
list_entities()
Returns dictionaries of detected entities by type.
Return:
Type | Description |
---|---|
list of dictionary | List of dictionaries of different entities at the specified level. |
get_emotion
get_emotion(granularity = 'sentence')
Returns emotion results, granularity defines whether to use emotion by sentence or by subsentence.
Parameters:
Name | Type | Description | Optional |
---|---|---|---|
granularity | string | Level at which emotions are analyzed. One of 'sentence' or 'subsentence'. | True |
Return:
Type | Description |
---|---|
list of dict | List of dictionaries with emotions as keys and dict {'occurences','sum','average'} as values. |
get_sentiment
get_sentiment(granularity = 'sentence')
Returns sentiment results, granularity defines whether to use sentiment by sentence or by subsentence.
Parameters:
Name | Type | Description | Optional |
---|---|---|---|
granularity | string | Level at which sentiments are analyzed. One of 'sentence' or 'subsentence'. | True |
Return:
Type | Description |
---|---|
list of dict | List of dictionaries with polarity as keys and dict {'occurences','sum','average'} as values. |
word_sentiment
word_sentiment(granularity = 'sentence', lemma = False, filter_pos = None, average=True)
Returns an average sentiment score for each word or lemma. For each sentence or subsentence (granularity parameter), the sentiment score is added to each of the words present. The scores are divided by the number of sentences or subsentences to get an average.
Parameters:
Name | Type | Description | Optional |
---|---|---|---|
granularity | string | Whether to use sentiment by 'sentence' or 'subsentence' for scoring. | True |
lemma | bool | Whether to use lemma or plain words. | True |
filter_pos | list of string | POStags to use for filtering. | True |
average | bool | Whether to return average or list of values. | True |
Return:
Type | Description |
---|---|
dictionary | Dictionary with words as keys and sentiment as value |
word_emotion
word_emotion(granularity = 'sentence', lemma = False, filter_pos = None, average=True)
Returns the average score for each emotion for each word or lemma in the vocabulary. For each sentence or subsentence (granularity parameter), the emotion scores are added to each of the words present. The scores are divided by the number of sentences or subsentences to get an average (or list of values if 'average' == False).
Parameters:
Name | Type | Description | Optional |
---|---|---|---|
granularity | string | Whether to use emotion by 'sentence' or 'subsentence' for scoring. | True |
lemma | bool | Whether to use lemma or plain words. | True |
filter_pos | list of string | POStags to use for filtering. | True |
average | bool | Whether to return average or list of values. | True |
Return:
Type | Description |
---|---|
dictionary | Dictionary with (words, POS tag) as keys and a dictionary with emotion scores as value. |
Example return
{ ('patients', 'N'): -0.4917, ('male', 'N'): -0.4275, ('age', 'N'): -0.5167, ('cure', 'N'): 0.6421 }
meaning_sentiment
meaning_sentiment(granularity='sentence', filter_meaning=None, average=True)
Returns average sentiment score for each meaning For each sentence or subsentence(granularity parameter), the sentiment score is added to each of the meaning present. The scores are divided by the number of sentences or subsentences to get an average. This can be used with custom meaning to get the sentiment associated with a particular meaning, for example 'customer service' or 'pricing' when analyzing customer reviews.
Parameters:
Name | Type | Description | Optional |
---|---|---|---|
granularity | string | Whether to use sentiment by 'sentence' or 'subsentence' for scoring. | True |
filter_meaning | list of string | Filters results by list of meanings | True |
average | bool | Whether to return average or list of values. | True |
Return:
Type | Description |
---|---|
dictionary | Dictionary with meanings as keys and sentiment as value |
Example return
{ ('patients', 'N'): {'surprise': 0.753, 'neutral': 0.445}, ('male', 'N'): {'neutral': 0.8}, ('surgery', 'N'): {'sadness': 0.79} }
meaning_emotion
meaning_emotion(granularity='sentence', filter_meaning=None, average=True)
Returns average emotion scores for each meaning. For each sentence or subsentence(granularity parameter), the score for each emotion is added to each of the meaning. The scores are divided by the number of sentences or subsentences to get an average. This can be used with custom meaning to get the emotion associated with a particular meaning, for example 'customer service' or 'pricing' when analyzing customer reviews.
Parameters:
Name | Type | Description | Optional |
---|---|---|---|
granularity | string | Whether to use emotion by 'sentence' or 'subsentence' for scoring. | True |
filter_meaning | list of string | Filters results by list of meanings | True |
average | bool | Whether to return average or list of values. | True |
Return:
Type | Description |
---|---|
dictionary | Dictionary with meanings as keys and sentiment as value |
filter_polarity
filter_polarity(polarity, granularity='sentence')
Filters Sentence or Subsentence of the specified polarity.
Parameters:
Name | Type | Description | Optional |
---|---|---|---|
polarity | string or list of string | Polarity, 'neutral', 'positive', 'negative'. | False |
granularity | string | Whether to use sentiment by 'sentence' or 'subsentence' for scoring. | True |
Return:
Type | Description |
---|---|
list of instances of Sentence or Subsentence | List of instances of objects with the specified polarity. |
filter_emotion
filter_emotion(emotions, granularity='sentence')
Filters Sentence of the specified emotions.
Parameters:
Name | Type | Description | Optional |
---|---|---|---|
emotions | string or list of string | Emotions to filter, one of 'joy', 'love', 'surprise', 'anger', 'sadness', 'fear' or 'neutral'. | False |
granularity | string | Whether to use sentiment by 'sentence' or 'subsentence' for scoring. | True |
Return:
Type | Description |
---|---|
list of instances of Sentence or Subsentence | List of instances of objects with the specified emotion. |
filter_type
filter_type(sentence_type)
Filters Sentence of the specified emotions.
Parameters:
Name | Type | Description | Optional |
---|---|---|---|
sentence_type | string or list of string | Types to filter, one of 'assert', 'command', 'question_open', 'question_closed'. | False |
Return:
Type | Description |
---|---|
list of instances of Sentence | List of instances of Sentence with the specified type. |
match_pattern
match_pattern(self, patterns_json, level = None, print_tree=False, skip_errors=False)
Match given pattern (either Token Pattern or Dependency Pattern) on the current TextChunk object.
The 'level' argument specifies on which level the matching should be done, i.e. on the document level (returns matches per document), on the sentence or subsentence level. The default level is one level below in the hierarchy, document for NLP class, sentence for Document class and subsentence for Sentence class.
For more information on patterns look at the dedicated section: Patterns.
Parameters:
Name | Type | Description | Optional |
---|---|---|---|
patterns_json | dictionary | Token Pattern or Dependency Pattern | False |
level | string | Level on which matching is done, one of 'document', 'sentence', 'subsentence' | True |
print_tree | bool | Whether to print the dependency tree for dependency patterns. | True |
skip_errors | bool | Whether to skip errors and continue matching. | True |
Return:
Type | Description |
---|---|
list of tuple | List of tuple (TextChunk object, match dictionary) |