Skip to content

Pipeline

API Comprehension

Get Lettria API Comprehension result for the given input text. The Natural Language Structuration (NLS) API is based on Natural Language Comprehension (NLC) API.

cf. Lettria API Comprehension

Dep Tree

Sentences are represented in trees called "Dep Tree". This tree structure represents the prediction of the parser dependency model, which gives us parent>child relations between Lettria tokens, and a label on each relation, called the dependency.

Representation example

Input : "Yesterday, John has eaten orange fruits"

has eaten  (root)
|____yesterday  (obl:mod)
|____|____,  (punct)
|____John  (nsubj)
|____fruits  (obj)
|____|____orange (amod)

In this representation, we can see the parent>child relations along the dependency labels. Here the token "has eaten" is the root token of the dep tree.

cf. Universal Dependency

Token

Gather all Lettria tokens for the document.

Informations for a token :

index pure : exact postion in input text index : token index sentence : sentence index subsentence : sub sentence index position : combination of sentence index and token index source pure : exact input from the text source : normalized source pure lemma : token lemma lemmatizer : all lemmatizer information, depending on postagg postagg : postagger prediction dependency : parser dependency prediction categosens : desambiguization prediction parent : parent token (parser dependency prediction) children : children tokens (parser dependency prediction) brother : brothers tokens (conj processing) language : input text language transform : token transformation informations plural : token is plural gender : token gender (for nouns) person : token's person (for pronouns) intensity : token intensity (for adverbs) temporality : token's temporality (for verbs) transitivity : token's transitivity (for verbs) negation : negation on token class id : platform ontology class id

cf. Lettria Token

API Input

Get API Input metadatas informations

Speakers

Get speaker information for each document's sentence.

Document Structure

Get document's structure and create the document path. The document path is set for each information in the KNWL graph

Platform connection

Connect with Lettria platform and retrieve client's ontologies and patterns.

Patterns

Load client platform patterns. Dispatch into 4 types of pattern :

  • jump patterns
  • move patterns
  • regex patterns
  • graph patterns

Match text patterns (move and jump patterns) with document's tokens for further processing.

Ontology

Load client ontology classes and properties for further processing. Match ontology classes names with tokens's categories to retrieve class id for each tokens and cards. Apply ontology properties heritage : a child class heritates the properties of the parent class.

Coreference

Process coreference model prediction avoiding errors. Create document's cards and a default GHOST CARD. Apply a special treatment for SPEAKER and INTERLOCUTOR card, filtering by token's person. For conversational documents : rebuild speaker's clusters with API Input metadata informations.

Pattern Text Motor

Process all text patterns :

  • keyword patterns (pattern jump) : pattern through document tokens, regardless of the relation between them.
  • text patterns (pattern move) : pattern through dep trees, analyzing tokens relations.
  • regex pattern : pattern through text characters

Patterns are used by client to map specific patterns in order to catch an attribute (property patterns), or to trigger a classification label (classification patterns).

cf. classification

Extraction

This part anlayze the structure of the sentences. This is the generic extraction of informations from the NLC dep trees, analyzing relations between tokens and making choices depending on postagg, dependency and categories.

A classic example is the [ subject -> verb -> object ] extraction : for each verb we search subjects and objects through the dep tree and structure an information depending on the categories of the verb and the object.

We also apply transformation rules comming from NLC API, to reduce graph diversity. For example nouns of actions such as "destruction" will be transformed into a verb "destroy". In that way, both sentence structures (with the noun or with the verb) will output the same KNWL graph at the end.

Structuration

This part gather all extracted informations and starts to create the KNWL graph. It applies local understanding patterns. An unterstanding pattern is a way to transform detected informations, regardless of the sentence structure.

For example if we take different sentences saying the same thing :

  • "My weight is 90kg."
  • "I have a weight of 90 kilos."
  • "I weigh 90 kilograms"

At the end of the extraction, here is what we get :

  • "My weight is 90kg." -> SPEAKER > possessing > weight > attribute mass > 90kg
  • "I have a weight of 90 kilos." -> SPEAKER > possessing > weight > attribute mass > 90kg
  • "I have a weight that is at 90 kilos." -> SPEAKER > possessing > weight > link manner > 90kg
  • "I weigh 90 kilograms" -> SPEAKER > event to weigh > link manner > 90kg

At the end of the structuration, here is what we have for all the sentences : -> SPEAKER > attribute weight > 90kg

Understanding

The understanding part lets us reduce graph diversity by mapping patterns through structured informations.

By checking Lettria categosens, we transform abstract entities into attributes.

And we apply last NLC transformations. Indeed, even if we know what the transformation will happen, for example, the token "destruction" (and therefore the associated card) will be transformed into "destroy" (so an information of type event, in that case), we still need to analyze the sentence structure as it is, because it was said or writen like so, in consequence, the other words in the sentence are related in a specific way. The structure of :

  • "The destruction of the car by th police will be at 6PM."
  • "The car will be destroyed at 6PM by the police."
  • "The police will destroy the car at 6PM." are different.

So we need to process the sentence normally, and at the end, use a graph manager to transform parts of the graph and reconnect everything nicely.

Pattern Graph Motor

We take the outputed KNWL graph and process all graph patterns, loaded from Lettria platform. We set result in NLS output, in the "variable" key.

NEL : Natural Linked Entities

NEL is an option that uses SPARQL requests to retrieve Wikidata Q-IDs for each document's card.

Set Ontology properties

Finally we set the Lettria platform ontology properties into the associated cards.

Export output

Export KNWL graph for each processed document in a selected format : JSON, Turtle, ... It is possible to export your KNWL graph into Neo4j, or any RDF vizualizer. You can alse visualize your document's graph on your project's page, on Lettria platform, in the "explore data" section. --> https://app.lettria.com/nlp/project/view/your project id/explore-structuration

Next Steps