
Knowledge Extraction module for Coypu project



This repository is a simple implementation for relation extraction and entity linking on twitter text. The workflow is as folllows: Tweet —-> extracted triples —–> link entity to wikidata through wikidata api.

How to run an example?

pip install stanford_openie

Replace the text by the twitter you want to work on: extractor.AnnoText(‘Fire breaks out in Hawaii’, save=True) —> extractor.AnnoText(‘Your own text’, True)

The code will return you the triples from that text in json form.

    "subject": "Fire",
    "relation": "breaks out in",
    "object": "Hawaii",
            "pid": "P910",
            "property": "topic's main category",
            "eid": "Q4992738",
            "entity": "Category:Fires"
            "pid": "P6",
            "property": "head of government",
            "eid": "Q469689",
            "entity": "Neil Abercrombie"

This result will be saved to triple.json file under the same directory since we give True to save argument.


The pipeline of this module is based on following parts:

  1. A txt2graph class, which extract the triples in a given text. (Based on openie)
  2. A entLink class, which links the found entities using the wikidata entity search API.