The goal of this project is to construct a knowledge graph that shows how various concepts are related to each other. These relationships will help us construct the natural path from a basic concept to a more advanced one, providing also an outline of the concepts that must be covered to fully understand the topic. Secondly, the graph also serves as a tool to identify topics in a text such as an article or lecture transcript.
The program analyzed about 20,000 pages containing the term astronomy to build a knowledge graph that not not only identied the important terms, but also how these terms are connected to each other. A portion of the graph is shown below. Each node in the graph represents a term relevant to astronomy and the width of the line joining nodes corresponds to the strength of connections between them. All nodes in the same order (connected directly to astronomy are of order: 1, those through another node are of order: 2 and so on) have the same color. The text overlay can be toggled by the keyboard shortcut "t".
The slider on the left below, controls the minimum strength of the connections. Strenghts have a maximum value of 1 and minimum of 0. Setting the left slider to 0.4 drops all connections with strength below 0.4. The right slider controls the order of nodes. Setting it to 1, for example, shows only nodes connected directly to astronomy.
One of the best use of the above knowledge graph is that it can identify topics in a given text. To test the program, paste any text (such as an article, notes, part of a wikipedia page etc.) in the text area below. Formatting is not important. However, it should pertain to a subtopioc of astronomy.
The program looks for keyphrases in the text that are part of the knowledge graph and constructs a subgraph for the text. It then identifies the node with most connections as the topic. The subgraph is displayed after clicking Learn. The identified topic and a short summary is also displayed.
The strongest connections between a target concept and astronomy can be calculated by entering a term in the search box. A list of allowed topics are suggested after typing few characters. Hit Enter to search. The program analyzes all paths containing up to 6 intermediate nodes to find the best three paths. A short description of the intermediate concepts can be accessed by clicking on the title. The program works reasonably well for common subtopics of astronomy, but currently favors longer connections. More work in optimizing the search algorithm is required.