SQL Server 2017 Graph Databases ( Overview, Uses, Example )

Overview
Graph database is introduced in SQL Server 2017 version and it is a collection of Nodes and Edges. Both the relational and graph databases are data management system software. In Graph database building blocks are Nodes and Edges whereas in relational database building blocks are tables.

The term “graph” comes from mathematics. There it’s used to describe a collection of nodes (or vertices), each containing information (properties), and with labeled relationships (or edges) between the nodes.

A graph is composed of two elements: a NODE (vertices) and an EDGE (relationship). Each node represents entities such as a person or location, and the nodes are connected to one another with edges; these provide details on the relationship between two nodes with their own set of attributes and properties. For example, a relationship might exist between a location such as Hyderabad and a person named Mahesh, who lives in Hyderabad. Mahesh and Hyderabad are the entities, and ‘lives in’ is the relationship between the two.
Graph database has nodes and edges, So in SQL server two new table types NODE and EDGE. And a new function called MATCH(). Node or edge tables can be created under any schema in a database, but they all belong to one logical graph.
Here are some features that make a graph database unique:

Edges or relationships are first class entities in a Graph Database and can have attributes or properties associated with them.
A single edge can flexibly connect multiple nodes in a Graph Database.
You can express pattern matching and multi-hop navigation queries easily.
You can express transitive closure and polymorphic queries easily.

Why graph database?

Some data models do not fit the traditional relationships. These scenarios include hierarchical data, complex many-to-many relationships, or complex interconnected data and relationships. Even it is always possible to store and query data for these types of scenarios in SQL Server, but the number of tables and relationships among these objects will be more and querying this tables will be more complex. So, in SQL Server 2017 introduces a new feature that makes it far easier to navigate these types of data relationships, called graph database.
Relational databases do well when there are only one or two levels of relationships; graph databases handle far greater depths.
When to consider graph database?

When you have lots of relationships in your data
When you want to do real-time analytical queries
When your current system is not working out because of slow joins in RDBMS systems.

Node Table

A node table represents an entity in a graph schema. Whenever we create a node table, along with the user defined columns, an implicit $node_id column is created, which uniquely identifies a given node in the database. The values in the $node_id column are automatically generated and are a combination of object_id of that node table and an internally generated bigint value. However, when the $node_id column is selected, a computed value in the form of a JSON string is displayed. Also, $node_id is a pseudo column, that maps to an internal name with hex string in it. When we select $node_id from the table, the column name will appear as $node_id_\hex_string.

It is recommended that users create a unique constraint or index on the $node_id column at the time of creation of node table, but if one is not created, a default unique, non-clustered index is automatically created.

Syntax for creating the Node table:

CREATE TABLE
    { database_name.schema_name.table_name. | schema_name.table_name | table_name }
    ( {  } [ ,...n ] ) AS [NODE]
[ ; ]

Edge Table

An edge table represents a relationship in a graph. Edges are always directed and connect two nodes. An edge table enables users to model many-to-many relationships in the graph. An edge table may or may not have any user-defined attributes in it.

Syntax for creating the Edge table:

CREATE TABLE
    { database_name.schema_name.table_name. | schema_name.table_name | table_name }
    ( {  } [ ,...n ] ) AS [EDGE]
[ ; ]

Every time an edge table is created, along with the user-defined attributes, three implicit columns are created in the edge table:

Column Name	Description
$edge_id	Uniquely identifies a given edge in the database. It is a generated column and the value is a combination of object_id of the edge table and a internally generated bigint value. However, when the $edge_id column is selected, a computed value in the form of a JSON string is displayed. $edge_id is a pseudo-column, that maps to an internal name with hex string in it. When you select $edge_id from the table, the column name will appear as $edge_id_\
$from_id	Stores the $node_id of the node, from where the edge originates.
$to_id	Stores the $node_id of the node, at which the edge terminates.

Similar to the $node_id column, it is recommended that users create a unique index or constraint on the $edge_id column at the time of creation of the edge table, but if one is not created, a default unique, nonclustered index is automatically created on this column.It is also recommended, for OLTP scenarios, that users create an index on ($from_id, $to_id) columns, for faster lookups in the direction of the edge.

Metadata for Graph Databases

sys.tables : If is_node is set to 1, that indicates that the table is a node table and if is_edge is set to 1, that indicates that the table is an edge table.
sys.columns : The sys.columns view contains additional columns graph_type and graph_type_desc, that indicate the type of the column in node and edge tables.

Below are the valid values for graph_type:

Value	Description
1	GRAPH_ID
2	GRAPH_ID_COMPUTED
3	GRAPH_FROM_ID
4	GRAPH_FROM_OBJ_ID
5	GRAPH_FROM_ID_COMPUTED
6	GRAPH_TO_ID
7	GRAPH_TO_OBJ_ID
8	GRAPH_TO_ID_COMPUTED

System Functions

Built-in	Description
OBJECT_ID_FROM_NODE_ID	Extract the object_id from a node_id.We need to pass node_id to this function and it will return the object_id. Eg: select OBJECT_ID_FROM_NODE_ID('{"type":"node","schema":"dbo","table":"Person","id":0}')
GRAPH_ID_FROM_NODE_ID	Extract the graph_id from a node_id. We need to pass node_id to this function and it will return the object_id. Eg: select GRAPH_ID_FROM_NODE_ID('{"type":"node","schema":"dbo","table":"Person","id":1}')
NODE_ID_FROM_PARTS	Construct a node_id from an object_id and a graph_id. Eg: select NODE_ID_FROM_PARTS(1221579390,2)
OBJECT_ID_FROM_EDGE_ID	Extract object_id from edge_id. Eg: select OBJECT_ID_FROM_EDGE_ID('{"type":"edge","schema":"dbo","table":"friendOf","id":1}')
GRAPH_ID_FROM_EDGE_ID	Extract identity from edge_id. Eg: select GRAPH_ID_FROM_EDGE_ID('{"type":"edge","schema":"dbo","table":"friendOf","id":1}')
EDGE_ID_FROM_PARTS	Construct edge_id from object_id and identity. Eg: select EDGE_ID_FROM_PARTS(1525580473,1)

MATCH clause
MATCH clause is introduced to support pattern matching and multi-hop navigation through the graph. The MATCH function uses ASCII-art style syntax for pattern matching.For example to find friends of mahesh

SELECT E2.ENAME FROM employee E1,know,employee E2 WHERE MATCH(E1-(know)->E2) AND E1.ENAME='MAHESH'

Demonstration
Sample Data Model

In our data model, the rectangle boxes are referred to as nodes. The lines connecting them are relationships or Edges. So basically Nodes are the entity which are Employee, Company, Project and Location. These entities are connected to each other with a relationship that is called Edges which are LivesIn, Project_Works, Company_Works, LocatedIn, Knows, Owns. So the basic flows goes something like Node > Relationship > Node. Here Employee has 4 relations and Company has 2 relations.

Know: Employee know employees
Company_Works: Employee works in the company
Project_Works : Employee works in the project
LivesIn: Employee lives in location
Owns: Company owns the project
LocatedIn: Company located in the location

Schema

CREATE TABLE company(ID INTEGER PRIMARY KEY,
CNAME VARCHAR(45)) AS NODE
CREATE TABLE [location](ID INTEGER PRIMARY KEY,
LNAME VARCHAR(45),
STATENAME VARCHAR(45)) AS NODE
CREATE TABLE employee(ID INTEGER PRIMARY KEY,
ENAME VARCHAR(45)) AS NODE
CREATE TABLE project(ID INT PRIMARY KEY,
PNAME VARCHAR(45)) AS NODE

CREATE TABLE know AS EDGE 
CREATE TABLE livesin AS EDGE 
CREATE TABLE project_works AS EDGE 
CREATE TABLE locatedin AS EDGE 
CREATE TABLE company_works AS EDGE 
CREATE TABLE owns AS EDGE 

INSERT INTO company(ID,CNAME) VALUES (1,'MICROSOFT'),(2,'IBM'),(3,'GOOGLE')

INSERT INTO project(ID,PNAME) VALUES (1,'CMS'),(2,'TMS'),(3,'PMS'),(4,'LMS'),(5,'CRMS')

INSERT INTO [location](ID,LNAME,STATENAME) VALUES (1,'HYDERABAD','TG'),(2,'CHENNAI','TN'),(3,'BANGALORE','KA')

INSERT INTO employee(ID,ENAME) VALUES (1,'MAHESH'),(2,'SURESH'),(3,'RAKESH'),(4,'PRAGESH'),(5,'RAMESH'),(6,'VENKATESH'),(7,'SATHEESH'),(8,'SAMPORNESH')

-- know/friendof
INSERT INTO know VALUES ((SELECT $NODE_ID FROM EMPLOYEE WHERE ID=1),(SELECT $NODE_ID FROM EMPLOYEE WHERE ID=2))
INSERT INTO know VALUES ((SELECT $NODE_ID FROM EMPLOYEE WHERE ID=1),(SELECT $NODE_ID FROM EMPLOYEE WHERE ID=3))
INSERT INTO know VALUES ((SELECT $NODE_ID FROM EMPLOYEE WHERE ID=3),(SELECT $NODE_ID FROM EMPLOYEE WHERE ID=2))
INSERT INTO know VALUES ((SELECT $NODE_ID FROM EMPLOYEE WHERE ID=4),(SELECT $NODE_ID FROM EMPLOYEE WHERE ID=5))
INSERT INTO know VALUES ((SELECT $NODE_ID FROM EMPLOYEE WHERE ID=6),(SELECT $NODE_ID FROM EMPLOYEE WHERE ID=7))
INSERT INTO know VALUES ((SELECT $NODE_ID FROM EMPLOYEE WHERE ID=7),(SELECT $NODE_ID FROM EMPLOYEE WHERE ID=8))
INSERT INTO know VALUES ((SELECT $NODE_ID FROM EMPLOYEE WHERE ID=8),(SELECT $NODE_ID FROM EMPLOYEE WHERE ID=6))
INSERT INTO know VALUES ((SELECT $NODE_ID FROM EMPLOYEE WHERE ID=1),(SELECT $NODE_ID FROM EMPLOYEE WHERE ID=8))


-- project_works
INSERT INTO project_works VALUES ((SELECT $NODE_ID FROM EMPLOYEE WHERE ID=1),(SELECT $NODE_ID FROM PROJECT WHERE ID=1))
INSERT INTO project_works VALUES ((SELECT $NODE_ID FROM EMPLOYEE WHERE ID=2),(SELECT $NODE_ID FROM PROJECT WHERE ID=2))
INSERT INTO project_works VALUES ((SELECT $NODE_ID FROM EMPLOYEE WHERE ID=3),(SELECT $NODE_ID FROM PROJECT WHERE ID=1))
INSERT INTO project_works VALUES ((SELECT $NODE_ID FROM EMPLOYEE WHERE ID=4),(SELECT $NODE_ID FROM PROJECT WHERE ID=5))
INSERT INTO project_works VALUES ((SELECT $NODE_ID FROM EMPLOYEE WHERE ID=5),(SELECT $NODE_ID FROM PROJECT WHERE ID=5))
INSERT INTO project_works VALUES ((SELECT $NODE_ID FROM EMPLOYEE WHERE ID=6),(SELECT $NODE_ID FROM PROJECT WHERE ID=3))
INSERT INTO project_works VALUES ((SELECT $NODE_ID FROM EMPLOYEE WHERE ID=7),(SELECT $NODE_ID FROM PROJECT WHERE ID=4))
INSERT INTO project_works VALUES ((SELECT $NODE_ID FROM EMPLOYEE WHERE ID=8),(SELECT $NODE_ID FROM PROJECT WHERE ID=4))

-- company_works
INSERT INTO company_works VALUES ((SELECT $NODE_ID FROM EMPLOYEE WHERE ID=1),(SELECT $NODE_ID FROM COMPANY WHERE ID=1))
INSERT INTO company_works VALUES ((SELECT $NODE_ID FROM EMPLOYEE WHERE ID=2),(SELECT $NODE_ID FROM COMPANY WHERE ID=1))
INSERT INTO company_works VALUES ((SELECT $NODE_ID FROM EMPLOYEE WHERE ID=3),(SELECT $NODE_ID FROM COMPANY WHERE ID=1))
INSERT INTO company_works VALUES ((SELECT $NODE_ID FROM EMPLOYEE WHERE ID=4),(SELECT $NODE_ID FROM COMPANY WHERE ID=2))
INSERT INTO company_works VALUES ((SELECT $NODE_ID FROM EMPLOYEE WHERE ID=5),(SELECT $NODE_ID FROM COMPANY WHERE ID=2))
INSERT INTO company_works VALUES ((SELECT $NODE_ID FROM EMPLOYEE WHERE ID=6),(SELECT $NODE_ID FROM COMPANY WHERE ID=3))
INSERT INTO company_works VALUES ((SELECT $NODE_ID FROM EMPLOYEE WHERE ID=7),(SELECT $NODE_ID FROM COMPANY WHERE ID=3))
INSERT INTO company_works VALUES ((SELECT $NODE_ID FROM EMPLOYEE WHERE ID=8),(SELECT $NODE_ID FROM COMPANY WHERE ID=3))

-- locatedin
INSERT INTO locatedin VALUES ((SELECT $NODE_ID FROM COMPANY WHERE ID=1),(SELECT $NODE_ID FROM LOCATION WHERE ID=1))
INSERT INTO locatedin VALUES ((SELECT $NODE_ID FROM COMPANY WHERE ID=1),(SELECT $NODE_ID FROM LOCATION WHERE ID=2))
INSERT INTO locatedin VALUES ((SELECT $NODE_ID FROM COMPANY WHERE ID=2),(SELECT $NODE_ID FROM LOCATION WHERE ID=1))
INSERT INTO locatedin VALUES ((SELECT $NODE_ID FROM COMPANY WHERE ID=3),(SELECT $NODE_ID FROM LOCATION WHERE ID=1))

-- livesin
INSERT INTO livesin VALUES ((SELECT $NODE_ID FROM EMPLOYEE WHERE ID=1),(SELECT $NODE_ID FROM LOCATION WHERE ID=1))
INSERT INTO livesin VALUES ((SELECT $NODE_ID FROM EMPLOYEE WHERE ID=2),(SELECT $NODE_ID FROM LOCATION WHERE ID=2))
INSERT INTO livesin VALUES ((SELECT $NODE_ID FROM EMPLOYEE WHERE ID=3),(SELECT $NODE_ID FROM LOCATION WHERE ID=1))
INSERT INTO livesin VALUES ((SELECT $NODE_ID FROM EMPLOYEE WHERE ID=4),(SELECT $NODE_ID FROM LOCATION WHERE ID=1))
INSERT INTO livesin VALUES ((SELECT $NODE_ID FROM EMPLOYEE WHERE ID=5),(SELECT $NODE_ID FROM LOCATION WHERE ID=1))
INSERT INTO livesin VALUES ((SELECT $NODE_ID FROM EMPLOYEE WHERE ID=6),(SELECT $NODE_ID FROM LOCATION WHERE ID=1))
INSERT INTO livesin VALUES ((SELECT $NODE_ID FROM EMPLOYEE WHERE ID=7),(SELECT $NODE_ID FROM LOCATION WHERE ID=1))
INSERT INTO livesin VALUES ((SELECT $NODE_ID FROM EMPLOYEE WHERE ID=8),(SELECT $NODE_ID FROM LOCATION WHERE ID=3))

-- owns
INSERT INTO owns VALUES ((SELECT $NODE_ID FROM COMPANY WHERE ID=1),(SELECT $NODE_ID FROM PROJECT WHERE ID=1))
INSERT INTO owns VALUES ((SELECT $NODE_ID FROM COMPANY WHERE ID=1),(SELECT $NODE_ID FROM PROJECT WHERE ID=2))
INSERT INTO owns VALUES ((SELECT $NODE_ID FROM COMPANY WHERE ID=2),(SELECT $NODE_ID FROM PROJECT WHERE ID=5))
INSERT INTO owns VALUES ((SELECT $NODE_ID FROM COMPANY WHERE ID=3),(SELECT $NODE_ID FROM PROJECT WHERE ID=3))
INSERT INTO owns VALUES ((SELECT $NODE_ID FROM COMPANY WHERE ID=3),(SELECT $NODE_ID FROM PROJECT WHERE ID=4))

Sample Queries

Query to list employee names who work on project 'CMS'

SELECT ENAME FROM employee,project_works,project WHERE MATCH(employee-(project_works)->project) AND PNAME='CMS'

Query to list company name,project name, employee name who work on project 'CMS' in company 'MICROSOFT'

SELECT CNAME,PNAME,ENAME FROM company,owns,project,project_works,employee WHERE MATCH(company-(owns)->project AND employee-(project_works)->project) AND PNAME='CMS' AND CNAME='MICROSOFT'

Query to list companies that MAHESH friends works

SELECT CNAME,E2.ENAME FROM company,company_works,employee E1,know,employee E2 WHERE MATCH(E1-(know)->E2-(company_works)->company) AND E1.ENAME='MAHESH'

List employees who work in the same city they live in

SELECT ENAME,LNAME FROM employee,livesin,locatedin,company_works,company,[location] WHERE MATCH(employee-(company_works)->company-(locatedin)->[location] AND employee-(livesin)->[location])

Let me know if you have any questions, suggestions or comments.

References:

https://docs.microsoft.com/en-us/sql/relational-databases/graphs/sql-graph-sample

https://medium.com/@dmccreary/a-taxonomy-of-graph-use-cases-2ba34618cf78

https://www.youtube.com/watch?v=_KEmK_4ZBYQ

DevGateways

2019/08/15

SQL Server 2017 Graph Databases ( Overview, Uses, Example )

0 comments:

Post a Comment

Popular Posts

Categories

Blog Archive

Recent Posts

Label

Contact