Glossary of network analysis terms

Revision as of 13:25, 23 August 2017 by OwenWilliams (talk | contribs) (created page)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Originally compiled in conjunction with the Early Modern Digital Agendas: Network Analysis institute in July 2017, the glossary below aims to help those using network analysis as an approach to understand common term. Additions and updates are welcome.

For more digital humanities tools for use at the Folger Shakespeare Library, see an extensive list in the article Digital resources at the Folger.

(words in italics are defined elsewhere in the glossary)

actor
See ‘’node’’.
adjacency (also edge) list
A list of all of the edges in your network formatted in two columns (A and B). Each row signifies an edge exists between the two nodes in column A and B. These edges can be undirected, or directed. In a directed network column A would be the source node, and B would be the target node.
API
In computer programming, an Application Programming Interface (API) is a set of subroutine definitions, protocols, and tools for building application software.
betweenness of a node or edge
The number of ’’shortest paths’’ in the network that flow through a node or edge. Also called betweenness ‘’’centrality’’’. A node X has a high betweenness centrality if the shortest path from Y to Z is through X.
bipartite (also bimodal) network
A network of two node types in which connections are only between nodes of different types. One can perform a ‘’projection’’ on a bipartite network.
centrality of a node
A numerical measurement of importance of a node. ‘’Degree’’ is a simple example. Four types of centrality: 1) Degree Centrality – number of connections; 2) Closeness Centrality – closeness to the entire network; 3) Betweenness Centrality – to what degree a node provides a bridge to other nodes; 4) Eigenvector Centrality – connection to well-connected nodes, bridging nodes.
Closeness Centrality
“Closeness Centrality measures the proximity of a selected node to all other nodes within the graph” (Cherven, ‘’Mastering Gephi Network Visualization’’, 2015). Calculated by getting the average distance a node has to each of the others, and then taking the reciprocal (you only take the reciprocal so that the nodes with the closest connections to all the others have a higher number).
chunking
In network inference projects scholars might “chunk” a text and ‘’infer’’ an edge when two names co-occur within that text chunk. You can chunk at the sentence, paragraph, page or other level (e.g. 500 words). This is sometimes done in plays, novels, or encyclopedia entries to infer social interaction, for instance.
component
A connected part of the network. Networks often consist of multiple disconnected components.
CSV files
Comma separated values files allow data to be saved in a table structured format. CSVs look like a garden-variety spreadsheet but with a .csv extension (Traditionally they take the form of a text file containing information separated by commas, hence the name).
degree of a node
The number of edges connected to this node. Variants in-degree/out-degree: counts the number of ingoing/outgoing edges in a directed network. Sometimes indicated by the size of the sphere representing the node. Also called degree centrality.
diameter of a network
The largest shortest path length
directed network
A network in which the edges are directional, e.g. when A sends a letter to B.
dyad
Two nodes, usually connected by an ‘’edge’’.
edge
Connection, ‘link’, or ‘tie’ between nodes.
Eigenvector centrality
Eigenvector centrality measures the “influence of a particular node…by the connectedness of its closest neighbors. This can be thought of as the who you know type of centrality, wherein an individual node might not be thought of as important on its own, but its relationship to other highly connected nodes indicates a high level of influence” (Ken Cherven, ‘’Mastering Gephi Network Visualization’’, 2015).
ego network
A network focused around one central node. A classic example is a correspondence network derived from the collected letters of a single individual. It is hard to analyse using quantitative measures, but Wasserman and Faust’s Social Network Analysis textbook has chapters on how to analyse ego networks.
graph
Another term for a network.
homophily
The tendency of nodes to become connected to other nodes that are similar under a certain definition of similarity.
inferred network
Network ‘’nodes’’ and/or ‘’edges’’ that are inferred to exist, for example based on the connectivity of other nodes and/or edges in the network, or from external properties of the nodes, such as their correlated activities over time.
k-partite network
A ‘’multi-partite’’ network with k different node types in which nodes of the same type are not connected. A ‘’bipartite’’ network is a k-partite network with k = 2.
link
See ‘’edge’’.
Linked (Open) Data (LOD)
Data that is linked to external ‘’unique identifiers’’ (URIs) that have been defined by institutions or authorities, e.g. the Library of Congress, ODNB, Wikidata, VIAF, Six Degrees of Francis Bacon.
logarithmic
Most data is typically plotted on a linear scale. This means that values are proportionally spaced: ‘0’, ‘1’, and ‘2’ are equally far apart, and ‘10’ is ten times as far from ‘0’ as ‘1’ is. By contrast, on a logarithmic scale we space things equally if they are related by the same factor. So ‘1’, ‘10’, ‘100’, and ‘1000’ are equally far apart, because they are all related by a factor of 10. We can have a linear scale on one axis and a logarithmic scale on the other, or logarithmic scales on both (often called a log-log plot)

See also: https://www.khanacademy.org/math/algebra2/exponential-and-logarithmic-functions/introduction-to-logarithms/a/intro-to-logarithms

log-log plot
A plot with logarithmic scales on both axes. A ‘’scale-free degree distribution’’ appears as a straight line in a log-log plot.
matrix
A way of representing a network where there is a row and a column for each node, and the values in the cells indicate whether an edge exists between a pair of nodes.
multi-partite network
A network with more than one node type.
node
Also called a “vector” because it marks the intersection of lines; also sometimes called an actor.
noise
Another word for random variation in the data, for example due to uncertainty or error. Noise can obscure trends or patterns one is looking for in the data.
ontology
A set of concepts and categories in a subject area or domain that shows their properties and the relations between them. An example of relationship (i.e. ‘’edge’’) ontologies is that developed by Six Degrees of Francis Bacon (note, these are ‘’directed’’ edge ontologies): http://sixdegreesoffrancisbacon.com/relationship_types
projection of a bipartite network
Transformation of a ‘’bipartite’’ network into a ‘’weighted’’ network of just one of the two original node types in which the weight of the connection is the number of shared neighbours in the bipartite network. When you project a bipartite network, in other words, you transform one of the node types into an edge: instead of two people nodes being connected to a place, they are connected to each other, and the place becomes the edge connecting them.
Power law
See ‘’scale-free degree distribution’’.
scale-free degree distribution
Intuitively one might expect the ‘’degree’’ distribution in a network to follow a bell curve, which is more formally described as a normal (or Gaussian) distribution: a large rounded peak tapering away rapidly on each side. A simple probability distribution that resembles a bell curve or normal distribution is the roll of two dice. The distribution is centered around the number 7 and the probability decreases as you move away from the center on either side. A power-law distribution, by contrast has no peak; instead it decreases continuously and rapidly for increasing degrees. In fact the distribution of the data points within a power-law distribution is so broad across several orders of magnitude that it is normally plotted on ‘’logarithmic’’ axes. On these axes a power law distribution appears as a straight diagonal line, which means that the shape of the distribution is the same for high and low degrees, resulting in what is known as a scale-free degree distribution. Whether we look at the network as a whole, or at a specific region, due to the scale-free distribution we will always find a few relatively well-connected nodes, or "hubs", and a much larger number of nodes with a relatively small number of connections compared to the hubs. A wide range of networks have been shown to exhibit this property, including power grids, social networks, and the world-wide web.
shortest path
The shortest path (fewest number of steps) between two nodes in the network.
thresholding
A ‘’weighted’’ network: only considering the edges in the network above a certain weight.
tie
See ‘’edge’’.
triad
Three nodes connected by an edge.
weighted network
Anetwork in which each edge has a numerical weight attached to it, indicating the strength of the connection.
unipartite network
Anetwork of just one node type, in contrast to ‘’bipartite’’ network. Networks are typically unipartite.
Unique Identifiers/Uniform Resource Identifiers (URIs)
A unique string of numbers or characters to identify a unique entity. ODNB, Wikidata, SDFB etc. each assign stable strings of numbers (URIs) to biographies of people. These URIs allow users to navigate to the correct John Smith, for example.