# Glossary of network analysis terms

Originally compiled in conjunction with the Early Modern Digital Agendas: Network Analysis institute in July 2017, the glossary below aims to help those learning how to use network analysis as an approach with common terms. Additions and updates are welcome. N.B., words in *italics* are defined elsewhere in the glossary.

For more digital humanities tools for use at the Folger Shakespeare Library, see an extensive list in the article Digital resources at the Folger.

**actor**

- See
*node*.

**adjacency (also edge) list**

- A list of all of the edges in your network formatted in two columns (A and B). Each row signifies an edge exists between the two
*nodes*in column A and B. These edges can be undirected, or directed. In a directed network column A would be the source node, and B would be the target node.

**API**

- In computer programming, an Application Programming Interface (API) is a set of subroutine definitions, protocols, and tools for building application software.

**betweenness of a node or edge**

- The number of ’’shortest paths’’ in the network that flow through a
*node*or*edge*. Also called betweenness*centrality*. A node X has a high betweenness centrality if the shortest path from Y to Z is through X.

**bipartite (also bimodal) network**

- A network of two
*node*types in which connections are only between nodes of different types. One can perform a*projection*on a bipartite network.

**centrality of a node**

- A numerical measurement of importance of a
*node*.*Degree*is a simple example. Four types of centrality: 1) Degree Centrality – number of connections; 2) Closeness Centrality – closeness to the entire network; 3) Betweenness Centrality – to what degree a node provides a bridge to other nodes; 4)*Eigenvector Centrality*– connection to well-connected nodes, bridging nodes.

**Closeness Centrality**

- “Closeness Centrality measures the proximity of a selected
*node*to all other nodes within the graph” (Cherven, ‘’Mastering Gephi Network Visualization,’’ 2015). Calculated by getting the average distance a node has to each of the others, and then taking the reciprocal (you only take the reciprocal so that the nodes with the closest connections to all the others have a higher number).

**chunking**

- In network inference projects scholars might “chunk” a text and
*infer*an edge when two names co-occur within that text chunk. You can chunk at the sentence, paragraph, page or other level (e.g. 500 words). This is sometimes done in plays, novels, or encyclopedia entries to infer social interaction, for instance.

**component**

- A connected part of the network. Networks often consist of multiple disconnected components.

**CSV files**

- Comma separated values files allow data to be saved in a table structured format. CSVs look like a garden-variety spreadsheet but with a .csv extension (Traditionally they take the form of a text file containing information separated by commas, hence the name).

**degree of a node**

- The number of edges connected to this
*node*. Variants include in-degree/out-degree, which counts the number of ingoing and outgoing edges in a directed network. Sometimes indicated by the size of the sphere representing the node. Also called degree centrality.

**diameter of a network**

- The largest shortest path length

**directed network**

- A network in which the edges are directional, e.g. when A sends a letter to B.

**dyad**

- Two
*nodes*, usually connected by an*edge*.

**edge**

- Connection,
*link*, or*tie*between*nodes*.

**Eigenvector centrality**

- Eigenvector centrality measures the “influence of a particular node . . . by the connectedness of its closest neighbors. This can be thought of as the who you know type of centrality, wherein an individual
*node*might not be thought of as important on its own, but its relationship to other highly connected nodes indicates a high level of influence” (Ken Cherven, ‘’Mastering Gephi Network Visualization,’’ 2015).

**ego network**

- A network focused around one central
*node*. A classic example is a correspondence network derived from the collected letters of a single individual. It is hard to analyse using quantitative measures, but Wasserman and Faust’s classic textbook*Social Network Analysis*has chapters on how to analyse ego networks.

**graph**

- Another term for a network.

**homophily**

- The tendency of
*nodes*to become connected to other nodes that are similar under a certain definition of similarity.

**inferred network**

- Network
*nodes*and/or*edges*that are inferred to exist, for example based on the connectivity of other nodes and/or edges in the network, or from external properties of the nodes, such as their correlated activities over time.

**k-partite network**

- A
*multi-partite*network with k different*node*types in which nodes of the same type are not connected. A*bipartite*network is a k-partite network with k = 2.

**link**

- See
*edge*.

**Linked (Open) Data (LOD)**

- Data that is linked to external
*unique identifiers*(URIs) that have been defined by institutions or authorities, e.g. the Library of Congress's Name Authority File, Oxford Dictionary of National Biography, Wikidata, the Virtual International Authority File.

**logarithmic**

- Most data is typically plotted on a
**linear**scale. This means that values are proportionally spaced: ‘0’, ‘1’, and ‘2’ are equally far apart, and ‘10’ is ten times as far from ‘0’ as ‘1’ is. By contrast, on a**logarithmic**scale we space things equally if they are related by the same factor. So ‘1’, ‘10’, ‘100’, and ‘1000’ are equally far apart, because they are all related by a factor of 10. We can have a linear scale on one axis and a logarithmic scale on the other, or logarithmic scales on both (often called a log-log plot). See also Khan Academy's*Introduction to Logarithms*

**log-log plot**

- A plot with logarithmic scales on both axes. A
*scale-free degree distribution*appears as a straight line in a log-log plot.

**matrix**

- A way of representing a network where there is a row and a column for each
*node*, and the values in the cells indicate whether an edge exists between a pair of nodes.

**multi-partite network**

- A network with more than one
*node*type.

**node**

- Sometimes called a “vector” because it marks the intersection of lines, and sometimes called an
*actor*, nodes are the elements of a network that are being connected.

**noise**

- Another word for random variation in the data, for example due to uncertainty or error. Noise can obscure trends or patterns one is looking for in the data.

**ontology**

- A set of concepts and categories in a subject area or domain that shows their properties and the relations between them. An example of relationship (i.e.
*edge*) ontologies is that developed by Six Degrees of Francis Bacon (note, these are*directed*edge ontologies).

**projection of a bipartite network**

- Transformation of a
*bipartite*network into a*weighted*network of just one of the two original*node*types in which the weight of the connection is the number of shared neighbors in the bipartite network. When you project a bipartite network, in other words, you transform one of the node types into an edge: instead of two people nodes being connected to a place, they are connected to each other, and the place becomes the edge connecting them.

**Power law**

- See
*scale-free degree distribution*.

**scale-free degree distribution**

- Intuitively one might expect the
*degree*distribution in a network to follow a bell curve, which is more formally described as a normal (or Gaussian) distribution: a large rounded peak tapering away rapidly on each side. A simple probability distribution that resembles a bell curve or normal distribution is the roll of two dice. The distribution is centered around the number 7 and the probability decreases as you move away from the center on either side. A power-law distribution, by contrast has no peak; instead it decreases continuously and rapidly for increasing degrees. In fact the distribution of the data points within a power-law distribution is so broad across several orders of magnitude that it is normally plotted on*logarithmic*axes. On these axes a power law distribution appears as a straight diagonal line, which means that the shape of the distribution is the same for high and low degrees, resulting in what is known as a scale-free degree distribution. Whether we look at the network as a whole, or at a specific region, due to the scale-free distribution we will always find a few relatively well-connected*nodes*, or "hubs", and a much larger number of nodes with a relatively small number of connections compared to the hubs. A wide range of networks have been shown to exhibit this property, including power grids, social networks, and the world-wide web.

**shortest path**

- The shortest path (fewest number of steps) between two
*nodes*in the network.

**thresholding**

- In a
*weighted*network, considering only the*edges*in the network above a certain weight.

**tie**

- See
*edge*.

**triad**

- Three
*nodes*connected by an edge.

**weighted network**

- A network in which each edge has a numerical weight attached to it, indicating the strength of the connection.

**unipartite network**

- A network of just one
*node*type, in contrast to*bipartite*network. Networks are typically unipartite.

**Unique Identifiers/Uniform Resource Identifiers (URIs)**

- A unique string of numbers or characters to identify a unique entity. The Oxford Dictionary of National Biography, Wikidata, Six Degrees of Francis Bacon, and others each assign stable strings of numbers (URIs) to biographies of people. These URIs allow users to navigate to the correct "John Smith," for example.