Glossary of network analysis terms: Difference between revisions
OwenWilliams (talk | contribs) No edit summary |
OwenWilliams (talk | contribs) No edit summary |
||
(4 intermediate revisions by the same user not shown) | |||
Line 7: | Line 7: | ||
===== '''actor'''===== | ===== '''actor'''===== | ||
:See | :See ''node''. | ||
===== '''adjacency (also edge) list'''===== | ===== '''adjacency (also edge) list'''===== | ||
:A list of all of the edges in your network formatted in two columns (A and B). Each row signifies an edge exists between the two nodes in column A and B. These edges can be undirected, or directed. In a directed network column A would be the source node, and B would be the target node. | :A list of all of the edges in your network formatted in two columns (A and B). Each row signifies an edge exists between the two ''nodes'' in column A and B. These edges can be undirected, or directed. In a directed network column A would be the source node, and B would be the target node. | ||
===== '''API'''===== | ===== '''API'''===== | ||
Line 16: | Line 16: | ||
===== '''betweenness of a node or edge'''===== | ===== '''betweenness of a node or edge'''===== | ||
:The number of ’’shortest paths’’ in the network that flow through a node or edge. Also called betweenness ''centrality''. A node X has a high betweenness centrality if the shortest path from Y to Z is through X. | :The number of ’’shortest paths’’ in the network that flow through a ''node'' or ''edge''. Also called betweenness ''centrality''. A node X has a high betweenness centrality if the shortest path from Y to Z is through X. | ||
===== '''bipartite (also bimodal) network'''===== | ===== '''bipartite (also bimodal) network'''===== | ||
:A network of two node types in which connections are only between nodes of different types. One can perform a ''projection'' on a bipartite network. | :A network of two ''node'' types in which connections are only between nodes of different types. One can perform a ''projection'' on a bipartite network. | ||
===== '''centrality of a node'''===== | ===== '''centrality of a node'''===== | ||
:A numerical measurement of importance of a node. ''Degree'' is a simple example. Four types of centrality: 1) Degree Centrality – number of connections; 2) Closeness Centrality – closeness to the entire network; 3) Betweenness Centrality – to what degree a node provides a bridge to other nodes; 4) Eigenvector Centrality – connection to well-connected nodes, bridging nodes. | :A numerical measurement of importance of a ''node''. ''Degree'' is a simple example. Four types of centrality: 1) Degree Centrality – number of connections; 2) Closeness Centrality – closeness to the entire network; 3) Betweenness Centrality – to what degree a node provides a bridge to other nodes; 4) ''Eigenvector Centrality'' – connection to well-connected nodes, bridging nodes. | ||
===== '''Closeness Centrality'''===== | ===== '''Closeness Centrality'''===== | ||
:“Closeness Centrality measures the proximity of a selected node to all other nodes within the graph” (Cherven, ‘’Mastering Gephi Network Visualization,’’ 2015). Calculated by getting the average distance a node has to each of the others, and then taking the reciprocal (you only take the reciprocal so that the nodes with the closest connections to all the others have a higher number). | :“Closeness Centrality measures the proximity of a selected ''node'' to all other nodes within the graph” (Cherven, ‘’Mastering Gephi Network Visualization,’’ 2015). Calculated by getting the average distance a node has to each of the others, and then taking the reciprocal (you only take the reciprocal so that the nodes with the closest connections to all the others have a higher number). | ||
===== '''chunking'''===== | ===== '''chunking'''===== | ||
Line 37: | Line 37: | ||
===== '''degree of a node'''===== | ===== '''degree of a node'''===== | ||
:The number of edges connected to this node. Variants include in-degree/out-degree, which counts the number of ingoing and outgoing edges in a directed network. Sometimes indicated by the size of the sphere representing the node. Also called degree centrality. | :The number of edges connected to this ''node''. Variants include in-degree/out-degree, which counts the number of ingoing and outgoing edges in a directed network. Sometimes indicated by the size of the sphere representing the node. Also called degree centrality. | ||
===== '''diameter of a network'''===== | ===== '''diameter of a network'''===== | ||
Line 46: | Line 46: | ||
===== '''dyad'''===== | ===== '''dyad'''===== | ||
:Two nodes, usually connected by an ''edge''. | :Two ''nodes'', usually connected by an ''edge''. | ||
===== '''edge'''===== | ===== '''edge'''===== | ||
:Connection, ''link'', or ''tie'' between nodes. | :Connection, ''link'', or ''tie'' between ''nodes''. | ||
===== '''Eigenvector centrality'''===== | ===== '''Eigenvector centrality'''===== | ||
:Eigenvector centrality measures the “influence of a particular node . . . by the connectedness of its closest neighbors. This can be thought of as the who you know type of centrality, wherein an individual node might not be thought of as important on its own, but its relationship to other highly connected nodes indicates a high level of influence” (Ken Cherven, ‘’Mastering Gephi Network Visualization,’’ 2015). | :Eigenvector centrality measures the “influence of a particular node . . . by the connectedness of its closest neighbors. This can be thought of as the who you know type of centrality, wherein an individual ''node'' might not be thought of as important on its own, but its relationship to other highly connected nodes indicates a high level of influence” (Ken Cherven, ‘’Mastering Gephi Network Visualization,’’ 2015). | ||
===== '''ego network'''===== | ===== '''ego network'''===== | ||
:A network focused around one central node. A classic example is a correspondence network derived from the collected letters of a single individual. It is hard to analyse using quantitative measures, but Wasserman and Faust’s ''Social Network Analysis'' | :A network focused around one central ''node''. A classic example is a correspondence network derived from the collected letters of a single individual. It is hard to analyse using quantitative measures, but Wasserman and Faust’s classic textbook ''Social Network Analysis'' has chapters on how to analyse ego networks. | ||
===== '''graph'''===== | ===== '''graph'''===== | ||
Line 61: | Line 61: | ||
===== '''homophily'''===== | ===== '''homophily'''===== | ||
:The tendency of nodes to become connected to other nodes that are similar under a certain definition of similarity. | :The tendency of ''nodes'' to become connected to other nodes that are similar under a certain definition of similarity. | ||
===== '''inferred network'''===== | ===== '''inferred network'''===== | ||
Line 67: | Line 67: | ||
===== '''k-partite network'''===== | ===== '''k-partite network'''===== | ||
:A ''multi-partite'' network with k different node types in which nodes of the same type are not connected. A ''bipartite'' network is a k-partite network with k = 2. | :A ''multi-partite'' network with k different ''node'' types in which nodes of the same type are not connected. A ''bipartite'' network is a k-partite network with k = 2. | ||
====='''link'''===== | ====='''link'''===== | ||
Line 82: | Line 82: | ||
===== '''matrix'''===== | ===== '''matrix'''===== | ||
:A way of representing a network where there is a row and a column for each node, and the values in the cells indicate whether an edge exists between a pair of nodes. | :A way of representing a network where there is a row and a column for each ''node'', and the values in the cells indicate whether an edge exists between a pair of nodes. | ||
===== '''multi-partite network'''===== | ===== '''multi-partite network'''===== | ||
:A network with more than one node type. | :A network with more than one ''node'' type. | ||
===== '''node'''===== | ===== '''node'''===== | ||
:Sometimes called a “vector” because it marks the intersection of lines, and sometimes called an actor, nodes are the elements of a network that are being connected. | :Sometimes called a “vector” because it marks the intersection of lines, and sometimes called an ''actor'', nodes are the elements of a network that are being connected. | ||
===== '''noise'''===== | ===== '''noise'''===== | ||
Line 97: | Line 97: | ||
====='''projection of a bipartite network'''===== | ====='''projection of a bipartite network'''===== | ||
:Transformation of a ''bipartite'' network into a ''weighted'' network of just one of the two original node types in which the weight of the connection is the number of shared neighbors in the bipartite network. When you project a bipartite network, in other words, you transform one of the node types into an edge: instead of two people nodes being connected to a place, they are connected to each other, and the place becomes the edge connecting them. | :Transformation of a ''bipartite'' network into a ''weighted'' network of just one of the two original ''node'' types in which the weight of the connection is the number of shared neighbors in the bipartite network. When you project a bipartite network, in other words, you transform one of the node types into an edge: instead of two people nodes being connected to a place, they are connected to each other, and the place becomes the edge connecting them. | ||
===== '''Power law'''===== | ===== '''Power law'''===== | ||
Line 103: | Line 103: | ||
===== '''scale-free degree distribution'''===== | ===== '''scale-free degree distribution'''===== | ||
:Intuitively one might expect the ''degree'' distribution in a network to follow a bell curve, which is more formally described as a normal (or Gaussian) distribution: a large rounded peak tapering away rapidly on each side. A simple probability distribution that resembles a bell curve or normal distribution is the roll of two dice. The distribution is centered around the number 7 and the probability decreases as you move away from the center on either side. A power-law distribution, by contrast has no peak; instead it decreases continuously and rapidly for increasing degrees. In fact the distribution of the data points within a power-law distribution is so broad across several orders of magnitude that it is normally plotted on ''logarithmic'' axes. On these axes a power law distribution appears as a straight diagonal line, which means that the shape of the distribution is the same for high and low degrees, resulting in what is known as a scale-free degree distribution. Whether we look at the network as a whole, or at a specific region, due to the scale-free distribution we will always find a few relatively well-connected nodes, or "hubs", and a much larger number of nodes with a relatively small number of connections compared to the hubs. A wide range of networks have been shown to exhibit this property, including power grids, social networks, and the world-wide web. | :Intuitively one might expect the ''degree'' distribution in a network to follow a bell curve, which is more formally described as a normal (or Gaussian) distribution: a large rounded peak tapering away rapidly on each side. A simple probability distribution that resembles a bell curve or normal distribution is the roll of two dice. The distribution is centered around the number 7 and the probability decreases as you move away from the center on either side. A power-law distribution, by contrast has no peak; instead it decreases continuously and rapidly for increasing degrees. In fact the distribution of the data points within a power-law distribution is so broad across several orders of magnitude that it is normally plotted on ''logarithmic'' axes. On these axes a power law distribution appears as a straight diagonal line, which means that the shape of the distribution is the same for high and low degrees, resulting in what is known as a scale-free degree distribution. Whether we look at the network as a whole, or at a specific region, due to the scale-free distribution we will always find a few relatively well-connected ''nodes'', or "hubs", and a much larger number of nodes with a relatively small number of connections compared to the hubs. A wide range of networks have been shown to exhibit this property, including power grids, social networks, and the world-wide web. | ||
===== '''shortest path'''===== | ===== '''shortest path'''===== | ||
:The shortest path (fewest number of steps) between two nodes in the network. | :The shortest path (fewest number of steps) between two ''nodes'' in the network. | ||
===== '''thresholding'''===== | ===== '''thresholding'''===== | ||
: | :In a ''weighted'' network, considering only the ''edges'' in the network above a certain weight. | ||
===== '''tie'''===== | ===== '''tie'''===== | ||
Line 115: | Line 115: | ||
===== '''triad'''===== | ===== '''triad'''===== | ||
:Three nodes connected by an edge. | :Three ''nodes'' connected by an edge. | ||
===== '''weighted network'''===== | ===== '''weighted network'''===== | ||
Line 121: | Line 121: | ||
===== '''unipartite network'''===== | ===== '''unipartite network'''===== | ||
:A network of just one node type, in contrast to ''bipartite'' network. Networks are typically unipartite. | :A network of just one ''node'' type, in contrast to ''bipartite'' network. Networks are typically unipartite. | ||
===== '''Unique Identifiers/Uniform Resource Identifiers (URIs)''' ===== | ===== '''Unique Identifiers/Uniform Resource Identifiers (URIs)''' ===== |
Latest revision as of 14:00, 23 August 2017
Originally compiled in conjunction with the Early Modern Digital Agendas: Network Analysis institute in July 2017, the glossary below aims to help those learning how to use network analysis as an approach with common terms. Additions and updates are welcome. N.B., words in italics are defined elsewhere in the glossary.
For more digital humanities tools for use at the Folger Shakespeare Library, see an extensive list in the article Digital resources at the Folger.
actor
- See node.
adjacency (also edge) list
- A list of all of the edges in your network formatted in two columns (A and B). Each row signifies an edge exists between the two nodes in column A and B. These edges can be undirected, or directed. In a directed network column A would be the source node, and B would be the target node.
API
- In computer programming, an Application Programming Interface (API) is a set of subroutine definitions, protocols, and tools for building application software.
betweenness of a node or edge
- The number of ’’shortest paths’’ in the network that flow through a node or edge. Also called betweenness centrality. A node X has a high betweenness centrality if the shortest path from Y to Z is through X.
bipartite (also bimodal) network
- A network of two node types in which connections are only between nodes of different types. One can perform a projection on a bipartite network.
centrality of a node
- A numerical measurement of importance of a node. Degree is a simple example. Four types of centrality: 1) Degree Centrality – number of connections; 2) Closeness Centrality – closeness to the entire network; 3) Betweenness Centrality – to what degree a node provides a bridge to other nodes; 4) Eigenvector Centrality – connection to well-connected nodes, bridging nodes.
Closeness Centrality
- “Closeness Centrality measures the proximity of a selected node to all other nodes within the graph” (Cherven, ‘’Mastering Gephi Network Visualization,’’ 2015). Calculated by getting the average distance a node has to each of the others, and then taking the reciprocal (you only take the reciprocal so that the nodes with the closest connections to all the others have a higher number).
chunking
- In network inference projects scholars might “chunk” a text and infer an edge when two names co-occur within that text chunk. You can chunk at the sentence, paragraph, page or other level (e.g. 500 words). This is sometimes done in plays, novels, or encyclopedia entries to infer social interaction, for instance.
component
- A connected part of the network. Networks often consist of multiple disconnected components.
CSV files
- Comma separated values files allow data to be saved in a table structured format. CSVs look like a garden-variety spreadsheet but with a .csv extension (Traditionally they take the form of a text file containing information separated by commas, hence the name).
degree of a node
- The number of edges connected to this node. Variants include in-degree/out-degree, which counts the number of ingoing and outgoing edges in a directed network. Sometimes indicated by the size of the sphere representing the node. Also called degree centrality.
diameter of a network
- The largest shortest path length
directed network
- A network in which the edges are directional, e.g. when A sends a letter to B.
dyad
- Two nodes, usually connected by an edge.
edge
- Connection, link, or tie between nodes.
Eigenvector centrality
- Eigenvector centrality measures the “influence of a particular node . . . by the connectedness of its closest neighbors. This can be thought of as the who you know type of centrality, wherein an individual node might not be thought of as important on its own, but its relationship to other highly connected nodes indicates a high level of influence” (Ken Cherven, ‘’Mastering Gephi Network Visualization,’’ 2015).
ego network
- A network focused around one central node. A classic example is a correspondence network derived from the collected letters of a single individual. It is hard to analyse using quantitative measures, but Wasserman and Faust’s classic textbook Social Network Analysis has chapters on how to analyse ego networks.
graph
- Another term for a network.
homophily
- The tendency of nodes to become connected to other nodes that are similar under a certain definition of similarity.
inferred network
- Network nodes and/or edges that are inferred to exist, for example based on the connectivity of other nodes and/or edges in the network, or from external properties of the nodes, such as their correlated activities over time.
k-partite network
- A multi-partite network with k different node types in which nodes of the same type are not connected. A bipartite network is a k-partite network with k = 2.
link
- See edge.
Linked (Open) Data (LOD)
- Data that is linked to external unique identifiers (URIs) that have been defined by institutions or authorities, e.g. the Library of Congress's Name Authority File, Oxford Dictionary of National Biography, Wikidata, the Virtual International Authority File.
logarithmic
- Most data is typically plotted on a linear scale. This means that values are proportionally spaced: ‘0’, ‘1’, and ‘2’ are equally far apart, and ‘10’ is ten times as far from ‘0’ as ‘1’ is. By contrast, on a logarithmic scale we space things equally if they are related by the same factor. So ‘1’, ‘10’, ‘100’, and ‘1000’ are equally far apart, because they are all related by a factor of 10. We can have a linear scale on one axis and a logarithmic scale on the other, or logarithmic scales on both (often called a log-log plot). See also Khan Academy's Introduction to Logarithms
log-log plot
- A plot with logarithmic scales on both axes. A scale-free degree distribution appears as a straight line in a log-log plot.
matrix
- A way of representing a network where there is a row and a column for each node, and the values in the cells indicate whether an edge exists between a pair of nodes.
multi-partite network
- A network with more than one node type.
node
- Sometimes called a “vector” because it marks the intersection of lines, and sometimes called an actor, nodes are the elements of a network that are being connected.
noise
- Another word for random variation in the data, for example due to uncertainty or error. Noise can obscure trends or patterns one is looking for in the data.
ontology
- A set of concepts and categories in a subject area or domain that shows their properties and the relations between them. An example of relationship (i.e. edge) ontologies is that developed by Six Degrees of Francis Bacon (note, these are directed edge ontologies).
projection of a bipartite network
- Transformation of a bipartite network into a weighted network of just one of the two original node types in which the weight of the connection is the number of shared neighbors in the bipartite network. When you project a bipartite network, in other words, you transform one of the node types into an edge: instead of two people nodes being connected to a place, they are connected to each other, and the place becomes the edge connecting them.
Power law
- See scale-free degree distribution.
scale-free degree distribution
- Intuitively one might expect the degree distribution in a network to follow a bell curve, which is more formally described as a normal (or Gaussian) distribution: a large rounded peak tapering away rapidly on each side. A simple probability distribution that resembles a bell curve or normal distribution is the roll of two dice. The distribution is centered around the number 7 and the probability decreases as you move away from the center on either side. A power-law distribution, by contrast has no peak; instead it decreases continuously and rapidly for increasing degrees. In fact the distribution of the data points within a power-law distribution is so broad across several orders of magnitude that it is normally plotted on logarithmic axes. On these axes a power law distribution appears as a straight diagonal line, which means that the shape of the distribution is the same for high and low degrees, resulting in what is known as a scale-free degree distribution. Whether we look at the network as a whole, or at a specific region, due to the scale-free distribution we will always find a few relatively well-connected nodes, or "hubs", and a much larger number of nodes with a relatively small number of connections compared to the hubs. A wide range of networks have been shown to exhibit this property, including power grids, social networks, and the world-wide web.
shortest path
- The shortest path (fewest number of steps) between two nodes in the network.
thresholding
- In a weighted network, considering only the edges in the network above a certain weight.
tie
- See edge.
triad
- Three nodes connected by an edge.
weighted network
- A network in which each edge has a numerical weight attached to it, indicating the strength of the connection.
unipartite network
- A network of just one node type, in contrast to bipartite network. Networks are typically unipartite.
Unique Identifiers/Uniform Resource Identifiers (URIs)
- A unique string of numbers or characters to identify a unique entity. The Oxford Dictionary of National Biography, Wikidata, Six Degrees of Francis Bacon, and others each assign stable strings of numbers (URIs) to biographies of people. These URIs allow users to navigate to the correct "John Smith," for example.