Glossary of digital humanities terms: Difference between revisions

Latest revision as of 12:59, 27 August 2019

Originally compiled by Daniel Powell in conjunction with the Early Modern Digital Agendas institute in July 2013, the glossary below aims to help both novices and more advanced users of digital tools and approaches understand common terms employed in the digital humanities. Additions and updates are welcome.

For more digital humanities tools for use at the Folger Shakespeare Library, see an extensive list in the article Digital resources at the Folger.

Academia.edu

A social-networking platform for academics to share, track, and communicate research. Founded in 2008, Academia.edu has over 1.4 million users and contains nearly 1.4 million papers. The site allows for the real-time access of research relevant to users' interests in an open-source format.

Adobe Flash

A proprietary, browser-independent, vector-graphics animation platform. Using the Player plug-in, Flash content will appear identically across various browsers and devices.

AntConc

A free concordance program available for Windows, Mac OS X, and Linux operating systems. AntConc has evolved from a simple concordance program into a powerful tool for textual analysis. It is able to perform the following types of linguistic analyses: concordance, concordance plot, clusters, n-grams, collocates, word frequency, keyword list.

Apache

A public domain, open-source, Web-server software package. This software allows a user's computer and a Web server to communicate with each other. The Apache HTTP Server is the most widely used Web server in the world.

API

Application programming interface: A specification that allows software applications to communicate with one another. An API allows client programs to access facilities within an application.

BASIC (beginner's all-purpose symbolic instruction code)

A simple, easy-to-learn programming language developed in the mid-1960s for nonscience students that incorporated a simple program editor.

bioinformatics

The application of computing and information technologies to the study and preservation of biological data. An inherently interdisciplinary field, bioinformatics research originates in computer science and more traditional scientific fields. Major bioinformatics research fields include sequence analysis of DNA, databases and data mining of scientific literature, 3-D visualization, and genome annotation.

bit

A unit of information derived from a choice between two equally probable alternatives or ‘events’; such a unit stored electronically in a computer, e.g. 1 and 0.

bitmap

a representation, e.g. of a computer memory, in which each item is represented by one bit; spec. a graphic display in which characters are formed by assigning to each individual pixel a bit value.

browser

A software application allowing users to locate and retrieve information from networked information services. Now most frequently used to refer to a Web browser, the term refers to a specialized computer program for viewing, interacting with, and navigating Web pages. These programs use HTTP to implement HTML (see below).

cluster analysis

A way of analyzing data that classifies a set of information into two or more mutually exclusive groups based on combinations of internal variables. Cluster analysis is useful for discovering structures and patterns within data based solely on a selected category of similarity and difference. In practice, cluster analysis of a corpus oftexts usually groups them together according to the similarities and differences of the frequencies ofthe most frequent words. Cluster analysis has been shown to be highly reliable in authorship attribution and genre identification. The statistical software program MINITAB facilitates cluster analysis.

CommentPress

An open-source theme and plug-in for the WordPress content management system that allows readers to comment paragraph by paragraph in the margin of a text. It can be applied to a fixed document (e.g., essay, book) or to a constantly updated blog. Recently, CommentPress has evolved into Digress.it, a more robust version of the application. Users of CommentPress must have a WordPress Web site, and users of Digress.it must register on Digress.it for a hosted account.

Concordance

A proprietary concordance program. Concordance is a comprehensive application with a number of powerful features, including multiple language support, user-definable alphabets, user-definable contexts, multiple-pane viewing, the ability to statistically analyze selected texts, and the ability to export concordance results as text, HTML, or Web Concordance files.

content management system

A software program or suite of applications designed to enable the creation, editing, review, organization, and publication of content to the Web from a central interface. Popular content management systems include WordPress, Drupal, and Joomla!.

corpus

Pl. Corpora, a collection of written texts, particularly the entire body of work on a subject or by a specific creator; a collection of written or spoken material in machine-readable form, assembled for the purpose of studying linguistic structures, frequencies, etc.

CSS (cascading style sheets)

A way of specifying appearance of HTML or XML in a browser. CSS allows the separation of structural content from presentation. For a further introduction to CSS, see Eric A. Meyer, Cascading Style Sheets: The Definitive Guide (2nd ed.; Sebastopol: O'Reilly, 2004; print), 1-22.

CSV

or (Comma Separated Values) a file type which allows data to be saved in a table structured format. The extension for this file is .csv; they traditionally take the form of a text file containing information separated by commas, but may be manipulated by programs like excel and open refine.

database

A collection of information organized in such a way that a computer program can quickly select desired data. The structure of a database is dependent on the type of relationship being described. A database differs from a file of that same information in that it describes how the data relate to one another instead of presenting an unordered collection of the same content.

Drupal

A free and open-source content management system distributed under the GNU Public License. Drupal, WordPress, and Joomla! are the most common content management systems used to manage Web content.

Dublin Core

A standard set of vocabulary terms used to describe a wide range of resources. This set of elements comprises a basic, standardized, shared system of metadata widely used by libraries, governments, international organizations, and businesses. See metadata.

EEBO-TCP (Early English Books Online-Text Creation Partnership)

A partnership between EEBO, a ProQuest subscription database, and the non-profit Text Creation Partnership to create standardized, accurate XML-encoded electronic editions of early print books. The EEBO corpus consists of the works represented in the English Short Title Catalogue, the Wing Catalogue, the Thomason Tracts, and the Early English Books Tract Supplement. EEBO-TCP seeks to provide accurate, publicly accessible full-text transcriptions of these early printed texts. EEBO also exists as a stand-alone Proquest/Chadwyck-Healey product that provides access to the same texts through PDF images of microfilmed pages. Thus all EEBO texts are available as PDF images; those that have been transcribed through EEBO-TCP are available as both PDF images and full-text documents.

See History of Early English Books Online and Using Early English Books Online for more information.

ECCO (Eighteenth Century Collections Online)

Much like EEBO, ECCO is a digital collection consisting of all significant English and foreign language titles published in the United Kingdom during the long eighteenth century (1660-1815). Its over 200,000 volumes are available as PDF images; while ECCO-TCP has made over 2,000 texts freely available as full-text transcriptions.

electronic literature

Born-digital, first-generation digital objects created on a computer and usually meant to be read on one; alternatively, literature that takes advantage of the capabilities and contexts provided by stand-alone or networked computing devices. This broad collection of work often leverages the capabilities of hypertext linking, interactivity, game play, and multimedia presented by executable code. See N. Katherine Hayles, Electronic Literature: What Is lt?.

Facebook

A social-networking service launched in 2004. Users, once registered, may create a personal profile, add other users as friends, exchange messages, join groups, and post and share images. Facebook is the most popular social-networking site in the English speaking world.

FedoraCommons (Fedora Extensible Digital Object Repository Architecture)

A modular, digital-assets-management architecture for storing, managing, and accessing digital objects. Not to be confused with the Linux operating system named Fedora, FedoraCommons provides an extremely flexible underlying architecture for the formation of digital repositories containing any type of digital content.

Flickr

A Web site for sharing photos and videos created in 2004. Widely used to host images embedded in blogs and online forums, Flickr holds more than 6 billion images. Flickr is a Web 2.0 application that uses folksonomic tagging to organize content for collection and discovery.

GIS (Geographic Information Systems)

A computer system designed to capture, store, manipulate, analyze, manage, and present all types of spatial or geographical data.

Google Books

The full text of books and other print materials are scanned by Google, converted to text using optical character recognition, stored in its database, and made available for searching. Materials in the public domain are available in full and for download; for materials in copyright, various access levels are available. Google Books currently contains over 20 million items.

Google Books Ngram Viewer

A graphing tool, developed by Google to chart the yearly count of selected n-grams (letter combinations), words, or phrases as found in over 5.2 million books digitized by Google through 2008. Results are displayed as a normalized line chart, with only matches found in over 40 books indexed in the database.

Google Docs

Google's free, Web-based office suite and data storage service. Now revamped as Google Drive, the service allows users to create, edit, and share documents, spreadsheets, and slide-based presentations. While Google Docs still exists (as of September 2014) as a legacy system, Google Drive incorporates Google Docs and also allows users to store, share, and sync any file on Google servers.

Google Earth

A virtual globe and map application that allows users to view satellite imagery, maps, terrain, 3-D buildings, and so on. Images and data are updated regularly. Since its release in 2005, Google Earth has been downloaded more than 1 billion times.

HTML (hypertext markup language)

An authoring language used to create documents on the World Wide Web. HTML defines the structure and layout of a document using a variety of tags and attributes. Web browsers read HTML documents and transform them into the Web pages users encounter online; HTML is not displayed directly but is used by a browser to interpret the content of a page. XHTML is HTML written as XML. See Chuck Musciano and Bill Kennedy, HTML: The Definitive Guide (3rd ed.; Sebastopol: O'Reilly, 1998), 1-15.

HTTP (hypertext transfer protocol)

An application protocol (rules) for the exchange or transfer of hypertext. HTTP is the underlying protocol used by the World Wide Web to define how messages are formatted and transmitted. The abbreviation, in lower-case letters followed by a colon, constitutes the beginning of the web address of a file to be transmitted using this protocol.

interactive visualization

A graphic representation designed to be manipulated by human users. In an interactive visualization, computer-generated graphic illustrations of information can change with user input. A very basic example is the movement of a mouse cursor on a computer screen that occurs when a user moves the physical mouse device.

Internet Archive

A not-for-profit, open-access digital library. It contains over 3 million books that are in the public domain, as well as music, moving images, audio files, software, and archived Web pages. Digital material can be downloaded and uploaded by users. Internet Archive oversees one of the largest book-digitization projects in the world.

JavaScript

A programming language resembling C++. Typically, JavaScript programs are inserted into the HTML of a Web page and executed by a browser. JavaScript provides dynamic content, changing the look of a page or responding to user-initiated events such as a mouse click.

Joomla!

A free, open-source content management system for website development distributed under the GNU Public License. Other common content management systems include Drupal and WordPress.

KWIC (keyword in context)

A type of concordance output that sorts and aligns words within a textual sample alphabetically and in conjunction with surrounding text. Instead of isolating search terms in a list of individual words, KWIC allows users to see the results of a search within a limited context, providing a fuller meaning. KWIC is also the name of a concordance program (KWIC Concordance for Windows) designed to analyze texts and provide word frequency lists, concordance, and collocation tables.

LAMP (Linux, Apache HTTP Server, MySQL, and PHP)

A set of free, open-source software programs used to build a general-purpose Web server. Linux is the operating system, Apache is the Web server, MySQL is the relational database management system, and PHP (or Perl or Python) is the programming language

Linkedln

A social-networking site for professional networking. Linkedln profiles summarize work history, education, and professional achievements. Linkedln also allows users to develop "connections" with colleagues, clients, and partners.

LION (Literature Online)

A virtual library containing over 350,000 literary texts, full-text journals, author biographies, and other reference and critical sources related to the study of English language literature. Launched in 1996 and owned by ProQuest/Chadwyck-Healey, LION is available by subscription.

LiveJournal

A combination blog and social-networking site founded in 1999. Users are able to write entries for their personal journal, restrict visibility, upload multimedia, customize the appearance of their journal via HTML and CSS, "friend" other users, join communities based on common interests, and comment on the entries of other users. LiveJournal has over 1.8 million active users and was an early example of a Web 2.0 site.

LM (language model)

A statistical language model is a probability distribution over sequences of words, assigning a mathematical probability that estimates the relative likelihood of different phrases. The basic mathematical formula for this is P(w₁,..., w_m), where m is the length of the sequence. Language models are used in speech recognition, Optical Character Recognition (OCR), Part-of-speech tagging (POS), and other processes. An n-gram LM assumes that the probability of a word depends on the previous n words.

Lucene

A free and open-source information retrieval software library supported by the Apache Software Foundation. Lucene facilitates full-text indexing and search of any Web content but is primarily used for searching local, single-site Web applications such as Twitter. Lucene is file-format agnostic and works with PDFs, HTML, and word processors as long as their textual information is able to be extracted.

machine learning

A way of programming computers that allows for the evolution of computational behavior based on empirical data or past experience. Machine learning focuses particularly on the ability of computers to learn to recognize complex patterns and make intelligent decisions based on those patterns, an ability that is especially valuable in computational textual analysis. See Ethem Alpaydin, Introduction to Machine Learning (Cambridge: MIT P, 2004; print).

Many Eyes

An IBM-developed Web site where users may upload data, create interactive or static visualizations, and carry on discussions. The site is designed to facilitate not only individual discovery through data visualization but also to spur discussion and collaboration between individuals engaged in similar types of knowledge production. Many Eyes provides numerous types of visualizations, divided into categories, including scatter plots, network diagrams, bar charts, bubble charts, line graphs, word trees, tag clouds, and tree maps. Along with Voyant Tools, Many Eyes is one of the most useful Web-based visualization and analysis platforms publicly available.

MARC (machine-readable cataloging)

An international set of standards for the representation and communication of bibliographic information in machine-readable form. Developed by the Library of Congress in the 1960s, MARC standards constitute the foundation of most library cataloging systems in use today. For another set of standards, see Dublin Core.

Memex

A term coined by Vannevar Bush to refer to a mechanized device to store, access, and organize massive amounts of information. Bush formulated his idea of the memex in a 1945 article published in The Atlantic Monthly ("As We May Think"). The idea of the memex influenced the development of hypertext, personal computing, the Internet, the World Wide Web, and online knowledge collections such as Wikipedia.

metadata

Data describing other data. Metadata provide information about one or more aspects of data, such as type, date, creator, location, and so on. Most often encountered in library and archival contexts, metadata facilitate the organization, discovery, and use of a wide range of resources. For further information, consult the National Information Standards Organization's publication Understanding Metadata [pdf].

methodological commons

In Willard McCarty's formulation, a set of computational techniques shared among the disciplines of the humanities and related social sciences, including database design, text analysis, numerical analysis, imaging, music information retrieval, and communication. For an illustration of this methodological commons, as well as further analysis of the role humanities computing has to play in such a system, see Willard McCarty, "Humanities Computing" Encyclopedia of Library and Information Science (New York: Dekker, 2003).

MINITAB

A proprietary and well-established statistical analysis program developed in the 1970s. MINITAB allows for basic statistical calculations, as well as regression analyses, table and graph production, multivariate analyses, forecasting tools, and variation analysis.

MMOG (massively multiplayer online game)

A term that describes online role-playing games that usually feature a persistent and evolving virtual world and allow online cooperation and competition on a large scale. World of Warcraft is one of the largest and most popular MMOGs in the world.

MonoConc Pro

An easy-to-use concordance program. In addition to providing full-text search capability for uploaded texts, MonoConc Pro enables textual analysis such as regular expression searches, tag searches, and the ability to compare corpuses based on chosen variables.

multidimensional scaling

A set of analytic techniques used to visualize similarities or dissimilarities in data. Multidimensional scaling is increasingly used to represent nonspatial information in spatial terms, often within GIS applications.

Myspace

A social-networking service founded in 2003. During the period 2005-08, Myspace was the most popular social-networking site in the world. It was surpassed in popularity by Facebook in 2008 and has seen a steady decline in users since then.

MySQL

An open-source relational database management system (RDMS). MySQL is the most widely used RDMS in the world. Many of the World Wide Web's most heavily used Web sites and applications use MySQL.

new media

A broad term used to refer to the digital creation, distribution, and execution of content, as well as interactive user feedback and communities that form around such content. New media creation and criticism have often been identified with artistic production and the social democratization and justice movements. See The New Media Reader, Eds. Noah Wardrip-Fruin and Nick Montfort (Cambridge: MIT P, 2003), 3-25.

n-gram

In linguistics, a sequence of n items from a given sequence of text or speech. N-grams can be any combination of letters, phonemes, syllables, words, or letters. A bigram sequence of the phrase "to be or not to be," for instance, would break down as follows: to be, be or, or not, not to, to be. N-grams are regularly used in natural language processing and speech recognition.

OCR (optical character recognition)

The use of computer technologies to convert scanned images of typewritten, printed, or handwritten text into machine-readable text. This conversion allows for the computerization of material texts into formats for digital storage, search, and display. Adobe Acrobat Professional supports OCR processes, as does Microsoft Office for Windows. OCR accuracy depends on the font and style of the original document. Unusual letterforms and strong serifs can cause transcription errors, the most common of which is the long-s to f misread for early modern texts.

OHCO (ordered hierarchy of content objects)

A phrase coined to answer the question, "What is text?" Texts are, in this view, composed of objects (e.g., chapters, paragraphs, sentences) organized hierarchically so that they "nest" within one another. These objects do not overlap, and they organize text into units based on meaning and communication. This concept is integral to TEl encoding with XML. See Allen Renear, Elli Mylonas, and David Durand, Refining our Notion of What Text Really Is: The Problem of Overlapping Hierarchies, and Steven J. DeRose, David G. Durand, and Allen H. Renear, "What Is Text, Really?" Journal of Computing in Higher Education 1.2 (1990): 3-26.

Omeka

A free, open-source Web-publishing platform for the display of library, museum, archives, and scholarly collections and exhibitions. It is available as either a hosted application, or as a content management system (CMS) downloaded and installed on an outside server. Developed at the Center for History and New Media at George Mason University, Omeka is designed to help nonspecialists digitally present collections-based research. Omeka uses Dublin Core metadata standards to organize content.

Orlando: Women's Writing in the British Isles from the Beginnings to the Present

A highly developed electronic textbase for research on and discovery of women's writing in the British Isles. Orlando contains more than 8 million words of text and seeks to produce a full scholarly history of women's writing in the British Isles by integrating biographical entries, bibliographic listings, contextual historical material, among other materials. Collaboratively authored and in a state of constant growth, Orlando is a powerful tool for navigating an impressive amount of information related to women's writing in the British Isles.

pattern-recognition analytics

In machine learning, a type of algorithm allowing machines to detect patterns in given input. In the digital humanities, pattern-recognition analytics often take the form of algorithms that facilitate the classification, clustering, regression, and sequence labeling of textual input. The application of pattern-recognition applications to text has proven to be useful in author studies and stylometrics. See cluster analysis and machine learning, as well as Christopher Bishop, Pattern Recognition and Machine Learning (New York: Springer, 2006).

PBWorks

A hosted wiki space founded in 2005. PBWorks allows the setup of a collaborative wiki site that may be public or private and is available at a basic level of functionality at no cost to the user.

Plain Vanilla ASCII

A phrase used by Project Gutenberg to describe their philosophy of preserving texts in the simplest, easiest-to-use form available. In practice, this means that Project Gutenberg uses a basic form of the American Standard Code for Information Interchange (ASCII) to preserve and disseminate texts. Nearly all software programs and applications are able to interpret and display ASCII characters, ensuring the longevity and usability of Project Gutenberg texts. See Michael Hart, "The History and Philosophy of Project Gutenberg" Project Gutenberg (Project Gutenberg, 1992).

point pattern analysis

A set of analytic techniques used to study the spatial arrangement of points in space within a defined area. Point pattern analysis can indicate whether a set of data is clustered, regular, or random in a given space. Point pattern analysis is often used in GIS (Geographic Information Systems) to detect geographic patterns.

principal components analysis

An analytic technique designed to identify patterns in data and express that data in a way that highlights similarity and difference within the data. It is based on the principle of reducing the differences inherent in a set of interrelated variables while retaining as much variation as possible. See I. T. Jolliffe, Principal Component Analysis (New York: Springer, 2002).

Project Gutenberg

A volunteer-based project founded in 1971 to digitize and archive literary texts. Digitized texts are freely available for download in a variety of formats. Many of these items are full-text transcriptions of books in the public domain. The project is the oldest single collection of free electronic texts.

Python

A general-purpose programming language emphasizing readability and easy debugging. Python has found wide use in a variety of Web applications. For further information, as well as tutorials designed for different levels of knowledge.

raster graphics

A graphic stored as a bitmap. A bitmap is a representation in which each item corresponds to one or more bits of information. When referring to graphics, these representations take the form of rows and columns of dots (pixels); the value of each dot in the matrix is stored as a bit of data. The most basic value of each dot is 1 or 0, creating a black-and-white image in which each pixel corresponds to either value. Most notably, raster graphics are images that may be stored in various image formats and that can only be processed simultaneously, as opposed to vector graphics, which consist of objects that can be managed individually within a display. Raster graphics are also difficult to scale and become pixilated when shrunken or enlarged. See vector graphics.

Rosetti Archive

A digital archive of the entire artistic output of the pre-Raphaelite poet and painter Dante Gabriel Rossetti. Texts have been transcribed and encoded for search and analysis, and most of them are accompanied by high-quality digital images. The archive also contains a substantial body of critical commentary, notes, and glosses. The Rossetti Archive is one of the oldest and most established digital humanities archives currently available on the Web and has influenced the development of numerous other projects.

Second Life

An online virtual world launched in 2003. Second Life users are able to explore their virtual environment freely and interact with other users through avatars. Second Life has developed an internal economy and currency, and there are numerous examples of organizations creating virtual spaces affiliated with or mirroring their real-world instantiations.

server

A combination of hardware and software that carries out a specialized service for other programs connected to it through a network. There are a wide variety of servers, including Web servers, which receive requests from browsers for Web pages; database servers, which respond to requests for data corresponding to a search query; and FTP (file transfer protocol) servers, which enable users to employ FTP software to upload and retrieve files. Server can refer to either hardware, software, or the combination of the two.

SGML (standardized general markup language)

A markup language designed to format, store, and access large corpora of documents. The language is declarative, meaning that it describes source documents instead of specifying the particulars of their future display. These descriptive tags can then be processed in a variety of ways. SGML is the parent language of HTML, XHTML (the XML version of HTML), and XML.

SOM (self-organizing map)

A technique of data visualization relying on the training of an artificial neural network to reduce the dimensions of a data set. SOMs are first trained using input examples and then use those examples to reformulate the visualization. See Teuvo Kohonen, Self Organizing Maps (3rd ed.; New York: Springer, 2001).

source code

The instructions for a program in their original form. These instructions are written in a particular programming language, usually in the form of text. This source code is compiled into machine code that can then be executed by a computer. Most applications are distributed as executable files, not as source code. Source code is also the only format of computer code that human beings can read.

spatial autocorrelation

A measure of the degree to which a set of spatial features and their associated data values tend to be clustered together in space or dispersed. This is a measure of the dependency among observations in a given geographic space. Values clustered together in space exhibit positive spatial autocorrelation, while those that are dispersed exhibit negative spatial autocorrelation. See Daniel A. Griffith, Spatial Autocorrelation: A Primer (Washington: Assn. of American Geographers, 1987).

spatial statistics

A branch of statistics and geography dealing with the analysis of spatial distributions, patterns, processes, and relationships. Most techniques used in spatial statistics were developed particularly for use with geographic data; as such, they incorporate space directly into their mathematics.

SpecLab (Speculative Computing Laboratory)

A digital humanities laboratory founded at the University of Virginia in 2000. Focused on "speculative computing" rather than the digitization and classification of existing texts, SpecLab focused on exploratory research that used humanities tools in a digital context rather than digital tools in humanities contexts. SpecLab incubated several digital projects which have outlasted its three-year existence, including NINES, the Rossetti Archive, Ivanhoe, and Temporal Modeling. See Johanna Drucker, "Background to SpecLab" [PDF], SpecLab: Digital Aesthetics and Projects in Speculative Computing (Chicago: University of Chicago Press, 2009).

static visualization

A visualization of information that contains no interactive elements. Static visualizations such as print graphics are often contrasted with digital, interactive visualizations that change according to user input. Conventional pie charts, bar graphs, and scatter plots are examples of this type of information visualization.

TAPoR (Text Analysis Portal for Research)

A project designed to develop a network of human and computing infrastructure by establishing regional centers to develop electronic textual storage and analysis. Since its inception, TAPoR has evolved into a centralized portal for Web-based textual analysis tools such as Wordle, the Voyant suite of tools, and the TAPoRWare suite of tools.

TEI (Text Encoding Initiative)

A consortium that collectively develops and maintains standards for the representation of texts in digital form. In practice, the organization is chiefly concerned with producing and maintaining the TEl Guidelines for encoding texts in the humanities, social sciences, and linguistics. The TEl Guidelines, unlike other formats for preserving text, are a primarily semantic system; textual units are encoded according to what they are rather than how they appear.

TextArc

A textual visualization application designed to show the distribution of words in texts. TextArc represents the entire text as two concentric spirals. Each line of the text is displayed in very small font around the outside; each word is displayed inside that spiral in a more readable size. Every word appearing more than once also appears within these two circles, with its position governed by its frequency.

text encoding

Broadly considered, the process of putting text in a special format for preservation or dissemination. In the digital humanities, textual encoding nearly always refers to the practice of transforming plain text content into XML. The TEl Guidelines are often followed when encoding textual materials in the arts, humanities, and social sciences. See TEl.

text mining

The process of automatically deriving previously unknown information from written texts using computational techniques. Text mining tools facilitate researchers' discovery of patterns within structured data.

topic: textual analysis

topic: teaching critical digital literacy

Transcribe Bentham

A participatory manuscript-transcription project based at University College London. Through the Transcribe Bentham interface, volunteers can transcribe the original and unstudied papers of the philosopher and reformer Jeremy Bentham. The project makes available high-quality digital images of manuscripts, which are then used to produce the transcriptions. These transcriptions are in turn encoded with basic TEI markup by volunteers. Transcribe Bentham is a well-regarded experiment in crowd-sourced academic production.

Twitter

An online social networking and microblogging service launched in 2006. Users are able to send and read text-based posts ("tweets") of up to 140 characters. The Twitter platform is one of the most popular websites and apps in the world, with hundreds of millions of tweets generated daily.

University of Oxford TextArchive(OTA)

A digital archive that develops, collects, catalogs, and preserves electronic literary and linguistic resources. Founded in 1976 by Oxford University Computing Services, it is thought to be the oldest archive of digital academic textual resources. Access to the OTA is free, as is the downloading of all resources, although some require permission to be downloaded, requested either from OTA or the original depositors.

vector graphics

A graphic stored as a series of mathematical instructions that are then used to form an image. Since vector graphics are stored as mathematical formulas, their file sizes are smaller than bitmap image files. Because they are mathematically created objects, users can resize and stretch vector graphics without reducing their clarity. See raster graphics.

Visual Basic

Designed by Microsoft, a programming language and environment based on BASIC. Visual Basic was one of the first products to provide a graphic environment for developing user interfaces simply by dragging and dropping controls (i.e., buttons or dialogue boxes) and then defining their behavior.

visualization

Broadly conceived, any graphic expression meant to represent a certain set of information. In the digital humanities, visualization usually refers to data visualization, or the graphic expression of large-scale collections of nonnumerical information such as textual elements, network relationships, or frequency analyses. See Martyn Jessop, "Data Visualization as Scholarly Activity" Literary and Linguistic Computing 23, no. 3 (2003): 281-93.

Voice of the Shuttle (VoS)

A Web resource started in 1994 as a suite of static Web sites that has grown into a large digital database of humanities and humanities-related content. VoS organizes content into several areas, including religious studies, media studies, dance, literature, and architecture. VoS still serves as a well-regarded directory of Web content tailored for humanities scholars.

Voyant Tools

A Web-based suite of textual-analysis tools, intended to be user-friendly, flexible, and powerful. It contains numerous modules able to analyze and visualize text in a variety of ways, including a document reader, a term-frequencies generator, a collocation visualizer, a word cloud visualization, and a scatterplot generator. Users can upload plain text into Voyant or cut and paste text into Voyant's on-screen input field. Results are exportable, as are some visualizations.

Web 2.0

A loosely defined term used to describe second-generation Web sites that facilitate participatory collaboration, interoperability, and information sharing. Web 2.0 highlights user-generated content and dynamic applications built on the Web rather than static contents being presented to users. This transition is largely cultural rather than technical, as reflected in the centrality of virtual communities, social media, and remix culture to the phenomenon. See Tim O'Reilly, What is Web 2.0?.

wiki

A Web site whose content can be added to, modified, and deleted by users employing a simplified markup language or text editor within a Web browser. Wikis have become increasingly prevalent on many levels, ranging from small private wikis to collaborative wikis to large collections of wikis such as Folgerpedia. Wikis often feature a discussion page where changes can be debated or reverted to a previous version.

The William Blake Archive

An online open-access archive of the literary work of William Blake. Founded in 1996, the archive contains digitized images of Blake's work, as well as full-text electronic editions of many of his illuminated works, commercial books, drawings and paintings, and manuscripts. Encoded in XML, the site is a hybrid catalog, database, and series of editions.

Women Writers Project, (WWP)

An established and long-term digital research and archiving project devoted to early modern women's writing and its electronic preservation and encoding. Founded in 1988, the WWP has had a great influence on the development of both the TEl Guidelines and the planning of long-term digital projects. WWP also publishes Women Writers Online (WWO), a full-text collection of early women's writing in English.

word cloud

A visualization of word frequencies. Usually, the more frequently a word appears in a given text, the larger its size in the resulting visualization. Programs designed to create word clouds are easily accessible; two of the most used are Wordle and the Many Eyes tag cloud.

WordHoard

A text-analysis environment containing several categories of preloaded texts, including those of Chaucer, Spenser, the early Greeks, and Shakespeare. For this chosen group of canonical texts, users can perform a variety of analyses, including full-text searching, concordance building, and finding collocates.

WordPress

A free and open-source blogging tool and CMS (content management system) based on PHP (Hypertext Preprocessor) and MySQL. WordPress refers to both the content management system software used to manage materials on Web servers and to the popular blogging service.

XML

(extensible markup language) A markup language designed to encode documents in a format that is both human and machine-readable. XML separates content from structure and is highly customizable. For further information and to learn how to use XML, see Benoit Marchal, XML by Example (Indianapolis: Que, 2000).

XSL (extensible stylesheet language)

A family of languages used to transform and render XML documents. Extensible style sheet language transformations (XSLT) is an XML language that transforms an XML document into another format; extensible style sheet language formatting objects (XSL FO) specifies the visual formatting of an XML document.

XSLT (extensible stylesheet language transformation)

An XML-based language used to transform XML documents into another format or structure, usually other XML documents or HTML documents, PDF documents, or word processor tiles.

YouTube

The world's largest video-sharing Web site, created in 2005. YouTube uses Adobe Flash and HTML5 to display a wide variety of user-generated content.

Zotero

A free and open-source application designed to manage bibliographic references and materials. Developed by the Center for History and New Media at George Mason University, Zotero has numerous features designed to facilitate integration with online research environments, including integration with major Web browsers to automatically detect bibliographic information and import it on command; online syncing; exporting formatted reference lists into major word-processing programs; and sharing collections and items with other registered users. It is available as a browser plug-in (Zotero for Firefox) and as a stand-alone product that is able to interface with several browsers (Zotero Standalone).

@@ Line 1: / Line 1: @@
+__NOTOC__
 Originally compiled by Daniel Powell in conjunction with the [[Early Modern Digital Agendas]] institute in July 2013, the glossary below aims to help both novices and more advanced users of digital tools and approaches understand common terms employed in the digital humanities. Additions and updates are welcome.
@@ Line 15: / Line 17: @@
 : A public domain, open-source, Web-server software package. This software allows a user's computer and a Web server to communicate with each other. The [http://httpd.apache.org/ Apache HTTP Server] is the most widely used Web server in the world.
-===== '''API (application programming interface) ''' =====
+===== '''API''' =====
-: A specification that allows software applications to communicate with one another. An API allows client programs to access facilities within an application.
+: Application programming interface: A specification that allows software applications to communicate with one another. An API allows client programs to access facilities within an application.
 ===== '''BASIC (beginner's all-purpose symbolic instruction code)''' =====
@@ Line 44: / Line 46: @@
 ===== '''content management system ''' =====
 : A software program or suite of applications designed to enable the creation, editing, review, organization, and publication of content to the Web from a central interface. Popular content management systems include WordPress, Drupal, and Joomla!.
+===== c'''orpus''' =====
+: Pl. Corpora, a collection of written texts, particularly the entire body of work on a subject or by a specific creator; a collection of written or spoken material in machine-readable form, assembled for the purpose of studying linguistic structures, frequencies, etc.
 ===== '''CSS (cascading style sheets)''' =====
 : A way of specifying appearance of HTML or XML in a browser. CSS allows the separation of structural content from presentation. For a further introduction to CSS, see Eric A. Meyer, ''Cascading Style Sheets: The Definitive Guide'' (2nd ed.; Sebastopol: O'Reilly, 2004; print), 1-22.
+====='''CSV'''=====
+: or (Comma Separated Values) a file type which allows data to be saved in a table structured format. The extension for this file is .csv; they traditionally take the form of a text file containing information separated by commas, but may be manipulated by programs like excel and open refine.
 ===== '''database''' =====
 : A collection of information organized in such a way that a computer program can quickly select desired data. The structure of a database is dependent on the type of relationship being described. A database differs from a file of that same information in that it describes how the data relate to one another instead of presenting an unordered collection of the same content.
@@ Line 113: / Line 120: @@
 ===== '''LAMP (Linux, Apache HTTP Server, MySQL, and PHP)''' =====
-: A set of free, open-source software programs used to build a general-purpose Web server. Linux is the operating system, Apache is the Web server, MySQL is the relational database management system, and PHP (or Perl or Python) is the programming language.
+: A set of free, open-source software programs used to build a general-purpose Web server. Linux is the operating system, Apache is the Web server, MySQL is the relational database management system, and PHP (or Perl or Python) is the programming language
 ===== [https://www.linkedin.com/ '''Linkedln'''] =====
@@ Line 124: / Line 131: @@
 : A combination blog and social-networking site founded in 1999. Users are able to write entries for their personal journal, restrict visibility, upload multimedia, customize the appearance of their journal via HTML and CSS, "friend" other users, join communities based on common interests, and comment on the entries of other users. LiveJournal has over 1.8 million active users and was an early example of a Web 2.0 site.
-===== '''Language Model (LM)''' =====
+====='''LM (language model)'''  =====
-A statistical language model is a probability distribution over sequences of words, assigning a mathematical probability that estimates the relative likelihood of different phrases. The basic mathematical formula for this is ''P(w<sub>1</sub>,..., w<sub>m</sub>)'', where ''m'' is the length of the sequence.  Language models are used in speech recognition, Optical Character Recognition (OCR), Part-of-speech tagging (POS), and other processes. An n-gram LM assumes that the probability of a word depends on the previous ''n'' words.
+:A statistical language model is a probability distribution over sequences of words, assigning a mathematical probability that estimates the relative likelihood of different phrases. The basic mathematical formula for this is ''P(w<sub>1</sub>,..., w<sub>m</sub>)'', where ''m'' is the length of the sequence.  Language models are used in speech recognition, Optical Character Recognition (OCR), Part-of-speech tagging (POS), and other processes. An [[Glossary of digital humanities terms#n-gram|n-gram]] LM assumes that the probability of a word depends on the previous ''n'' words.
 ===== [http://lucene.apache.org/ '''Lucene'''] =====
@@ Line 277: / Line 284: @@
 : A Web resource started in 1994 as a suite of static Web sites that has grown into a large digital database of humanities and humanities-related content. VoS organizes content into several areas, including religious studies, media studies, dance, literature, and architecture. VoS still serves as a well-regarded directory of Web content tailored for humanities scholars.
-===== [http://voyeurtools.org/ '''Voyant Tools'''] =====
+===== '''[https://voyant-tools.org/ Voyant Tools]''' =====
 : A Web-based suite of textual-analysis tools, intended to be user-friendly, flexible, and powerful. It contains numerous modules able to analyze and visualize text in a variety of ways, including a document reader, a term-frequencies generator, a collocation visualizer, a word cloud visualization, and a scatterplot generator. Users can upload plain text into Voyant or cut and paste text into Voyant's on-screen input field. Results are exportable, as are some visualizations.
@@ Line 293: / Line 300: @@
 ===== '''word cloud''' =====
-: A visualization of word frequencies. Usually, the more frequently a word appears in a given text, the larger its size in the resulting visualization. Programs designed to create word clouds are easily accessible; two of the most used are [http://www.wordle.net/ Wordle ] and the [http://www-958.ibm.com/software/data/cognos/manyeyes/page/Tag_Cloud.html Many Eyes tag cloud].
+: A visualization of word frequencies. Usually, the more frequently a word appears in a given text, the larger its size in the resulting visualization. Programs designed to create word clouds are easily accessible; two of the most used are [http://www.wordle.net/ Wordle] and the [http://www-958.ibm.com/software/data/cognos/manyeyes/page/Tag_Cloud.html Many Eyes tag cloud].
 ===== [http://wordhoard.northwestem.edu/ '''WordHoard'''] =====
@@ Line 301: / Line 308: @@
 : A free and open-source blogging tool and CMS (content management system) based on [http://php.net/ PHP] (Hypertext Preprocessor) and '''MySQL'''. WordPress refers to both the content management system software used to manage materials on Web servers and to the popular blogging service.
-===== '''XML (extensible markup language)''' =====
+===== '''XML ''' =====
-: A markup language designed to encode documents in a format that is both human and machine-readable. XML separates content from structure and is highly customizable. For further information and to learn how to use XML, see Benoit Marchal, ''XML by Example'' (Indianapolis: Que, 2000).
+: '''(extensible markup language) '''A markup language designed to encode documents in a format that is both human and machine-readable. XML separates content from structure and is highly customizable. For further information and to learn how to use XML, see Benoit Marchal, ''XML by Example'' (Indianapolis: Que, 2000).
 ===== '''XSL (extensible stylesheet language)''' =====
@@ Line 314: / Line 321: @@
 ===== [http://www.zotero.org/ '''Zotero'''] =====
-: A free and open-source application designed to manage bibliographic references and materials. Developed by the Center for History and New Media at George Mason University, Zotero has numerous features designed to facilitate integration with online research environments, including integration with major Web browsers to automatically detect bibliographic information and import it on command; online syncing; exporting formatted reference lists into major word-processing programs; and sharing collections and items with other registered users. It is available as a browser plug-in (Zotero for Firefox) and as a stand-alone product that is able to interface with several browsers (Zotero Standalone).
+: A free and open-source application designed to manage bibliographic references and materials. Developed by the Center for History and New Media at George Mason University, Zotero has numerous features designed to facilitate integration with online research environments, including integration with major Web browsers to automatically detect bibliographic information and import it on command; online syncing; exporting formatted reference lists into major word-processing programs; and sharing collections and items with other registered users. It is available as a browser plug-in (Zotero for Firefox) and as a stand-alone product that is able to interface with several browsers (Zotero Standalone).   [[Category:Folger Institute]]    [[Category:Digital Folger]]   [[Category:Digital humanities]]   [[Category:Scholarly programs]]  [[Category: Research guides]]
-[[Category:Folger Institute]]
-[[Category:Digital Folger]]
-[[Category:Digital humanities]]
-[[Category:Scholarly programs]]
-[[Category:Bibliography]]