An Introduction to Coding for Digital Humanists

How Computers Store Information

Input devices like keyboards and mice manipulate electrical states in the computer’s processor, switching the state from “with charge” (represented by 1) and “without charge” (represented by 0). This binary pattern is known as a bit of information. Eight bits is equal to one byte. Since it is possible to create elaborate patterns by combining bits and bytes, it is also possible to translate language (symbols) and many thought processes into binary code.

For example, the American Standard Code for Information Interchange (ASCII) character set maps the 256 most common written characters to binary equivalents, allowing alphabetic information to be transmitted electronically. As letters are typed, they are stored as sequences of ones and zeroes. For example, “Hi!” in binary code is “101000 1101001 0100001.” Bits and bytes are thus used as basic units of storage for computers.

For further information, see “How the Computer Works” in A Companion to the Digital Humanities.

Programming Languages

Since binary codes are unwieldy for humans, information can be relayed to a computer using artificially designed languages made up of more elaborates sets of symbols which correspond to binary patterns. These symbols may also represent algorithms–lists of sequential instructions–which can also be translated into binary code. These artificial languages are known as programming languages, and an individual set of procedures written in a programming language is known as a program. Programming languages vary in their resemblance to natural human languages. The further they differ from reference to the computer hardware, the more likely they are to resemble natural language. Such programming languages are often referred to as high-level programming languages. High-level programming languages characteristically have strong “abstraction,” in which the method of representing the program resembles its meaning (in our terms), rather than the method of its implementation (what the computer understands). In many cases, programs are “compiled,” translated from a high-level programming language to a low-level one before execution. In effect, the semantics (meaning) is a code for some procedure of implementation.

Markup Languages

In the early days of the printing press, printers developed a system of annotating manuscripts in order to convey how they should appear in the final printed version. This “marking up” of the manuscript involved codes or symbols, the semantics of which was aimed not at implementation, but at presentation. Markup languages are also used for presentation of information on computer screens, the most well-known being HTML, which was designed for displaying web pages. Embedding markup codes in texts is known as text encoding. Computer markup languages are not always used for presentational purposes. Sometimes they encode information that describes what the content is, rather than how it appears. In this case, the markup is considered semantic. Markup which describes how the content appears is considered stylistic.

Programming, Scripting, and Coding

Sometimes rather arbitrary terms are used for the acts of writing computer programs and encoding texts. Scripts are short, relatively simple programs, and the term scripting is often preferred to programming for the writing of such programs, especially when they are written in more abstract, high-level languages (often called scripting languages). Coding (the act of writing the code for a program or script) is often used as a synonym for programming or scripting. However, it is also used as a synonym for text encoding using a markup language.

What Should a Digital Humanist Know?

There is currently a debate about whether or not a Digital Humanist should know how to code in any of the above senses. This discussion does not aim to settle the matter but instead provides the following observations by way of contribution to the debate:

  1. Some knowledge of how computers operate and are operated by humans would seem to be part of basic technological and information literacy in the twenty-first century.
  2. Knowledge of how texts are manipulated by codes, especially through text encoding, can help users to perform many general tasks and can help scholars translate specific Humanities questions into the digital sphere more effectively.
  3. A basic understanding of the principles of coding can help Digital Humanists collaborate with those who are assigned the tasks of implementing the technical side of their projects.
  4. Students in the Digital Humanities have fewer options for acquiring coding knowledge and skills if they are not given some basic, formal training on which to build.

Can you call yourself a Digital Humanist without being actively involved in coding? No doubt. Can you call yourself a good Digital Humanist without any knowledge of coding? Probably not.

Coding for the Web I: Server-Side and Client-Side Scripting

In order to work, web pages require a combination of programming and markup languages. Information is exchanged between the server, the computer where the information is stored, and the client, the computer used by the person viewing the web page. Programs are required for the client computer to request the web page’s file from the server and for the server to send it, using a mutually-understood set of instructions called a file transfer protocol (FTP). Once the client has received the file, it must interpret the file’s markup in order to display the web page. Typically, these functions are all performed by a web browser. There are actually a number of different file transfer protocols; web browsers use one called Hypertext Transfer Protocol (HHTP)  to request and display web pages. Other types of FTP programs may be used to transfer the file without displaying it.

When a server receives a file transfer request, it can perform any number of actions defined by a script in the requested file. This is known as server-side scripting. For instance, a user might request a file called “clock.” This file does not contain the time but instead has an instruction for the server to check the current time when the script is executed and write into the file before it is sent to the client. When the client sees the file, they see a clock with the time at which they made the request (with perhaps a delay of a few milliseconds).

Since the user is viewing the file in a web browser, which is a complex program, the browser may also be used to perform this function. That is, the user’s client requests a file called “clock,” the file is sent to the client, and the browser fetches the local time from the user’s computer system, inserting it into the web page. This is called client-side scripting.

Typical server-side scripting languages are PHP (the most popular for the web), PERL, Java, and Python. The main client-side scripting language used for the web is Javascript. Each type of scripting has its advantages and limitations. For instance, a client-side script can respond to user actions. However, it cannot store user-submitted data after the web page is closed. Server-side scripts can store or manipulate user-submitted information, but the page must be re-loaded in order to provide a response to user actions. Recently, a technique known as Ajax has been developed to combine the two approaches. Ajax is becoming increasingly popular but is not appropriate for all circumstances.

Coding for the Web II: Text Markup using HTML and CSS

Regardless of any scripting that takes place, the page is submitted to the web browser using a markup language called Hypertext Markup Language (HTML).With client-side scripting the HTML markup is written by the script before it is sent to the client. With client-side scripting, the HTML markup sent to the client is then re-written by the client’s browser.

HTML markup codes contain a variety of types of information, including the text’s structure and formatting. For historical reasons which will be discussed below, HTML has fairly primitive formatting capabilities. As a result, it is often used in tandem with a specialized markup language for formatting called Cascading Style Sheets (CSS). These are sets of rules which tell the browser how to display elements based on their HTML codes. Different browsers do not always display elements in HTML codes the same way, and CSS can sometimes be used to ensure consistency. However, different browsers do not always implement CSS in the same way, so this may not always be successful. There is an increasing emphasis in the world of web browser production on adherence to common standards.

A third form of markup language has been developed for the web which aims to place greater emphasis on text semantics than on appearance. This markup language is called Extensible Markup Language (XML). The purpose XML is to render the meaning of the text unambiguous, regardless of how it is represented in an individual client or application.

Creating Web Pages

Many of the tasks necessary render a simple web page are not required of the individual user. Servers require setting up and maintenance, but that is normally taken care of by a “host” entity (such as campus IT or an internet service provider). The infrastructure transferring data between the host server and your local computer may be installed and maintained by your cable company. A company, such as Microsoft, or foundation, such as Mozilla, may have programmed the web browser used to interpret and display the HTML files and Javascripts you load.

All this works fairly seamlessly for users with little to no training. But, if you want to write your own web pages, you need to composed them in a form that can be read a browser. That means writing the HTML markup (and, for more complex pages, CSS and/or Javascript). If you are creating a web application, you may also have to write some server-side code. Believe it or not, that is what you are doing whenever you type a text in a word processor. The word processor merely hides the markup language you are creating when you, say, make text bold, and instead displays the visual effect you are trying to achieve. This is known as What You See Is What You Get (WYSIWYG). Most word processors today actually use XML for their markup. There are WYSIWYG editors for HTML such as Dreamweaver or WordPress’ visual editor, but they have limitations. Even if you use one, you may have to get down and dirty with the code in order to achieve the effect you want.

Editing HTML be done in any editing program. Two of the best are Notepad++ for Windows and Textwrangler for Mac. You may wish to download one of these programs to do your coding. You can use Microsoft Word, but you will have to save your files as text, rather than the default format. Also, beware: Word’s autocorrect functions and curly quotes can mess up your code.

Once you have chosen your editor, you are a few steps away from creating a simple web page. Here’s what you have left to do:

  1. Learn HTML (and, eventually, CSS).
  2. Write the code for your web page and save it to your computer’s hard drive.
  3. Upload the file to a server connected to the internet using FTP (or something similar). You may wish to follow the instructions for uploading your file to the CSUN server.

These steps will naturally take some time to master. However, you can get yourself started by following the instructions in Uploading Files to the CSUN Server. Go ahead and try it.

What Next?

Before beginning to learn HTML, it is useful look ahead to think about how the intricacies of text encoding affect the types of questions Digital Humanists are interested in. This is discussed in the article Textual Markup and the Study of Literature.

Comments are closed.