@reiver

ASCII

ASCII (short for American Standard Code for Information Interchange) is popular 7-bit character encoding made primarily for U.S. version of the English language alphabet.

ASCII is also sometimes used by individuals who are writing other (non-English) languages which can be transliterated into English language alphabet; as well as other variants of the English language.

In some contexts, when people talk about “text” they are referring to ASCII.

There are 128 ASCII characters. This is because there are 128 different binary strings that can be expressed using 7-bits.

UNICODE, especially the UTF-8 encoding of UNICODE, can be considered an extension of ASCII.

English Alphabet

ASCII includes symbols for all the letters in the U.S. version of the English language alphabet. Both in their upper case forms, and their lower case forms.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

a b c d e f g h i j k l m n o p q r s t u v w x y z

Numbers

ASCII also includes symbols for the Arabic numerals that are commonly used with the English language.

0 1 2 3 4 5 6 7 8 9

Punctuation, Symbols

Also, ASCII includes punctuation, and other symbols, relevant to the U.S. version of the English language.

! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~

Space

And also, ASCII includes a space character.

Control Characters

In addition to these, ASCII also includes a number of (what are called) control characters.

NUL SOH STX ETX EOT ENQ ACK BEL BS HT LF VT FF CR SO SI DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US DEL

ASCII Table

The following table lists all of ASCII. I.e., lists all 128 ASCII characters.

But also includes relevant extra information about each character.

Deciaml Hexadecimal 7-Bit Binary HTML Caret Escape Character Name Abbreviation
0 0x00 0000000 &#0; ^@ Null NUL
1 0x01 0000001 &#1; ^A Start of Header SOH
2 0x02 0000010 &#2; ^B Start of Text STX
3 0x03 0000011 &#3; ^C End of Text ETX
4 0x04 0000100 &#4; ^D End of Transmission EOT
5 0x05 0000101 &#5; ^E Enquiry ENQ
6 0x06 0000110 &#6; ^F Acknowledge ACK
7 0x07 0000111 &#7 ^G \a Bell BEL
8 0x08 0001000 &#8; ^H \b Backspace BS
9 0x09 0001001 &#9; ^I \t Horizontal Tab HT
10 0x0A 0001010 &#10; ^J \n Line Feed LF
11 0x0B 0001011 &#11; ^K \v Vertical Tab VT
12 0x0C 0001100 &#12; ^L \f Form Feed FF
13 0x0D 0001101 &#13; ^M \r Carriage Return CR
14 0x0E 0001110 &#14; ^N Shift Out SO
15 0x0F 0001111 &#15; ^O Shift In SI
16 0x10 0010000 &#16; ^P Data Link Escape DLE
17 0x11 0010001 &#17; ^Q Device Control 1 DC1
18 0x12 0010010 &#18; ^R Device Control 2 DC2
19 0x13 0010011 &#19; ^S Device Control 3 DC3
20 0x14 0010100 &#20; ^T Device Control 4 DC4
21 0x15 0010101 &#21; ^U Negative Acknowledge NAK
22 0x16 0010110 &#22; ^V Synchronize SYN
23 0x17 0010111 &#23; ^W End of Transmission Block ETB
24 0x18 0011000 &#24; ^X Cancel CAN
25 0x19 0011001 &#25; ^Y End of Medium EM
26 0x1A 0011010 &#26; ^Z Substitute SUB
27 0x1B 0011011 &#27; ^[ Escape ESC
28 0x1C 0011100 &#28; ^\ File Separator FS
29 0x1D 0011101 &#29; ^] Group Separator GS
30 0x1E 0011110 &#30; ^^ Record Separator RS
31 0x1F 0011111 &#31; ^_ Unit Separator US
32 0x20 0100000 &#32; Space SP
119 0x77 1110111 &#119; w Latin Small Letter W
120 0x78 1111000 &#120; x Latin Small Letter X
121 0x79 1111001 &#121; y Latin Small Letter Y
122 0x7A 1111010 &#122; z Latin Small Letter Z
123 0x7B 1111011 &#123; { Left Curly Bracket
124 0x7C 1111100 &#124; | Vertical Line
125 0x7D 1111101 &#125; } Right Curly Bracket
126 0x7E 1111110 &#126; ~ Tilde
127 0x7F 1111111 &#127; ^? Delete DEL

ASCII Table Structure

One can see some structure to the ASCII table if it is divided into 4 equals pieces of length 32.

Decimal 7-Bit Binary Character   Decimal 7-Bit Binary Character   Decimal 7-Bit Binary Character   Decimal 7-Bit Binary Character
0 0000000 NUL   32 0100000 SPACE   64 1000000 @   96 1100000 `
1 0000001 SOH   33 0100001 !   65 01000001 A   97 1100001 a
2 0000010 STX   34 0100010 "   66 0100002 B   98 1100002 b
3 0000011 ETX   35 0100011 #   67 1000011 C   99 1100011 c

The first thing to notice is that the upper case letters in the 3rd column, and the lower case letters in the 4th column, are next to each other.

The second thing to notice is that most the control characters are in the first column.

The third thing to notice is that the least-significant 5 bits of the binary string for each row is exactly the same.

-- Mirza Charles Iliya Krempeaux
See Other Topics