Text files, such as source code files for computer programs, are constructed from characters.


Source code is constructed from characters that are members of character sets. Character sets in C++ include the following:

source character set a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9 _ { } [ ] # ( ) < > % : ; . ? * + - / ^ & | ~ ! = , \ " '
execution character set source character set members, escape sequences and universal character names

In addition, there are five whitespace characters used to delimit programming constructs: space, horizontal tab, vertical tab, form feed and new line.

Characters can be categorised in the following ways:

An alphanumeric character is one of the following:

  • 0 1 2 3 4 5 6 7 8 9
  • a b c d e f g h i j k l m n o p q r s t u v w x y z
  • A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

An alphabetic character is one of the following:

  • a b c d e f g h i j k l m n o p q r s t u v w x y z
  • A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

A numeric character, digit or decimal digit is one of the following:

  • 0 1 2 3 4 5 6 7 8 9

An octal digit is one of the following:

  • 0 1 2 3 4 5 6 7

A hexadecimal digit is one of the following:

  • 0 1 2 3 4 5 6 7 8 9 a b c d e f A B C D E F

A non-zero digit is one of the following

  • 1 2 3 4 5 6 7 8 9

A non-digit character is one of the following:

  • universal character name
  • _ (underscore)
  • a b c d e f g h i j k l m n o p q r s t u v w x y z
  • A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

A sign character is one of the following:

  • + -

Escape Sequences

Escape sequences are used in programming languages to represent characters that have special meaning or are not visible. The escape sequences available and their meaning varies between languages. Escape sequences used in C++ are as follows:

simple escape sequence one of: \' \" \? \\ \0 \a \b \f \n \r \t \v
octal escape sequence An octal escape sequence \ooo consists of a backslash followed by one, two or three octal digits specifying the character value. The sequence is terminated by the first character that is not an octal digit.
hexadecimal escape sequence A hexadecimal escape sequence \xhhh consists of a backslash followed by x followed by one or more hexadecimal digits specifying the character value. There is no limit to the number of digits in a hexadecimal sequence. The sequence is terminated by the first character that is not a hexadecimal digit.

The meaning of each simple escape sequence is as follows:

\'single quote
\"double quote
\?question mark
\\backslash
\0null (binary 0)
\aalert
\bbackspace
\fform feed
\nnew line
\rcarriage return
\thorizontal tab
\vvertical tab

Universal Character Names

A universal character name provides a way to name a character that is not in the source character set or escape sequences. A universal character name is formed as follows:

For example, \U2468ACE0.

The character represented by universal character name \Unnnnnnnn is the character whose character short name in ISO/IEC 10646 is nnnnnnnn. The character represented by universal character name \unnnn is the character whose character short name in ISO/IEC 10646 is 0000nnnn.

ASCII

American Standard Code for Information Interchange (ASCII), is a character encoding used to represent text based on the English alphabet.

The ASCII codes are as follows:

Character Description Decimal Hexadecimal Binary
NUL null 0 00 00000000
SOH start of heading 1 01 00000001
STX start of text 2 02 00000010
ETX end of text 3 03 00000011
EOT end of transmission 4 04 00000100
ENQ enquiry 5 05 00000101
ACK acknowledge 6 06 00000110
BEL bell 7 07 00000111
BS backspace 8 08 00001000
HT horizontal tabulation 9 09 00001001
LF line feed 10 0A 00001010
VT vertical tabulation 11 0B 00001011
FF form feed 12 0C 00001100
CR carriage return 13 0D 00001101
SO shift out 14 0E 00001110
SI shift in 15 0F 00001111
DLE data link escape 16 10 00010000
DC1 device control 1 17 11 00010001
DC2 device control 2 18 12 00010010
DC3 device control 3 19 13 00010011
DC4 device control 4 20 14 00010100
NAK negative acknowledge 21 15 00010101
SYN synchronous idle 22 16 00010110
ETB end of transmission block 23 17 00010111
CAN cancel 24 18 00011000
EM end of medium 25 19 00011001
SUB substitute 26 1A 00011010
ESC escape 27 1B 00011011
FS file separator 28 1C 00011100
GS group separator 29 1D 00011101
RS record separator 30 1E 00011110
US unit separator 31 1F 00011111
SPACE 32 20 00100000
! 33 21 00100001
" 34 22 00100010
# 35 23 00100011
$ 36 24 00100100
% 37 25 00100101
& 38 26 00100110
' 39 27 00100111
( 40 28 00101000
) 41 29 00101001
* 42 2A 00101010
+ 43 2B 00101011
, 44 2C 00101100
- 45 2D 00101101
. 46 2E 00101110
/ 47 2F 00101111
0 48 30 00110000
1 49 31 00110001
2 50 32 00110010
3 51 33 00110011
4 52 34 00110100
5 53 35 00110101
6 54 36 00110110
7 55 37 00110111
8 56 38 00111000
9 57 39 00111001
: 58 3A 00111010
; 59 3B 00111011
< 60 3C 00111100
= 61 3D 00111101
> 62 3E 00111110
? 63 3F 00111111
@ 64 40 01000000
A 65 41 01000001
B 66 42 01000010
C 67 43 01000011
D 68 44 01000100
E 69 45 01000101
F 70 46 01000110
G 71 47 01000111
H 72 48 01001000
I 73 49 01001001
J 74 4A 01001010
K 75 4B 01001011
L 76 4C 01001100
M 77 4D 01001101
N 78 4E 01001110
O 79 4F 01001111
P 80 50 01010000
Q 81 51 01010001
R 82 52 01010010
S 83 53 01010011
T 84 54 01010100
U 85 55 01010101
V 86 56 01010110
W 87 57 01010111
X 88 58 01011000
Y 89 59 01011001
Z 90 5A 01011010
[ 91 5B 01011011
\ 92 5C 01011100
] 93 5D 01011101
^ 94 5E 01011110
_ 95 5F 01011111
$ 96 60 01100000
a 97 61 01100001
b 98 62 01100010
c 99 63 01100011
d 100 64 01100100
e 101 65 01100101
f 102 66 01100110
g 103 67 01100111
h 104 68 01101000
i 105 69 01101001
j 106 6A 01101010
k 107 6B 01101011
l 108 6C 01101100
m 109 6D 01101101
n 110 6E 01101110
o 111 6F 01101111
p 112 70 01110000
q 113 71 01110001
r 114 72 01110010
s 115 73 01110011
t 116 74 01110100
u 117 75 01110101
v 118 76 01110110
w 119 77 01110111
x 120 78 01111000
y 121 79 01111001
z 122 7A 01111010
{ 123 7B 01111011
| 124 7C 01111100
} 125 7D 01111101
~ 126 7E 01111110
DEL delete 127 7F 01111111

example GCC C++
Borland C++ Compiler
Java
home Home Page