Text files, such as source code files for computer programs, are constructed from characters.
Source code is constructed from characters that are members of character sets. Character sets in C++ include the following:
source character set | a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9 _ { } [ ] # ( ) < > % : ; . ? * + - / ^ & | ~ ! = , \ " ' |
execution character set | source character set members, escape sequences and universal character names |
In addition, there are five whitespace characters used to delimit programming constructs: space, horizontal tab, vertical tab, form feed and new line.
Characters can be categorised in the following ways:
An alphanumeric character is one of the following:
An alphabetic character is one of the following:
A numeric character, digit or decimal digit is one of the following:
An octal digit is one of the following:
A hexadecimal digit is one of the following:
A non-zero digit is one of the following
A non-digit character is one of the following:
A sign character is one of the following:
|
Escape sequences are used in programming languages to represent characters that have special meaning or are not visible. The escape sequences available and their meaning varies between languages. Escape sequences used in C++ are as follows:
simple escape sequence | one of: \' \" \? \\ \0 \a \b \f \n \r \t \v | |
octal escape sequence | An octal escape sequence \ooo consists of a backslash followed by one, two or three octal digits specifying the character value. The sequence is terminated by the first character that is not an octal digit. | |
hexadecimal escape sequence | A hexadecimal escape sequence \xhhh consists of a backslash followed by x followed by one or more hexadecimal digits specifying the character value. There is no limit to the number of digits in a hexadecimal sequence. The sequence is terminated by the first character that is not a hexadecimal digit. |
The meaning of each simple escape sequence is as follows:
\' | single quote |
\" | double quote |
\? | question mark |
\\ | backslash |
\0 | null (binary 0) |
\a | alert |
\b | backspace |
\f | form feed |
\n | new line |
\r | carriage return |
\t | horizontal tab |
\v | vertical tab |
A universal character name provides a way to name a character that is not in the source character set or escape sequences. A universal character name is formed as follows:
|
For example, \U2468ACE0.
The character represented by universal character name \Unnnnnnnn is the character whose character short name in ISO/IEC 10646 is nnnnnnnn. The character represented by universal character name \unnnn is the character whose character short name in ISO/IEC 10646 is 0000nnnn.
American Standard Code for Information Interchange (ASCII), is a character encoding used to represent text based on the English alphabet.
The ASCII codes are as follows:
Character | Description | Decimal | Hexadecimal | Binary |
---|---|---|---|---|
NUL | null | 0 | 00 | 00000000 |
SOH | start of heading | 1 | 01 | 00000001 |
STX | start of text | 2 | 02 | 00000010 |
ETX | end of text | 3 | 03 | 00000011 |
EOT | end of transmission | 4 | 04 | 00000100 |
ENQ | enquiry | 5 | 05 | 00000101 |
ACK | acknowledge | 6 | 06 | 00000110 |
BEL | bell | 7 | 07 | 00000111 |
BS | backspace | 8 | 08 | 00001000 |
HT | horizontal tabulation | 9 | 09 | 00001001 |
LF | line feed | 10 | 0A | 00001010 |
VT | vertical tabulation | 11 | 0B | 00001011 |
FF | form feed | 12 | 0C | 00001100 |
CR | carriage return | 13 | 0D | 00001101 |
SO | shift out | 14 | 0E | 00001110 |
SI | shift in | 15 | 0F | 00001111 |
DLE | data link escape | 16 | 10 | 00010000 |
DC1 | device control 1 | 17 | 11 | 00010001 |
DC2 | device control 2 | 18 | 12 | 00010010 |
DC3 | device control 3 | 19 | 13 | 00010011 |
DC4 | device control 4 | 20 | 14 | 00010100 |
NAK | negative acknowledge | 21 | 15 | 00010101 |
SYN | synchronous idle | 22 | 16 | 00010110 |
ETB | end of transmission block | 23 | 17 | 00010111 |
CAN | cancel | 24 | 18 | 00011000 |
EM | end of medium | 25 | 19 | 00011001 |
SUB | substitute | 26 | 1A | 00011010 |
ESC | escape | 27 | 1B | 00011011 |
FS | file separator | 28 | 1C | 00011100 |
GS | group separator | 29 | 1D | 00011101 |
RS | record separator | 30 | 1E | 00011110 |
US | unit separator | 31 | 1F | 00011111 |
SPACE | 32 | 20 | 00100000 | |
! | 33 | 21 | 00100001 | |
" | 34 | 22 | 00100010 | |
# | 35 | 23 | 00100011 | |
$ | 36 | 24 | 00100100 | |
% | 37 | 25 | 00100101 | |
& | 38 | 26 | 00100110 | |
' | 39 | 27 | 00100111 | |
( | 40 | 28 | 00101000 | |
) | 41 | 29 | 00101001 | |
* | 42 | 2A | 00101010 | |
+ | 43 | 2B | 00101011 | |
, | 44 | 2C | 00101100 | |
- | 45 | 2D | 00101101 | |
. | 46 | 2E | 00101110 | |
/ | 47 | 2F | 00101111 | |
0 | 48 | 30 | 00110000 | |
1 | 49 | 31 | 00110001 | |
2 | 50 | 32 | 00110010 | |
3 | 51 | 33 | 00110011 | |
4 | 52 | 34 | 00110100 | |
5 | 53 | 35 | 00110101 | |
6 | 54 | 36 | 00110110 | |
7 | 55 | 37 | 00110111 | |
8 | 56 | 38 | 00111000 | |
9 | 57 | 39 | 00111001 | |
: | 58 | 3A | 00111010 | |
; | 59 | 3B | 00111011 | |
< | 60 | 3C | 00111100 | |
= | 61 | 3D | 00111101 | |
> | 62 | 3E | 00111110 | |
? | 63 | 3F | 00111111 | |
@ | 64 | 40 | 01000000 | |
A | 65 | 41 | 01000001 | |
B | 66 | 42 | 01000010 | |
C | 67 | 43 | 01000011 | |
D | 68 | 44 | 01000100 | |
E | 69 | 45 | 01000101 | |
F | 70 | 46 | 01000110 | |
G | 71 | 47 | 01000111 | |
H | 72 | 48 | 01001000 | |
I | 73 | 49 | 01001001 | |
J | 74 | 4A | 01001010 | |
K | 75 | 4B | 01001011 | |
L | 76 | 4C | 01001100 | |
M | 77 | 4D | 01001101 | |
N | 78 | 4E | 01001110 | |
O | 79 | 4F | 01001111 | |
P | 80 | 50 | 01010000 | |
Q | 81 | 51 | 01010001 | |
R | 82 | 52 | 01010010 | |
S | 83 | 53 | 01010011 | |
T | 84 | 54 | 01010100 | |
U | 85 | 55 | 01010101 | |
V | 86 | 56 | 01010110 | |
W | 87 | 57 | 01010111 | |
X | 88 | 58 | 01011000 | |
Y | 89 | 59 | 01011001 | |
Z | 90 | 5A | 01011010 | |
[ | 91 | 5B | 01011011 | |
\ | 92 | 5C | 01011100 | |
] | 93 | 5D | 01011101 | |
^ | 94 | 5E | 01011110 | |
_ | 95 | 5F | 01011111 | |
$ | 96 | 60 | 01100000 | |
a | 97 | 61 | 01100001 | |
b | 98 | 62 | 01100010 | |
c | 99 | 63 | 01100011 | |
d | 100 | 64 | 01100100 | |
e | 101 | 65 | 01100101 | |
f | 102 | 66 | 01100110 | |
g | 103 | 67 | 01100111 | |
h | 104 | 68 | 01101000 | |
i | 105 | 69 | 01101001 | |
j | 106 | 6A | 01101010 | |
k | 107 | 6B | 01101011 | |
l | 108 | 6C | 01101100 | |
m | 109 | 6D | 01101101 | |
n | 110 | 6E | 01101110 | |
o | 111 | 6F | 01101111 | |
p | 112 | 70 | 01110000 | |
q | 113 | 71 | 01110001 | |
r | 114 | 72 | 01110010 | |
s | 115 | 73 | 01110011 | |
t | 116 | 74 | 01110100 | |
u | 117 | 75 | 01110101 | |
v | 118 | 76 | 01110110 | |
w | 119 | 77 | 01110111 | |
x | 120 | 78 | 01111000 | |
y | 121 | 79 | 01111001 | |
z | 122 | 7A | 01111010 | |
{ | 123 | 7B | 01111011 | |
| | 124 | 7C | 01111100 | |
} | 125 | 7D | 01111101 | |
~ | 126 | 7E | 01111110 | |
DEL | delete | 127 | 7F | 01111111 |
example | GCC C++ | |
Borland C++ Compiler | ||
Java | ||
home | Home Page |