A computer file is a resource for storing information that is available to a computer program and is usually based on some form of durable data storage.


A file is durable in the sense that it remains available for programs to use after the current program has stopped running. Computer files can be considered as the modern counterpart of paper documents which are traditionally stored in files, this being the source of the term.

File Contents

At the lowest level, many operating systems consider a file simply as a sequence of bytes. At a higher level, where the content of the file is being considered, these bytes may represent various entities, including integer values, text characters, image pixels and audio. It is up to the program using the file to understand the meaning and internal layout of information in it.

Naming and Organising Files

A file is typically accessed using a name.

A file can be located in a directory. In this case, the term file must include directories. This permits the existence of directory hierarchies, where each directory can contain an arbitrary number of files and other directories. These other directories are known as subdirectories. Subdirectories can contain further files and directories and so on, constructing a tree-like structure in which one master directory, or root directory, can contain any number of levels of other directories and files.

The root directory is the top-most directory in a hierarchy, and can be likened to the root of a tree as the starting point where all branches originate. Unix abstracts this hierarchy, and in Unix-like systems, the root directory is denoted by /, where the directory entry itself has no name (its name is the empty part before the initial directory delimiter (/)). All filesystem entries, including mounted volumes, are "branches" of this root. Under DOS and Windows, each volume has a drive letter assignment (e.g. C:\) and there is no common root directory above that.

When directories are used, each file and directory has not only a name, but also a path, which identifies the directories in which it resides. In the path, a special character, such as a slash (/ or \), is used to separate the file and directory names.

Storing Files

A file is essentially an abstract concept presented to a user or an operating system. However, for a file to be useful, it must have a corresponding physical manifestation. In physical terms, a computer file is normally stored on a recording medium, for example, a hard disk for non-volatile storage or RAM if it contains only temporary information.

In Unix-like operating systems, many files have no direct association with a physical storage device, for example /dev/null and most files in the /dev, /proc and /sys directories. These can be accessed as files in user space although they are really virtual files that exist as objects within the operating system kernel.

Unix File Types

Unix does not impose any internal file structure for normal files. This implies that from the point of view of the operating system there is only one file type, and the structure and interpretation is entirely dependent on how the file is interpreted by software.

Unix does however have some special files. These special files can be identified by the ls -l command which displays the type of the file in the first alphabetic letter of the permissions field.

A normal file is indicated by a dash (-).

Directory

The most common special file is the directory.

A directory is marked with d as the first letter of the permissions field, for example:

drwxr-xr-x /

Symbolic Link

A symbolic link is a reference to another file. This special file is stored as a textual representation of the referenced file's path.

A symbolic link is marked with l as the first letter of the permissions field, for example:

lrwxrwxrwx termcap -> /usr/share/misc/termcap

Named Pipe

One of the facilities provided by Unix for inter-process communication is the pipe. A pipe connects the output of one Unix process to the input of another. This is particularly important if the processes have to be executed under different user names and permissions.

Named pipes are special files that can exist anywhere in the filesystem. They are created with the command mkfifo.

A named pipe is marked with p as the first letter of the permissions field, for example:

prw-rw---- mypipe

Socket

A socket is a special file used for inter-process communication. It facilitates communication between two processes. In addition to sending data, processes can send file descriptors across a Unix domain socket connection.

A socket is marked with s as the first letter of the permissions field, for example:

srwxrwxrwx X0

Device Node

Device nodes are used to apply access rights and to direct operations to appropriate device drivers.

Nodes are created with the mknod command.

Unix makes a distinction between character devices and block devices. Basically, a character device provides only a serial stream of input or output, whereas a block device is randomly accessible.

A character device node is marked with c as the first letter of the permissions string and a block device node is marked with b, for example:

crw------- /dev/kbd
brw-rw---- /dev/hda

Implementation

Device nodes correspond to resources that an operating system kernel has already allocated. Unix identifies these resources by a major number and a minor number, both stored as part of the structure of a node. Generally the major number identifies the device driver and the minor number identifies a particular device that the driver controls.

Devices

character devices

Character device nodes relate to devices through which the system transmits data one character at a time. These device nodes often serve for stream communication with devices such as teletype machines, virtual terminals, and serial modems, and usually do not support random access to data.

block devices

Block device nodes correspond to devices through which the system moves data in the form of blocks. These device nodes often represent addressable devices such as hard drives, CD-ROM drives, or memory regions.

pseudo-devices

Device nodes on Unix-like systems do not necessarily have to correspond to physical devices. Nodes that lack this correspondence are referred to as pseudo-devices.

Naming Conventions

The following prefixes are commonly used in Linux-based systems to identify device nodes in the /dev directory:

fb  frame buffer
fd  floppy drive
hd  IDE hard drive
hda  master device on first ATA channel
hdb  slave device on first ATA channel
hdc  master device on second ATA channel
hdd  slave device on second ATA channel
lp  printer
par  parallel port
pt  pseudo-terminal
s  SCSI device in general: mainly hard disks, but also SATA and USB disks
scd  SCSI audio-oriented optical disk drive
scd0  first CD-ROM
scd1  second CD-ROM
scd2  third CD-ROM
sd  SCSI, SATA or USB hard drive
sda  first drive
sdb  second drive
sdc  third drive
sg  SCSI generic device
sr  SCSI data-oriented optical disk drive
st  SCSI magnetic tape
tty  terminal
ttyS  serial port

Device Nodes

console 

purpose 

The console device node provides access to the system console.

description 

Device node console provides access to the device or file designated as the system console.

The system console is typically a terminal or display located near the system unit. It has two functions in the operating system:

  • The system console provides access to the system when it is operating in a non-multiuser mode.

  • The system console displays messages for system errors and other problems requiring intervention. These messages are generated by the operating system and its various subsystems when starting or operating.

file 

/dev/console

initrd 

purpose 

The initrd device node is a RAM disk initialised before the kernel is started that can be used as the basis for a two-phased system startup.

description 

Device node initrd is a read-only block device. It is a RAM disk that is initialised by the bootloader before the kernel is started. The kernel then can use initial root device initrd's contents for a two-phased system startup.

In the first startup phase, the kernel starts up and mounts an initial root filesystem from the contents of initrd. In the second phase, additional drivers or other modules are loaded from the initial root device's contents. After loading the additional modules, a new root filesystem - the normal root filesystem - is mounted from a different device.

files 

/dev/initrd
/dev/ram0
/linuxrc
/initrd

null, zero 

purpose 

The null and zero device nodes provide access to the null device.

description 

The null device node provides character access to the null device driver. This device driver is normally accessed to write data to the bit bucket.

Data written to a null or zero device node is discarded. Reads from the null device node return end of file, whereas reads from zero return \0 characters.

null and zero are typically created by:

mknod –m 666 /dev/null c 1 3
mknod –m 666 /dev/zero c 1 5
chown root:root /dev/null /dev/zero

files 

/dev/null
/dev/zero

ram

purpose

The ram device node makes the RAM disk device available.

description 

The ram device node references a block device to access the RAM disk in raw mode.

It is typically created by:

mknod –m 660 /dev/ram b 1 1
chown root:disk /dev/ram

file 

/dev/ram

scd, sr

purpose

The scd and sr device nodes provide access to CD-ROM drivers.

description 

CD-ROM and DVD drives and WORM devices are accessible via the scd and sr device drivers.

files 

/dev/scd<n>
/dev/sr<n>

sd?

purpose

The sd? device nodes provide access to drivers for SCSI disk drives.

description 

The block device name has the following form: sdlp, where l is a letter denoting the physical drive, and p is a number denoting the partition on that physical drive. Often, the partition number will be left off when the device corresponds to the whole drive.

SCSI disks have a major device number of 8, and a minor device number of the form (16 * drive_number) + partition_number, where drive_number is the number of the physical drive in order of detection, and partition_number is as follows:

Partition 0  is the whole drive.
Partitions 1-4  are the DOS primary partitions.
Partitions 5-15  are the DOS extended partitions.

For example, /dev/sda will have major 8, minor 0, and will refer to all of the first SCSI drive in the system; and /dev/sdb3 will have major 8, minor 19, and will refer to the third DOS primary partition on the second SCSI drive in the system.

files 

/dev/sd[a-h] whole device
/dev/sd[a-h][0-15] individual partition

systty 

purpose 

In many Linux distributions device node systty is a symbolic link to the device that is used as the attached keyboard and monitor.

description 

The keyboard and monitor attached to the system unit are collectively known as the physical console. The console where system messages appear is known as the logical console. As an illustration of the difference, X Windows should start on the physical console but system messages issued by failures when starting X Windows should be written to the logical console.

These distinctions are also made in the naming of devices. Device console is used to send messages to the logical console. Symbolic link systty points to the device that is used by the attached keyboard and monitor, often /dev/tty0.

file 

/dev/systty

tty 

purpose 

Device node tty supports the controlling terminal interface.

description 

For each process, the tty device node is a synonym for the controlling terminal associated with that process. By directing messages to the tty file, programs and shell scripts can ensure that messages are written to the terminal even if output is redirected. Programs can also direct their display output to this file so that it is not necessary to identify the active terminal.

File tty is a character file with major number 5 and minor number 0, usually of mode 0666 and owner.group root.tty. It is a synonym for the controlling terminal of a process, if any.

file 

/dev/tty

Door

A door is a special file for inter-process communication between a client and server, currently implemented in the Sun Solaris operating system only.

A door is marked with D as the first letter of the permissions field, for example:

Dr--r--r-- name_service_door


home Home Page