A filesystem is a defined structure for storing and organising
data to facilitate location and
access.
Data within a filesystem is
stored in files. The fielsystem
provides access to data in a
volume by maintaining
the physical location of the
files.
A filesystem acts as a digital index enabling a computer to find a
specific file, regardless of the
size or configuration of the
volume or where the
data
bytes associated with the
file are located in the
recording medium. It
may be viewed as a special-purpose database for storage, organisation,
manipulation and retrieval of
data.
Most filesystems provide access to the
sectors in an
underlying
data storage device.
The filesystem is responsible for organising these
sectors into
files and
directories, and
keeping track of which
sectors are allocated
to each file and which are not
being used. Most filesystems address
data in fixed-sized units called
clusters,
each of which contains a fixed number of
sectors.
A cluster is the
smallest amount of space in the
recording medium that
can be allocated to a file. A
filesystem might not make use of a
storage device at all.
It can be used to organise and represent access to any
data, whether it is stored or
dynamically generated.
A filesystem normally contains
directories which
associate names with
files, usually by connecting the
name to an index in a
file allocation table of some sort. Directory structures may be flat, or
may facilitate hierarchies where
directories can
contain subdirectories.
Traditional filesystems provide facilities to create, move and delete
files and
directories.
Types of Filesystems
Filesystems can be classified under numerous categories, including the
following:
-
device
- magnetic disk
- optical disk
- flash
- record-oriented
- transactional
-
distributed
-
special-purpose
magnetic disk filesystems |
A magnetic disk filesystem is a filesystem designed for
storage of files in a
magnetic disk, such as a floppy disk or platters in a hard disk
drive.
Some filesystems support journaling and/or versioning.
Magnetic disk filesystems include:
ext |
extended filesystem: designed for Linux
The volume
in an ext filesystem is split into
blocks and
organised into block groups, analogous to cylinder groups in
the Unix File System (UFS). This is done
to reduce internal fragmentation and minimise the number of
disk seeks when reading a large quantity of consecutive
data. The ext
architecture uses a data structure called
identification nodes (inodes) to refer to
and locate files and
associated data. The
filesystem organises the
volume into
groups of
blocks,
which contain both inode information and associated
data clusters.
ext |
extended filesystem: released in April 1992 as
the first filesystem developed exclusively for the
Linux operating system
kernel.
ext was Linux's original filesystem, which it inherited
from its Minix predecessor.
The ext filesystem solved the two major problems in the
Minix file system: maximum
partition
size and
file name
length limitation of 14 characters. It allowed 2GB of
data and
file names of
up to 255 characters. However, it still had problems:
there was no support for separate access, inode
modification or
data
modification timestamps.
|
ext2 |
second extended filesystem: released in
January 1983 exclusively for the Linux
operating system kernel
ext2 can be described as the standard filesystem for
the Linux operating system.
It was initially designed as a replacement for ext and
is fast enough to be used as the benchmarking standard.
The ext2 filesystem was developed as a solution for the
problems in the ext filesystem (no support for separate
access, inode modification or data modification
timestamps) It was an overhaul of ext and was designed
with extensibility in mind, with space left in many of
its on-disk
data structures
for use by future versions. However, ext2 is not a
journaling filesystem, and the filesystem needs to be
checked after an unclean shutdown.
maximum
file name
length
|
255 characters |
block
size
|
1KB |
2KB |
4KB |
8KB |
maximum
file
size
|
16GB |
256GB |
2TB |
64TB |
maximum
volume
size
|
2TB |
8TB |
16TB |
32TB |
|
ext3 |
third extended filesystem: released in
November 2001 exclusively for the Linux
operating system kernel
ext3 is a journaled filesystem that is commonly used by
the Linux
kernel and
is almost completely compatible with ext2. Its main
advantage over ext2 is journaling which improves
reliability and eliminates the need to check the
filesystem after an unclean shutdown.
maximum
file name
length
|
255 characters |
block
size
|
1KB |
2KB |
4KB |
8KB |
maximum
file size
|
16GB |
256GB |
2TB |
2TB |
maximum
volume
size
|
2TB |
4TB |
8TB |
16TB |
|
|
FAT |
File Allocation Table: used in DOS and Windows
FAT filesystems represent logical areas of a
volume in
allocation units called clusters, and map the
locations of file
data to those areas
using a file allocation table (FAT).
FAT12 |
original FAT filesystem
The FAT12 filesystem uses a
12-bit
cluster
number. A
volume
using FAT12 can contain a maximum of 4,086
clusters,
which is 212 minus a few reserved values.
FAT12 is most suitable for very small
volumes,
and is used on floppy disks and hard drive
partitions
smaller than about 16MB.
maximum
file name
length
|
8.3 (8 character filename and 3 character filename
extension)
|
maximum
file size
|
32MB |
maximum
volume
size
|
32MB |
maximum number of
files
|
4,086 |
|
FAT16 |
FAT filesystem used for older systems and small
partitions
on modern systems
The FAT16 filesystem uses a
16-bit
cluster
number. A volume using FAT16 can contain a maximum of
65,526
clusters,
which is 216 minus a few
reserved values. FAT16 is suitable for
volumes
ranging in size from 16MB to 2GB.
FAT16
clusters
vary with the size of the
volume.
In a 65MB
volume,
clusters
are 1KB in size, whereas they are much larger in
volumes
that contain gigabytes of data. Since only a single
file can be
written to a
cluster,
this creates inefficiencies that can typically result
in wastage of as much as 50% of available space on a
2GB
volume.
maximum
file name
length
|
8.3 (8 character filename and 3 character filename
extension)
|
maximum
file size
|
2GB |
maximum
volume
size
|
2GB |
maximum number of
files
|
65,526 |
|
FAT32 |
newest FAT filesystem
The FAT32 filesystem uses a
28-bit
cluster
number (not 32, because 4 of the 32
bits are
reserved). Theoretically FAT32 can manage volumes with
over 268 million
clusters,
and support
volumes
up to 2TB in size. However to do this the size of the
FAT grows very large.
By increasing the size of the file allocation table,
FAT32 could support more
clusters
that were smaller in size on large hard drives,
reducing the potential for wasted space. In addition,
FAT32 could handle
file names with
up to 255 characters.
FAT32 adds to filesystem overhead and is inefficient to
run on volumes smaller than 260MB.
|
VFAT |
Virtual FAT filesystem
The VFAT filesystem is an enhancement to the FAT16
filesystem providing long (up to 255 character)
file names.
VFAT has several key features and improvements:
-
long
file name
support
- improved performance:
- better management capabilities:
Despite the new name and improvements, VFAT is
basically a variant of FAT16. Most of the new
capabilities relate to how the filesystem is used, and
not the actual structures on the disk. The only
significant change in terms of actual structures is the
addition of long
file names.
|
|
NTFS |
New Technology File System: used in Windows NT-based
operating systems
NTFS uses a 64-bit address space and has the ability to vary
cluster
size independently of the
volume
size.
Other benefits over the FAT filesystems include:
-
file and
directory
security attributes
-
file encryption
-
support for storage
volumes
of up to 16TB
- high level of fault tolerance
NTFS replaced the file allocation table with the
master file table (MFT), which holds more
information about
files than did FAT.
The MFT references all
files and
directories
in the
volume,
including associated metadata such as security settings.
NTFS's overhead makes it unsuitable for
volumes
smaller than 400MB, and it can not be used on floppy disks.
|
|
optical disk filesystems |
An optical disk filesystem is a filesystem designed for
storage of files on an
optical disk, such as a CD or a DVD.
Optical disk filesystems include:
ISO 9660 |
used on CD-ROM and DVD-ROM disks
Rock Ridge and Joliet are extensions to
ISO 9660.
|
|
flash filesystems |
A flash filesystem is a filesystem designed for storing
files in flash ROM devices.
Solid state media like flash ROM are similar to disks in their
interfaces, but have different problems. While eliminating seek
times, they require special handling such as wear leveling and
different error detection and correction algorithms. While a disk
filesystem can be used on a flash drive, this is not ideal for
several reasons:
erasing |
Flash ROM has to be explicitly erased before it can be
rewritten. The time taken to erase can be significant. It
is therefore beneficial to erase unused memory while the
device is idle.
|
random access |
Disk filesystems are optimised to avoid disk seeks whenever
possible, due to the high cost of seeking. Flash ROM
devices impose no seek latency.
|
wear levelling |
Flash ROM devices tend to wear out when a single
cluster
is repeatedly overwritten; flash filesystems are designed
to spread writes evenly across the
recording medium.
|
Log-structured filesystems have many of the desirable properties
for a flash filesystem.
Flash filesystems include:
exFAT |
Microsoft proprietary system intended for flash drives and
often called FAT64
It has a limit of 264
bytes (16 exabytes).
|
JFFS |
original log-structured Linux file system for NOR flash media
|
JFFS2 |
successor to JFFS, for NAND and NOR flash
|
YAFFS |
a log-structured file system designed for NAND flash, but
also used with NOR flash.
|
|
record-oriented filesystems |
In record-oriented filesystems
files are stored as a
collection of records. They are typically associated with
mainframe and minicomputer
operating systems.
Programs read and write
whole records, rather than
bytes or arbitrary
byte ranges, and can seek
to a record boundary but not within records. The more
sophisticated record-oriented filesystems have more in common with
simple databases than with other filesystems.
In a record-oriented filesystem (or
database-based filesystem), instead of or in addition to
hierarchical structured management,
files are identified by
metadata such as type of
file, topic or author.
Record-oriented filesystems include:
VSAM |
Virtual Storage Access Method: for IBM's z/OS and
z/VSE operating systems
|
QSAM |
Queued Sequential Access Method: for IBM's z/OS and
z/VSE operating systems
|
SFS |
Structured File Server: a record-oriented filesystem
from IBM
It was originally part of the Encina system and is
now integrated into CICS Transaction Server.
|
RSD |
Record Sequential Delimited: a record-oriented
filesystem from IBM
|
|
transactional filesystems |
Each device
operation may involve changes to a number of
files and structures. In
many cases these changes are related, making it important that they
all be executed at the same time. If the computer crashes before
all records are updated, then some
data will be missing and
the records may not match.
Transaction processing introduces the guarantee that at
any point while it is running, a transaction can either be finished
completely or reverted completely. This means that if there is a
crash or power failure, after recovery the stored state will be
consistent.
This type of filesystem is designed to be fault tolerant, but may
incur additional overhead.
|
distributed filesystems |
A distributed filesystem, also called a
network filesystem, is a filesystem that acts as a client
for a remote file access protocol, providing access to files on a
server.
Distributed filesystems include:
NCP |
NetWare Core Protocol: from Novell, used in networks
based on NetWare
|
NFS |
Network File System: originally from Sun
Microsystems
NFS is the standard in UNIX-based networks. It may use
Kerberos authentication and a client cache.
|
SMB |
Server Message Block: originally from IBM, although
the most common version is heavily modified by Microsoft
SMB is the standard in Windows-based networks. It is also
known as Common Internet File System
(CIFS). SMB may use Kerberos authentication.
|
FTPFS |
filesystem for FTP access
|
|
pseudo/virtual filesystems |
A filesystem may not make use of a
storage device at
all. It can be used to organise and represent access to any
data, whether it is stored
or dynamically generated. If the
data is dynamically
generated, the filesystem is known as a
virtual filesystem.
Pseudo and virtual filesystems include:
procfs |
pseudo-filesystem, used to access
kernel
information about processes
|
specfs |
special filesystem for
device
files
|
|
encrypted filesystems |
Encrypted filesystems include:
eCryptfs |
a stacked cryptographic filesystem in the Linux
kernel since
version 2.6.19
|
EFS |
encrypted file system for Microsoft Windows systems and AIX
It is an extension of NTFS.
|
|
Filesystems and Operating Systems
To make use of a filesystem, an interface between user and filesystem is
required. This interface can be textual, such as the Unix shell or DOS
command prompt, or graphical, such as a file browser.
Filesystems under Unix-like Operating Systems
Unix-like operating systems create
a virtual filesystem, which makes all
files on all
devices appear to exist
in a single hierarchy. There is one root
directory and every
file is located within the
hierarchy under it. A Unix-like system can use a RAM disk or network
shared resource as its root
directory.
Unix-like systems assign a
device
name to each
device, but this is not
how the files on that
device are accessed.
To gain access to files on a
device, the
operating system must be informed
where in the directory
hierarchy those files should
appear. This process is called mounting a filesystem. The
directory assigned to
the device is called
the mount point.
Filesystems other than the root may need to be available as soon as the
operating system has
booted. All Unix-like
systems provide a facility for mounting filesystems at
boot time. System
administrators define these filesystems in the configuration
file fstab.
Removable media allow programs
and data to be transferred
between computers without a physical connection and are in common use.
Examples include USB flash drives, CD-ROMs and DVDs. Utilities have been
developed to detect the presence and availability of a
recording medium and
mount it without user intervention.
Filesystems under Linux
To ease the addition of new filesystems and provide a generic file API,
the Linux kernel employs a
Virtual File System (VFS) layer, that allows different
underlying architectures to be used. The VFS layer interacts with the
filesystem to perform disk I/O, enabling Linux to support multiple
filesystems. Commonly used filesystems include:
- ext (ext2, ext3, ext4)
- FAT (FAT12, FAT16, FAT32, exFAT)
- ISO 9660
Filesystems under Windows
Windows makes use of the FAT and NTFS filesystems.
Windows uses a drive letter to distinguish one
volume from another.
For example, the path C:\WINDOWS represents
directory
WINDOWS in the
volume represented by
the letter C. The C
volume is commonly used
for the primary hard disk
partition, on which
Windows is usually installed and from which it
boots. The tradition of
using C for the
volume letter can be
traced to MS-DOS, where letters A and B are reserved
for up to two floppy drives. Network
volumes may also be
mapped to volume
letters.