|
| |
Data Compression
In computer science and information theory, data compression or source coding
is the process of encoding information using fewer bits (or other
information-bearing units) than an unencoded representation would use through
use of specific encoding schemes. For example, this article could be encoded
with fewer bits if we accept the convention that the word "compression"
be
encoded as "comp". One popular instance of compression that many
computer users
are familiar with is the ZIP file format, which, as well as providing
compression, acts as an archiver, storing many files in a single output file.<br>
<br>
As is the case with any form of communication, compressed data communication
only works when both the sender and receiver of the information understand the
encoding scheme. For example, this text makes sense only if the receiver
understands that it is intended to be interpreted as characters representing the
English language. Similarly, compressed data can only be understood if the
decoding method is known by the receiver. Some compression algorithms exploit
this property in order to encrypt data during the compression process so that
decompression can only be achieved by an authorized party (eg. through the use
of a password).<br>
<br>
Compression is possible because most real-world data has statistical redundancy.
For example, the letter 'e' is much more common in English text than the letter
'z', and the probability that the letter 'q' will be followed by the letter 'z'
is rather small. Lossless compression algorithms usually exploit statistical
redundancy in such a way as to represent the sender's data more concisely, but
nevertheless perfectly.<br>
<br>
Further compression is possible if some loss of fidelity is allowable. For
example, a person viewing a picture or television video scene might not notice
if some of its finest details are removed or not represented perfectly.
Similarly, two strings of samples representing an audio recording may sound the
same but actually not be exactly the same. Lossy compression algorithms
introduce relatively minor differences and represent the picture, video, or
audio using fewer bits.<br>
<br>
Compression is important because it helps reduce the consumption of expensive
resources, such as disk space or connection bandwidth. However, compression
requires information processing power, which can also be expensive. The design
of data compression schemes therefore involves trade-offs between various
factors including compression capability, any amount of introduced distortion,
computational resource requirements, and often other considerations as well.<br>
<br>
Some schemes are reversible so that the original data can be reconstructed
(lossless data compression), while others accept some loss of data in order to
achieve higher compression (lossy data compression).<br>
<br>
However, lossless data compression algorithms will always fail to compress some
files; indeed, any compression algorithm will necessarily fail to compress any
data containing no discernible patterns. Attempts to compress data that has been
compressed already will therefore usually result in an expansion, as will
attempts to compress encrypted data.<br>
<br>
In practice lossy data compression will also come to a point where compressing
again does not work, although an extremely lossy algorithm, which for example
always removes the last byte of a file, will always compress a file up to the
point where it is empty.<br>
<br>
</p></body>
</html>
|