+ 1-888-787-5890  
   + 1-302-351-4405  
 
 
 
 

Essay/Term paper: Data compression

Essay, term paper, research paper:  Technology

Free essays available online are good but they will not follow the guidelines of your particular writing assignment. If you need a custom term paper on Technology: Data Compression, you can hire a professional writer here to write you a high quality authentic essay. While free essays can be traced by Turnitin (plagiarism detection program), our custom written essays will pass any plagiarism test. Our writing service will save you time and grade.

Word Count: 1546



"Data Compression" just sounds complicated. Don"t be
afraid, compression is our good friend for many reasons. It saves hard drive
space. It makes data files to handle. It also cuts those immense file download
times from the Internet. Wouldn"t it be nice if we could compress all files
down to just a few bytes?

There is a limit to how much you can compress
a file. How random the file is, is the determining factor to how far it can
be compressed. If the file is completely random and no pattern can be found,
then the shortest representation of the file is the file it self. The actual
proof that proves this is at the end of my paper. The key to compressing a
file is to find some sort of exploitable pattern. Most of this paper will
be explaining those patterns that are commonly used.

Null suppression is
the most primitive form of data compression that I could find. Basically,
it says that if you have different fields that data is in (possibly a spread
sheet), and any of them have only zeros in them, then the program just eliminates
the data and goes straight from the empty data set to the next.

Only one
step up from null suppression is Run Length Encoding. Run length encoding
simply tells you how many of what you have in a row. It would change a set
of binary data like {0011100001} into what the computer reads as (2)zeros,
(3)ones, (4)zeros, 1. As you can see, it works on the same basic idea of finding
a series of 0"s (null suppression) and 1"s in this case too and abbreviating
them.

Once the whole idea of data compression caught on, more people started
working on programs for it. From these people we got some new premises to
work with. Substitutional encoding is a big one. It was invented jointly
by two people: Abraham Lempel and Jakob Ziv. Most compression algorithms (big
word meaning roughly "program") using substitutional encoding start with "LZ"
for Lempel-Ziv.

LZ-77 is a really neat compression in which the program
starts off just copying the source file over to the new target file, but when
it recognizes a phrase of data that it has previously written, it replaces
the second set of data in the target file with directions on how to get to
the first occurrence of it and copy it in the directions" place. This is more
commonly called a sliding-window compression because the focus of the program
is always sliding all around the file.

LZ-78 is the compression that most
people have in their homes. Some of the more common ones are ZIP, LHA, ARJ,
ZOO, and GZIP. The main idea behind LZ-78 is a "dictionary". Yet it works
quite a bit like the LZ-77. For every phrase it comes across, it indexes the
string by a number and writes it in a "dictionary". When the program comes
across the same string, it uses the associated number in the "dictionary" instead
of the string. The "dictionary" is then written along side the compressed
file to be used in decoding.

There is a combined version of LZ-77 an
LZ-78. It is called LZFG. It only writes to the dictionary when it finds
the repeated phrase, not on every phrase. Then instead of LZFG replacing the
second set of data with directions on how to get to the first occurrence of
it, the program puts in the number reference for the dictionary"s translation.
Not only is it faster, but it compresses better because of the fact that it
doesn"t have as big of a dictionary attached.

Statistical encoding is another
one of the new compression concepts. It is an offshoot of the LZ family of
compressors; It uses basically the same style as LZFG, but instead of assigning
the numbers in order that the strings come out of the source file, statistical
compressors do some research. It calculates the number of times each string
is used and then ranks the string with the most number of uses at the top of
the hash table. The string with the least is ranked at the bottom. (A hash
table is where the rank is figured) The higher up a string is on this list,
the smaller of a reference number it gets to minimize the total bit usage.
This gives this compression just a slight edge on the others, but every little
bit helps. (ha ha -bit- )

Beware! There are a few compression programs
out there that claim wonderful compression ratios; ratios that beat the compression
limit for that file"s randomness. These programs aren"t really compression
programs. They are OWS and WIC. Never compress anything with these. What
they do is split up the file that you desired to compress and hide most of
it on another part of your hard drive. OWS puts it in a specific spot on the
physical hard drive disk. WIC puts the extra information in a hidden file
called winfile.dll. The real problems with these programs are that if you
don"t have the winfile.dll or the information on the certain spot on your drive,
then the program won"t put your file back together.

My original intent
with this project was to invent a new compression algorithm. I started with
the idea that if you took the file in its pure binary form and laid it out
in a matrix, there were certain rows and columns that you could add up to get
an output that would be able to recreate the original matrix. I was close
too. I had four different outputs that actually would be what would make up
the compressed file that combined together to create one output for each bit.
From this single output I could determine if the bit was 1 or 0. It worked
perfectly for matrixes of 1x1, 2x2, and 3x3. Except that with this small of
a matrix, I wasn"t compressing it at all. It was more of a coding system that
took up more space than did the original file. I even found a way to shrink
the size of the four outputs but it was not enough to even break even on bit
count. When I got to the 4x4"s I found an overlap. An overlap is a term I
made up for this algorithm. It means that I got the same single output for
a 1
as I did a 0. When that happens, I can"t figure out which it is: a 1
or 0. When you can"t recreate the original file, data compression has failed.
It becomes lossy. I needed a fifth original output. If you want more information
on how I thought the algorithm would have worked, please refer to my Inventor"s
Log that I included. It"s way too much to re-type here and it would serve
no real purpose in this paper.

If you were paying attention earlier, you
would be saying, "Why don"t you find a pattern? Otherwise you can"t compress
it. You are treating it like a random file." I didn"t find out that it was
impossible to compress random data until about the time my algorithm was failing.

Because
of my setbacks I started looking for an entirely new way to compress data,
using a pattern of some sort. I got to thinking about all of the existing
algorithms. I wanted to combine a hash table, statistical coder, and a run
length coder. The only hard part that I would see in that would be trying
to get the patent holders of each of those algorithms to allow me to combine
them and actually modify them slightly.

In its current algorithm, the statistical
coder only accepts alpha-numeric phrases. I would like to modify it to not
read the characters that the binary code spells out, but the binary code it
self. I don"t know what form the output is aside from compressed, but for
my purposes it wouldn"t matter what the form the output is. I would program
into the program all of the 32 combinations of 5 bits (2^5). Each of the combinations
would be labeled in the program 1 through 32. I would then make a hash table
of all of the 5 bit combinations. This would give me an output which I would
run through a run length coder. Since the entire algorithm is reliant on binary
code and not on characters, it can be recursable, or it can compress further
an already compressed file. LZ"s can"t do that because once they convert a
string into it"s dictionary/sliding window form, it ceases to be one of the
characters that it compresses.

Now that you are aquatinted with our friend,
Data Compression, I hope he will be able to serve you better. Now you can
download programs faster, save space, and who knows? Maybe you will invent
the next new compression algorithm. Until then, keep your mice happy and your
monitors warm.


Proof that random data is not compressible:
Let"s suppose
the file to be compressed is 8 bits long (x works, but this is easier) and
is random
There are exactly 2^8 different possible 8 bit data strings. (2^x)
To
compress the file it must shrink it by at least one bit (2^x)-1
So there are
at most 2^7 different compressed files 2^(x-1)
Thus at least two source files
compress down to the same file.
Therefore The compression cannot be lossless.


Bibliography

Aronson, Jules Data Compression- a comparison of Methods
Washington D.C.: Institute for Computer Science and Technology

http://simr02.si.ehu.es/DOCS/mice/compression-faq/part1/faq-doc-x.html
x=Intro,8,9, and 10

http://www.epl.co.uk/genit18.htm

http://www.sees.bangor.ac.uk/~gerry/sp_summary.html

http://Literacy.com//mkp/pages/3468/index.html 

Other sample model essays:

In today"s business world, information about the customer is a necessity for a businesses trying to maximize its profits. A new, and important, tool in gaining this knowledge is Data Mining. Data Mi...
Technology / Data Parity
December 1996 Parity/Non-parity Parity check Early transmission codes had a serious problem in that a bit could be lost or gained in transmission because of an electrical or mechanical failure/ If the...
Dealing with Conflict 1.1  Potential conflict can arise when two parties or people have a disagreement on a particular subject. For example, a person may have said he was going to comp...
Abstract This essay intends to discuss the following statement; Digital Broadcasting will have a fundamental effect on viewing patterns, popular culture and audience identity. This will be done...
Technology / Do Computers Think?
Can or will computers ever think? Well this has been a subject of much debate between even the greatest minds, and yet there is still no answer. First of all I have would like you to answer a question...
Technology / E-Mail Business
An Economic Comparison of Mail Communications New technologies and advances in electronics have always allowed big business to do things faster, more efficiently, and more professionally than ever bef...
Economic Consequences of Software Crime In 1996 worldwide illegal copying of domestic and international software cost $15.2 billion to the software industry, with a loss of $5.1 billion in the North A...
Sociology Term Papers / Effects Of Technology
Advancement in technology has affected many people in many ways. New technology has altered peoples' consciousness, language, and the way society views the world. Technological advancements hav...
After researching electroconvulsive therapy (ECT), I have decided that if a close family member or even myself were severely depressed I would not support the use of ECT. Electroconvulsive therapy co...
Technology / Electronic Commerce
Introduction: As the internet grows in popularity every day, so too does electronic commerce. Electronic commerce, simply put, is the exchange of money for goods and services via electronic means. ...
Experience with Dream Essay - Reliable and great customer service. Quality of work - High quality of work.
, ,
Dream Essay - Very reliable and great customer service. Encourage other to try their service. Writer 91463 - Provided a well written Annotated Bibliography with great deal of detail per th
, ,
it is always perfect
, ,
The experience with Dream Essay is stress free. Service is excellent and forms various forms of communication all help with customer service. Dream Essay is customer oriented. Writer 17663
, ,
Only competent & proven writers
Original writing — no plagiarism
Our papers are never resold or reused, period
Satisfaction guarantee — free unlimited revisions
Client-friendly money back guarantee
Total confidentiality & privacy
Guaranteed deadlines
Live Chat & 24/7 customer support
All academic and professional subjects
All difficulty levels
12pt Times New Roman font, double spaced, 1 inch margins
The fastest turnaround in the industry
Fully documented research — free bibliography guaranteed
Fax (additional info): 866-332-0244
Fax (additional info): 866-308-7123
Live Chat Support
Need order related assistance?—Click here to submit a inquiry
© Dreamessays.com. All Rights Reserved.
Dreamessays.com is the property of MEDIATECH LTD