| Home > About PCs > Maintenance > Mass storage units |
Short version to remember, maybe:
A byte is one character of plain text.
A megabyte (MB) is a little more than a million bytes.
A gigabyte (GB) is a little more than a thousand megabytes.
A terabyte (TB) is a little more than a thousand gigabytes, and so forth.
Long version:
| Unit | Original definition |
Comments | IEC definition (see below) | ||
|---|---|---|---|---|---|
| bit | b | One binary digit | Smallest piece of information a computer can store | (same) | |
| byte | B | Eight bits | One character of plain text: 01000001 = "A" | (same) | |
| --- | K | 1,024 bytes | Now used mostly for file sizes in folder windows. | 1,024 = 210 > 103 | 1,000 bytes (103) |
| megabyte | MB | 1,024 K = 1,048,576 bytes > one million bytes |
Used to describe small amounts of RAM memory, and older removable formats such as diskettes. | 1,048,576 = 220 > 106 | 1,000,000 bytes (106) |
| gigabyte | GB | 1,024 MB = 1,073,741,824 bytes > one billion bytes |
Used for capacities of RAM memory, hard disks, and larger removable formats. | 1,073,741,824 = 230 > 109 | 1,000,000,000 bytes (109) |
| terabyte | TB | 1,024 GB = 1.0995 × 1012 bytes > one trillion bytes |
Used for some RAID arrays, and things like the total size of the Web, which in 1998 was estimated at ten terabytes. | 1.0995 × 1012 = 240 > 1012 | 1012 bytes |
| petabyte | PB | 1,024 TB = 1.1259 × 1015 bytes > one quadrillion bytes |
The Large Synoptic Survey Telescope to be built in Chile is expected to generate up to 30TB of data per night, or up to 10 petabytes per year. | 1.1259 × 1015 = 250 > 1015 | 1015 bytes |
| exabyte | EB | 1,024 PB = 1.1529 × 1018 bytes > one quintillion bytes |
Rarely used. | 1.1529 × 1018 = 260 > 1018 | 1018 bytes |
Why powers of two? What's wrong with those nice round numbers the IEC likes? Computers do everything in base two, because at the hardware level their operations depend on literally millions of tiny transistor circuits that only have two possible states, on and off. Humans do arithmetic in base ten because most of us have ten fingers.
To a computer, the number 1,024—two to the tenth power—is a round number, and 1,000—ten to the third—is not:
| Base 10, decimal | Base 2, binary |
Base 16, hexadecimal |
|---|---|---|
| 1,024 | 10000000000 | 400 |
| 1,000 | 1111101000 | 3E8 |
There are now two conflicting sets of definitions being used for computer mass storage units such as megabyte, gigabyte, and so forth. When you see these terms used now, if you need to know in which sense they are used you will have to find out from documentation or based on context, a confusing and unfortunate situation.
The International Electrotechnical Commission, www.iec.ch, asserts a set of decimal-based "definitions" for these commonly-used terms, given in the far-right column of the large table above. The original binary-based definitions of these terms are better established in practical usage, and more meaningful relative to real memory architecture on the chip. The original binary units are seen in documentation for conventional RAM memory and CDs, and in software including file sizes. You'll see the IEC definitions used in manufacturer documentation of storage capacities of hard disks and DVDs, sometimes flash media including USB, and in networking contexts. When documentation uses the IEC units, typically they just baldly assert that one megabyte equals one million bytes, without any explanation or mention of the IEC or dispute.
The IEC also asserts the following bizarre terms and abbreviations for the original binary-based storage units, instead of the terms commonly used and understood:
kebibyte, KiB = 210 bytes
mebibyte, MiB = 220 bytes
gibibyte, GiB = 230 bytes
tebibyte, TiB = 240 bytes
pebibyte, PiB = 250 bytes
exbibyte, EiB = 260 bytes
As near as I can determine, nobody really likes the IEC system but the IEC and storage media manufacturers. Adopting the IEC definitions, without explanation, lets people claim a capacity in "gigabytes" which will appear to a casual reader to be 7.4% larger than the actual capacity of the device. Perhaps this is the digital-world equivalent of the false bottoms supermarket berry baskets used to have. People mostly don't use the IEC's strange "mebibyte/gibibyte" terms for anything other than explaining this issue.
I will admit, if pressed, that the IEC definitions are superficially more consistent with prior and ongoing usage of prefixes like "mega" and "giga" in ISV and (non-computer) science. They also fly in the face of the usages established and understood by the people who actually created the computer revolution.
Since 2007 standards organizations, including IEEE and NIST, have actually accepted the IEC definitions, so that the definition of megabyte is now supposedly changed from its original meaning to one million bytes even. If you want to specify the original 220 bytes (1024×1024) you're now supposed to use the ridiculous mebibyte. One can reasonably expect different people to use megabyte in the original and post-2007 senses as it suits them, with resulting unnecessary confusion. I consider this a bizarre and shameful outcome, but apparently we're stuck with it. See the Wikipedia articles megabyte and gigabyte for further details and discussion.
Denominations such as billion, trillion, and so forth, for decimal numbers above millions, are used here according to the nomenclature now used in English-speaking countries, called short scale, in which the prefix of the denomination indicates the number of groups of three zeros after 1,000. A different system, long scale, is used in most Continental European countries, and in most other countries whose languages derive from Continental Europe, in which the same prefixes signify the number of powers of one million. The two systems are the same until you go beyond the number 999,999,999.
Canada uses one system in English and the other in French, and South Africa one in English and the other in Afrikaans, with maximum opportunities for confusion, obviously. If you hear somebody say something like thousand million (which would just be billion in short scale) they're probably talking in long scale. For more, see the Wikipedia article Long and short scales, or any recently-published doorstop-size dictionary under number.
The straightforward way to eliminate ambiguity is to include scientific notation, as seen in the definitions of gigabyte, terabyte, petabyte, and exabyte in the large table above, whenever you're talking about 109 (ten to the ninth power) or greater.