4

You can have 0-255 in hex, stored in 2 characters, so it kind of compresses the data, and is used for all sorts of things including colour, IP and MAC addresses.

My question is why did they stop at 16 bit (or why is that most commonly used)? There are enough letters in the alphabet for 32 bit, which would give a range of 0-65536 contained within the same amount of space, potentially allowing for 280 trillion colours as opposed to just 16 million. If you make the letters case sensitive and add two symbols, you could go to 64 bit, allowing up to 4.3 billion values to be represented by the two characters.


Some examples of situations I think this would work:

IPv4 is running out. I know v6 is being rolled in, but it's very long and will be hard to remember. Take the 192.168.0.1 address, it can also be stored as C0.A8.0.1. Using 64 bit hex but still keeping it to a maximum of 8 characters, you can have 280 trillion combinations instead of 4 billion, and we wouldn't have this problem.

As mentioned above, it also provides a much larger range of colours. The RAW photo format records at 32 bits per colour channel instead of 8, with the downside of a huge increase to the file size. If the RGB values were stored as hex, there should be no change in the size of the file as you increase the range of colours, as it would still be stored within 6 bits per pixel, but with a higher base number. Instead, it's recorded as numerical values at 96 bits per pixel, which is a very unnecessary increase of 1600%, leaving photos at over 20MB (and according to an online calculator, 4K RAW video at 32 bits of colour could go up to 2.5GB per second).


This part isn't really to do with the question, but I wrote a script a while back which can convert the numbers to different base values, ranging from binary to base 88 (ran out of symbols after that), which shows it's easily possible to implement similar things. As an example, here's the output from 66000.
Base 2: 11111111111110000
Base 16: 101D0
Base 32: 20EG
Base 64: G7G
The code is here if anyone is interested, it still has a few bugs though and I've only tried it from within Maya. A bit off topic, but I've also just noticed that normal hex seems to be around 20% fewer bits than the original number, and base 88 is almost a 50% reduction.


One final question: Has anyone attempted my idea of storing photos as hex? Would it potentially work if you used 64 bit hex, and stored the photos with data like [64;1920;Bgh54D;NgDFF4;...]? If not, I might try create something which can do that.

Peter
  • 1,085
  • 4
  • 11
  • 14
  • 2
    Because hex means 16! I know you want to argue the default system for representation of numbers in computer science. The last paragraph confuses me though. A representation of numbers is only helpful for us humans. For computers they are still zeroes and ones and you cant do anything about that. You effectively write the zeroes and ones into a string format (which is 8 bit each character again) and thus lose in the end. The number 255 is 11111111 for the computer, in hex it's FF for you, but in character data that's TWO bytes, not ONE. – sinni800 Feb 20 '14 at 15:36
  • But the amount of bits used for the storage of higher base numbers is reduced, so while they are still literally stored as 1's and 0's, there is a lot less 1's and 0's used when storing 1000 as base 64 as opposed to in binary. Sorry if that sounds a bit convoluted, but I think it makes sense. Edit: Oh actually, I get what you mean, I meant literally store it as the characters, so 255 would be FF, which would be the ASCII codes (which I know would increase the bits used to store lower numbers, but would work better with higher numbers) – Peter Feb 20 '14 at 15:39
  • 1
    @Peter - It doesn't. You also have to understand base 16, hexdecmial, wasn't always used in a 32-bit world it used to exist in 16 and 8 bit worlds. – Ramhound Feb 20 '14 at 15:42
  • I get your point though, it didn't cross my mind that things like IP addresses were actually stored as binary, and I totally forgot about 32 bit not always being around. I still think it'd help with high detail photos though. – Peter Feb 20 '14 at 15:43
  • @Peter - Which is the reason RAW exists. There is a difference between bytes being stored by the computer and bytes on a storage device. One has an infinite amount of space the other has limited space and has memory registers. You do understand if you store data in "hex" instead of "binary" you will have to convert it back to binary right? Everything a computer does is in bits, actually everything it does is in voltage, but thats an entirely differnt computer 101 lesson. – Ramhound Feb 20 '14 at 15:47
  • Yeah, I did guess the conversion would be the processing heavy part, although to be honest until I asked this I didn't realise that hex was literally only used by us and not computers. I'd still love to try make some code that could convert the images, just to see how fast it'd run, and how much it'd actually compress the images if it is 32bpc. – Peter Feb 20 '14 at 16:01
  • @sinni800 *"Because hex means 16"* -- Actually "hex" means six, not sixteen. How many sides does a hexagon have? "Hexadecimal" means sixteen. – sawdust Feb 21 '14 at 08:14
  • Most systems are stuck using binary coding of symbols and values. So you are stuck with the the same storage requirements regardless of how you display the data in human-readable form. Now there are a few communication systems that encode 8 bits into one symbol, such as QAM256. But such schemes that employ both amplitude and phase modulation cannot be easily used in storage media or do not have a storage density advantage. – sawdust Feb 21 '14 at 08:40
  • @sawdust I'm sorry, I missed here. Yeah, DEC (DEKA) means ten and hex means 6 so it should be hexadecimal... Man :D – sinni800 Feb 24 '14 at 09:12

3 Answers3

12

If I am reading the question correctly, you are saying that the data 'shrinks' when you use larger bases, when in fact it doesn't.

Take your own example: Base 2: 11111111111110000 Base 16: 101D0 Base 32: 20EG Base 64: G7G

We would use 101D0 for that, because hex is standard. What would happen if we used base 64 notation?

The answer is: essentially nothing, since you are still storing and processing the data in bits in your device. Even if you say you have G7G instead of 101D0, you are still storing and working with 11111111111110000 in your device. Imagine you have the number 5. If you put that in binary it would be 101. 101 has 3 digits and 5 has one, and this does not mean 5 is more compressed than 101, since you would still be storing the number as 0101 on your computer.

Just to keep with your examples, the IPv6 thing, or MAC addresses (for this example they are just the same thing, strings of two digits separated by dots).

We have, in hex, 00:00:FF:01:01. That is how you would regularly express it. This translates in binary as 0000 0000 0000 0000 1111 1111 0000 0001 0000 0001 (You are probably starting to see why we use hex now). This is easy, because since 16=2^4, you can convert one hex digit as 4 binary digits and just put the result together to get the actual binary string. In your base 64 system, if we had something like GG:HH:01:02:03, each letter would translate to 6 bits.

What is the problem with this then? The fact that computers work internally with powers of two. They don't really care about the notation you are using. In CPU registers, memory and other devices, you will never see data divided in groups of 6 bits.

TL;DR: Hexadecimal is just a notation to help us humans see binary things easier since a byte can be expressed as two characters (0-F), what is stored and processed in the computer is the same no matter the notation you use to read it.

  • Thanks, at the time of writing, I wasn't actually aware that the computers didn't use the hex, I thought IP addresses were 1-255 purely because it was 2 characters per number. I was thinking that converting numbers at higher base numbers would possibly be good for the compression at least, but I can see now why it's not used normally. – Peter Feb 20 '14 at 16:04
4

Hexadecimal literally means 16. ;)

But aside from the snarky answer, hexadecimal (or any other power-of-2 base numbering system) is simply a more compact format for representing binary data. At the lowest level, the values are still represented numerically by bits. At the lowest level, these bits are broken down into chunks that the hardware architecture can handle easily.

Keep in mind that hexadecimal numbers are not represented as characters 0-9 and a-f--they are literally stored as bits. Each "digit" is not, as you suggest, encoded as an 8-bit character 0-255, where only the first 16 values in the system are being used.

Let's compare compare the base 2 and base 64 representations in your example.

base2: 11111111111110000 --> 17 "digits" with 1 bit per digit = 17 bits
base64: G7G --> need 3 "digits" with 6 bits per digit = 18 bits

Now consider a base64 encoding where each "digit" is actually represented by an 8-bit character. You still have G7G, but now each "digit" requires 8 bits.

G7G --> 3 "digits" with 8 bits per digit --> 24 bits

Even in this oversimplified example, if you use base64 to represent everything, you could have a lot more slack (wasted) space than a numbering system that allocates space in smaller chunks.

As I said, the previous example is an oversimplification and assumes you are only dealing with unsigned numbers (i.e., no negative numbers). In reality, the data will be stored in "words" whose size can vary depending on the hardware architecture. If you have an 8-bit word, you must assign values in chunks of 8 bits, so the 17-bit value now requires 24 bits to store.

So although it is trivial to use any power-of-two base numbering system as you suggest, it just isn't common. This may be because popular modern computer architectures arose out of 16-bit architectures where hexadecimal was literally the hardware's native language.

rob
  • 14,148
  • 5
  • 52
  • 85
  • Yeah thanks, I wasn't aware that IP addresses and things were stored as the 8 1's and 0's instead of 2 characters, so I understand why the first idea wouldn't work. I did realise however about the characters being 8 bits each though, and being less efficient around smaller numbers. The image idea I suggested, it seems that up to 16 bits per channel, it'd actually be less effective, but then possibly would work better past that mark. – Peter Feb 20 '14 at 16:11
  • 1
    I think the part where you're getting hung up is the boundary between the machine's internal representation of a value and the various representations that you, as a human, can use to visualize the same value. Whether you represent a 32-bit value in hex or binary, it still requires 32 bits to store. – rob Feb 20 '14 at 16:34
  • 1
    Small remark: Hexadecimal does literally mean 16 as much as decimal means 10. Nice answer. – Jimmy Kane Feb 20 '14 at 17:00
3

Hex seems to be a pretty good compromise between binary and decimal.

  • It's easy to convert to binary just by looking at it.

  • Easy to read, write, and communicate verbally if needed. Imagine trying to tell someone a base64-encoded string over the phone.

  • Single board computers in the 70's and 80's used to have 7-segment LEDs and no other display mechanism out of the box. Fortunately, A, B, C, D, E, and F all can be rendered in one of those.

Of course, when we talk about 64-bit, 128-bit, and larger quantities, or things like hashes, it's not easy to communicate in hex, decimal, or anything really. To me, the "heyday" of hexadecimal was when 8 and 16-bit CPUs were commonplace, and also when low-level programming was more commonplace because it was more necessary. I could be wrong.

I'm not sure hex is in common use except to express the address of pointers in C/C++. I guess hex is used out of habit or tradition, here, and has also come to be a signal that something is a "raw binary" value and not really any "type."

Has anyone attempted my idea of storing photos as hex?

Any file, no matter what its type or contents, is a big chunk of bytes. It's already in binary. Hex is just a (very minimally) human-friendly view of that.

If you want to look at the bytes of a file in hex format, there's a plethora of hex editors and viewers that will do that.

If you are proposing to store a photo as a text file containing a list of hex numbers, I guess you could do that if you want, but it's going to be larger and slower to process than the original file.

LawrenceC
  • 73,030
  • 15
  • 129
  • 214
  • Thanks, it did dawn on me that my photo idea would require a lot of processing, I'd still like to try it sometime though, just to see if it can compress large photos effectively. When I was writing the question I wasn't aware that IP addresses and that lot actually used 8 bytes, I literally thought it was stored as two characters. – Peter Feb 20 '14 at 16:07
  • You're taking original data (the original .jpg, for example), then EXPANDING it when you convert to textual hex format. If you compress it down using an alternate representation, you aren't ever going to approach the size of the original binary. The original binary has 256 "symbols" for each of 256 values a byte can hold. If you have less symbols, you're always going to have a bigger representation in the end. – LawrenceC Feb 20 '14 at 16:13
  • I know it would be less efficient for jpg files and basically anything less than 16 bits per channel, but for raw uncompressed photos, I believe it could potentially work quite well. Although that's why I'd be interested in trying it, as I have no idea if it would work well. – Peter Feb 20 '14 at 16:26
  • Give it a shot, it'll be a good learning experience. There's lots of algorithms out there already like RLE, DCT, gzip, lzma, Huffman, etc. – LawrenceC Feb 20 '14 at 17:10