Did you ever saw a char and thought: "Damn, 1 byte for a single char is pretty darn inefficient"? No? Well I did. So what I decided to do instead is to pack 5 chars, convert each char to a 2 digit integer and then concat those 5 2 digit ints together into one big unsigned int and boom, I saved 5 chars using only 4 instead of 5 bytes. The reason this works is, because one unsigned int is a ten digit long number and so I can save one char using 2 digits. In theory you could save 32 different chars using this technique (the first two digits of an unsigned int are 42 and if you dont want to account for a possible 0 in the beginning you end up with 32 chars). If you would decide to use all 10 digits you could save exactly 3 chars. Why should anyone do that? Idk. Is it way to much work to be useful? Yes. Was it funny? Yes.

Anyone whos interested in the code: Heres how I did it in C: https://pastebin.com/hDeHijX6

Yes I know, the code is probably bad, but I do not care. It was just a funny useless idea I had.

top 50 comments

sorted by: hot top controversial new old

[–] ChaoticNeutralCzech@feddit.org 2 points 19 minutes ago

unsigned int turn_char_to_int(char pChar)
{
    switch(pChar)
    {
    case 'a':
        return 10;
    case 'b':
        return 11;
    case 'c':
        return 12;
    case 'd':
        return 13;
    case 'e':
        return 14;
    case 'f':
        return 15;
    case 'g':
        return 16;
    case 'h':
        return 17;
    case 'i':
        return 18;
    case 'j':
        return 19;
    case 'k':
        return 20;
    case 'l':
        return 21;
    case 'm':
        return 22;
    case 'n':
        return 23;
    case 'o':
        return 24;
    case 'p':
        return 25;
    case 'q':
        return 26;
    case 'r':
        return 27;
    case 's':
        return 28;
    case 't':
        return 29;
    case 'u':
        return 30;
    case 'v':
        return 31;
    case 'w':
        return 32;
    case 'x':
        return 33;
    case 'y':
        return 34;
    case 'z':
        return 35;
    case ' ':
        return 36;
    case '.':
        return 37;

    }
}

Are you a monster or just stupid?

[–] Zacryon@feddit.org 1 points 38 minutes ago

At first I thought, "How are they going to compress 256 values, i.e. 1 Byte sized data, by "rearranging into integers"?

Then I saw your code and realized you are discarding 228 of them, effectively reducing the available symbol set by about 89%.

Speaking of efficiency: Since chars are essentially unsigned integers of size 1 byte and 'a' to 'z' are values 97 to 122 (decimal, both including) you can greatly simplify your turn_char_to_int method by just substracting 87 from each symbol to get them into your desired value range instead of using this cumbersome switch-case structure. Space (32) and dot (46) would still need special handling though to fit your desired range.

Bit-encoding your chosen 28 values directly would require 5 bit.

[–] python@lemmy.world 2 points 1 hour ago

Hey, this is awesome for saving space when writing things to NFC tags! Every bit still matters with those suckers

[–] dullbananas@lemmy.ca 4 points 3 hours ago

That's bootleg gzip.

[–] Garbagio@lemmy.zip 4 points 4 hours ago (1 children)

Real kings store 32 booleans as a single int

[–] aarch64@programming.dev 4 points 4 hours ago

That's where std::vector<bool> or bitfields come in handy!

[–] ILikeBoobies@lemmy.ca 3 points 6 hours ago

And here I was wasting time with bit fields to make my bools smaller.

[–] NigelFrobisher@aussie.zone 4 points 9 hours ago

My colleague said he didn’t see the point in storing enums as shorts or bytes instead of a full word, so I retaliated by storing them in their string form instead, arguing that it’ll be compressed by the db engine.

[–] Valmond@lemmy.world 24 points 14 hours ago (3 children)

CPU still pulls a 32kb block from RAM...

[–] enumerator4829@sh.itjust.works 13 points 10 hours ago

Lol, using RAM like last century. We have enough L3 cache for a full linux desktop in cache. Git gud and don’t miss it (/s).

(As an aside, now I want to see a version of puppylinux running entirely in L3 cache)

[–] BartyDeCanter@lemmy.sdf.org 8 points 13 hours ago

Look at this guy with their fancy RAM caches.

[–] DacoTaco@lemmy.world 4 points 13 hours ago* (last edited 12 hours ago) (2 children)

Cache man, its a fun thing. ~~32k~~ 32 (derp, 32 not 32k) is a common cache line size. Some compilers realise that your data might be hit often and aligns it to a cache line start to make its access fast and easy. So yes, it might allocate more memory than it should need, but then its to align the data to something like a cache line.
There is also a hardware reasons that might also be the case. I know the wii's main processor communicates with the co processor over memory locations that should be 32k aligned because of access speed, not only because of cache. Sometimes, more is less :')

Hell, might even be a cause of instruction speed that loading and handling 32k of data might be faster than a single byte :').

Then there is also the minimum heap allocation size that might factor in. Though a 32k minimum memory block seems... Excessive xD

[–] victorz@lemmy.world 4 points 13 hours ago

Cache Man, I would watch that movie.

[–] gens@programming.dev 1 points 13 hours ago (1 children)

Cache lines are 64 bytes though? Pages are 4k.

[–] DacoTaco@lemmy.world 2 points 12 hours ago* (last edited 12 hours ago)

Ye derp, im used to 32, not 32k lol.

[–] drath@lemmy.world 20 points 13 hours ago* (last edited 13 hours ago)

Oh god, please don't. Just use utf8mb4 like a normal human being, and let the encoding issues finally die out (when microsoft kills code pages). If space is of consideration, just use compression, like gz or something.

[–] sunbeam60@lemmy.ml 34 points 16 hours ago* (last edited 16 hours ago) (3 children)

After all.. Why not?

Why shouldn’t I ignore the 100+ cultures whose character set couldn’t fit into this encoding?

[–] SubArcticTundra@lemmy.ml 8 points 10 hours ago (1 children)

So did ASCII

[–] SpaceCowboy@lemmy.ca 4 points 7 hours ago

They left one bit for the other cultures use.

[–] MonkeMischief@lemmy.today 25 points 13 hours ago* (last edited 13 hours ago) (1 children)

Not 100% relevant but it was in my collection and I thought it was close enough to be funny. :D

[–] JohnEdwa@sopuli.xyz 5 points 9 hours ago

ŚĆŻRŹĘĄMŚ

[–] Valmond@lemmy.world 5 points 14 hours ago

Åååååå!

[–] HeyThisIsntTheYMCA@lemmy.world 9 points 14 hours ago (1 children)

dammit yesterday was too long i thought this was a dnd joke at first

[–] MonkeMischief@lemmy.today 4 points 13 hours ago (1 children)

Me too! Haven't had my coffee yet. I was like

"... character...? Charisma...? (blink blink)"

[–] HeyThisIsntTheYMCA@lemmy.world 3 points 13 hours ago

coffee? Fuck that's what's going on i knew it. hold on

[–] joseandres42@lemmy.world 19 points 17 hours ago (1 children)

I do this kind of thing everyday as a firmware engineer :)

[–] SubArcticTundra@lemmy.ml 1 points 10 hours ago

What do u write the firmware for?

[–] null@lemmy.nullspace.lol 20 points 18 hours ago

Not useless -- you have a future in tiny, embedded systems.

[–] traceur301@lemmy.blahaj.zone 24 points 20 hours ago (3 children)

I'm not sure if this is the right setting for technical discussion, but as a relative elder of computing I'd like to answer the question in the image earnestly. There's a few factors squeezing the practicality out of this for almost all applications: processor architectures (like all of them these days) make operating on packed characters take more operations than 8 bit characters so there's a speed tradeoff (especially considering cache and pipelining). Computers these days are built to handle extremely memory demanding video and 3d workloads and memory usage of text data is basically a blip in comparison. When it comes to actual storage and not in-memory representation, compression algorithms typically perform better than just packing each character into fewer bits. You'd need to be in a pretty specific niche for this technique to come in handy again, for better or for worse

[–] gusgalarnyk@lemmy.world 6 points 18 hours ago

I liked the technical discussion so thank you. Keep it up, I got into this career because there was always so much to learn.

[–] cows_are_underrated@feddit.org 6 points 19 hours ago (1 children)

This is 100% true. I never plan on actually using this. It might be useful for working on microcontrollers like an ESP32, but apart from that the trade of for more computational power is not worth the memory savings.

[–] rustyricotta@lemmy.dbzer0.com 1 points 8 hours ago

Having seen many of Kaze's videos on N64 development, I've learned that the N64 has like 4x the processing power it needs compared to its memory. In hardware cases like that the trade-off of computational power and memory memory savings gets you some nice performance gains.

load more comments (1 replies)

[–] daniskarma@lemmy.dbzer0.com 9 points 21 hours ago (1 children)

I was hopping it was somehow badly implemented in python and each char ended up occupying 2Gb

[–] cows_are_underrated@feddit.org 6 points 21 hours ago (3 children)

Hmmmmmmm, that sounds like another fun idea. Trying to make storing a single char as inefficient as possible.

load more comments (3 replies)

[–] RiQuY@lemmy.zip 27 points 1 day ago (3 children)

Interesting idea but type conversion and parsing is much more slower than wasting 1 byte. Nowadays memory is "free" and the main issue is the execution speed.

[–] rtxn@lemmy.world 9 points 23 hours ago (1 children)

Fuck it. *uses ulong to store a boolean*

[–] Tja@programming.dev 1 points 2 hours ago

So, python?

[–] cows_are_underrated@feddit.org 8 points 1 day ago

I know. This whole thing was never meant to be very useful, and more like a proof of concept

load more comments (1 replies)

[–] UnPassive@lemmy.world 2 points 16 hours ago (1 children)

I have a coworker who does stuff like this and it's always low-benefit optimizations that cost the team time to interface with - but I do still kind of love it

[–] Saleh@feddit.org 7 points 15 hours ago (3 children)

I feel like many programmers (or their management) have grown ignorant to resource limitations over the past decade or so.

Obviously there is good examples like many linux distros running well on 4GB RAM and the like, but when it comes to windows, websites and proprietary programs, they gobble up insane amounts of RAM to provide almost the same functionality as in 2010.

[–] Hoimo@ani.social 6 points 14 hours ago

It's just not on their radar at all these days. You want to develop and iterate quickly, so you're not going to program everything from scratch. No, you grab an off-the-shelf framework and implement only your business-specific things in that framework. There's so many layers of abstraction that optimization becomes impossible (beyond what the framework does for you), but it saves you a ton of expensive developer hours and gets you to market really fast. And when someone complains that your website doesn't perform for shit, you just blame their hardware, right? Externalize those costs.

[–] BartyDeCanter@lemmy.sdf.org 3 points 13 hours ago

4GB to run well... I remember happily running linux on 4MB of RAM, complete with X and web browser. I also remember running BeOS on a machine with 64MB of RAM and having one of the the best desktop experiences I've ever used.

[–] MonkeMischief@lemmy.today 3 points 13 hours ago

they gobble up insane amounts of RAM to provide almost the same functionality as in 2010.

Critical to using our service? Maybe even an operating system?

ELECTRON APP!

[–] bacon_pdp@lemmy.world 92 points 1 day ago (9 children)

Bro reinventing sixbits

https://en.m.wikipedia.org/wiki/Six-bit_character_code

[–] deltapi@lemmy.world 1 points 14 hours ago

The AD&D "Gold Box" games from SSI Inc. stored game text in 6-bit encoding. The first one of these I played was "Champions of Krynn" and the PC release came on 4 360k 5.25 dsdd floppy disks. They actually needed the packing in those days, and couldn't afford to spent cpu cycles or ram on built in compression.
I remember opening up the game data files in a file viewer (maybe pc-tools?) and being confounded by the lack of text in the files.

load more comments (8 replies)

[–] bandwidthcrisis@lemmy.world 56 points 1 day ago (2 children)

You would have done well with this kind of thinking in the mid-80s when you needed to fit code and data into maybe 16k!

As long as you were happy to rewrite it in Z80 or 6502.

Another alternative is arithmetic encoding. For instance, if you only needed to store A-Z and space, you code those as 0-26, then multiply each char by 1, 27, 27^2, 26^3 etc, the add them.

To unpack them, divide by 27 repeatedly, the remainder each time is each character. It's simply covering numbers to base-27.

It wouldn't make much difference from using 5 bits per char for a short run, though, but could be efficient for longer strings, or if encoding a smaller set of characters.

load more comments (2 replies)

load more comments