Character encodings: UTF-8 vs iso-8859-1(5)

Please compose all posts in Emacs.

Moderators: phlip, Moderators General, Prelates

User avatar
Hurduser
Posts: 285
Joined: Tue Dec 04, 2007 6:14 pm UTC
Location: Esperantujo

Character encodings: UTF-8 vs iso-8859-1(5)

Postby Hurduser » Fri Dec 14, 2007 7:36 pm UTC

Well, I prefer UTF-8 even though I do admit it makes programming a pain in the lower back. Even simple things like calculating the length of a String gets ugly with that. However, the Iso-Encodings fail as soon as you try something unusual - which tends to happen. If even the names of close friends can not be in the same encoding as a mail to them, the limitations are just too much.

So, what do you think about this issue?
Frag mal nach im IRC
'zum Kotzen' das heisst dort XP.
Win2k, nur so zum staunen,
hat mehr Bugs als nur zweitausend.

User avatar
headprogrammingczar
Posts: 3072
Joined: Mon Oct 22, 2007 5:28 pm UTC
Location: Beaming you up

Re: Character encodings: UTF-8 vs iso-8859-1(5)

Postby headprogrammingczar » Fri Dec 14, 2007 8:48 pm UTC

Both are evil. We should use barcode notation, with a reverse UTF-8 alphabet. There is nothing quite like a 5 MiB paragraph.
<quintopia> You're not crazy. you're the goddamn headprogrammingspock!
<Weeks> You're the goddamn headprogrammingspock!
<Cheese> I love you

zenten
Posts: 3799
Joined: Fri Jun 22, 2007 7:42 am UTC
Location: Ottawa, Canada

Re: Character encodings: UTF-8 vs iso-8859-1(5)

Postby zenten » Sun Dec 16, 2007 4:17 am UTC

UTF-16.

User avatar
Tei
Posts: 63
Joined: Fri Nov 30, 2007 2:58 pm UTC

Re: Character encodings: UTF-8 vs iso-8859-1(5)

Postby Tei » Mon Dec 17, 2007 10:21 am UTC

Hurduser wrote:Well, I prefer UTF-8 even though I do admit it makes programming a pain in the lower back. Even simple things like calculating the length of a String gets ugly with that. However, the Iso-Encodings fail as soon as you try something unusual - which tends to happen. If even the names of close friends can not be in the same encoding as a mail to them, the limitations are just too much.

So, what do you think about this issue?


length of string?
the language library are for that stuff...

I think today computers can handle UTF-8, so is a non-sense to use something else.

User avatar
phlip
Restorer of Worlds
Posts: 7557
Joined: Sat Sep 23, 2006 3:56 am UTC
Location: Australia
Contact:

Re: Character encodings: UTF-8 vs iso-8859-1(5)

Postby phlip » Mon Dec 17, 2007 10:58 am UTC

The real question: ISO-8859-1 vs Windows-1252 vs CP437...

Code: Select all

enum ಠ_ಠ {°□°╰=1, °Д°╰, ಠ益ಠ╰};
void ┻━┻︵​╰(ಠ_ಠ ⚠) {exit((int)⚠);}
[he/him/his]

User avatar
Hurduser
Posts: 285
Joined: Tue Dec 04, 2007 6:14 pm UTC
Location: Esperantujo

Re: Character encodings: UTF-8 vs iso-8859-1(5)

Postby Hurduser » Mon Dec 17, 2007 7:13 pm UTC

phlip wrote:The real question: ISO-8859-1 vs Windows-1252 vs CP437...

This is no question at all at least where I live. DOS-Boxen use CP850 and instead of Iso-8859-1, we normally use Iso-8859-15 because that one has the Euro-sign.
Frag mal nach im IRC
'zum Kotzen' das heisst dort XP.
Win2k, nur so zum staunen,
hat mehr Bugs als nur zweitausend.

User avatar
3_of_8
Posts: 55
Joined: Wed Jan 02, 2008 6:01 pm UTC
Location: Dingolfing, Germany
Contact:

Re: Character encodings: UTF-8 vs iso-8859-1(5)

Postby 3_of_8 » Wed Jan 09, 2008 9:56 pm UTC

Unicode's a right pain in the butt. But it will probably prevail sooner or later.
Geek code:
GCS/M/S d-(--) s+: a--- C++(+++)>$ ULC++(+++) P+ L++(+++)>++++ !E W+++ N++ o K+ w>--- !O !M-- !V PS+(++) PE- Y+(++) PGP+++ t+ 5+++ !X R+ tv- b++ DI++ D+ G++ e->++++ h>+ r-- y--

Workaphobia
Posts: 121
Joined: Thu Jan 25, 2007 12:21 am UTC

Re: Character encodings: UTF-8 vs iso-8859-1(5)

Postby Workaphobia » Thu Jan 10, 2008 1:27 am UTC

I had to learn something about unicode for some work I did with XML and CJK languages. It's a bit of a pain, but made a lot more sense after I actually had to research the subject rather than just leaving phrases like "unicode", "utf8", and "encoding" vague and poorly defined in the back of my head.

I say, if you know enough to be able to tell the difference, then you know enough to pick whichever's best for the situation and to know what impact that decision will have on your project.
Evidently, the key to understanding recursion is to begin by understanding recursion.

The rest is easy.

User avatar
Hangar
Posts: 171
Joined: Fri Nov 23, 2007 3:41 am UTC

Re: Character encodings: UTF-8 vs iso-8859-1(5)

Postby Hangar » Thu Jan 10, 2008 8:40 pm UTC

Unicode is definitely the best multibyte encoding for text. I tend to prefer UTF-8 because libraries that claim to support UTF-16 often provide no support for surrogate pairs.


Return to “Religious Wars”

Who is online

Users browsing this forum: No registered users and 5 guests