Megalextoria: Computer Folklore » New Year's Computer Stories...

Home » Digital Archaeology » Computer Arcana » Computer Folklore » New Year's Computer Stories...

Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend

Switch to threaded view of this topic

Create a new topic

Submit Reply

Re: New Year's Computer Stories... [message #418791 is a reply to message #418786]

Wed, 18 January 2023 12:01

Charlie Gibbs is currently offline

Charlie Gibbs
Messages: 5313
Registered: January 2012

Karma: 0

Senior Member

On 2023-01-18, Bob Eager <news0009@eager.cx> wrote:

> On Tue, 17 Jan 2023 21:01:47 -0500, Andreas Kohlbach wrote:
>
>> As for sorts of people in general, I only know there are 10 sorts of
>> people. These who understand binary, and those who don't.
>
> No, there are 11. Those who understand binary, those who don't, and those
> who don't get the joke.

There are two kinds of people: those who believe that there are
two kinds of people, and those that don't.

--
/~\ Charlie Gibbs | Microsoft is a dictatorship.
\ / <cgibbs@kltpzyxm.invalid> | Apple is a cult.
X I'm really at ac.dekanfrus | Linux is anarchy.
/ \ if you read it the right way. | Pick your poison.

Report message to a moderator

Send a private message to this user

Re: New Year's Computer Stories... [message #418798 is a reply to message #418597]

Wed, 18 January 2023 21:29

Rich Alderson is currently offline

Rich Alderson
Messages: 489
Registered: August 2012

Karma: 0

Senior Member

No Message Body

Report message to a moderator

Send a private message to this user

Re: New Year's Computer Stories... [message #418799 is a reply to message #418798]

Wed, 18 January 2023 22:25

Charlie Gibbs is currently offline

Charlie Gibbs
Messages: 5313
Registered: January 2012

Karma: 0

Senior Member

On 2023-01-19, Rich Alderson <news@alderson.users.panix.com> wrote:

> Andreas Kohlbach <ank@spamfence.net> writes:
>
>> On 17 Jan 2023 19:57:05 -0500, Rich Alderson wrote:
>>
>>> You're missing an umlaut, or an "e": His new name is Oetzi, not Otzi.
>>
>> Actually Ötzi.
>
> I assume that that's umlaut-o, but I've never gotten GNUS to display Unicode
> as anything other than question marks or \ooo octals. I was told in German
> classes 50 years ago that "ae", "oe", and "ue" were acceptable alternatives,
> if somewhat outmoded.

Ditto for "ss". Meanwhile, I'm running Debian Bullseye with xfce -
everything displays just fine, even in a terminal window.

--
/~\ Charlie Gibbs | Microsoft is a dictatorship.
\ / <cgibbs@kltpzyxm.invalid> | Apple is a cult.
X I'm really at ac.dekanfrus | Linux is anarchy.
/ \ if you read it the right way. | Pick your poison.

Report message to a moderator

Send a private message to this user

Re: New Year's Computer Stories... [message #418804 is a reply to message #418597]

Thu, 19 January 2023 03:26

Ahem A Rivet's Shot is currently offline

Ahem A Rivet's Shot
Messages: 4843
Registered: January 2012

Karma: 0

Senior Member

On Thu, 19 Jan 2023 00:20:06 -0500
Andreas Kohlbach <ank@spamfence.net> wrote:

> (setq mm-coding-system-priorities '(iso-8859-1 iso-8859-15 utf-8))
> (add-to-list 'mm-body-charset-encoding-alist '(iso-8859-1 . 8bit))

Why not just use utf-8 instead of the obsolescent iso-8859
encodings ?

--
Steve O'Hara-Smith
Odds and Ends at http://www.sohara.org/

Report message to a moderator

Send a private message to this user

Re: New Year's Computer Stories... [message #418806 is a reply to message #418770]

Thu, 19 January 2023 07:59

Anonymous

Karma:

Originally posted by: Johnny Billquist

On 2023-01-17 14:06, Charles Richmond wrote:
> On 1/17/2023 6:27 AM, greymaus wrote:
>>
> [snip...] [snip...] [snip...]
>
>> Why was the circle defined as 360 degress? The answer is probably buried
>> under the sands of Sumer. Some of the US soldiers that served in 'Iraq
>> said there are millions of cuniform tablets still lying around there.
>>
>
> ISTM that anscient Sumeria used a base-20 number system, like the Mayans
> did. And since the earth is *roughly* a round ball, and the measure of
> round things totals 360 degrees...

Was this a circular proof?

"Since a circle is 360 degrees, a round thing obviously have 360
degrees." Which at all does not explain why 360.

We do obviously also have the 400 degree unit/circle, as well as the
radian, in which a circle is roughly 6.28 radians.

Johnny

Report message to a moderator

Re: New Year's Computer Stories... [message #418807 is a reply to message #418597]

Thu, 19 January 2023 08:19

Anonymous

Karma:

Originally posted by: Johnny Billquist

On 2023-01-19 06:20, Andreas Kohlbach wrote:
> On 18 Jan 2023 21:29:59 -0500, Rich Alderson wrote:
>>
>> Andreas Kohlbach <ank@spamfence.net> writes:
>>
>>> On 17 Jan 2023 19:57:05 -0500, Rich Alderson wrote:
>>
>>>> You're missing an umlaut, or an "e": His new name is Oetzi, not Otzi.
>>
>>> Actually Ötzi.
>>
>> I assume that that's umlaut-o, but I've never gotten GNUS to display Unicode as
>> anything other than question marks or \ooo octals.
>
> I always thought Gnus would handle Umlaut declaration automatically, but
> seems (from your article) I was wrong.

Nothing is ever easy with encodings...

Your post was in ISO-8859-1, Rich response didn't contain any
information at all about what encoding it should be in, however, it also
didn't change the content in any way, and reasonable heuristics would
figure out it's ISO-8859-1.

Your reply to Rich, finally, again was in ISO-8859-1.

Since Rich's post didn't have any content-type information, it seems
likely that Gnus also either isn't trying to look at any such
information and decode appropriately, or else it think it cannot display
such characters for some reason.

> You could try to add
>
> (setq mm-coding-system-priorities '(iso-8859-1 iso-8859-15 utf-8))
> (add-to-list 'mm-body-charset-encoding-alist '(iso-8859-1 . 8bit))
>
> to your ~/.gnus file and restart Gnus.

Worth trying, but I doubt it would solve it.

> As for usenet or mail, it's pretty much out of your control. If the one
> you quote uses Umlauts, your reader must declare it to produce proper
> results.

Your reader don't need to "declare" anything. It needs to understand
what the posted content is declared as, and process/display it
correspondingly.

> Btw. Charlie Gibbs' slrn handles Umlauts correctly (declares them),
> because of the Ö in "Ötzi" in the quote.

Yes, looked fine for me too (in Thunderbird). Since Rich post didn't
tell what encoding was used, it seems likely heuristics were used to
figure out it was in fact ISO-8859-1.

>> I was told in German classes 50 years ago that "ae", "oe", and "ue"
>> were acceptable alternatives, if somewhat outmoded.
>
> Probably. Being native German that was never an option. Although early
> computers had no support for other than usascii.

How early are you talking about? 50s and 60s? Because in the 70s,
ISO-646-DE existed (https://en.wikipedia.org/wiki/ISO/IEC_646).

But I also seem to recall that in German, if not able to use the umlaut
letters, the "ae", "oe" and "ue" substitutes are perfectly acceptable.
And from a collating point of view, this is how the letters are collated
as well (in German).

Of course, it's all different if you go to Swedish for example. There,
these substitutes are not acceptable, and are not how the letters are
collated. Even though they are pretty much the same characters as in
German in other ways.

Johnny

Report message to a moderator

Re: New Year's Computer Stories... [message #418808 is a reply to message #418789]

Thu, 19 January 2023 08:22

Anonymous

Karma:

Originally posted by: Johnny Billquist

On 2023-01-18 14:41, Nuno Silva wrote:
> On 2023-01-18, Kerr-Mudd, John wrote:
>
>> On 17 Jan 2023 19:57:05 -0500
>> Rich Alderson <news@alderson.users.panix.com> wrote:
>>
>>> "Kerr-Mudd, John" <admin@127.0.0.1> writes:
>>>
>>>> On Tue, 17 Jan 2023 06:53:47 -0600
>>>> Charles Richmond <codescott@aquaporin4.com> wrote:
>>>
>>>> > ISTM a few decades ago, a frozen body was found in the Alps, and the
>>>> > Swiss and the French or maybe the Italians were arguing over who owned
>>>> > the territory where that "ice man" was found, and thus had rights to the
>>>> > "ice man" body. I do *not* remember how all that turned out...
>>>
>>>> They named him Otzi; he got moved from Innsbruck to Bolzano.
>>>> https://en.wikipedia.org/wiki/%C3%96tzi#Border_dispute
>>>
>>> You're missing an umlaut, or an "e": His new name is Oetzi, not Otzi.
>>>
>>
>> Ta, but I'm still very 7-bit. the Wikilink seems to have pasted in
>> something.
>
>
> \tzi, then? :-)

Yay! Someone who still remembers ISO-646-DE. :-)
(or -SE, -FI, or some other ones...)

Johnny

Report message to a moderator

Re: New Year's Computer Stories... [message #418809 is a reply to message #418807]

Thu, 19 January 2023 09:22

Anonymous

Karma:

Originally posted by: onion

Johnny Billquist <bqt@softjar.se> wrote:

> On 2023-01-19 06:20, Andreas Kohlbach wrote:
>> On 18 Jan 2023 21:29:59 -0500, Rich Alderson wrote:
>>> Andreas Kohlbach <ank@spamfence.net> writes:
>>>> On 17 Jan 2023 19:57:05 -0500, Rich Alderson wrote:
>>>
>>>> > You're missing an umlaut, or an "e": His new name is Oetzi, not Otzi.
>>>
>>>> Actually Ötzi.
>>>
>>> I assume that that's umlaut-o, but I've never gotten GNUS to display
>>> Unicode as anything other than question marks or \ooo octals.
>>
>> I always thought Gnus would handle Umlaut declaration automatically, but
>> seems (from your article) I was wrong.
>
> Nothing is ever easy with encodings...
>
> Your post was in ISO-8859-1, Rich response didn't contain any
> information at all about what encoding it should be in, however, it also
> didn't change the content in any way, and reasonable heuristics would
> figure out it's ISO-8859-1.
>
> Your reply to Rich, finally, again was in ISO-8859-1.
>
> Since Rich's post didn't have any content-type information, it seems
> likely that Gnus also either isn't trying to look at any such
> information and decode appropriately, or else it think it cannot display
> such characters for some reason.
>
>> You could try to add
>>
>> (setq mm-coding-system-priorities '(iso-8859-1 iso-8859-15 utf-8))
>> (add-to-list 'mm-body-charset-encoding-alist '(iso-8859-1 . 8bit))
>>
>> to your ~/.gnus file and restart Gnus.
>
> Worth trying, but I doubt it would solve it.
>
>> As for usenet or mail, it's pretty much out of your control. If the one
>> you quote uses Umlauts, your reader must declare it to produce proper
>> results.
>
> Your reader don't need to "declare" anything. It needs to understand
> what the posted content is declared as, and process/display it
> correspondingly.
>
>> Btw. Charlie Gibbs' slrn handles Umlauts correctly (declares them),
>> because of the Ö in "Ötzi" in the quote.
>
> Yes, looked fine for me too (in Thunderbird). Since Rich post didn't
> tell what encoding was used, it seems likely heuristics were used to
> figure out it was in fact ISO-8859-1.
>
>>> I was told in German classes 50 years ago that "ae", "oe", and "ue"
>>> were acceptable alternatives, if somewhat outmoded.
>>
>> Probably. Being native German that was never an option. Although early
>> computers had no support for other than usascii.
>
> How early are you talking about? 50s and 60s? Because in the 70s,
> ISO-646-DE existed (https://en.wikipedia.org/wiki/ISO/IEC_646).
>
> But I also seem to recall that in German, if not able to use the umlaut
> letters, the "ae", "oe" and "ue" substitutes are perfectly acceptable.
> And from a collating point of view, this is how the letters are collated
> as well (in German).
>
> Of course, it's all different if you go to Swedish for example. There,
> these substitutes are not acceptable, and are not how the letters are
> collated. Even though they are pretty much the same characters as in
> German in other ways.
>
> Johnny
>

More generally: Apple's character viewer describes Ö
as "Latin Capital Letter O With Diaresis". That character
is, of course, used in languages other than German.

--
\|/
(((Ï))) – Mr Ön!on

When we shake the ketchup bottle
First none comes and then a lot'll.

Report message to a moderator

Re: New Year's Computer Stories... [message #418810 is a reply to message #418798]

Thu, 19 January 2023 09:23

Charles Richmond is currently offline

Charles Richmond
Messages: 2754
Registered: December 2011

Karma: 0

Senior Member

On 1/18/2023 8:29 PM, Rich Alderson wrote:
> Andreas Kohlbach <ank@spamfence.net> writes:
>
>> On 17 Jan 2023 19:57:05 -0500, Rich Alderson wrote:
>
>>> You're missing an umlaut, or an "e": His new name is Oetzi, not Otzi.
>
>> Actually Ötzi.
>
> I assume that that's umlaut-o, but I've never gotten GNUS to display Unicode as
> anything other than question marks or \ooo octals. I was told in German
> classes 50 years ago that "ae", "oe", and "ue" were acceptable alternatives, if
> somewhat outmoded.
>

ISTM UTF-8 character set supports umlauted and accented characters with
the codes between 128 to 255

--

Charles Richmond

--
This email has been checked for viruses by Avast antivirus software.
www.avast.com

Report message to a moderator

Send a private message to this user

Re: New Year's Computer Stories... [message #418813 is a reply to message #418810]

Thu, 19 January 2023 11:11

Harry Vaderchi is currently offline

Harry Vaderchi
Messages: 719
Registered: July 2012

Karma: 0

Senior Member

On Thu, 19 Jan 2023 08:23:12 -0600
Charles Richmond <codescott@aquaporin4.com> wrote:

> On 1/18/2023 8:29 PM, Rich Alderson wrote:
>> Andreas Kohlbach <ank@spamfence.net> writes:
>>
>>> On 17 Jan 2023 19:57:05 -0500, Rich Alderson wrote:
>>
>>>> You're missing an umlaut, or an "e": His new name is Oetzi, not Otzi.
>>
>>> Actually Ötzi.
>>
>> I assume that that's umlaut-o, but I've never gotten GNUS to display Unicode as
>> anything other than question marks or \ooo octals. I was told in German
>> classes 50 years ago that "ae", "oe", and "ue" were acceptable alternatives, if
>> somewhat outmoded.
>>
>
> ISTM UTF-8 character set supports umlauted and accented characters with
> the codes between 128 to 255

to the detriment of IBM Extended ASCII pretty box characters.

--
Bah, and indeed Humbug.

Report message to a moderator

Send a private message to this user

Re: New Year's Computer Stories... [message #418815 is a reply to message #418807]

Thu, 19 January 2023 12:47

Niklas Karlsson is currently offline

Niklas Karlsson
Messages: 265
Registered: January 2012

Karma: 0

Senior Member

On 2023-01-19, Johnny Billquist <bqt@softjar.se> wrote:
>
> But I also seem to recall that in German, if not able to use the umlaut
> letters, the "ae", "oe" and "ue" substitutes are perfectly acceptable.
> And from a collating point of view, this is how the letters are collated
> as well (in German).
>
> Of course, it's all different if you go to Swedish for example. There,
> these substitutes are not acceptable, and are not how the letters are
> collated. Even though they are pretty much the same characters as in
> German in other ways.

I suppose those substitutes are not formally acceptable, but I've seen
them used plenty in Swedish in a 7-bit environment.

Niklas
--
Keeping UUCP running is starting to seem a lot like keeping a 130-year-old
man who smokes 4 packs a day on life support because he's the last person
on Earth who knows how to do the cha-cha, but he won't tell anyone.
-- Ryan Tucker

Report message to a moderator

Send a private message to this user

Re: New Year's Computer Stories... [message #418816 is a reply to message #418809]

Thu, 19 January 2023 12:55

Anonymous

Karma:

Originally posted by: Johnny Billquist

On 2023-01-19 15:22, Mr Ön!on wrote:
> Johnny Billquist <bqt@softjar.se> wrote:

>> Of course, it's all different if you go to Swedish for example. There,
>> these substitutes are not acceptable, and are not how the letters are
>> collated. Even though they are pretty much the same characters as in
>> German in other ways.
>>
>> Johnny
>>
>
> More generally: Apple's character viewer describes Ö
> as "Latin Capital Letter O With Diaresis". That character
> is, of course, used in languages other than German.

Which is the Unicode name for it. Which in a way is unfortunate, since
we're then basically trying to describe the visual representation, and
not the letter.

This is one of my main gripes with Unicode. It conflates and mixes
semantics with visual representation. Sometimes you have the same
character used, which have different semantics depending on language and
context, and sometimes you have different characters just because of the
different semantics. And sometimes Unicode goes to silly extremes in
differentiating on visual differences that really don't make sense, such
as Green Book (U+1F4D7) as opposed to Blue Book (U+1F4D8). Why would
suddenly the color be a part of the Unicode character, and what is then
expected to happen if you have this is a text that is in red?

m might mean the latin letter lowercase 'm', but might also mean
"meter", which is a unit. But it's the same Unicode codepoint that is
supposed to be used.

On the other hand, Å is the letter "capital A with ring" (U+00C5), which
is a specific letter in Swedish. Then we have the unit "Ångström" (which
be written Angstrom by non-Swedes), which is a unit that is 10^-10
meters. But this one has it's own code point (U+212B). The unit was
named after the Swedish scientist Ångström
(https://en.wikipedia.org/wiki/Angstrom).

Why cannot Unicode even be consistent with itself on anything? It's just
the largest mess ever created. But by now we're stuck with it, and we're
going to suffer in eternity.

I don't even know how many different "a" there are in Unicode, but it's
excellent for anyone who wants to fool people into clicking on links
that leads to scams for example... It's almost impossible to properly
handle all the different strings that might appear the same, and which a
human will read as the same, but a computer might not.

Johnny

Report message to a moderator

Re: New Year's Computer Stories... [message #418817 is a reply to message #418810]

Thu, 19 January 2023 13:01

Anonymous

Karma:

Originally posted by: Johnny Billquist

On 2023-01-19 15:23, Charles Richmond wrote:
> On 1/18/2023 8:29 PM, Rich Alderson wrote:
>> Andreas Kohlbach <ank@spamfence.net> writes:
>>
>>> On 17 Jan 2023 19:57:05 -0500, Rich Alderson wrote:
>>
>>>> You're missing an umlaut, or an "e": His new name is Oetzi, not Otzi.
>>
>>> Actually Ötzi.
>>
>> I assume that that's umlaut-o, but I've never gotten GNUS to display
>> Unicode as
>> anything other than question marks or \ooo octals. I was told in German
>> classes 50 years ago that "ae", "oe", and "ue" were acceptable
>> alternatives, if
>> somewhat outmoded.
>>
>
> ISTM UTF-8 character set supports umlauted and accented characters with
> the codes between 128 to 255

Technically, UFT-8 is not a character set, it's an encoding for the
Unicode character set, using a variable number of 8-bit bytes.

Unicode codepoints 0 to 255 are equal to Latin-1 (however, when using
UTF-8 only 0 to 127 are "compatible", which means just 7-bit ASCII). And
yes, there you have a bunch of characters with accents, umlauts, as well
as some other fairly common characters from western Europe at 160 to
255. (128 to 159 are non-printable characters.)

Johnny

Report message to a moderator

Re: New Year's Computer Stories... [message #418819 is a reply to message #418597]

Thu, 19 January 2023 16:50

Anonymous

Karma:

Originally posted by: snipeco.2

Andreas Kohlbach <ank@spamfence.net> wrote:

> On Thu, 19 Jan 2023 08:26:26 +0000, Ahem A Rivet's Shot wrote:
>>
>> On Thu, 19 Jan 2023 00:20:06 -0500
>> Andreas Kohlbach <ank@spamfence.net> wrote:
>>
>>> (setq mm-coding-system-priorities '(iso-8859-1 iso-8859-15 utf-8))
>>> (add-to-list 'mm-body-charset-encoding-alist '(iso-8859-1 . 8bit))
>
> Somebody canceling my article? It's gone here for some reason.
>

E-S has a server issue; it will be down tomorrow from 06:00 to 09:00 UTC
to resync databases. See Ray's recent article in e-s.support.

[...]

--
^Ï^. Sn!pe – My pet rock Gordon just is.

No plan survives contact with the enemy.
~ Slava Ukraini ~

Report message to a moderator

Re: New Year's Computer Stories... [message #418820 is a reply to message #418597]

Thu, 19 January 2023 19:40

Ahem A Rivet's Shot is currently offline

Ahem A Rivet's Shot
Messages: 4843
Registered: January 2012

Karma: 0

Senior Member

On Thu, 19 Jan 2023 16:01:12 -0500
Andreas Kohlbach <ank@spamfence.net> wrote:

> No heuristics, AFAIK. Gnus (or any other working newsreader) simply
> checks if the text (no matter who wrote it, or if it's just a quote)
> contains characters out of usascii, then applies the correct declaration.

But if there is a byte with the top bit set the encoding could be
almost any ISO-8859 variant (most have gaps - all 256 values are valid in
ISO8859-1 so *any* sequence of bytes is valid ISO8859-1) or any Windows 8
bit encoding and possibly any of several other things. At a PPOE I got to
the bottom of mojibake appearing on the page by finding that the feed
declared as ISO-8869-1 was in fact WIN-1252 - the encodings are almost but
not quite identical. That discovery culminated in some fancy heuristics that
identified WIN-* encodings mislabelled as ISO-8859 encodings.

--
Steve O'Hara-Smith
Odds and Ends at http://www.sohara.org/

Report message to a moderator

Send a private message to this user

Re: New Year's Computer Stories... [message #418821 is a reply to message #418820]

Thu, 19 January 2023 22:40

Anonymous

Karma:

Originally posted by: drb

> all 256 values are valid in ISO8859-1

Er, no. Nothing defined in the range 0x80 - 0xA0 inclusive, in -1.

De

Report message to a moderator

Re: New Year's Computer Stories... [message #418824 is a reply to message #418815]

Fri, 20 January 2023 03:58

Anonymous

Karma:

Originally posted by: Johnny Billquist

On 2023-01-19 18:47, Niklas Karlsson wrote:
> On 2023-01-19, Johnny Billquist <bqt@softjar.se> wrote:
>>
>> But I also seem to recall that in German, if not able to use the umlaut
>> letters, the "ae", "oe" and "ue" substitutes are perfectly acceptable.
>> And from a collating point of view, this is how the letters are collated
>> as well (in German).
>>
>> Of course, it's all different if you go to Swedish for example. There,
>> these substitutes are not acceptable, and are not how the letters are
>> collated. Even though they are pretty much the same characters as in
>> German in other ways.
>
> I suppose those substitutes are not formally acceptable, but I've seen
> them used plenty in Swedish in a 7-bit environment.

Oh, sure. For convenience, and as a last way out, they are most
certainly done. But formally, they are not considered equivalent in Swedish.

Johnny

Report message to a moderator

Re: New Year's Computer Stories... [message #418825 is a reply to message #418821]

Fri, 20 January 2023 04:02

Anonymous

Karma:

Originally posted by: Johnny Billquist

On 2023-01-20 04:40, Dennis Boone wrote:
>> all 256 values are valid in ISO8859-1
>
> Er, no. Nothing defined in the range 0x80 - 0xA0 inclusive, in -1.

Yes, there is. Almost all the characters in that range is defined in
ISO-8859, however they are non-printable characters.
But they all have a meaning. And that range have the same meanings in
all the variants, so you don't even have to specify which one you're
talking about.

Same in Unicode. CSI, for example, is 0x9B.

Johnny

Report message to a moderator

Re: New Year's Computer Stories... [message #418827 is a reply to message #418597]

Fri, 20 January 2023 04:52

Anonymous

Karma:

Originally posted by: Johnny Billquist

On 2023-01-19 22:01, Andreas Kohlbach wrote:
> On Thu, 19 Jan 2023 14:19:12 +0100, Johnny Billquist wrote:
>>
>> On 2023-01-19 06:20, Andreas Kohlbach wrote:
>>> On 18 Jan 2023 21:29:59 -0500, Rich Alderson wrote:
>>>>
>>>> Andreas Kohlbach <ank@spamfence.net> writes:
>>>>
>>>> > On 17 Jan 2023 19:57:05 -0500, Rich Alderson wrote:
>>>>
>>>> >> You're missing an umlaut, or an "e": His new name is Oetzi, not Otzi.
>>>>
>>>> > Actually Ötzi.
>>>>
>>>> I assume that that's umlaut-o, but I've never gotten GNUS to display Unicode as
>>>> anything other than question marks or \ooo octals.
>>> I always thought Gnus would handle Umlaut declaration automatically,
>>> but
>>> seems (from your article) I was wrong.
>>
>> Nothing is ever easy with encodings...
>>
>> Your post was in ISO-8859-1, Rich response didn't contain any
>> information at all about what encoding it should be in, however, it
>> also didn't change the content in any way, and reasonable heuristics
>> would figure out it's ISO-8859-1.
>
> It doesn't matter what his writing has. It only matters if only a single
> character appears. In this case Rich quoted the "Ö" I wrote. Thus it has
> to be at least ISO-8859-1.

If the headers in Rich's post had said it was Unicode encoded as UTF-8,
that "Ö" would have suddenly been shown as something completely
different, even though the post would contain the exact same bits.

There is nothing that absolutely says it has to be 8859-1. However, in
the case where no encoding information was provided, what is the reader
supposed to do? All it can do here is guess.

Not to mention that even within 8859, you have a lot of other variants
as well. Why even pick -1? It's all heuristics.

>> Your reply to Rich, finally, again was in ISO-8859-1.
>
> Which is correct.

Yes. Your headers said so. So even if you had been using Unicode and
UTF-8, a reader would have parsed it as 8859-1, but if that had been the
case, we would have seen a bunch of garbage instead. :-)

>> Since Rich's post didn't have any content-type information, it seems
>> likely that Gnus also either isn't trying to look at any such
>> information and decode appropriately, or else it think it cannot
>> display such characters for some reason.
>
> No heuristics, AFAIK. Gnus (or any other working newsreader) simply
> checks if the text (no matter who wrote it, or if it's just a quote)
> contains characters out of usascii, then applies the correct declaration.

Uh? No. The posts usually should contain a header declaring what
character set and encoding is used. When none exists, and the reader
spots characters with the high bit set, the reader will have to guess
what encoding was used for the post. In Rich's case, I guess it just
didn't try anything at all, but that depends on the reader.
But if you do have this information in the headers, the reader *knows*
how to interpret it all.

Charlie Gibbs, for example, replied to Rich, quoting Rich and all. But
Charlie's post was in Unicode using UTF-8. The "Ö" still rendered
correctly, at least for me, even though his quoting of Rich actually
modified what Rich had written, if you look at the actual bits.

So his reader correctly guessed that it was 8859-1, but converted it to
Unicode and UTF-8 for Charlie's own reply.

By the way, your post might actually not be in 8859-1, but in 8859-15.
8859-1 got abused/appropriated by Windows-1252, making life even more
miserable.

>>> As for usenet or mail, it's pretty much out of your control. If the one
>>> you quote uses Umlauts, your reader must declare it to produce proper
>>> results.
>>
>> Your reader don't need to "declare" anything. It needs to understand
>> what the posted content is declared as, and process/display it
>> correspondingly.
>
> That is not how mail and news was designed. All was designed around
> usascii at first. Once the rest of the world caught up to computing and
> used characters not fitting into the 7-bit scheme, other character sets
> came to life, telling clients what to deal with.

Yes. And the problem is how to interpret when something outside of
usascii appears. It's not just about getting the bits across, but also
how to interpret them.
Thus the "content-type" header, which in your case said:

Content-Type: text/plain; charset=iso-8859-1

And for Charlie Gibbs:

Content-Type: text/plain; charset=UTF-8

> A reader cannot just guess (well some try), because in different charsets
> a certain position contains different characters.

If you don't tell the reader how to interpret the bits, the reader
*have* to guess. Or how do you think it should render sequence 042 326
042 (all in octal)?

If we use 8859-1 it looks like this:

Gromit:/Users/bqt> printf "\042\326\042\n"
"Ö"

If we have UTF-8 and Unicode:

Gromit:/Users/bqt> printf "\042\326\042\n"
"?"

Which is because this sequence is not valid as UTF-8.

Now, let's take some valid UTF-8, then...

Gromit:/Users/bqt> printf "\042\303\266\042\n"
"ö"

if we do that in 8859-1:

Gromit:/Users/bqt> printf "\042\303\266\042\n"
"Ã¶"

Which is a perfectly legal 8859-1 string. So how would you know which
way to interpret this four byte sequence if you don't have any header
telling you? Basically, you can't.

Fortunately, the 8859-1 "Ö" string is not valid as UTF-8, so heuristics
can guess that it's most likely 8859-something, and a fair guess is that
it's -1 or -15 at least.

>>>> I was told in German classes 50 years ago that "ae", "oe", and "ue"
>>>> were acceptable alternatives, if somewhat outmoded.
>>> Probably. Being native German that was never an option. Although
>>> early
>>> computers had no support for other than usascii.
>>
>> How early are you talking about? 50s and 60s? Because in the 70s,
>> ISO-646-DE existed (https://en.wikipedia.org/wiki/ISO/IEC_646).
>
> I refer to the first time I came in contact with printers for home
> computers in the early 80s. They only printed characters in the 7-bit
> range. So in Germany we had to use "oe" instead of "ö" for example.

I bet you that printer had an option for doing ISO-646-DE. But I can
believe that not everyone figured this out.

In 646-DE, all the "special" characters are in the 7-bit range.

>> But I also seem to recall that in German, if not able to use the
>> umlaut letters, the "ae", "oe" and "ue" substitutes are perfectly
>> acceptable. And from a collating point of view, this is how the
>> letters are collated as well (in German).
>
> OK for me.
>
> But still, if a text is not 7-bit clean (even if a "Ä" shows up in a
> quote for example), they need to be declared in the header.

Yes. And that needs to be done by the sender. The reader, if no such
header exists, will need to make a guess.

Johnny

Report message to a moderator

Re: New Year's Computer Stories... [message #418828 is a reply to message #418820]

Fri, 20 January 2023 09:30

Peter Flass is currently offline

Peter Flass
Messages: 8375
Registered: December 2011

Karma: 0

Senior Member

Ahem A Rivet's Shot <steveo@eircom.net> wrote:
> On Thu, 19 Jan 2023 16:01:12 -0500
> Andreas Kohlbach <ank@spamfence.net> wrote:
>
>> No heuristics, AFAIK. Gnus (or any other working newsreader) simply
>> checks if the text (no matter who wrote it, or if it's just a quote)
>> contains characters out of usascii, then applies the correct declaration.
>
> But if there is a byte with the top bit set the encoding could be
> almost any ISO-8859 variant (most have gaps - all 256 values are valid in
> ISO8859-1 so *any* sequence of bytes is valid ISO8859-1) or any Windows 8
> bit encoding and possibly any of several other things. At a PPOE I got to
> the bottom of mojibake appearing on the page by finding that the feed
> declared as ISO-8869-1 was in fact WIN-1252 - the encodings are almost but
> not quite identical. That discovery culminated in some fancy heuristics that
> identified WIN-* encodings mislabelled as ISO-8859 encodings.
>

That’s why the posting client is supposed to provide “charset=“ in the
header.

--
Pete

Report message to a moderator

Send a private message to this user

Re: New Year's Computer Stories... [message #418829 is a reply to message #418827]

Fri, 20 January 2023 09:30

Peter Flass is currently offline

Peter Flass
Messages: 8375
Registered: December 2011

Karma: 0

Senior Member

Johnny Billquist <bqt@softjar.se> wrote:
> On 2023-01-19 22:01, Andreas Kohlbach wrote:
>> On Thu, 19 Jan 2023 14:19:12 +0100, Johnny Billquist wrote:
>>>
>>> On 2023-01-19 06:20, Andreas Kohlbach wrote:
>>>> On 18 Jan 2023 21:29:59 -0500, Rich Alderson wrote:
>>>> >
>>>> > Andreas Kohlbach <ank@spamfence.net> writes:
>>>> >
>>>> >> On 17 Jan 2023 19:57:05 -0500, Rich Alderson wrote:
>>>> >
>>>> >>> You're missing an umlaut, or an "e": His new name is Oetzi, not Otzi.
>>>> >
>>>> >> Actually Ötzi.
>>>> >
>>>> > I assume that that's umlaut-o, but I've never gotten GNUS to display Unicode as
>>>> > anything other than question marks or \ooo octals.
>>>> I always thought Gnus would handle Umlaut declaration automatically,
>>>> but
>>>> seems (from your article) I was wrong.
>>>
>>> Nothing is ever easy with encodings...
>>>
>>> Your post was in ISO-8859-1, Rich response didn't contain any
>>> information at all about what encoding it should be in, however, it
>>> also didn't change the content in any way, and reasonable heuristics
>>> would figure out it's ISO-8859-1.
>>
>> It doesn't matter what his writing has. It only matters if only a single
>> character appears. In this case Rich quoted the "Ö" I wrote. Thus it has
>> to be at least ISO-8859-1.
>
> If the headers in Rich's post had said it was Unicode encoded as UTF-8,
> that "Ö" would have suddenly been shown as something completely
> different, even though the post would contain the exact same bits.
>
> There is nothing that absolutely says it has to be 8859-1. However, in
> the case where no encoding information was provided, what is the reader
> supposed to do? All it can do here is guess.
>
> Not to mention that even within 8859, you have a lot of other variants
> as well. Why even pick -1? It's all heuristics.
>
>>> Your reply to Rich, finally, again was in ISO-8859-1.
>>
>> Which is correct.
>
> Yes. Your headers said so. So even if you had been using Unicode and
> UTF-8, a reader would have parsed it as 8859-1, but if that had been the
> case, we would have seen a bunch of garbage instead. :-)
>
>>> Since Rich's post didn't have any content-type information, it seems
>>> likely that Gnus also either isn't trying to look at any such
>>> information and decode appropriately, or else it think it cannot
>>> display such characters for some reason.
>>
>> No heuristics, AFAIK. Gnus (or any other working newsreader) simply
>> checks if the text (no matter who wrote it, or if it's just a quote)
>> contains characters out of usascii, then applies the correct declaration.
>
> Uh? No. The posts usually should contain a header declaring what
> character set and encoding is used. When none exists, and the reader
> spots characters with the high bit set, the reader will have to guess
> what encoding was used for the post. In Rich's case, I guess it just
> didn't try anything at all, but that depends on the reader.
> But if you do have this information in the headers, the reader *knows*
> how to interpret it all.
>
> Charlie Gibbs, for example, replied to Rich, quoting Rich and all. But
> Charlie's post was in Unicode using UTF-8. The "Ö" still rendered
> correctly, at least for me, even though his quoting of Rich actually
> modified what Rich had written, if you look at the actual bits.
>
> So his reader correctly guessed that it was 8859-1, but converted it to
> Unicode and UTF-8 for Charlie's own reply.
>
> By the way, your post might actually not be in 8859-1, but in 8859-15.
> 8859-1 got abused/appropriated by Windows-1252, making life even more
> miserable.
>
>>>> As for usenet or mail, it's pretty much out of your control. If the one
>>>> you quote uses Umlauts, your reader must declare it to produce proper
>>>> results.
>>>
>>> Your reader don't need to "declare" anything. It needs to understand
>>> what the posted content is declared as, and process/display it
>>> correspondingly.
>>
>> That is not how mail and news was designed. All was designed around
>> usascii at first. Once the rest of the world caught up to computing and
>> used characters not fitting into the 7-bit scheme, other character sets
>> came to life, telling clients what to deal with.
>
> Yes. And the problem is how to interpret when something outside of
> usascii appears. It's not just about getting the bits across, but also
> how to interpret them.
> Thus the "content-type" header, which in your case said:
>
> Content-Type: text/plain; charset=iso-8859-1
>
> And for Charlie Gibbs:
>
> Content-Type: text/plain; charset=UTF-8
>
>> A reader cannot just guess (well some try), because in different charsets
>> a certain position contains different characters.
>
> If you don't tell the reader how to interpret the bits, the reader
> *have* to guess. Or how do you think it should render sequence 042 326
> 042 (all in octal)?
>
> If we use 8859-1 it looks like this:
>
> Gromit:/Users/bqt> printf "\042\326\042\n"
> "Ö"
>
> If we have UTF-8 and Unicode:
>
> Gromit:/Users/bqt> printf "\042\326\042\n"
> "?"
>
> Which is because this sequence is not valid as UTF-8.
>
> Now, let's take some valid UTF-8, then...
>
> Gromit:/Users/bqt> printf "\042\303\266\042\n"
> "ö"
>
> if we do that in 8859-1:
>
> Gromit:/Users/bqt> printf "\042\303\266\042\n"
> "Ã¶"
>
> Which is a perfectly legal 8859-1 string. So how would you know which
> way to interpret this four byte sequence if you don't have any header
> telling you? Basically, you can't.
>
> Fortunately, the 8859-1 "Ö" string is not valid as UTF-8, so heuristics
> can guess that it's most likely 8859-something, and a fair guess is that
> it's -1 or -15 at least.
>
>>>> > I was told in German classes 50 years ago that "ae", "oe", and "ue"
>>>> > were acceptable alternatives, if somewhat outmoded.
>>>> Probably. Being native German that was never an option. Although
>>>> early
>>>> computers had no support for other than usascii.
>>>
>>> How early are you talking about? 50s and 60s? Because in the 70s,
>>> ISO-646-DE existed (https://en.wikipedia.org/wiki/ISO/IEC_646).
>>
>> I refer to the first time I came in contact with printers for home
>> computers in the early 80s. They only printed characters in the 7-bit
>> range. So in Germany we had to use "oe" instead of "ö" for example.
>
> I bet you that printer had an option for doing ISO-646-DE. But I can
> believe that not everyone figured this out.
>
> In 646-DE, all the "special" characters are in the 7-bit range.
>
>>> But I also seem to recall that in German, if not able to use the
>>> umlaut letters, the "ae", "oe" and "ue" substitutes are perfectly
>>> acceptable. And from a collating point of view, this is how the
>>> letters are collated as well (in German).
>>
>> OK for me.
>>
>> But still, if a text is not 7-bit clean (even if a "Ä" shows up in a
>> quote for example), they need to be declared in the header.
>
> Yes. And that needs to be done by the sender. The reader, if no such
> header exists, will need to make a guess.
>
> Johnny
>

People just need to stop using cruddy software.

--
Pete

Report message to a moderator

Send a private message to this user

Re: New Year's Computer Stories... [message #418830 is a reply to message #418829]

Fri, 20 January 2023 10:32

scott is currently offline

scott
Messages: 4237
Registered: February 2012

Karma: 0

Senior Member

Peter Flass <peter_flass@yahoo.com> writes:
> Johnny Billquist <bqt@softjar.se> wrote:

>>
>> Yes. And that needs to be done by the sender. The reader, if no such
>> header exists, will need to make a guess.
>>
>> Johnny
>>
>
> People just need to stop using cruddy software.

s/cruddy/obsolete/

My newsreader dates back to the late 1980s (xrn), assumes a
single byte per character and indexes that byte directly into
the specified font (which in my case are ISO-8859-1 X11 fonts).

Updating would require switching it from athena widgets to
GTK3 and the programming rats nest called cairo and the
GTK font subsystem.

I really like the minimal and very efficient (albeit plain)
graphical UI that xrn provides, and generally don't miss
UTF-8 support. Except for smart quotes, which is the most
stupid invention of all time. The ASCII double-quote character
should be sufficient - no need to have "matching" quote marks
that require two bytes each.

Report message to a moderator

Send a private message to this user

Re: New Year's Computer Stories... [message #418831 is a reply to message #418830]

Fri, 20 January 2023 11:45

Anonymous

Karma:

Originally posted by: Thomas Koenig

Scott Lurndal <scott@slp53.sl.home> schrieb:

> I really like the minimal and very efficient (albeit plain)
> graphical UI that xrn provides, and generally don't miss
> UTF-8 support. Except for smart quotes, which is the most
> stupid invention of all time. The ASCII double-quote character
> should be sufficient - no need to have "matching" quote marks
> that require two bytes each.

I use slrn (see header), which is a text-based news reader
with MIME support. Works fine for me, and has few frills.

Report message to a moderator

Re: New Year's Computer Stories... [message #418832 is a reply to message #418830]

Fri, 20 January 2023 12:11

Anonymous

Karma:

Originally posted by: Johnny Billquist

On 2023-01-20 16:32, Scott Lurndal wrote:
> Peter Flass <peter_flass@yahoo.com> writes:
>> Johnny Billquist <bqt@softjar.se> wrote:
>
>>>
>>> Yes. And that needs to be done by the sender. The reader, if no such
>>> header exists, will need to make a guess.
>>>
>>> Johnny
>>>
>>
>> People just need to stop using cruddy software.
>
> s/cruddy/obsolete/
>
> My newsreader dates back to the late 1980s (xrn), assumes a
> single byte per character and indexes that byte directly into
> the specified font (which in my case are ISO-8859-1 X11 fonts).

Obsolete might be the proper term here, but it boils down to it not
handling the content-type header.
If it did, things would work fine in a sense.
Either xrn will (should) translate the content in the post to whatever
is used for display, or else substitute whatever can be shown.

Assuming 8859-1 is not right if the post actually were saying that the
content is of the type UTF-8 (obviously).

But it don't mean it can't be handled, even if your environment can't
show Unicode.

> Updating would require switching it from athena widgets to
> GTK3 and the programming rats nest called cairo and the
> GTK font subsystem.

I don't believe that anything like that is required. But I haven't
looked at this in a very long time...

> I really like the minimal and very efficient (albeit plain)
> graphical UI that xrn provides, and generally don't miss
> UTF-8 support. Except for smart quotes, which is the most
> stupid invention of all time. The ASCII double-quote character
> should be sufficient - no need to have "matching" quote marks
> that require two bytes each.

Just as I like using xterm. And even though xterm is ridiculously
minimal, it does support UTF-8.

Johnny

Report message to a moderator

Re: New Year's Computer Stories... [message #418833 is a reply to message #418832]

Fri, 20 January 2023 12:41

scott is currently offline

scott
Messages: 4237
Registered: February 2012

Karma: 0

Senior Member

Johnny Billquist <bqt@softjar.se> writes:
> On 2023-01-20 16:32, Scott Lurndal wrote:
>> Peter Flass <peter_flass@yahoo.com> writes:
>>> Johnny Billquist <bqt@softjar.se> wrote:
>>
>>>>
>>>> Yes. And that needs to be done by the sender. The reader, if no such
>>>> header exists, will need to make a guess.
>>>>
>>>> Johnny
>>>>
>>>
>>> People just need to stop using cruddy software.
>>
>> s/cruddy/obsolete/
>>
>> My newsreader dates back to the late 1980s (xrn), assumes a
>> single byte per character and indexes that byte directly into
>> the specified font (which in my case are ISO-8859-1 X11 fonts).
>
> Obsolete might be the proper term here, but it boils down to it not
> handling the content-type header.
> If it did, things would work fine in a sense.
> Either xrn will (should) translate the content in the post to whatever
> is used for display, or else substitute whatever can be shown.

xrn simply uses the original X11 font subsystem and the legacy fonts.

>
> Assuming 8859-1 is not right if the post actually were saying that the
> content is of the type UTF-8 (obviously).
>
> But it don't mean it can't be handled, even if your environment can't
> show Unicode.

I've no ambition to rewrite it to handle modern fonts.

>
>> Updating would require switching it from athena widgets to
>> GTK3 and the programming rats nest called cairo and the
>> GTK font subsystem.
>
> I don't believe that anything like that is required. But I haven't
> looked at this in a very long time...
>
>> I really like the minimal and very efficient (albeit plain)
>> graphical UI that xrn provides, and generally don't miss
>> UTF-8 support. Except for smart quotes, which is the most
>> stupid invention of all time. The ASCII double-quote character
>> should be sufficient - no need to have "matching" quote marks
>> that require two bytes each.
>
> Just as I like using xterm. And even though xterm is ridiculously
> minimal, it does support UTF-8.

Some versions of xterm do support UTF-8, that's true. They've
been updated more often than JIK's xrn.

Report message to a moderator

Send a private message to this user

Re: New Year's Computer Stories... [message #418837 is a reply to message #418827]

Fri, 20 January 2023 19:41

Andreas Kohlbach is currently offline

Andreas Kohlbach
Messages: 1456
Registered: December 2011

Karma: 0

Senior Member

On Fri, 20 Jan 2023 10:52:24 +0100, Johnny Billquist wrote:
>
> On 2023-01-19 22:01, Andreas Kohlbach wrote:
>> On Thu, 19 Jan 2023 14:19:12 +0100, Johnny Billquist wrote:
>>>
>>> Nothing is ever easy with encodings...
>>>
>>> Your post was in ISO-8859-1, Rich response didn't contain any
>>> information at all about what encoding it should be in, however, it
>>> also didn't change the content in any way, and reasonable heuristics
>>> would figure out it's ISO-8859-1.
>> It doesn't matter what his writing has. It only matters if only a
>> single
>> character appears. In this case Rich quoted the "Ö" I wrote. Thus it has
>> to be at least ISO-8859-1.
>
> If the headers in Rich's post had said it was Unicode encoded as
> UTF-8, that "Ö" would have suddenly been shown as something completely
> different, even though the post would contain the exact same bits.

Then the declaration was wrong/broken.

> There is nothing that absolutely says it has to be 8859-1. However, in
> the case where no encoding information was provided, what is the
> reader supposed to do? All it can do here is guess.

Or just ignore. Gnus here usually asks me what to do with the then
non printable characters. I usually choose to replace all with ".".

> Not to mention that even within 8859, you have a lot of other variants
> as well. Why even pick -1? It's all heuristics.

Is "looking at text characters" and "heuristics" the same?

Yes, 8859 has more "variants". Like -15, because I mention a € here.

I notice your article was (correctly) UTF-8. So I wonder how mine will be
declared. UTF-8 or ISO-8859-15...

>>> Your reply to Rich, finally, again was in ISO-8859-1.
>> Which is correct.
>
> Yes. Your headers said so. So even if you had been using Unicode and
> UTF-8, a reader would have parsed it as 8859-1, but if that had been
> the case, we would have seen a bunch of garbage instead. :-)

Of course a reader has to "mangle" the text itself according to the
declaration. There'll be garbage if a reader declares UTF-8, but the text
in the body actually is Latin1 (not "re-coding" it).

>>> Since Rich's post didn't have any content-type information, it seems
>>> likely that Gnus also either isn't trying to look at any such
>>> information and decode appropriately, or else it think it cannot
>>> display such characters for some reason.
>> No heuristics, AFAIK. Gnus (or any other working newsreader) simply
>> checks if the text (no matter who wrote it, or if it's just a quote)
>> contains characters out of usascii, then applies the correct declaration.
>
> Uh? No. The posts usually should contain a header declaring what
> character set and encoding is used. When none exists, and the reader
> spots characters with the high bit set, the reader will have to guess
> what encoding was used for the post. In Rich's case, I guess it just
> didn't try anything at all, but that depends on the reader.
> But if you do have this information in the headers, the reader *knows*
> how to interpret it all.

Will see in this posting (I'll send a follow up, once I posted it). If
you're correct it should declare UTF-8, because you did so in your
article a followup here. Otherwise it should be ISO-8859-15. Damn, this
is exciting! :-)

[...]

> By the way, your post might actually not be in 8859-1, but in
> 8859-15. 8859-1 got abused/appropriated by Windows-1252, making life
> even more miserable.

Yeah, the post produced garbage. One character was a €, by chance.

[...]

>>> How early are you talking about? 50s and 60s? Because in the 70s,
>>> ISO-646-DE existed (https://en.wikipedia.org/wiki/ISO/IEC_646).
>> I refer to the first time I came in contact with printers for home
>> computers in the early 80s. They only printed characters in the 7-bit
>> range. So in Germany we had to use "oe" instead of "ö" for example.
>
> I bet you that printer had an option for doing ISO-646-DE. But I can
> believe that not everyone figured this out.
>
> In 646-DE, all the "special" characters are in the 7-bit range.

Was an Commodore MPS 801 (or 803), a rebadged Seikosha GP 500). I cannot
find it could print characters outside the ascii range, unless when
printing in graphics mode (but then it can print anything you throw at it
anyway).

May be you could also alter the fonts in the fontset, no idea.

>>> But I also seem to recall that in German, if not able to use the
>>> umlaut letters, the "ae", "oe" and "ue" substitutes are perfectly
>>> acceptable. And from a collating point of view, this is how the
>>> letters are collated as well (in German).
>> OK for me.
>> But still, if a text is not 7-bit clean (even if a "Ä" shows up in a
>> quote for example), they need to be declared in the header.
>
> Yes. And that needs to be done by the sender. The reader, if no such
> header exists, will need to make a guess.

It can guess, and many readers do so. But there is AFAIK no RFC forcing a
reader to guess.
--
Andreas

Report message to a moderator

Send a private message to this user

Re: New Year's Computer Stories... [message #418841 is a reply to message #418837]

Sat, 21 January 2023 03:48

Anonymous

Karma:

Originally posted by: greymaus

On 2023-01-21, Andreas Kohlbach <ank@spamfence.net> wrote:
> On Fri, 20 Jan 2023 10:52:24 +0100, Johnny Billquist wrote:
>>
>>>> letters are collated as well (in German).
>>> OK for me.
>>> But still, if a text is not 7-bit clean (even if a "Ä" shows up in a
>>> quote for example), they need to be declared in the header.
>>
>> Yes. And that needs to be done by the sender. The reader, if no such
>> header exists, will need to make a guess.
>
> It can guess, and many readers do so. But there is AFAIK no RFC forcing a
> reader to guess.

I use slrn, and have had never any problems with it. For an instance, in
the above, the Euro sign is correctly shown. Some newsreaders go in for
`pretty' (Windoish) pictures. From what I know from doing a bit of
programing, it is far easier to write stuff that works in terminals
rather than dealing with stuff than needs graphics.
o

--
greymausg@mail.com

Fe, Fi, Fo, Fum, I smell the stench of an Influencer.
Where is our money gone, Dude?

Report message to a moderator

Re: New Year's Computer Stories... [message #418842 is a reply to message #418597]

Sat, 21 January 2023 04:32

Harry Vaderchi is currently offline

Harry Vaderchi
Messages: 719
Registered: July 2012

Karma: 0

Senior Member

On Fri, 20 Jan 2023 19:43:22 -0500
Andreas Kohlbach <ank@spamfence.net> wrote:

> On Fri, 20 Jan 2023 19:41:48 -0500, Andreas Kohlbach wrote:
>>
>> On Fri, 20 Jan 2023 10:52:24 +0100, Johnny Billquist wrote:
>>>
>>>> No heuristics, AFAIK. Gnus (or any other working newsreader) simply
>>>> checks if the text (no matter who wrote it, or if it's just a quote)
>>>> contains characters out of usascii, then applies the correct declaration.
>>>
>>> Uh? No. The posts usually should contain a header declaring what
>>> character set and encoding is used. When none exists, and the reader
>>> spots characters with the high bit set, the reader will have to guess
>>> what encoding was used for the post. In Rich's case, I guess it just
>>> didn't try anything at all, but that depends on the reader.
>>> But if you do have this information in the headers, the reader *knows*
>>> how to interpret it all.
>>
>> Will see in this posting (I'll send a follow up, once I posted it). If
>> you're correct it should declare UTF-8, because you did so in your
>> article a followup here. Otherwise it should be ISO-8859-15. Damn, this
>> is exciting! :-)
>
> Gnus did ISO-8859-15, not UTF-8. I win! ;-)

Yay!

And I hope it eschews 'smartquotes' too.
(Luckily my NR doesn't generate them)

But here's some from wikipedia:

“ ‘Hello,’ he said, ‘to you’ ”

Bah, looks to me as if it's been 'translated' by pasting!

sending as UTF-8

--
Bah, and indeed Humbug.

Report message to a moderator

Send a private message to this user

Re: New Year's Computer Stories... [message #418845 is a reply to message #418842]

Sat, 21 January 2023 13:03

Anonymous

Karma:

Originally posted by: drb

> “ ‘Hello,’ he said, ‘to you’ ”

It came through with fancy quotes, and it set the charset in the
header to utf8.

We'll see how smart mine is here.

De

Report message to a moderator

Re: New Year's Computer Stories... [message #418846 is a reply to message #418845]

Sat, 21 January 2023 16:58

Rich Alderson is currently offline

Rich Alderson
Messages: 489
Registered: August 2012

Karma: 0

Senior Member

drb@ihatespam.msu.edu (Dennis Boone) writes:

>> “ ‘Hello,’ he said, ‘to you’ ”

> It came through with fancy quotes, and it set the charset in the
> header to utf8.

> We'll see how smart mine is here.

> De

Here's what came through for me:

| From: drb@ihatespam.msu.edu (Dennis Boone)
| Subject: Re: New Year's Computer Stories...
| Newsgroups: alt.folklore.computers
| Date: Sat, 21 Jan 2023 18:03:38 +0000
| Path: reader2.panix.com!panix!usenet.blueworldhosting.com!feed1.us enet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media .com!news.highwinds-media.com!feeder.usenetexpress.com!tr3.i ad1.usenetexpress.com!69.80.99.27.MISMATCH!Xl.tags.giganews. com!local-2.nntp.ord.giganews.com!news.giganews.com.POSTED!n ot-for-mail
| NNTP-Posting-Date: Sat, 21 Jan 2023 18:03:38 +0000
| Sender: Dennis Boone <drb@yagi.h-net.org>
| References: <tor2i4$1a338$1@dont-email.me> <tq65op$380l2$2@dont-email.me>
| <20230117150954.ca5f144ffb59c240285dea08@127.0.0.1>
| <mddfsc8vevy.fsf@panix5.panix.com> <87ilh47ftq.fsf@usenet.ankman.de>
| <mddbkmvcl3s.fsf@panix5.panix.com> <8735875ce1.fsf@usenet.ankman.de>
| <tqbg0h$vim$1@news.misty.com> <87fsc644tj.fsf@usenet.ankman.de>
| <tqdo8p$nok$1@news.misty.com> <87k01g3eib.fsf@usenet.ankman.de>
| <87h6wk3efp.fsf@usenet.ankman.de>
| <20230121093258.b99f908929d4382aedd1a77a@127.0.0.1>
| User-Agent: tin/2.6.1-20211226 ("Convalmore") (FreeBSD/13.1-RELEASE-p2 (amd64))
| MIME-Version: 1.0
| Content-Type: text/plain; charset=UTF-8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| Content-Transfer-Encoding: 8bit
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| Message-ID: <drmdnX8lkY9ntVH-nZ2dnZfqnPSdnZ2d@giganews.com>
| Lines: 8
| X-Usenet-Provider: http://www.giganews.com
| X-Trace: sv3-2GrQF5Rhg2kUejpMPiA53/Dgz6bKILwi7R4+9D+TE110m2pb+spYi8kq 1ifn1Xb+0AmeF7A6QP97uvN!EIqOj4+ujUMSMB3XbFUQ3tYDIpg1lVs6iJj+ cWCa+yaIAGyjbrz4CWCELaQtx/JZCSSG688=
| X-Complaints-To: abuse@giganews.com
| X-DMCA-Notifications: http://www.giganews.com/info/dmca.html
| X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
| X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly
| X-Postfilter: 1.3.40
| X-Received-Bytes: 1842
| Xref: panix alt.folklore.computers:723350
|
| > ?\200\234?\200\211?\200\230Hello,?\200\231 he said, ?\200\230to
| > you?\200\231?\200\211?\200\235
|
| It came through with fancy quotes, and it set the charset in the
| header to utf8.
|
| We'll see how smart mine is here.
|
| De

I used cut-and-paste to get the character representation of the UTF-8
smartquotes, which as you can see did not show up as single characters for me,
even though the message headers declare the "correct" things.

Rich

--
Rich Alderson news@alderson.users.panix.com
Audendum est, et veritas investiganda; quam etiamsi non assequamur,
omnino tamen proprius, quam nunc sumus, ad eam perveniemus.
--Galen

Report message to a moderator

Send a private message to this user

Re: New Year's Computer Stories... [message #418850 is a reply to message #418837]

Mon, 23 January 2023 07:17

Anonymous

Karma:

Originally posted by: Johnny Billquist

On 2023-01-21 01:41, Andreas Kohlbach wrote:
> On Fri, 20 Jan 2023 10:52:24 +0100, Johnny Billquist wrote:
>>
>> On 2023-01-19 22:01, Andreas Kohlbach wrote:
>>> On Thu, 19 Jan 2023 14:19:12 +0100, Johnny Billquist wrote:
>>>>
>>>> Nothing is ever easy with encodings...
>>>>
>>>> Your post was in ISO-8859-1, Rich response didn't contain any
>>>> information at all about what encoding it should be in, however, it
>>>> also didn't change the content in any way, and reasonable heuristics
>>>> would figure out it's ISO-8859-1.
>>> It doesn't matter what his writing has. It only matters if only a
>>> single
>>> character appears. In this case Rich quoted the "Ö" I wrote. Thus it has
>>> to be at least ISO-8859-1.
>>
>> If the headers in Rich's post had said it was Unicode encoded as
>> UTF-8, that "Ö" would have suddenly been shown as something completely
>> different, even though the post would contain the exact same bits.
>
> Then the declaration was wrong/broken.

Agreed. Assuming his intent was to have a "Ö".

>> There is nothing that absolutely says it has to be 8859-1. However, in
>> the case where no encoding information was provided, what is the
>> reader supposed to do? All it can do here is guess.
>
> Or just ignore. Gnus here usually asks me what to do with the then
> non printable characters. I usually choose to replace all with ".".

Well, you also have to define what you mean with non-printable
characters. Technically, only 32-126 should be considered as printable,
I guess...

>> Not to mention that even within 8859, you have a lot of other variants
>> as well. Why even pick -1? It's all heuristics.
>
> Is "looking at text characters" and "heuristics" the same?

Heuristics basically just means that you look at the text content and
try to guess what kind of coding/character set the text was written using.

> Yes, 8859 has more "variants". Like -15, because I mention a € here.
>
> I notice your article was (correctly) UTF-8. So I wonder how mine will be
> declared. UTF-8 or ISO-8859-15...

Your posts all are 8859-15. Not sure why it would say UTF-8 if you have
your news reader set to use 8859-15.

>>>> Your reply to Rich, finally, again was in ISO-8859-1.
>>> Which is correct.
>>
>> Yes. Your headers said so. So even if you had been using Unicode and
>> UTF-8, a reader would have parsed it as 8859-1, but if that had been
>> the case, we would have seen a bunch of garbage instead. :-)
>
> Of course a reader has to "mangle" the text itself according to the
> declaration. There'll be garbage if a reader declares UTF-8, but the text
> in the body actually is Latin1 (not "re-coding" it).

Right.

>>>> Since Rich's post didn't have any content-type information, it seems
>>>> likely that Gnus also either isn't trying to look at any such
>>>> information and decode appropriately, or else it think it cannot
>>>> display such characters for some reason.
>>> No heuristics, AFAIK. Gnus (or any other working newsreader) simply
>>> checks if the text (no matter who wrote it, or if it's just a quote)
>>> contains characters out of usascii, then applies the correct declaration.
>>
>> Uh? No. The posts usually should contain a header declaring what
>> character set and encoding is used. When none exists, and the reader
>> spots characters with the high bit set, the reader will have to guess
>> what encoding was used for the post. In Rich's case, I guess it just
>> didn't try anything at all, but that depends on the reader.
>> But if you do have this information in the headers, the reader *knows*
>> how to interpret it all.
>
> Will see in this posting (I'll send a follow up, once I posted it). If
> you're correct it should declare UTF-8, because you did so in your
> article a followup here. Otherwise it should be ISO-8859-15. Damn, this
> is exciting! :-)

Why do you think I said your post will be UTF-8? I never suggested that.
When you receive/read a post, you have to process and interpret it
according to what character set the *poster* used. When you reply to, or
write your own post, it will be in whatever character set *you* want to
use. When replying, the content of the post you reply to might need to
be converted into the character set you use. If a 1:1 mapping exists
between them, this is easy. Otherwise some loss is expected, which have
to be dealt with in some way.

The character "Ö" exists in both Unicode and 8859-15, so in this case,
there are no problems translating it.

>> By the way, your post might actually not be in 8859-1, but in
>> 8859-15. 8859-1 got abused/appropriated by Windows-1252, making life
>> even more miserable.
>
> Yeah, the post produced garbage. One character was a €, by chance.

:-)

>>>> How early are you talking about? 50s and 60s? Because in the 70s,
>>>> ISO-646-DE existed (https://en.wikipedia.org/wiki/ISO/IEC_646).
>>> I refer to the first time I came in contact with printers for home
>>> computers in the early 80s. They only printed characters in the 7-bit
>>> range. So in Germany we had to use "oe" instead of "ö" for example.
>>
>> I bet you that printer had an option for doing ISO-646-DE. But I can
>> believe that not everyone figured this out.
>>
>> In 646-DE, all the "special" characters are in the 7-bit range.
>
> Was an Commodore MPS 801 (or 803), a rebadged Seikosha GP 500). I cannot
> find it could print characters outside the ascii range, unless when
> printing in graphics mode (but then it can print anything you throw at it
> anyway).

You need to understand that ISO-646-DE did not use any characters
outside of the ASCII range. It merely substituted some characters inside
the ASCII range with national characters. Typically "[\]{|}" (and some
others) were replaced by various national characters instead.
And most printers I ever used had (have) a bunch of dip-switches inside
the printer, where you could (can) select what national character set to
be used instead of the standard ASCII.

So you would most likely have needed to flip a couple of DIP-switches to
have gotten "ö". But it was most likely very possible to get it.

> May be you could also alter the fonts in the fontset, no idea.

That I don't know, but suspect might have been harder.

>>>> But I also seem to recall that in German, if not able to use the
>>>> umlaut letters, the "ae", "oe" and "ue" substitutes are perfectly
>>>> acceptable. And from a collating point of view, this is how the
>>>> letters are collated as well (in German).
>>> OK for me.
>>> But still, if a text is not 7-bit clean (even if a "Ä" shows up in a
>>> quote for example), they need to be declared in the header.
>>
>> Yes. And that needs to be done by the sender. The reader, if no such
>> header exists, will need to make a guess.
>
> It can guess, and many readers do so. But there is AFAIK no RFC forcing a
> reader to guess.

I don't have the energy to read through the RFCs for this now. But it
might be that if no character set is given, you are down to the old
"only 7-bit ASCII is allowed" level. In which any character >126 is
basically non-printable, which seems to be what Rich experienced. But
his reply did retain the character, and at least for me, when reading
his post, I did see the "Ö", even though no character-set was given. So
my browser correctly came to the conclusion that the post was in
8859-{1,15}. As it also understands UTF-8, but didn't use that, it did
make a guess, and one that was reasonable. But that's about as much as I
care to try and work out right now.

Johnny

Report message to a moderator

Re: New Year's Computer Stories... [message #418851 is a reply to message #418597]

Mon, 23 January 2023 07:21

Anonymous

Karma:

Originally posted by: Johnny Billquist

On 2023-01-22 00:28, Andreas Kohlbach wrote:
> On Sat, 21 Jan 2023 09:32:58 +0000, Kerr-Mudd, John wrote:
>>
>> On Fri, 20 Jan 2023 19:43:22 -0500
>> Andreas Kohlbach <ank@spamfence.net> wrote:
>>
>>>> Will see in this posting (I'll send a follow up, once I posted it). If
>>>> you're correct it should declare UTF-8, because you did so in your
>>>> article a followup here. Otherwise it should be ISO-8859-15. Damn, this
>>>> is exciting! :-)
>>>
>>> Gnus did ISO-8859-15, not UTF-8. I win! ;-)
>>
>> Yay!
>>
>> And I hope it eschews 'smartquotes' too.
>> (Luckily my NR doesn't generate them)
>>
>> But here's some from wikipedia:
>>
>>
>> “ ‘Hello,’ he said, ‘to you’ ”
>>
>>
>> Bah, looks to me as if it's been 'translated' by pasting!
>>
>> sending as UTF-8
>
> Are they Grave Accents? Should be inside 7-bit, thus no declaration
> needed if no character outside show up.
>
> Let's see...

I think you failed to notice the backward looking double quotes, which
are not in neither ASCII, nor 8859-1.

In addition, I suspect the backward single quotes are not actually grave
accents. There is so much misery in Unicode...

Johnny

Report message to a moderator

Re: New Year's Computer Stories... [message #418852 is a reply to message #418851]

Mon, 23 January 2023 07:53

Ahem A Rivet's Shot is currently offline

Ahem A Rivet's Shot
Messages: 4843
Registered: January 2012

Karma: 0

Senior Member

On Mon, 23 Jan 2023 13:21:12 +0100
Johnny Billquist <bqt@softjar.se> wrote:

> I think you failed to notice the backward looking double quotes, which
> are not in neither ASCII, nor 8859-1.
>
> In addition, I suspect the backward single quotes are not actually grave
> accents. There is so much misery in Unicode...

Only if you expect to go from appearance to code point - viewed
from the other direction there is great precision in Unicode and of course
it covers pretty much every script in use and is the only thing to do so.
The objection to adding Klingon script wasn't that it was fictional but
that no author writing in Klingon (yes there are some) uses it.

--
Steve O'Hara-Smith
Odds and Ends at http://www.sohara.org/

Report message to a moderator

Send a private message to this user

Re: New Year's Computer Stories... [message #418863 is a reply to message #418852]

Tue, 24 January 2023 10:25

Anonymous

Karma:

Originally posted by: Johnny Billquist

On 2023-01-23 13:53, Ahem A Rivet's Shot wrote:
> On Mon, 23 Jan 2023 13:21:12 +0100
> Johnny Billquist <bqt@softjar.se> wrote:
>
>> I think you failed to notice the backward looking double quotes, which
>> are not in neither ASCII, nor 8859-1.
>>
>> In addition, I suspect the backward single quotes are not actually grave
>> accents. There is so much misery in Unicode...
>
> Only if you expect to go from appearance to code point - viewed
> from the other direction there is great precision in Unicode and of course
> it covers pretty much every script in use and is the only thing to do so.
> The objection to adding Klingon script wasn't that it was fictional but
> that no author writing in Klingon (yes there are some) uses it.

Sadly, it's not even that. It's just a mess. Or how do you explain green
book and blue book? When did color become an integral part of glyphs?
And why is there then no red book? How do I get a red book? Should I use
a blue book, and then do it with a red color? Or what would happen then?

And the we have glyphs for unit, but not for all units. A simple thing
like 'm', is that the letter "m", or the unit meter? No way to tell from
the glyph, but of course for things like "mm", there is a special glyph
(U+339C). What kind of "great precision" is that? Some are ambiguous
like hell. Not to mention there are multiple codepoints for the latin
"m". Like U+FF4D, which is just a plain "m", but with more space around
it. In which way is it different from U+006D, apart from obviously being
a different code point? And what does it represent? "Fullwidth Latin
Small Letter M". The difference being "fullwidth", Which is a
typographical difference. Not to mention, are we talking about the
letter or the unit? Still unclear.

And what is the difference (is there any?) between (U+0308 U+004F) and
(U+00D6)?

Unicode is such a mixture of everything that nothing is clear.

Johnny

Report message to a moderator

Re: New Year's Computer Stories... [message #418865 is a reply to message #418863]

Tue, 24 January 2023 12:30

Ahem A Rivet's Shot is currently offline

Ahem A Rivet's Shot
Messages: 4843
Registered: January 2012

Karma: 0

Senior Member

On Tue, 24 Jan 2023 16:25:00 +0100
Johnny Billquist <bqt@softjar.se> wrote:

> Unicode is such a mixture of everything that nothing is clear.

It is a lot less unclear if you ignore the glyphs completely and
simply ask which code point should I use to represent this entity - if
there is a specialised code point (eg units) then use that otherwise
fallback to the nearest available option.

There is a lot of stuff in Unicode that seems to me to have no
reason for being there but I consider that a small price to pay for having
pretty much every character I might ever need available and usually even
being able to be semantically correct in my choice of code point. We've
come a long way from backspacing over an o and typing a /.

Oh and yes I agree typographical differences have no place in
unicode, I have no idea why the fullwidth m is there.

--
Steve O'Hara-Smith
Odds and Ends at http://www.sohara.org/

Report message to a moderator

Send a private message to this user

Re: New Year's Computer Stories... [message #418866 is a reply to message #418798]

Tue, 24 January 2023 18:00

Anonymous

Karma:

Originally posted by: Richmond

Rich Alderson <news@alderson.users.panix.com> writes:

> Andreas Kohlbach <ank@spamfence.net> writes:
>
>> On 17 Jan 2023 19:57:05 -0500, Rich Alderson wrote:
>
>>> You're missing an umlaut, or an "e": His new name is Oetzi, not
>>> Otzi.
>
>> Actually _tzi.
>
> I assume that that's umlaut-o, but I've never gotten GNUS to display
> Unicode as anything other than question marks or \ooo octals. I was
> told in German classes 50 years ago that "ae", "oe", and "ue" were
> acceptable alternatives, if somewhat outmoded.

Ötzi. The trick is to know what it is called "LATIN CAPITAL LETTER O
WITH DIAERESIS" and then you can use ctrl-x 8 <return> LATIN<tab> etc.

Report message to a moderator

Re: New Year's Computer Stories... [message #418867 is a reply to message #418865]

Wed, 25 January 2023 09:05

Anonymous

Karma:

Originally posted by: Johnny Billquist

On 2023-01-24 18:30, Ahem A Rivet's Shot wrote:
> On Tue, 24 Jan 2023 16:25:00 +0100
> Johnny Billquist <bqt@softjar.se> wrote:
>
>> Unicode is such a mixture of everything that nothing is clear.
>
> It is a lot less unclear if you ignore the glyphs completely and
> simply ask which code point should I use to represent this entity - if
> there is a specialised code point (eg units) then use that otherwise
> fallback to the nearest available option.

Which is sortof unclear. :-)

> There is a lot of stuff in Unicode that seems to me to have no
> reason for being there but I consider that a small price to pay for having
> pretty much every character I might ever need available and usually even
> being able to be semantically correct in my choice of code point. We've
> come a long way from backspacing over an o and typing a /.

I'm not sure we actually progressed that much.

> Oh and yes I agree typographical differences have no place in
> unicode, I have no idea why the fullwidth m is there.

I sortof know, which is why I'm also aware of it, and annoyed by it. In
for example, Japanese, when they write latin characters, they
traditionally wanted them to use the same kind of spacing as hiragana,
katakana and so on, so they already had these full width latin. So when
Unicode was created, it was decided to preserve this as well.

And then you also have superscript and subscript of all the latin
letters and numbers, and a few more duplicates, not to mention similar
looking letters in different alphabets, making it extremely hard to know
which Unicode code point you actually should use. Not to mention
understanding if two strings are actually equal, if we talk about the
word they represent, as the actual codepoints used can be very
different, with no semantical difference.

I could go on... But I get the feeling you get it. :-)

Johnny

Report message to a moderator

Re: New Year's Computer Stories... [message #418868 is a reply to message #418867]

Wed, 25 January 2023 12:13

Harry Vaderchi is currently offline

Harry Vaderchi
Messages: 719
Registered: July 2012

Karma: 0

Senior Member

On Wed, 25 Jan 2023 15:05:38 +0100
Johnny Billquist <bqt@softjar.se> wrote:

> On 2023-01-24 18:30, Ahem A Rivet's Shot wrote:
>> On Tue, 24 Jan 2023 16:25:00 +0100
>> Johnny Billquist <bqt@softjar.se> wrote:
>>
>>> Unicode is such a mixture of everything that nothing is clear.
>>
>> It is a lot less unclear if you ignore the glyphs completely and
>> simply ask which code point should I use to represent this entity - if
>> there is a specialised code point (eg units) then use that otherwise
>> fallback to the nearest available option.
>
> Which is sortof unclear. :-)
>
>> There is a lot of stuff in Unicode that seems to me to have no
>> reason for being there but I consider that a small price to pay for having
>> pretty much every character I might ever need available and usually even
>> being able to be semantically correct in my choice of code point. We've
>> come a long way from backspacing over an o and typing a /.
>
> I'm not sure we actually progressed that much.
>
>> Oh and yes I agree typographical differences have no place in
>> unicode, I have no idea why the fullwidth m is there.
>
> I sortof know, which is why I'm also aware of it, and annoyed by it. In
> for example, Japanese, when they write latin characters, they
> traditionally wanted them to use the same kind of spacing as hiragana,
> katakana and so on, so they already had these full width latin. So when
> Unicode was created, it was decided to preserve this as well.
>
> And then you also have superscript and subscript of all the latin
> letters and numbers, and a few more duplicates, not to mention similar
> looking letters in different alphabets, making it extremely hard to know
> which Unicode code point you actually should use. Not to mention
> understanding if two strings are actually equal, if we talk about the
> word they represent, as the actual codepoints used can be very
> different, with no semantical difference.
>
> I could go on... But I get the feeling you get it. :-)
>
> Johnny

TL;DR - it's a mess.

--
Bah, and indeed Humbug.

Report message to a moderator

Send a private message to this user

Re: New Year's Computer Stories... [message #418869 is a reply to message #418868]

Wed, 25 January 2023 13:19

Ahem A Rivet's Shot is currently offline

Ahem A Rivet's Shot
Messages: 4843
Registered: January 2012

Karma: 0

Senior Member

On Wed, 25 Jan 2023 17:13:26 +0000
"Kerr-Mudd, John" <admin@127.0.0.1> wrote:

> TL;DR - it's a mess.

But it's the least awful available solution to international text
encoding.

--
Steve O'Hara-Smith
Odds and Ends at http://www.sohara.org/

Report message to a moderator

Send a private message to this user

Re: New Year's Computer Stories... [message #418873 is a reply to message #418869]

Thu, 26 January 2023 16:05

Anonymous

Karma:

Originally posted by: Johnny Billquist

On 2023-01-25 19:19, Ahem A Rivet's Shot wrote:
> On Wed, 25 Jan 2023 17:13:26 +0000
> "Kerr-Mudd, John" <admin@127.0.0.1> wrote:
>
>> TL;DR - it's a mess.
>
> But it's the least awful available solution to international text
> encoding.

Well, it's the only one, and it's unlikely we'll see any other. It's
established, and we'll have to live with it, just like the x86. (I know
ARM exists... :-P )

Johnny

Report message to a moderator

Pages (5): [ « ‹ 1 2 3 4 5 › »]

Switch to threaded view of this topic

Create a new topic

Submit Reply

Previous Topic:	Re: do some Americans write their 1's in this way ?
Next Topic:	Re: Filesystem vs file system

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

] [

PDF

]

Current Time: Wed Apr 17 22:37:00 EDT 2024

Total time taken to generate the page: 0.03975 seconds