| ▲ | quibono 7 hours ago |
| CLRF vs LF strikes again. Partly at least. I wonder why even have a max line length limit in the first place? I.e. is this for a technical reason or just display related? |
|
| ▲ | brk 4 hours ago | parent | next [-] |
| Wait, now we have to deal with Carriage Line Return Feeds too? I wonder if the person who had the idea of virtualizing the typewriter carriage knew how much trouble they would cause over time. |
| |
| ▲ | keybored 4 hours ago | parent [-] | | Yeah, and using two bytes for a single line termination (or separation or whatever)? Why make things more complicated and take more space at the same time? | | |
| ▲ | floren 3 hours ago | parent [-] | | Remember that back in the mists of time, computers used typewriter-esque machines for user interaction and text output. You had to send a CR followed by an LF to go to the next line on the physical device. Storing both characters in the file meant the OS didn't need to insert any additional characters when printing. Having two separate characters let you do tricks like overstriking (just send CR, no LF) | | |
| ▲ | kstrauser 42 minutes ago | parent [-] | | True, but I don’t think there was a common reason to ever send a linefeed without going back to the beginning. Were people printing lots of vertical pipe characters at column 70 or something? It would’ve been far less messy to make printers process linefeed like \n acts today, and omit the redundant CR. Then you could still use CR for those overstrike purposes but have a 1-byte universal newline character, which we almost finally have today now that Windows mostly stopped resisting the inevitable. |
|
|
|
|
| ▲ | OJFord 7 hours ago | parent | prev | next [-] |
| I haven't seen them other than in the submission - but if the length matches up it may be that they were processed from raw email, the RFC defines a length to wrap at. Edit: yes I think that's most likely what it is (and it's SHOULD 78ch; MUST 998ch) - I was forgetting that it also specifies the CRLF usage, it's not (necessarily) related to Windows at all here as described in TFA. Here it is in my 'notmuch-more' email lib: https://github.com/OJFord/amail/blob/8904c91de6dfb5cba2b279f... |
| |
| ▲ | FabHK 6 hours ago | parent [-] | | > it's not (necessarily) related to Windows at all here as described in TFA. The article doesn't claim that it's Windows related. The article is very clear in explaining that the spec requires =CRLF (3 characters), then mentions (in passing) that CRLF is the typical line ending on Windows, then speculates that someone replaced the two characters CRLF with a one character new line, as on Unix or other OSs. | | |
| ▲ | OJFord 6 hours ago | parent [-] | | Ok yeah I may have misinterpreted that bit in the article. It would be a totally reasonable assumption if you didn't happen to know that about email though, it wasn't a judgement regardless. |
|
|
|
| ▲ | dgan 7 hours ago | parent | prev [-] |
| I am just wondering how it is good idea for a sever to insert some characters into user's input. If a collegue were to propose this, i d laugh in his face It's just sp hacky i cant belive it's a real life's solution |
| |
| ▲ | jagged-chisel 6 hours ago | parent | next [-] | | “Insert characters”? Consider converting the original text (maintaining the author’s original line wrapping and indentation) to base64. Has anything been “inserted” into the text? I would suggest not. It has been encoded. Now consider an encoding that leaves most of the text readable, translates some things based on a line length limit, and some other things based on transport limitations (e.g. passing through 7-bit systems.) As long as one follows the correct decoding rules, the original will remain intact - nothing “inserted.” The problem is someone just knowledgeable enough to be aware that email is human readable but not aware of the proper decoding has attempted to “clean up” the email for sharing. | | |
| ▲ | dgan 6 hours ago | parent [-] | | Okey it does sound better from this POV. Still wierd as its a Client/UI concern, not something a server is supposed to do; whats next,adding "bold" tags on the title? Lol | | |
| ▲ | brookst 4 hours ago | parent [-] | | SMTP is a line-oriented protocol. The server processes one line at a time, and needs to understand headers. Infinite line length = infinite buffer. Even worse, QP is 7-bit (because SMTP started out ASCII only), so characters >127 get encoded as three bytes (equal, then two hex digits), so a 500-character non-ASCII UTF8 line is 1500 bytes. It all made sense at the time. Not so much these days when 7-bit pipes only exist because they always have. |
|
| |
| ▲ | flexagoon 6 hours ago | parent | prev | next [-] | | When you post a comment on HN, the server inserts HTML tags into your input. Isn't that essentially the same thing? | | |
| ▲ | dgan 6 hours ago | parent [-] | | No, because there is a clear separation between the content and the envelop. You wouldnt expect the post office to open your physical letters and write routing instructions to the postmen for delivery But I agree with sibling comment: it makes more sense when its called "encoding" instead of "inserting chars into original stream" | | |
| ▲ | 1718627440 3 hours ago | parent [-] | | > You wouldnt expect the post office to open your physical letters and write routing instructions to the postmen for delivery Digital communication is based on the postmen reading, transcribing and copying your letters. There is a reason why digital communication is treated differently then letters by the law and why the legally mandated secrecy for letters doesn't apply to emails. |
|
| |
| ▲ | direwolf20 5 hours ago | parent | prev | next [-] | | It's called escaping, and almost every protocol has it. HN must convert the & symbol to & for displaying in HTML. Many wire protocols like SATA or Ethernet must insert a 1 after a certain number of consecutive 0s to maintain electrical balance. Don't remember which ones — don't quote me that it's SATA and Ethernet. | | |
| ▲ | zoho_seni an hour ago | parent [-] | | Protocols that literally insert a bit are HDLC / PPP / CAN and they insert a 0 after a few 1s |
| |
| ▲ | layer8 5 hours ago | parent | prev [-] | | Just wait until you learn what mess UTF-8 will turn your characters into. ;) |
|