| ▲ | jstanley 4 hours ago | |||||||
How are you ending up with a byte-order mark in your shell scripts though? This has literally never happened to me. I don't know a single piece of software that writes byte-order marks, they are super niche. | ||||||||
| ▲ | dspillett 2 hours ago | parent | next [-] | |||||||
BOM is officially recommended against for UTF-8, but I've seen some tools include it when converting from UCS or UTF16 in Windows. A number of text editors support it, and may stick in that mode for subsequent files, which might be how a BOM could accidentally get into a new file. Irritatingly, you'll find BOMs to not be uncommon in CSV files because of Excel, which interprets files as CP1252 (a superset of the printable characters of ISO 8859-1, sometimes known as Win1252 or Windows-1252) if the BOM is not present, causing anything beyond ASCII to be misinterpreted (accented characters are usually the first thing people in Europe notice getting garbled, currently symbols other than $ too). | ||||||||
| ▲ | apple1417 2 hours ago | parent | prev | next [-] | |||||||
My most common source of unintentional BOMs is powershell. By default, 'echo 1 > test' writes the raw bytes 'FF FE 31 00 0D 00 0A 00'. Not too likely for that to end up in a shell script though. | ||||||||
| ▲ | rmunn 3 hours ago | parent | prev | next [-] | |||||||
The coworker who created the script runs Windows. When I informed him that he'd gotten a BOM into the shell script, he checked his IDE settings (JetBrains Rider) and his encoding default was set to UTF-8 without BOM, so neither of us have any clue how that script ended up with a BOM in it. Perhaps he edited the script with a different tool at one point. But it was definitely because the script was created or edited on Windows. (I forgot to mention earlier that you'll only ever run into this when you work on projects where devs are using different OSes to check files into Git. Many people will therefore never see this issue). | ||||||||
| ||||||||
| ▲ | Elfener 39 minutes ago | parent | prev | next [-] | |||||||
The answer is (also confirmed by other replies) Windows. It seems in the Unix world, everyone uses UTF-8 (without BOM of course) and text encoding mistakes don't exist. When you involve Windows, which likes a random mix of UTF-16, UCS-2, CP1252, and I guess also UTF-8 with BOM, you're screwed. | ||||||||
| ▲ | defrost 4 hours ago | parent | prev [-] | |||||||
Notepad++ (popular with some on Windows) does optional Byte Order Marks on text files (subtitles, bash scripts, anything UTF-8 etc). Not my editor of choice but some swear by it and are prone to work cross platform across NAS's and SSH terminals with either windows or some *nix as 'primary' work space. I'm sure other editors have this as an option, the time I ran into BOM issues I traced it back to the use of Notepad++ by a third party. | ||||||||