▲ | throwaway89201 6 days ago | |
The training sets of most LLMs contain a copious amount of content from Libgen (or now: Anna's Archive), where em dashes are frequently used in literary writing. | ||
▲ | nullc 6 days ago | parent [-] | |
Who the hell knows how the initial biases of LLM's broke. My IRC name (gmaxwell) is a token in the GPT3 tokenizer. |