Let me start with an example- some time ago I worked on a VAE that encoded and decoded SMILES strings. The idea is that you should be able to encode a SMILES into an embedding space, do all the normal things you would do in that space, and then convert the resulting embedding vector back to a valid molecule.
The VAE is trained with a very large number of valid SMILES strings, typically tokenized at the character level (so "C" is a token, and "Br" is "B" then "r"). I and others have observed that VAEs trained like this produce large number of embedding vectors that do not decode to valid SMILES strings- they have syntax errors, or perform chemical alchemy (personally, I saw the training set had Br (bromine) and Ca (Calcium), and the output molecules sometimes were Ba (barium) even though that's not in the original dataset at all.
There are other reasons why the tokenizer produces bad results- only about 1-10% of vectors decode to valid molecules. Invalid SMILES are mostly useless- they don't correspond to actual structures.
To respond to this, the SELFIES format makes a few changes so that it is effectively impossible to produce invalid SELFIES stringes when decoding a VAE. Among other things, tokenization matches the actual elements and so the model will only ever output valid elements.
I believe this is the SMILES paper that my own experiments were based on: https://arxiv.org/pdf/1610.02415 (see https://github.com/maxhodak/keras-molecules for an open source attempt at implementation)
And this is the paper introducing SELFIES: https://arxiv.org/abs/1905.13741 (open source packages for working with SELFIES, and some example training scripts https://github.com/aspuru-guzik-group/selfies see "Validity of Latent Space in VAE SMILES vs. SELFIES for more detail on the robustness).
BTW, as a side note: even though we put a bunch of effort into duplicating the original SMILES VAE, it was extremely slow to train and not very useful. Now you can just ask Gemini to write a full SELFIES VAE and train it in less than a day on a conventional GPU (thanks pytorch transformers!) to get a decent basic set of embeddings useful for exploring chemical space.