There are no “junk” Unicode characters. There are just nonsensical combinations of characters. Stripping out characters blindly is not a solution, because you have no way of knowing what was intended.