| ▲ | sixhobbits an hour ago | |
I do some similar charting etc with telegram data dumps that you can still get from the "telegram lite" desktop app even though they removed the export functionality from the main app. For removing noise you might want to look into TF-IDF instead of the manual method described in the post that I didn't understand. It basically looks for words common across the whole corpus as noise or ones that appear within a specific chat much higher than the whole dataset as interesting. You can also do some fun stuff by finding phrases used asymmetrically eg more by one person in the convo than the other, or over time. Wordclouds per person are also fun! | ||