| ▲ | doctor_blood 14 hours ago | |
Unfortunately there isn't much information on what texts they're actually training this on; how Anglocentric is the dataset? Does it include the Encyclopedia Britannica 9th Edition? What about the 11th? Are Greek and Latin classics in the data? What about Germain, French, Italian (etc. etc.) periodicals, correspondence, and books? Given this is coming out of Zurich I hope they're using everything, but for now I can only assume. Still, I'm extremely excited to see this project come to fruition! | ||
| ▲ | DGoettlich 6 hours ago | parent [-] | |
thanks. we'll be more precise in the future. ultimately, we took whatever we could get our hands on, that includes newspapers, periodicals, books. its multilingual (including italian, french, spanish etc) though majority is english. | ||