This demo was trained with Word2Vec and Gensim using data from Portuguese news websites.
Model was build with a total 248,537,667 tokens, from 20,683,339 sentences and a vocabulary of 345,107 unique words with at least 5 occurrences.
Parameteres used to train:
- Training model: Skip-Gram
- Negative sampling: 15
- Number of training threads: 4
- Number of training iterations: 10
- Min word frequency: 5
- Vector size: 300
- Max skip length: 5
- Threshold for occurrence of words: 1e-05
- Starting learning rate: 0.025