0. Foreword
classtransformers.BertTokenize
offical document: https://huggingface.co/transformers/v3.0.2/model_doc/bert.html
tokenize-demo:
1 2 3 4 5 6 7 8 9 10 11 12
| from transformers import BertTokenizer
def test_tokenize(raw_str: str): tokens = tokenizer.tokenize(raw_str) print('raw_str: {}, tokenized: {}'.format(raw_str, tokens))
if __name__ == '__main__': tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') test_tokenize('United States') test_tokenize('21,439,453')
|
1 2
| raw_str: United States, tokenized: ['united', 'states'] raw_str: 21,439,453, tokenized: ['21', ',', '43', '##9', ',', '45', '##3']
|
1.
1 2 3 4
| def tokenize(self, tokenizer: BertTokenizer):
|