Machine Learning
Model
monsoon-nlp/bert-base-thai is a Natural Language Processing (NLP) Model implemented in Transformer library, generally using the Python programming language that we used to make a pre-trained sentiment analysis model.
BERT-th presents the Thai-only pre-trained model based on the BERT-Base structure.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT (Bidirectional Encoder Representations from Transformers) is designed to pretrain deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be finetuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial taskspecific architecture modifications.
ref : https://arxiv.org/pdf/1810.04805.pdf
Library that we use in model
Transformers - provides APIs and tools to easily download and train state-of-the-art pre-trained models
Datasets - a library for easily accessing and sharing datasets for Audio, Computer Vision, and Natural Language Processing (NLP) tasks
NumPy - a Python library used for working with arrays. It also has functions for working in domain of linear algebra, fourier transform, and matrices
Step to Preprocessing data
Cleaning data
Removing HTML, punctuation, emoji
Tokenization
Words to integer
Train and Test
SRC our COLAB : https://colab.research.google.com/drive/1p7GkPV8z_X71NpnnLV8kvp8hh-zi4z9z?usp=sharing
Last updated