The Difference between BiLSTM and Transformer - The Discussion about Suitable Scenarios for BiLSTM and Transformer
DOI:
https://doi.org/10.61173/xpnbnx35Keywords:
Skeleton-based motion recognize, Human pose estimation, BiLSTM, Transformer encoder, Real- time inferenceAbstract
Action recognition is becoming more and more important with the development of autopilot cars and smart homes. Many people hope that autopilot cars can make transportation safer. Furthermore, action recognition can make their homes smart and make their daily lives more convenient. The common demand of these technologies is action recognition. This paper researches the differences between Bi-direction Long Short-Term Memory (BiLSTM) and Transformer to find their suitable scenes. This paper main research topic is motion recognition using BiLSTM and Transformer. The results show that in one train, the train epoch of BiLSTM is 178.41 seconds, Transformer is 1657.75 seconds; the train time in a similar fitter degree of BiLSTM is 19 times, but Transformer need 50 times to achieve a similar fitter degree. Therefore, BiLSTM is suitable for low-power-consumption devices and scenes that require high real-time recognition. The Transformer is suitable for unlimited-power-consumption devices and requires high recognition accuracy in scenes. In the future, this research will add more motions and add transitions between every motion. Furthermore, the research will improve codes to make results more accurate and make training more efficient.