Language-Independent Text Tokenization Using Unsupervised Deep Learning

Hanan A.Hosni Mahmoud; Alaaeldin M.Hafez; Eatedal Alabdulkreem

首页> 中文期刊> 《智能自动化与软计算(英文)》 >Language-Independent Text Tokenization Using Unsupervised Deep Learning

Language-Independent Text Tokenization Using Unsupervised Deep Learning

开具论文收录证明 >>

期刊封面封底目录下载 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Languages–independent text tokenization can aid in classiﬁcation of languages with few sources.There is a global research effort to generate text clas-siﬁcation for any language.Human text classiﬁcation is a slow procedure.Conse-quently,the text summary generation of different languages,using machine text classiﬁcation,has been considered in recent years.There is no research on the machine text classiﬁcation for many languages such as Czech,Rome,Urdu.This research proposes a cross-language text tokenization model using a Transformer technique.The proposed Transformer employs an encoder that has ten layers with self-attention encoding and a feedforward sublayer.This model improves the efﬁ-ciency of text classiﬁcation by providing a draft text classiﬁcation for a number of documents.We also propose a novel Sub-Word tokenization model with frequent vocabulary usage in the documents.The Sub-Word Byte-Pair Tokenization tech-nique(SBPT)utilizes the sharing of the vocabulary of one sentence with other sentences.The Sub-Word tokenization model enhances the performance of other Sub-Word tokenization models such pair encoding model by+10%using precision metric.

著录项

来源
《智能自动化与软计算(英文)》 |2023年第1期|321-334|共14页
作者
Hanan A.Hosni Mahmoud; Alaaeldin M.Hafez; Eatedal Alabdulkreem;
展开▼
作者单位

Department of Computer Sciences of Computer and Information Sciences Nourah bint Abdulrahman University.O.Box 84428 Arabia;

Department of Information Systems of Computer and Information Sciences Saud University Arabia;

展开▼
原文格式 PDF
正文语种 chi
中图分类英语;
关键词
Text classiﬁcation; language-independent tokenization; sub word tokenization;

相似文献

中文文献
外文文献
专利

1. Generation of functional oligopeptides that promote osteogenesis based on unsupervised deep learning of protein IDRs [J] . Mingxiang Cai ,Baichuan Xiao ,Fujun Jin . 骨研究:英文版 . 2022,第2期
2. A Hybrid Deep Learning-Based Unsupervised Anomaly Detection in High Dimensional Data [J] . Amgad Muneer ,Shakirah Mohd Taib ,Suliman Mohamed Fati . 计算机、材料和连续体(英文) . 2022,第3期
3. Deep Learning Driven Arabic Text to Speech Synthesizer for Visually Challenged People [J] . Mrim M.Alnfiai ,Nabil Almalki ,Fahd N.Al-Wesabi . 智能自动化与软计算(英文) . 2023,第6期
4. Automated Arabic Text Classification Using Hyperparameter Tuned Hybrid Deep Learning Model [J] . Badriyya B.Al-onazi ,Saud S.Alotaib ,Saeed Masoud Alshahrani . 计算机、材料和连续体(英文) . 2023,第3期
5. BERT-CNN: A Deep Learning Model for Detecting Emotions from Text [J] . Ahmed R.Abas ,Ibrahim Elhenawy ,Mahinda Zidan . 计算机、材料和连续体(英文) . 2022,第5期
6. Numeral Disambiguation in Chinese Text with Memory-based Learning [C] . . 第六届中国人工智能职合学术会议 . 2001
7. Research on Generating SQL from Natural Language Text Based on Deep Learning [A] . Lingxiao Cai . 2022

Language-Independent Text Tokenization Using Unsupervised Deep Learning

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅