首页>
外国专利>
Domain-specific stopword removal from unstructured computer text using a neural network
Domain-specific stopword removal from unstructured computer text using a neural network
展开▼
机译:使用神经网络从非结构化计算机文本中删除特定于域的停用词
展开▼
页面导航
摘要
著录项
相似文献
摘要
Methods and apparatuses are described for analyzing unstructured computer text for domain-specific stopword identification and removal. A computer data store stores unstructured text. A server computing device splits the unstructured text into phrases and generates tokens from the phrases. The server computing device generates a set of bootstrap keywords using the tokens. An artificial intelligence neural network executing on the server computing device generates a stopword training model. The server computing device generates a first set of candidate stopwords using the bootstrap keywords and the stopword training model. The server computing device generates regular expressions using the bootstrap keywords, and generates a second set of candidate stopwords using the regular expressions. The server computing device stores the candidate stopwords in the data store, and removes stopwords from the unstructured text using the data store.
展开▼