Click for new scientific resources and news about Corona[COVID-19]

Paper Information

Journal:   LIBRARY AND INFORMATION SCIENCE   WINTER 2010 , Volume 12 , Number 4 (48); Page(s) 9 To 36.
 
Paper: 

STOPWORDS LIST CONSTRUCTION FOR AUTOMATIC INDEXING OF PERSIAN TEXTS

 
 
Author(s):  SANJI M., DAVARPANAH M.R.
 
* 
 
Abstract: 

The Aim of this study was to identify nonconceptual or stop words in Persian Language and to develop a list of these words for automatic indexing of Persian texts in the fields of Psychology, Educational Sciences and Library and Information Science. The research was done based on content analysis method. The research population consisted the articles in the latest issues of the scientific journals of psychology, Education and Library and Information Science published in 1385.
Findings showed that: 1- Copula and auxiliary verbs, Adverbs, Pronouns, Characters (Prepositions, Conjunctions and Interjection), Sounds, numbers and Punctuation marks are among non-conceptual or stop words in the Persian language. 2- Without including Punctuation marks, 39/96 percent of educational sciences, 38/57 percent of psychology and 38/12 percent of library and information science texts are constructed of non-conceptual words. 3- High frequency stop words in these fields are approximately the same. 4-38/94 percent of the text analyzed words are stop words. 5- Comparing Persian list with the stop word list of Fox in English language showed that there is 28/5% overlap between these two lists. The result of this survey showed that about 40% of the words in Persian language texts can be ignored in text analysis and automatic indexing.

 
Keyword(s): PERSIAN STOP WORDS, NON-CONCEPTUAL WORDS, AUTOMATIC INDEXING, PERSIAN LANGUAGE PROCESSING
 
References: 
  • ندارد
 
  Persian Abstract Yearly Visit 187
 
Latest on Blog
Enter SID Blog