To have a comprehensive data base for Hindi language a statistical study of printed-Hindi-text characters using computer is reported. The data-base is needed for the research community in the areas of speech recognition, synthesis, system development and performance evaluation. Frequency of occurrence and frequency information is needed to choose isolated words and sentences. The corpus of this pilot study is of about 51,000 printed characters. A code of English alphabets using English computer keyboard has been used to represent Hindi characters and computer algorithm was developed to find relative occurrence of different characters, their neighbours and the most-frequent words.
展开▼