Proliferation of digital libraries plus high availability of electronic documents from the Internet have created new challenges for computer science researchers and professionals. This paper discusses the problems of using parallel and cluster computing systems for detecting plagiarism in large collections of semi-structured electronic texts, including software written in formal languages at one end of the spectrum and natural language texts at the other end. The main component of the system is using string matching algorithms and suffix trees. Implementation and performance issues are also discussed.
展开▼