In the document retrieval method, the key is built on the database search. However, when the a-mount of data to be retrieved becomes very large, using this search method, a large number of retrieval operations will be concentrated on a single host, which can result in reduced efficiency of retrieval. Under this background, a distributed system can be used to solve the problem. Retrieving resources in a distributed system can be based on MapReduce architecture to achieve retrieval. Thus, the pressure of retrieval operation will be allocated to each node in a distributed system, which can effectively reduce the pressure of the machine and greatly improve the retrieval efficiency. Using the traditional way, retrieving 1 million data consumes 500 seconds, while using the method based on MapReduce architecture for distributed systems to retrieve one million data only needs 40 seconds. Com-pared with traditional search method, method of distributed systems based on MapReduce architecture can promote efficiency to 12. 5 times.%在文件检索的方法中,目前主要是基于数据库进行检索。但是,当待检索的数据量变得非常大的时候,再使用这种检索方式,大量的检索操作就会集中在一台主机上进行,这会导致检索效率降低。基于这种情况,拟采用分布式系统来解决这个问题。在分布式系统中进行资源检索时,可以基于MapReduce架构来实现检索,这样,检索操作的压力将分散到分布式系统的各个节点中,这样可以有效降低机器的压力,大大提高检索的效率。采用传统方式检索100万条数据,需要耗时500 s,而采用基于MapReduce架构的分布式系统的方法来检索100万的数据,只需要花费40 s,相对于传统检索方法采用基于MapReduce架构的分布式系统检索可使检索效率提升接近12.5倍。
展开▼