Page is the basic unit of data exchange between disk and memory, in operating systems, database management systems, and inverted file’s data organization, it occupies a very important position. To reduce the inverted index’s disk I/O read and write overhead, proposing a method that the inverted file storages by pages, and achieving to read and write files by page. This method mainly contains three parts, including disk I/O layer design, page manager design, and heap file manager design, achieving variable page size’s data file management using block, supporting for the fixed-length records, variable-length records storage assembly in the page and super long data record’s cross-page storage. The experimental test results show that the method is effective, and it can be applied to actual vertical search engine.%页是磁盘与内存进行数据交换的基本单位,它在操作系统、数据库管理系统以及倒排文件的数据组织中占据十分重要的地位。为减少倒排索引的磁盘 I/O 读写开销,提出了一种倒排文件按页存储的构建方法,实现了按页读写文件。该方法主要包括磁盘I/O层设计、页管理器设计以及堆文件管理器设计三个部分,实现了页大小可变的分块式数据文件管理,支持页内定长记录、变长记录的组装以及超长数据记录的跨页存储。经实验测试,结果表明该方法是行之有效的,可以将其应用到实际的垂直搜索引擎中。
展开▼