First, it is necessary to emphasise that it is mandatory to transform documents of the corpora into a common format when managing large amounts of information. This will allow us to query all documents using a unique query and to improve the performance of the system. By doing so we will avoid problems with performance and result management. Furthermore, nowadays, the technologies used to build IRSs are not prepared to satisfy corpora users' requirements. So, in the near future the development of new add-ons which take them into account is needed. There are some timid attempts to include basic linguistic operations (sensitivity to accents, umlauts, etc., theme searches, etc.) based on localization, but it is time to incorporate Syntactic techniques into commercial systems to enable the building of more versatile IRSs based on corpora.
展开▼