【24h】

Mathematics and Scientific Markup

机译:数学和科学标志

获取原文

摘要

The development of e-Science (cyberScience, Grid, etc.) is starting to become a reality with formalised data resources, services on demand, domain-specific search engines, digital repositories, etc. Increasingly STM information will be contained in compound XML documents, representing scientific communication (articles, theses, repository entries, etc.). In physical sciences such as chemistry, materials science, engineering, physics, earth sciences, these "datuments" [1] normally contain hypertext, graphics, tables, graphs and numerical data, mathematical objects and relationships. In addition they may also contain domain-specific content such as chemical formula and reactions, thermodynamic and mechanical properties, electric, magnetic and optical properties. Among the domain-specific languages, CML (Chemical Markup Language) is the oldest and broadest, and is now being actively used for publishing by the Royal Society of Chemistry (Project Prospect [2]) which gives an idea of what chemistry in datuments can look like. CML has had to develop the domain-specific objects (molecules, atoms, bonds, spectra, crystallography, etc.) and the relationships between them. However, due to the text-based nature of early XML, it has also had to design an implement domain-independent infrastructure which can support much of physical science. Originally called STMML [3] it supports data types (float, integer, complex, etc.), data structures (arrays, lists, matrices, etc.), geometrical concepts (points, planes, lines, etc.) and scientific units of measurement. In addition CML bases much of its flexibility one user-created dictionaries (ontologies) which are hyperlinked from objects in the datuments. It is now clear that the domain-independent parts of CML (and by extension some other markup languages in physical science) are loosely isomorphic with approaches in MathML and OMDOC. If a synthesis can be found, then CML can happily forget about the "non-chemistry" knowing that the mathematical and physical science community has a general way forward. In easiest-first order, the following are suggested: (1) Mathematical variables and equations in chemical documents. An obvious challenge is that the variables represent types, often physical quantities (but also chemical objects such as atomTypes). This would be one of the first areas to explore with publishers. (2) Graphs and tables. A high proportion of graphs are functions of one of more dependent variables against one or more independent variables, currently supported by >. (3) Dictionaries. The CML dictionaries and OMDOC content dictionaries seem fairly similar in approach. (4) Mathematical relationships. A large area of physical science is based on theoretically and experimentally validated relationships which have been proved over many years (e.g. Maxwell's equations in thermodynamics). Often a quantity can be most easily determined by measuring different ones and transforming them. However most transformations are currently hidden in procedural non-portable code and it would be an exciting challenge to create a self-consistent declarative model of parts of physical science. It would be very exciting to have a discovery engine which could, on demand, decide which quantities were deducible from which (with similarity to theorem proving). A major challenge for distributed mathematics and science is discovery through search engines. These currently work on "free text" and are optimised to recognise strings. In a few cases domain-specific canonicalisations can be used (e.g. our Google Inchi [4] transforms a molecular graph into a string which is recognised by search engines). However most cases require mathematical operations (arithmetic, transformations, subgraph-matching, etc.). How - and where - can these be performed? A new generation of domain-independent and domain-specific indexing and searching tools needs to be developed. Recently CML has had to evolve a grammar to support fuzzy c
机译:E-Science(Cyber​​cience,Grid等)的开发开始成为具有正式的数据资源的现实,按需服务,域特定的搜索引擎,数字存储库等越来越多的STM信息将包含在复合XML文档中,代表科学通信(文章,论文,存储库条目等)。在化学,材料科学,工程,物理学,地球科学等物理科学中,这些“DATUSTUMES”[1]通常包含超文本,图形,表格,图形和数值数据,数学对象和关系。此外,它们还可含有特异性域含量,例如化学式和反应,热力学和机械性能,电磁,磁性和光学性质。在特定于域的语言中,CML(化学标记语言)是最古老,最广泛的,现在正在积极地用于由皇家化学学会(项目前景[2])发表,这使得可以了解数据库中的化学物质看起来像。 CML必须开发特定于域的对象(分子,原子,键,光谱,晶体学等)和它们之间的关系。然而,由于早期XML的基于文本的性质,它还必须设计一个独立于域的独立基础设施,可以支持大部分物理科学。最初称为STMML [3]它支持数据类型(浮点,整数,复杂等),数据结构(阵列,列表,矩阵等),几何概念(点,平面,线条等)和科学单位测量。另外,CML基于它的大部分灵活性,一个用户创建的词典(本体),它是从数据库中的对象中的超链接到的超链接。现在清楚的是,CML的域独立部分(以及通过扩展物理科学中的其他一些标记语言)是与MathML和OMDoc中的方法松散的同构。如果可以找到合成,那么CML可以愉快地忘记“非化学”,知道数学和物理科学界具有一般的前途。在最简单的顺序中,建议以下提出以下内容:(1)化学文档中的数学变量和方程。显而易见的挑战是变量代表类型,通常是物理量(但也是atomType等化学物体)。这将是第一个探索出版商的领域之一。 (2)图形和表格。高比例图是对一个或多个独立变量的一个或多个独立变量中的一个的函数,目前由>支持。 (3)词典。 CML词典和OMDOC内容词典似乎相同的方法。 (4)数学关系。大面积的物理科学是基于经过理论上和实验验证的关系,这些关系已经过多年(例如,热力学中的Maxwell方程)。通常可以通过测量不同的数量来最容易地确定并改变它们。然而,大多数转换目前都隐藏在程序不便携式代码中,创建物理科学部分的自我一致声明模型是一个激动人心的挑战。具有一个发现引擎,可以根据需要确定哪些数量从哪个数量(具有与定理的定理相似)来说是非常令人兴奋的。分布式数学和科学的一项重大挑战是通过搜索引擎发现的。这些目前正在处理“自由文本”,并优化以识别字符串。在少数情况下,可以使用域特定的Canonication(例如,我们的Google inchi [4]将分子图转换为由搜索引擎识别的字符串)。然而,大多数情况需要数学操作(算术,转换,子图等)。这些如何 - 和哪里 - 可以执行这些吗?需要开发新一代独立的域和域特定的索引和搜索工具。最近CML必须发展一个语法来支持模糊C.

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号