Graph attention networks stack self-attention layers to compute the neighbor-specific weights. Due to inherent noise and artificially correlated dimensions, attention scores fail to create optimal linear combinations for feature aggregation from the neighborhood. Multiple attention heads solve the problem to an extent but at the cost of additional memory overhead and larger variance in results. In this work, we introduce a novel concept of computing attention scores using a low-rank approximation of the intended neighborhood. The sub-space feature representation of the neighborhood discards the adverse effect of noise and artificially correlated dimensions. Extensive experiments on graph datasets show that the proposed framework outperforms the state-of-the-art methods. The reduced variance in our metrics Kruskal-Wallis test also indicates that the proposed model is able to give stable results as compared to other state-of-the-art methods.
展开▼