GT-4S: 基于图Transformer的场景草图语义分割

doi:10.13328/j.cnki.jos.007155

微信服务号

微信订阅号

首页 > 过刊浏览>年第卷第期 >1-15. DOI:10.13328/j.cnki.jos.007155

PDF HTML阅读 XML下载导出引用引用提醒

GT-4S: 基于图Transformer的场景草图语义分割
DOI:
                        10.13328/j.cnki.jos.007155
                    
作者:
                        
                        
                    
作者单位:
作者简介:
通讯作者:
中图分类号:TP391
基金项目:国家自然科学基金(62272447); 北京市自然科学基金-海淀原始创新联合基金(L222008)

GT-4S: Graph Transformer for Scene Sketch Semantic Segmentation

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

场景草图由多个前、背景物体组成, 能够直观、概括地表达复杂的语义信息, 在现实生活中有着广泛的实际应用, 逐渐成为计算机视觉和人机交互领域的研究热点之一. 作为场景草图语义理解的基础任务, 场景草图语义分割的相关研究相对较少, 现有的方法多是对自然图像语义分割的方法进行改进, 不能克服草图自身的稀疏性和抽象性等特点. 针对以上问题, 直接从草图笔画入手, 提出一种图Transformer模型结合草图笔画的时空信息来解决自由手绘场景草图语义分割任务. 首先将矢量场景草图构建成图结构, 笔画表示为图的节点, 笔画在时序和空间上的关联表示为图的边. 然后通过边增强的Transformer模块捕获笔画的时空全局上下文信息. 最后将编码后的时空特征进行多分类优化学习. 在SFSD场景草图数据集上的实验结果表明, 所提方法可以利用笔画时空信息对场景草图进行有效的语义分割, 实现优秀的性能.

Abstract:

The scene sketch is made up of multiple foreground and background objects, which can directly and generally express complex semantic information. It has a wide range of practical applications in real life and has gradually become one of the research hotspots in the field of computer vision and human-computer interaction. As the basic task of the semantic understanding of scene sketch, scene sketch semantic segmentation is rarely studied. Most of the existing methods are improved from the semantic segmentation of natural images, which cannot overcome the sparsity and abstraction of sketches. To solve the above problems, this study proposes a graph Transformer model directly from sketch strokes. The model combines the temporal-spatial information of sketch strokes to solve the semantic segmentation task of free-hand scene sketches. First, the vector scene sketch is constructed into a graph with strokes as the nodes of the graph and temporal and spatial correlations between strokes as the edges of the graph. The temporal-spatial global context information of the strokes is then captured by the edge-enhanced Transformer module. Finally, the encoded temporal-spatial features are optimized for multi-classification learning. The experimental results on the SFSD scene sketch dataset show that the proposed method can effectively segment scene sketches using stroke temporal-spatial information and achieve excellent performance.

参考文献

相似文献

引证文献

引用本文

张拯明,郭燕,马翠霞,邓小明,王宏安. GT-4S: 基于图Transformer的场景草图语义分割.软件学报,,():1-15

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2023-08-11
最后修改日期:2023-10-21
录用日期:
在线发布日期: 2024-05-08
出版日期:

微信服务号

微信订阅号

引用本文

分享

文章指标

历史