Unsupervised Structralization Method of Merchandise Attributes in Chinese

doi:10.13328/j.cnki.jos.005018

微信服务号

微信订阅号

Home > Archive>Volume 28, Issue 2, 2017 >262-277. DOI:10.13328/j.cnki.jos.005018

PDF HTML XML Export Cite reminder

Unsupervised Structralization Method of Merchandise Attributes in Chinese
DOI:
                        10.13328/j.cnki.jos.005018
                    
Author:
                        
                        
                    
Affiliation:
Clc Number:
Fund Project:National Program on Key Basic Research Project of China (973) (2012CB316203); National Natural Science Foundation of China (61332006, 61472321); Northwestern Polytechnical University Foundation for Fundamental Research (3102014JSJ0013, 3102014JSJ0005)

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Extracting attribute names and values from textual product descriptions is important for many e-business applications such as user demand forecasting and product comparison and recommendation. The existing approaches first use supervised or semi-supervised classification techniques to extract attribute names and values, and then match them by analyzing their grammatical dependency. However, those methods have following limitations:(1) They require human intervention to label some attributes, values and the matching relationship between them; (2) The matching accuracy may be greatly affected by language habits, semantic logic, and the quality of corpus and candidates sets. To address these issues, this paper proposes an unsupervised approach for attribute name and value extraction and matching in Chinese textual merchandise descriptions. Taking advantage of search engine, it extracts the candidate set of attribute names with respect to a value by analyzing grammatical relation based on the principle of small probability event. A new algorithm for computing the matching probability between attribute names and values is also designed based on relative conditional deselect probability and Page Rank. The proposed approach can effectively extract attribute names and values from Chinese textual merchandise descriptions and match them without any human intervention, no matter whether the attribute name appears in the textual description or not. Finally, the performance of the proposed approach is evaluated on the textual descriptions of 4 types of merchandise using the search engine of Baidu. The experimental results show that the new approach for attribute name extraction can improve recall by 20%, compared with the approach of directly extracting attribute names from textual descriptions. Moreover, the new approach achieves considerably higher matching accuracy (above 30% if measured by the percentage of rank-1, above 0.3 if measured by MRR) than the existing techniques based on grammatical dependency analysis for non-quantization attributes.

Reference

Cited by

Get Citation

侯博议,陈群,杨婧颖,李战怀.无监督的中文商品属性结构化方法.软件学报,2017,28(2):262-277

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:August 15,2015
Revised:December 02,2015
Adopted:
Online: January 24,2017
Published:

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

Article Metrics

History