虽然Takagi-Sugeno-Kang(TSK)模糊分类器在一些重要场合已经取得了广泛应用,但如何提高其分类性能和增强其可解释性,仍然是目前的研究热点.提出一种随机划分与组合特征且规则具有高可解释性的深度TSK模糊分类器(RCC-DTSK-C),但和其他分类器构造不同的是:(1)RCC-DTSK-C由很多基训练单元构成,这些基训练单元可以被独立训练;(2)每一个基训练单元的隐含层通过模糊规则的可解释性来表达,而这些模糊规则又是通过随机划分、随机组合来进行特征选择的;(3)基于栈式结构理论,源数据集作为相同的输入空间被映射到每一个独立的基训练单元中,这样就有效地保证了源数据的所有特征在每一个独立的训练单元中都得以保留.实验结果表明,RCC-DTSK-C具有良好的分类性能和可解释性.
Although Takagi-Sugeno-Kang (TSK) is widely used in practically every profession, how to enhance its classification accuracy and interpretability is still a research focus. In this study, a deep TSK fuzzy classifier is proposed. This classifier (i.e., RCC-DTSK-C) can randomly select features and combine features and own triplely concise interpretability for fuzzy rules. There are several other varieties of RCC-DTSK-C such as reasonable structure for rule representation, namely, (1) the proposed RCC-DTSK-C consists of many base-training units and each base-training unit can be trained independently. According to the principle of stacked generalization, the input of the next base-training unit consists of the training set and random result obtained from random projections about prediction results of current base-training unit. (2) In RCC-DTSK-C, the hidden layer of each base-training unit is represented by triplely concise interpretable fuzzy rules which are in the sense of randomly selected features. These features are selected by dividing into the not-fixed several fuzzy partitions and randomly combining rules and keeping the same input space in every base-training unit. (3) The source data set is mapped into each of the independent base-training units as the same input space, which effectively ensures that all the features of the source data are preserved in each separate training unit. The extensive experimental results show RCC-DTSK-C can achieve the enhanced classification performance and triplely concise interpretability for fuzzy rules.
模糊系统作为智能计算领域的一个及其重要的研究分支, 由于其自身较强的可解释性和学习能力而被广泛地应用到多个领域[
我们所提出的模糊分类器RCC-DTSK-C类似于层次状的TSK模糊分类器, 但是有本质的区别, 它能有效地避免TSK模糊分类器所面临的巨大挑战.这一点在文后有详细的报道.
深度学习理论已经成为当前研究的热点, 它在很多领域都取得了成功[
本文利用栈式结构[
本文基于以下几种考虑构造了可解释性很强的TSK模糊分类器.
(1) 模糊划分数不确定, 完全随机生成, 比如随机生成3个模糊划分, 对应高斯隶属函数中心点为[0, 0.5, 1], 其语义表示为{差, 中等, 好}; 再如随机生成5个模糊划分, 对应高斯隶属函数中心点为[0, 0.25, 0.5, 0.75, 1], 其语义表示为{很差, 差, 中等, 好, 非常好};
(2) 随机选取源数据集中的部分或者大部分特征数据;
(3) 每个基训练单元中的模糊分类器拥有相同的输入空间;
(4) 由于0阶TSK模糊分类器的输出结果是常数, 对于系统易于分析和表达, 本文将以0阶TSK模糊系统为基础训练模型, 探讨随机模糊划分和规则组合的深度0阶TSK模糊系统的建模方法.
根据文献[
这里,
在反模糊化之前, 若采用重心反模糊化操作, 则最终的输出
经典模糊系统模型可以分为TSK模糊系统、ML模糊系统和GFM模糊系统.对于ML模糊系统, 其模糊规则表示为
对于GFM模糊系统, 其模糊规则表示为
其中,
根据文献[
然后, 经过相应的操作和去模糊化处理, 即可得到0阶TSK模糊系统输出[
其中,
如果隶属函数采用高斯函数, 公式(8)中的
这里, 参数
在公式(11)和公式(12)中,
根据文献[
即0阶TSK的输出
以上描述可参见文献[
通常情况下, 0阶TSK模糊分类器的分类性能比1阶TSK差.但是1阶TSK模糊分类器很难对每条模糊规则下的(
这里, 我们提出一个栈式结构的深度模糊学习模型RCC-DTSK-C.该模型利用0-阶TSK模糊分类器作为一个基训练模块.
为了更方便解释基训练单元的实现机制, 此处以单输出0阶TSK模糊分类器为例(如
单输出0阶TSK模糊分类器对应的基训练单元
Base training unit corresponding to single output 0-order TSK fuzzy classifier
我们的工作过程分为以下几步.
(1) 直接采用
(2) 随机生成特征选择矩阵
(3) 随机生成规则组合矩阵
通过以上分析, 在这种0阶TSK模糊分类器中所有规则可改写为(以5维输入空间的第
其中, “Can be ignored”表示当前这一维特征丢弃(未被选中).
关于本文为什么要使用栈式结构, 后文有详细的解释, 此处只介绍栈式结构的构成.
单输出的RCC-DTSK-C结构
Structure of RCC-DTSK-C with single output
其中,
近年来报道了很多关于TSK模糊模型及其应用工作[
通常情况下, 我们考虑有
我们这里提出的深度0阶TSK模糊分类器RCC-DTSK-C是深度学习的又一次尝试.该分类器构建机制类似于多层的极限学习机ELM, 但是有本质的不同.在RCC-DTSK-C中, 把模糊规则映射到多层TSK的每一个隐含层中.
与随机构建TSK的策略类似, RCC-DTSK-C随机分配了高斯函数的标准差、随机规则组合矩阵
If
该条规则就可以重新写成:
If
对于RCC-DTSK-C, 随机输入规则数(隐含层节点数), 输出
对于一个含有
根据第2.3节的描述, 本节给出RCC-DTSK-C训练算法, 具体训练步骤如下.
输入和输出:
输入:训练集
输出:预测函数以及每个基训练单元的模糊规则.
初始化:
随机选择模糊规则数
随机生成
训练过程:
1: 初始化第
2: 模糊化成
3: 生成
4: 对所有训练数据(
5: 对所有的输入属性
6: 在一个规则中, 计算输入属性的值
➢ 当
➢ 当
其中,
7: 根据步骤6中的
8: 构造规则层输出矩阵
9: 计算输出权重
10:计算整个输出矩阵
11:生成随机投影矩阵
12:根据第2.2节栈式结构的描述计算
13:
预测输出:
输出预测函数
根据
● 算法分析1
算法1中提出的规则组合矩阵
● 算法分析2
对于算法中表达式
● 算法分析3
下面分析算法的时间复杂度.我们首先分析第
其中,
表 1所罗列的6个数据集[
数据集
Datasets
数据集 | 训练样本数 | 测试样本数 | 特征数 | 类别数 |
Balloons (BAL) | 57 | 19 | 5 | 2 |
Climate-Model-Simulation-Crashes (CLI) | 405 | 135 | 21 | 2 |
Airline (AIR) | 300 000 | 100 000 | 29 | 2 |
Balance-Scale (BAS) | 469 | 156 | 5 | 3 |
Abalone (ABA) | 3 133 | 1 044 | 9 | 3 |
Yeast (YEA) | 1 113 | 371 | 9 | 10 |
我们知道, 虽然有很多不同的分类器被开发出来, 比如BP神经网络和支持向量机, 但是我们这里采用常见的0阶和1阶TSK模糊分类器[
a) 训练计算量小:大多数非模糊深度分类器训练通常需要很多次迭代, 无疑会增加训练的计算量; 而RCC-DTSK-C在训练过程中无需迭代, 极大地提高了训练效率.
b) 无需大量训练样本:大多数非模糊深度分类器在很大程度上要求大量的训练样本, 而RCC-DTSK-C在训练过程中只需要随机挑选部分样本数据即可.
c) 训练结果具有强的可解释性:大多数非模糊深度分类器通常输出结果难以解释, 而RCC-DTSK-C的输出具有强的可解释性.
下面我们列出这几种分类器各自的参数设置.因为0阶TSK和1阶TSK模糊分类器都用到模糊聚类方法(fuzzy c-means, 简称FCM)和SVM, 所以先介绍FCM和SVM的参数设置.SVM的正则化参数设置通过网格搜索从0.01到100, 步长是0.1, FCM中的聚类数和模糊规则数相等, 尺度参数
由于对输入特征和模糊隶属函数都是随机选择的, 那么对于一个数据集而言, RCC-DTSK-C的结构就有多种组合.对每个数据集, 我们稍微改变每层规则数, 并同时运行10次, 取平均值, 得到了平均模糊规则数、平均训练精度/平均测试精度、平均训练时间/平均测试时间.最后, 我们也列出了所有数据集的平均模糊规则数、平均训练精度/平均测试精度.结果见
平均模糊规则数和平均分类精度
Average number of fuzzy rules and average classification accuracies
数据集 | BAL | CLI | AIR | BAS | ABA | YEA | Mean | ||
0阶TSK |
规则数 | 2.50 | 5.00 | - | 15.25 | 20.50 | 16.00 | 11.85 | |
训练 | 43.86(1.24) | 91.60(0.69) | - | 56.08(6.63) | 55.54(0.75) | 52.85(2.50) | 59.99(1.89) | ||
测试 | 52.63(3.72) | 91.11(2.09) | - | 50.64(4.08) | 52.96(0.25) | 48.64(2.50) | 59.20(2.53) | ||
1阶TSK |
规则数 | 2.50 | 4.50 | - | 12.00 | 15.50 | 12.25 | 9.35 | |
训练 | 49.12(7.44) | 98.27(0) | - | 91.47(0.90) | 56.43(0.23) | 64.43(1.21) | 71.94(1.96) | ||
测试 | 68.42(0) | 92.59(1.57) | - | 88.46(3.17) | 54.11(2.72) | 56.52(3.66) | 72.02(2.22) | ||
FURIA | 规则数 | 3.00 | 7.50 | - | 16.50 | 17.50 | 16.50 | 12.2 | |
训练 | 74.36(0.04) | 99.07(0) | - | 88.95(0) | 56.21(0) | 64.01(0.03) | 76.52(0.01) | ||
测试 | 68.42(0.05) | 91.85(0.02) | - | 83.36(0.01) | 54.59(0.02) | 58.89(0.04) | 71.42(0.03) | ||
C4.5 | 规则数 | - | - | - | - | - | - | - | |
训练 | 77.63(0.01) | 99.07(0) | - | 89.91(0.01) | 56.77(0.01) | 79.31(0.01) | 80.54(0.01) | ||
测试 | 61.84(0.01) | 91.66(0.02) | - | 78.40(0.01) | 56.01(0.01) | 55.86(0.02) | 68.75(0.01) | ||
RCC-DTSK-C | 规则数 | 2.25 | 3.75 | 300.00 | 8.75 | 21.50 | 11.25 | 9.50 | |
训练 | 80.63(2.15) | 99.20(0.41) | 65.61(1.13) | 91.98(1.09) | 57.81(0.69) | 74.58(2.61) | 80.84(1.39) | ||
测试 | 77.50(1.26) | 92.83(1.28) | 61.78(2.11) | 90.73(2.00) | 56.32(2.36) | 61.52(1.91) | 75.78(1.76) |
平均训练时间和测试时间
Average training time and test time
数据集 | BAL | CLI | AIR | BAS | ABA | YEA | Mean | ||
0阶TSK |
训练 | 0.05(0.01) | 1.09(0.09) | - | 0.16(0) | 5.27(0.58) | 2.41(0.50) | 1.80(0.24) | |
测试 | 0 | 0.02 | - | 0.03 | 0.57 | 0.07 | 0.14 | ||
1阶TSK |
训练 | 0.06(0.01) | 6.56(0.57) | - | 0.14(0) | 36.13(0.60) | 8.70(0.35) | 10.32(0.31) | |
测试 | 0 | 0.08 | - | 0 | 4.56 | 0.45 | 1.02 | ||
FURIA | 训练 | - | - | - | - | - | - | - | |
测试 | - | - | - | - | - | - | - | ||
C4.5 | 训练 | - | - | - | - | - | - | - | |
测试 | - | - | - | - | - | - | - | ||
RCC-DTSK-C | 训练 | 0.05(0.01) | 2.52(0.17) | 2.09e+04(66.59) | 0.43(0.01) | 17.81(1.29) | 6.95(0.52) | 5.55(0.4) | |
测试 | 0 | 0.04 | 4.78e+03 | 0.01 | 2.92 | 0.44 | 0.77 |
根据
接下来, 我们研究RCC-DTSK-C随着层数变化而引起性能的变化.
RCC-DTSK-C对于不同层的训练精度和测试精度
Training accuracies and test accuracies of RCC-DTSK-C for different layers
数据集 | 第1层 | 第2层 | 第3层 | 第4层 | ||||
训练 | 测试 | 训练 | 测试 | 训练 | 测试 | 训练 | 测试 | |
BAL | 74.23(1.23) | 69.30(1.17) | 75.27(2.03) | 72.15(1.91) | 80.92(1.82) | 77.95(1.28) | - | - |
CLI | 99.17(0.02) | 91.63(1.02) | 99.24(0.26) | 92.98(1.25) | - | - | - | - |
AIR | 63.55(1.27) | 60.85(0.42) | 65.91(0.92) | 62.42(1.02) | 65.98(1.01) | 62.35(2.52) | - | - |
BAS | 87.95(0.98) | 79.62(0.62) | 89.85(0.47) | 77.95(1.52) | - | - | - | - |
ABA | 56.61(0.98) | 50.93(0.53) | 58.91(0.29) | 58.18(1.08) | - | - | - | - |
YEA | 76.44(0.33) | 75.27(0.59) | 76.85(0.47) | 71.95(1.52) | 76.99(1.08) | 74..86(2.52) | - | - |
为了更好地描述RCC-DTSK-C的可解释性, 我们记录了当RCC-DTSK-C在每个数据集取得最好的精度时对应的结构.RCC-DTSK-C规则结构的表示形式为“第1层模糊规则数-第2层模糊规则数-…-第
限于文章篇幅, 我们这里以数据集BAL为例进一步展示RCC-DTSK-C的可解释性.由于RCC-DTSK-C的可解释性与RCC-DTSK-C的相应结构和模糊规则有关, 在前面的实验中, RCC-DTSK-C在数据集BAL运行的最好精度是80.92%, 其对应的结构是4-3-2.为了方便观察模糊规则的可解释性, 我们取5个模糊划分数, 且在RCC-DTSK-C获得的所有模糊规则中提取了前4个规则, 然后在
规则展示
Rule presentation
规则 |
IF | THEN output |
||||
特征1 | 特征2 | 特征3 | 特征4 | 特征5 | is | |
规则1 | low | medium | very high | high | Can be ignored | 0.098(+1) |
规则2 | very low | low | Medium | Can be ignored | Can be ignored | -0.442(-1) |
规则3 | medium | very high | High | very low | Can be ignored | 0.706(+1) |
规则4 | Can be ignored | very high | very low | high | medium | -0.529(-1) |
IF feature_1 is low
AND feature_2 is medium
AND feature_3 is very high
AND feature_4 is high
AND feature_5 is Can be ignored
THEN
其中,
很明显, 这种模糊规则具有很高的可解释性.
为了对RCC-DTSK-C的可解释性进行更深入的研究,
对于数据集BAL的4条规则展示
Four rules presentation for BAL dataset
Fuzzy rule |
||||||||
规则序号 | ||||||||
if-part parameters | then-part parameters | if-part parameters | then-part parameters | if-part parameters | then-part parameters | |||
1 | [-0.1029] | [-0.7812, 0.2638] |
[0.3539,
-0.2019, 4852] |
|||||
2 | [-0.3351] | [0.6668,
-0.2497] |
[-0.7953, 0.0058, 0.7013] |
|||||
3 | [-0.7068] | [0.6533,
-0.0201] |
[-0.6652, 0.0478, 0.0680] |
|||||
4 | [-0.9523] | [-0.4841,
-0.6692] |
[0.8792, 0.6530, 0.9008] |
|||||
output |
Milton Friedman[
非参数统计分析
Nonparametric statistical analysis
本文通过栈式结构原理, 以提高分类性能和较强的可解释性为目的, 提出一种深度TSK模糊分类器RCC-DTSK-C.RCC-DTSK-C以栈式方式构建, 提出随机选取特征, 不固定模糊划分和随机规则组合, 生成每一个basetraining中的模糊规则.在RCC-DTSK-C的第1层和其他隐含层中始终保持相同的数据空间, 使得每个隐含层的每个特征仍然保持与输入层相同的物理意义.我们对所有数据集的实证结果表明, RCC-DTSK-C在分类性能上明显优于其他几种分类器.更重要的是, 通过对数据集BAL的进一步研究发现, RCC-DTSK-C还具有较强的可解释性.
Deng ZH, Choi KS, Chung FL,
Jiang YZ, Deng ZH, Wang ST. 0 order
蒋亦樟, 邓赵红, 王士同.0阶
Jiang YZ, Chung FL, Ishibuchi H,
Fadali S, Jafarzadeh S. TSK observers for discrete type-1 and type-2 fuzzy systems. IEEE Trans. on Fuzzy Systems, 2014, 22(2): 451-458.
Jiang YZ, Chung FL, Ishibuchi H, Deng ZH, Wang ST. Multitask TSK fuzzy system modeling by mining intertask common hidden structure. IEEE Trans. on Cybernetics, 2015, 45(3):534-547.
Mitrakis NE, Theocharis JB. A diversity-driven structure learning algorithm for building hierarchical neuro-fuzzy classifiers. Information Sciences, 2012, 186(1):40-58.
Chen Y, Wang DZ, Tong SC. Forecasting studies by designing Mamdani interval type-2 fuzzy logic systems: With the combination of BP algorithms and KM algorithms. Neurocomputing, 2016, 174(22):1133-1146.
Zhou SM, Gan JQ. Constructing
Chung FL, Duan JC. On multistage fuzzy neural network modeling. IEEE Trans. on Fuzzy Systems, 2000, 8(2):25-142.
Mantas CJ, Puche JM. Artificial neural networks are zero-order TSK fuzzy systems. IEEE Trans. on Fuzzy Systems, 2008, 16(3): 630-643.
Alcala-Fdez J, Alcala R, Gonzalez S, Nojima Y, Garcia S. Evolutionary fuzzy rule-based methods for monotonic classification. IEEE Trans. on Fuzzy Systems, 2017, 25(6):1376-1390.
Li T, Wang ST. Zero-order TSK fuzzy classifier based on LLM for large-scale data sets. Control and Decision, 2017, 32(1):21-30 (in Chinese with English abstract).
李滔, 王士同.适合大规模数据集且基于LLM的0阶TSK模糊分类器.控制与决策, 2017, 32(1):21-30.
http://www.jos.org.cn/1000-9825/14/429.htm]]>
http://www.jos.org.cn/1000-9825/14/429.htm]]>
Xu ML, Wang ST. Extracting fuzzy rules from the maximum ball containing the homogeneous data. Journal of Electronics & Information Technology, 2017, 39(5):1130-1135 (in Chinese with English abstract).
徐明亮, 王士同.由最大同类球提取模糊分类规则.电子与信息学报, 2017, 39(5):1130-1135.
Zhai JH, Wang XZ, Zhang SF. Ensemble incomplete wavelet packet subspaces for face recognition based on fuzzy integral. Pattern Recognition and Artificial Intelligence, 2014, 27(9):794-801 (in Chinese with English abstract).
翟俊海, 王熙照, 张素芳.基于模糊积分的不完全小波包子空间集成人脸识别.模式识别与人工智能, 2014, 27(9):794-801.
Wang ST, Jiang YZ, Chung FL, Qian PJ. Feedforward kernel neural networks, generalized least learning machine, and its deep learning with application to image classification. Applied Soft Computing, 2015, 37:125-141.
Bell IE, Baranoski GVG. Reducing the dimensionality of plant spectral databases. IEEE Trans. on Genscience and Remote Sensing, 2004, 42(3):570-576.
Papa JP, Rosa GH, Pereira DR, Yang XS. Quaternion-based deep belief networks fine-tuning. Applied Soft Computing, 2017, 60: 328-335.
Li F, Zhang J, Shang C, Huang DX, Oko E, Wang MH. Modelling of a post-combustion CO2 capture process using deep belief network. Applied Thermal Engineering, 2018, 130:997-1003.
Chen CLP, Zhang CY, Chen L, Gan M. Fuzzy restricted Boltzmann machine for the enhancement of deep learning. IEEE Trans. on Fuzzy Systems, 2015, 23(6):2163-2173.
Long MS, Wang JM, Cao Y, Sun JG, Yu PS. Deep learning of transferable representation for scalable domain adaptation. IEEE Trans. on Knowledge and Data Engineering, 2016, 28(8):2027-2040.
Wu ZD, Wang YN, Zhang JW. Fouling and damaged fingerprint recognition based on deep learning. Journal of Electronics & Information Technology, 2017, 39(7):1585-1591 (in Chinese with English abstract).[doi: 10.11999/JEIT161121]
吴震东, 王雅妮, 章坚武.基于深度学习的污损指纹识别研究.电子与信息学报, 2017, 39(7):1585-1591.[doi: 10.11999/JEIT1611 21]
Liu Q, Zhai JW, Zhong S, Zhang ZC, Zhou Q, Zhang P. A deep recurrent q-network based on visual attention mechanism. Journal of Computers, 2017, 40(6):1353-1366 (in Chinese with English abstract).
刘全, 翟建伟, 钟珊, 章宗长, 周倩, 章鹏.一种基于视觉注意力机制的深度循环Q网络模型.计算机学报, 2017, 40(6):1353-1366.
Gao JY, Yang XS, Zhang TZ, Xu CS. Robust visual tracking method via deep learning. Chinese Journal of Computers, 2016, 39(7): 1419-1434 (in Chinese with English abstract).
高君宇, 杨小汕, 张天柱, 徐常胜.基于深度学习的鲁棒性视觉跟踪方法.计算机学报, 2016, 39(7):1419-1434.
Li HY, Bi DY, Yang Y, Zha YF. Research on visual tracking algorithm based on deep feature expression and learning. Journal of Electronics & Information Technology, 2015, 37(9):2033-2039 (in Chinese with English abstract).
李寰宇, 毕笃彦, 杨源, 查宇飞.基于深度特征表达与学习的视觉跟踪算法研究.电子与信息学报, 2015, 37(9):2033-2039.
He YY, Li BQ. A combination form learning rate scheduling for deep learning model. Acta Electronica Sinica, 2016, 42(6):953-958 (in Chinese with English abstract).
贺昱曜, 李宝奇.一种组合型的深度学习模型学习率策略.自动化学报, 2016, 42(6):953-958.
Sun R, Zhang GH, Gao J. Pedestrian recognition method based on depth hierarchical feature representation. Journal of Electronics & Information Technology, 2016, 38(6):1528-1535 (in Chinese with English abstract).
孙锐, 张广海, 高隽.基于深度分层特征表示的行人识别方法.电子与信息学报, 2016, 38(6):1528-1535.
Anifowose F, Labadin J, Abdulraheem A. Improving the prediction of petroleum reservoir characterztion with a stacked generazation ensemble model of support vector machines. Applied Soft Computing, 2015, 26:483-496.
Juang CF, Chen TC, Cheng WY. Speedup of implementing fuzzy neural networks with high-dimensional inputs through parallel processing on graphic processing units. IEEE Trans. on Fuzzy Systems, 2011, 19(4):717-728.
Yuan YF, Zhuang HJ. A genetic algorithm for generating fuzzy classification rules. Fuzzy Sets and Systems, 1996, 84:1-19.
Leski J. TSK-Fuzzy modeling based on
Lin YY, Chang JY, Lin CT. A TSK-type-based self-evolving compensatory interval type-2 fuzzy neural network and its applications. IEEE Trans. on Industrial Electronics, 2014, 6(1):447-459.
Deng ZH, Cao LB, Jiang YZ, Wang ST. Minimax probability TSK fuzzy system classifier: A more transparent and highly interpretable classification model. IEEE Trans. on Fuzzy Systems, 2015, 23(4):813-826.
Zheng YJ, Ling HF, Chen SY, Xue JY. A hybrid neuro-fuzzy network based on differential biogeography-based optimization for online population classification in earthquakes. IEEE Trans. on Fuzzy Systems, 2014, 23(4):1070-1083.
Guenounou O, Dahhou B, Chabour F. TSK fuzzy model with minimal parameters. Applied Soft Computing, 2015, 30(2):748-757.
Bortolan G, Brohet C, Fusaro S. Possibilities of using neural networks for ECG classfication. Journal of Electrocardiology, 1995, 29: 10-16.
Valdes JJ. Extreme learning machines with heterogeneous data types. Neurocomputing, 2018, 280:38-52.
https://archive.ics.uci.edu/ml/datasets.html]]>
http://stat-computing.org/dataexpo/2009/]]>
Li T, Li J, Liu ZL, Li P, Jia CF. Differentially private Naïve Bayes learning over multiple data sources. Information Sciences, 2018, 444:89-104.