软件学报  2017, Vol. 28 Issue (6): 1547-1564 PDF

1. 南京航空航天大学 计算机科学与技术学院, 江苏 南京 210016;
2. 江苏省无线传感网高技术研究重点实验室(南京邮电大学), 江苏 南京 210003;
3. 南京邮电大学 计算机学院, 江苏 南京 210003

Survey on Matrix Completion Models and Algorithms
CHEN Lei1,2,3, CHEN Song-Can1
1. College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China;
2. Jiangsu High Technology Research Key Laboratory for Wireless Sensor Networks(Nanjing University of Posts and Telecommunications), Nanjing 210003, China;
3. School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
Foundation item: National Natural Science Foundation of China (61472186, 61572263, 61403208);National Science Foundation of Jiangsu Province (BK20161516, BK20151511);Postdoctoral Science Foundation of China (2015M581794);Project of Natural Science Research of Jiangsu University (15KJB520027);Postdoctoral Science Foundation of Jiangsu Province (1501023C); NUPTSF (NY214127, NY215097)
Abstract: In recent years, with the great success of compressed sensing (CS) in the field of signal processing, matrix completion (MC), derived from CS, has increasingly become a hot research topic in the field of machine learning. Many researchers have done a lot of fruitful studies on matrix completion problem modeling and their optimization, and constructed relatively complete matrix completion theory. In order to better grasp the development process of matrix completion, and facilitate the combination of matrix completion theory and engineering applications, this article reviews the existing matrix completion models and their algorithms. First, it introduces the natural evolution process from CS to MC, and illustrates that the development of CS theory has laid the foundation for the formation of MC theory. Second, the article summarizes the existing matrix completion models into the four classes from the perspective of the relaxation of non-convex and non-smooth rank function, aiming to provide reasonable solutions for specific matrix completion applications; Third, in order to understand the inherent optimization techniques and facilitate solving new problem-dependent matrix completion model, the article studies the representative optimization algorithms suitable for various matrix completion models. Finally, article analyzes the existing problems in current matrix completion technology, proposes possible solutions for these problems, and discusses the future work.
Key words: sparse learning     matrix completion     compressed sensing     matrix decomposition     stochastic optimization

1 矩阵补全溯源

 $\mathop {\min }\limits_{x \in {\mathbb{R}^n}} ||x|{|_0}{\text{ s}}{\text{.t}}{\text{. }}Ax = y$ (1)

 $\mathop {\min }\limits_{x \in {\mathbb{R}^n}} ||x|{|_1}{\text{ s}}{\text{.t}}{\text{. }}Ax = y$ (2)

 $\mathop {\min }\limits_{x \in {\mathbb{R}^n}} ||x|{|_0}{\text{ s}}{\text{.t}}{\text{. }}||Ax - y|{|_2} \leqslant \varepsilon$ (3)

 $\mathop {\min }\limits_{x \in {\mathbb{R}^n}} ||Ax - y||_2^2{\text{ s}}{\text{.t}}{\text{. }}||x|{|_1} \leqslant \varepsilon$ (4)

 $\mathop {\min }\limits_{x \in {\mathbb{R}^n}} ||x|{|_1} + \frac{\lambda }{2}||Ax - y||_2^2$ (5)

 $\mathop {\min }\limits_{X \in {\mathbb{R}^{{n_1} \times {n_2}}}} rank(X){\text{ s}}{\text{.t}}{\text{. }}{P_\Omega }(M) = {P_\Omega }(X){\text{ }}$ (6)

 ${[{P_\Omega }(M)]_{ij}} = \left\{ {\begin{array}{*{20}{l}} {{M_{ij}},{\text{ if }}(i,j) \in \Omega } \\ {0,{\text{ otherwise}}} \end{array}} \right.$ (7)

 $\mathop {\min }\limits_{X \in {\mathbb{R}^{{n_1} \times {n_2}}}} rank(X){\text{ s}}{\text{.t}}{\text{. }}||{P_\Omega }(M - X)|{|_F} \leqslant \delta$ (8)

2 矩阵补全模型

2.1 基于核范数松弛的矩阵补全模型

Fazel在文献[46]中证明了矩阵核范数是秩函数在矩阵谱范数意义下单位球上的最佳凸逼近.因此, 类似于压缩感知理论中常用的将向量L0范数松弛为向量L1范数的技巧, 为了使标准矩阵补全问题易于求解, 一个自然想法就是利用凸核范数代替非凸秩函数, 将标准的矩阵补全问题松弛为如下形式的凸约束优化模型:

 $\mathop {\min }\limits_{X \in {\mathbb{R}^{{n_1} \times {n_2}}}} ||X|{|_*}{\text{ s}}{\text{.t}}{\text{. }}{P_\Omega }(M) = {P_\Omega }(X)$ (9)

 $\mathop {\min }\limits_{X \in {\mathbb{R}^{{n_1} \times {n_2}}}} \lambda ||X|{|_ * } + \frac{1}{2}||{P_\Omega }(M) - {P_\Omega }(X)||_F^2$ (10)

 $\mathop {\min }\limits_{X \in {\mathbb{R}^{{n_1} \times {n_2}}}} \lambda ||X|{|_*} + \frac{1}{2}||X||_F^2{\text{ s}}{\text{.t}}{\text{. }}{P_\Omega }(M) = {P_\Omega }(X)$ (11)

 $\mathop {\min }\limits_{X \in {\mathbb{R}^{{n_1} \times {n_2}}}} ||X|{|_ * }{\text{ s}}{\text{.t}}{\text{. }}X + E = M,{P_\Omega }(E) = 0$ (12)

 $\mathop {\min }\limits_{X \in {\mathbb{R}^{{n_1} \times {n_2}}}} ||X|{|_r}{\text{ s}}{\text{.t}}{\text{. }}{P_\Omega }(M) = {P_\Omega }(X)$ (13)

2.2 基于矩阵分解的矩阵补全模型

 $\mathop {\min }\limits_{U \in {\mathbb{R}^{{n_1} \times k}},V \in {\mathbb{R}^{k \times {n_2}}},Z \in {\mathbb{R}^{{n_1} \times {n_2}}}} ||UV - Z||_F^2{\text{ s}}{\text{.t}}{\text{. }}{P_\Omega }(Z) = {P_\Omega }(M)$ (14)

 $\mathop {\min }\limits_{L \in {\mathbb{R}^{{n_1} \times k}},Q \in {\mathbb{R}^{k \times {n_2}}}} \left\{ {\lambda ||Q|{|_ * } + \frac{1}{2}||{P_\Omega }(LQ - W)||_F^2,{\text{ s}}{\text{.t}}{\text{. }}{L^T}L = I} \right\}$ (15)

 $||X|{|_*} = \mathop {\min }\limits_{L \in {\mathbb{R}^{{n_1} \times k}},Q \in {\mathbb{R}^{{n_2} \times k}}} \frac{1}{2}\{ ||L||_F^2 + ||Q||_F^2{\text{ s}}{\text{.t}}{\text{. }}X = L{Q^T}\}$ (16)

 $\mathop {\min }\limits_{L \in {\mathbb{R}^{{n_1} \times k}},Q \in {\mathbb{R}^{k \times {n_2}}}} \frac{1}{2}||{P_\Omega }(M - LQ)||_F^2 + \frac{\lambda }{2}(||L||_F^2 + ||Q||_F^2)$ (17)

 $\left\| {{P_\Omega }(M - LQ)} \right\| \leqslant \lambda$ (18)

2.3 基于非凸函数松弛的矩阵补全模型

 $\mathop {\min }\limits_{X \in {\mathbb{R}^{{n_1} \times {n_2}}}} \lambda ||X||_{{s_p}}^P + ||{P_\Omega }(M - X)||_P^P$ (19)

 ${f_\delta }({s_i}) = \exp ( - s_i^2/2{\delta ^2})$ (20)

 $\mathop {\min }\limits_{X \in {\mathbb{R}^{{n_1} \times {n_2}}}} \sum\nolimits_{i = 1}^{\min \{ {n_1},{n_2}\} } {{f_\delta }({\sigma _i}(X)} ){\text{ s}}{\text{.t}}{\text{. }}{P_\Omega }(M) = {P_\Omega }(X)$ (21)

Ghasemi等人充分利用高斯函数可微的性质, 采用梯度投影算法进行求解.受Ghasemi等人的启发, Geng等人[54]提出采用光滑双曲正切函数:

 ${f_\delta }({s_i}) = \frac{{\exp (s_i^2/2{\delta ^2}) - \exp ( - s_i^2/2{\delta ^2})}}{{\exp (s_i^2/2{\delta ^2}) + \exp ( - s_i^2/2{\delta ^2})}}$ (22)

2.4 其他类型的矩阵补全模型

 $X = R(\theta ) = \sum {{\theta _i}{R_i}}$ (23)

 $\mathop {\min }\limits_{\theta ,R} ||\theta |{|_{\text{0}}}{\text{ s}}{\text{.t}}{\text{. }}{P_\Omega }(M) = {P_\Omega }(R(\theta ))$ (24)

Wang等人很自然地将该问题进一步转换为噪声模型:

 $\mathop {\min }\limits_{\theta ,R} ||{P_\Omega }(R(\theta )) - {P_\Omega }(M)||_F^2{\text{ s}}{\text{.t}}{\text{. }}||\theta |{|_0} \leqslant r$ (25)

2.5 各类矩阵补全模型优缺点分析

Table 1 Application scenerios, advantages and disadvantages analysis of four kinds of matrix completion models 表 1 4类矩阵补全模型适用场景及优缺点分析

2.6 正则化技术在矩阵补全模型中的应用

3 矩阵补全模型优化算法

 $\left( {{\text{F}}1} \right):\mathop {\min }\limits_{{X_1},...,{X_m} \in {\mathbb{R}^{{n_1} \times {n_2}}}} J({X_1},...,{X_m}){\text{ s}}{\text{.t}}{\text{. }}C({X_1},...,{X_m}) = 0$ (26)

 $\left( {{\text{F2}}} \right):\mathop {\min }\limits_{{X_1},...,{X_m} \in {\mathbb{R}^{{n_1} \times {n_2}}}} F({X_1},...,{X_m}) = J({X_1},...,{X_m}) + H({X_1},...,{X_m})$ (27)

 $\left( {{\text{F3}}} \right):\mathop {\min }\limits_{X = ({X_1},...,{X_m}) \in {\mathbb{R}^{m \times n}}} F(X) = J(X) + H(X)$ (28)

3.1 近邻梯度下降算法

 $\mathop {\min }\limits_{X \in {\mathbb{R}^{m \times n}}} J(X) + H(X)$ (29)

 ${\left\| {\nabla J\left( X \right) - \nabla J\left( Y \right)} \right\|_F} \leqslant {L_J}{\left\| {X - Y} \right\|_F},\forall X,Y \in {\mathbb{R}^{m \times n}}$ (30)

 ${X^{k + 1}} = pro{x_{\delta J}}({X^k} - \delta \nabla H({X^k})) = \arg \mathop {\min }\limits_{X \in {\mathbb{R}^{{n_1} \times {n_2}}}} \delta J(X) + \frac{1}{2}||X - ({X^k} - \delta \nabla H({X^k}))||_F^2$ (31)

Input: maximum iteration time N;

Output: Xopt.

Initialize X0=0;

for k=0, …, N do

${X^{k + 1}} = pro{x_{\delta J}}({X^k} - \delta \nabla g({X^k})) = \arg \mathop {\min }\limits_{X \in {\mathbb{R}^{m \times n}}} \left\{ {\delta J(X) + \frac{1}{2}||X - ({X^k} - \delta \nabla H({X^k}))||_F^2} \right\}$;

if stopping criterion is satisfied then

return Xk+1

end if

end for

Input: maximum iteration time N;

Output: Xopt.

Initialize X0=Y0=0, t1=1;

for k=0, …, N do

$\begin{gathered} {X^{k + 1}} = pro{x_{\delta J}}({Y^k} -\delta \nabla g({Y^k})) = \arg \mathop {\min }\limits_{X \in {\mathbb{R}^{m \times n}}} \left\{ {\delta J(X) + \frac{1}{2}||X -({Y^k} -\delta \nabla H({Y^k}))||_F^2} \right\}; \hfill \\ {t_{k + 1}} = (1 + \sqrt {1 + 4t_k^2})/2; \hfill \\ {Y^{k + 1}} = {X^{k + 1}} + \frac{{{t_k} -1}}{{{t_{k + 1}}}}({X^{k + 1}} -{X^k}); \hfill \\ \end{gathered}$

if stopping criterion is satisfied then

return Xk+1

end if

end for

 $\mathop {\min }\limits_{X \in {\mathbb{R}^{m \times n}}} J(X){\text{ s}}{\text{.t}}{\text{. }}AX = B$ (32)

 $\mathop {\min }\limits_{X \in {\mathbb{R}^{m \times n}}} J(X) + \frac{\lambda }{2}||AX - B||_F^2$ (33)

3.2 分块坐标下降算法

Input: maximum iteration time N;

Output: (X1, …, Xm)opt.

Initialization: choose $(X_1^0,...,X_m^0)$;

for k=1, 2, …, N do

for i=1, 2, …, m do

$X_i^k = \mathop {\arg \min }\limits_{{X_i}} F(X_{ ＜ i}^k,{X_i},X_{ > i}^{k - 1})$;

end for

if stopping criterion is satisfied then

return $(X_1^k,...,X_m^k)$

end if

end for

3.3 矩阵空间Bregman迭代算法

Bregman迭代算法起源于凸分析中一种寻求等式约束下目标函数极值的优化方法[64], Osher等人[65]于2005年首先将该方法应用于全变分图像复原, 随之被拓展应用到压缩感知[66]、图像去噪[67]等领域, 已成为求解低秩矩阵恢复问题的有效方法之一.

 $D_J^P(X,\hat X) = J(X) - J(\hat X) - \langle P,X - \hat X\rangle$ (34)

 $\mathop {\min }\limits_{X \in {\mathbb{R}^{m \times n}}} J(X) + H(X)$ (35)

 ${X^{k + 1}} = \mathop {\arg \min }\limits_{X \in {\mathbb{R}^{m \times n}}} D_J^{{P^k}}(X,{X^k}) + H(X) = \mathop {\arg \min }\limits_{X \in {\mathbb{R}^{m \times n}}} J(X) - \langle {P^k},X\rangle + H(X)$ (36)

 $0 \in \partial J\left( {{X^{k + 1}}} \right) - {P^k} + \nabla H\left( {{X^{k + 1}}} \right)$ (37)

 ${P^{k + 1}} = {P^k} - \nabla H\left( {{X^{k + 1}}} \right)$ (38)

 $\left\{ {\begin{array}{*{20}{l}} {{X^{k + 1}} = \mathop {\arg \min }\limits_{X \in {\mathbb{R}^{m \times n}}} J(X) - \langle {P^k},X\rangle + H(X)} \\ {{P^{k + 1}} = {P^k} - \nabla H({X^{k + 1}})} \end{array}} \right.$ (39)

Input: maximum iteration time N;

Output: Xopt.

Initialization: choose X0=P0=0;

for k=0, 1, …, N do

${X^{k + 1}} = \mathop {\arg \min }\limits_{X \in {\mathbb{R}^{{n_1} \times {n_2}}}} J(X) - \langle {P^k},X\rangle + H(X)$

${P^{k + 1}} = {P^k} - \nabla H\left( {{X^{k + 1}}} \right)$

if stopping criterion is satisfied then

return Xk+1

end if

end for

3.4 交替方向乘子法

 $\mathop {\min }\limits_{X \in {\mathbb{R}^{m \times n}}} {F_1}(X) + {F_2}(Z){\text{ s}}{\text{.t}}{\text{. }}AX + BZ = C$ (40)

 ${L_\rho }(X,Z,Y) = {F_1}(X) + {F_2}(Z){\text{ }} + trace({Y^T}(AX + BZ - C)) + \left( {\beta /2} \right)||AX + BZ - C||_F^2$ (41)

Input: maximum iteration time N, parameter β;

Output: Xopt, Zopt.

Initialize Y0=Z0=0;

for k=0, …, N do

$\begin{array}{*{20}{l}} {{X^{k + 1}} = \mathop {\arg \min }\limits_x {L_\rho }(X, {Z^k}, {Y^k})} \\ {{Z^{k + 1}} = \mathop {\arg \min }\limits_z {L_\rho }({X^{k + 1}}, Z, {Y^k})} \\ {{Y^{k + 1}} = {Y^k} + \rho (A{X^{k + 1}} + B{Z^{k + 1}} -C)} \end{array};$

if stopping criterion is satisfied then

return Xk+1, Zk+1;

end if

end for

3.5 随机优化算法

 $\mathop {\min }\limits_{X \in {\mathbb{R}^{m \times n}}} F(X) = J(X) + \lambda ||X|{|_*},其中,J(X) = \frac{1}{2}||X - W||_F^2$ (42)

Input: maximum iteration time N, the regularization parameter λ;

Output: Xopt.

Initialize X1=0;

for t=1, 2, …, N do

$Y = Z/\sqrt k$, where each columns of Z is drawn uniformly at random and independently of each other from

$\{ \sqrt n {e_1},...,\sqrt n {e_n}\}$;

${\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{G} _t} = {G_t}{Y_t}Y_t^T$, where Gt is a subgradient of J(·) at Xt;

${X_{t + 1}} = \mathop {\arg \min }\limits_{X \in {\mathbb{R}^{m \times n}}} \frac{1}{2}||X - ({X_t} - {\eta _t}{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{G} _t})||_F^2 + {\eta _t}\lambda ||X|{|_*}$;

if stopping criterion is satisfied then

return Xt+1;

end if

end for

3.6 各类优化算法优缺点分析

Table 2 Convergence rate, advantages and disadvantages analysis of different representative optimizing algorithms 表 2 各类代表性优化算法收敛率及其优缺点分析

4 存在的问题及未来研究方向 4.1 存在的问题

(1) 噪声容错性差

(2) 归纳推广性不足

(3) 先验信息融合性不强

(4) 算法扩放性及计算效率低下

4.2 未来研究方向

(1) 模型方面

(2) 算法方面

(3) 应用方面

5 本文总结