Dense (Fully Connect) Layer
Forward
$x : (N, n)$
$y : (N, m)$
$w : (i, k)$
$b : (k,)$
$x$ : 입력데이터
$y$ : 출력 데이터
$w$ : 가중치 행렬
$N$ : 데이터 개수 ( batch size )
$b$ : 편향 벡터
$n$ : input dim
$m$ : output dim
$$y = \begin{bmatrix}
y_1 & y_2 & ... & y_m \\
\end{bmatrix}$$
$$x = \begin{bmatrix}
x_1 & x_2 & ... & x_n \\
\end{bmatrix}$$
$$ w = \begin{bmatrix}
w_{1,1} & w_{1,2} & ... & w_{1,m} \\
w_{2,1} & w_{2,2} & ... & w_{2,m} \\
... & ... & ... & ...\\
w_{n,1} & w_{n,2} & ... & w_{n, m}
\end{bmatrix} $$
$$ b = \begin{bmatrix}
b_1 & b_2 & ... & w_m \\
\end{bmatrix} $$
$$y_j = \sum_{i=1}^{n} x_i w_{i, j} + b_j$$
편미분
Weight
$$ \begin{matrix}
\frac{\partial y}{\partial w_{i, j}} = \sum_{k=1}^{m} \frac{\partial y_k}{\partial w_{i,j}} \\ \\
\left\{\begin{matrix}
\frac{\partial y_k}{\partial w_{i, j}} = 0 \text{ if } k \neq j
\\
\frac{\partial y_k}{\partial w_{i, j}} = x_i \text{ if } k = j
\end{matrix}\right.
\end{matrix} $$
$$ \frac{\partial y}{\partial w} =
\begin{bmatrix}
\frac{\partial y}{\partial w_{1, 1}} & ... & \frac{\partial y}{\partial w_{1, m}} \\
... & ... & ... \\
\frac{\partial y}{\partial w_{n, 1}} & ... & \frac{\partial y}{\partial w_{n, m}}
\end{bmatrix}
= \begin{bmatrix}
\frac{\partial y_1}{\partial w_{1, 1}} & ... & \frac{\partial y_m}{\partial w_{1, m}} \\
... & ... & ... \\
\frac{\partial y_1}{\partial w_{n, 1}} & ... & \frac{\partial y_m}{\partial w_{n, m}}
\end{bmatrix}
=
\begin{bmatrix}
x_1 & ... & x_1 \\
... & ... & ... \\
x_n & ... & x_n
\end{bmatrix} $$
$$ \frac{\partial L}{\partial w} = \begin{bmatrix}
\frac{\partial y}{\partial w_{1,1}} \frac{\partial L}{\partial y} & ... & \frac{\partial y}{\partial w_{1,m}} \frac{\partial L}{\partial y} \\
... & ... & ... \\
\frac{\partial y}{\partial w_{n,1}} \frac{\partial L}{\partial y} & ... & \frac{\partial y}{\partial w_{n,m}} \frac{\partial L}{\partial y} \\
\end{bmatrix} = \begin{bmatrix}
\frac{\partial y_1}{\partial w_{1,1}} \frac{\partial L}{\partial y_1} & ... & \frac{\partial y_m}{\partial w_{1,m}} \frac{\partial L}{\partial y_m} \\
... & ... & ... \\
\frac{\partial y_1}{\partial w_{n,1}} \frac{\partial L}{\partial y_1} & ... & \frac{\partial y_m}{\partial w_{n,m}} \frac{\partial L}{\partial y_m} \\
\end{bmatrix} = \begin{bmatrix}
x_1 \frac{\partial L}{\partial y_1} & ... & x_1 \frac{\partial L}{\partial y_m} \\
... & ... & ... \\
x_n \frac{\partial L}{\partial y_1} & ... & x_n \frac{\partial L}{\partial y_m}\\
\end{bmatrix} $$
$$ \therefore \frac{\partial L}{\partial w} = \frac{\partial y}{\partial w} \frac{\partial L}{\partial y} = \begin{bmatrix}
x_1 \\
... \\
x_n
\end{bmatrix} \cdot \begin{bmatrix}
\frac{\partial L}{\partial y_1} & ... & \frac{\partial L}{\partial y_m} \\
\end{bmatrix} $$
Bias
$$ \therefore \frac{\partial L}{\partial b} = \frac{\partial y}{\partial b} \frac{\partial L}{\partial y}
= \begin{bmatrix}
\frac{\partial y}{\partial d_1} \frac{\partial L}{\partial y} & ... & \frac{\partial y}{\partial d_m} \frac{\partial L}{\partial y} \\
\end{bmatrix} = \begin{bmatrix}
\frac{\partial y_1}{\partial d_1} \frac{\partial L}{\partial y_1} & ... & \frac{\partial y_m}{\partial d_m} \frac{\partial L}{\partial y_m} \\
\end{bmatrix} = \frac{\partial L}{\partial y} \\(\because \frac{\partial y}{\partial b_j} = \frac{\partial y_j}{\partial b_j} = 1 ) $$
X
$$ \frac{\partial y}{\partial x}
= \begin{bmatrix}\frac{\partial y_1}{\partial x_1} & ... & \frac{\partial y_m}{\partial x_1} \\ ... & ... & ... \\
\frac{\partial y_1}{\partial x_n} & & \frac{\partial y_m}{\partial x_n} \\
\end{bmatrix}
= \begin{bmatrix}
\frac{\partial }{\partial x_1}(x_1 w_{1,1}+...) & ... & \frac{\partial }{\partial x_1}(x_1 w_{1,m}+...) \\
... & ... & ... \\
\frac{\partial }{\partial x_n}(...+x_nw_{n,1}) & ... & \frac{\partial }{\partial x_n}(...+x_nw_{n,m}) \\
\end{bmatrix}
= \begin{bmatrix}
w_{1,1} & ... & w_{1,m} \\
... & ... & ... \\
w_{n, 1} & ... & w_{n,m} \\
\end{bmatrix} = W \\
\frac{\partial L}{\partial x}
= \frac{\partial y}{\partial x}\frac{\partial L}{\partial y}
=
\begin{bmatrix}
\frac{\partial L}{\partial y_1} & ... & \frac{\partial L}{\partial y_m} \\
\end{bmatrix} \cdot W^T $$
코드 구현
def forward(self, inputs):
if inputs.dtype != self.dtype:
warnings.warn("입력과 커널의 데이터 타입이 일치하지 않습니다. input dtype : {}, kernel dtype : {}".format(inputs.dtype, self.weight.dtype))
inputs = inputs.astype(self.dtype)
if self.training:
self.x = np.mean(inputs, axis=0)
outputs = np.matmul(inputs, self.weight) + self.bias if self.use_bias else np.matmul(inputs, self.weight)
return outputs
def backprop(self, dLdy, optimizer):
kernel_regularize_term = self.kernel_regularizer(self.weight) if self.kernel_regularizer is not None else 0
dLdw = np.matmul(self.x.T, dLdy)
kernel_delta = dLdw + kernel_regularize_term
self.weight = self.weight - optimizer.learning_rate * kernel_delta
if self.use_bias:
bias_regularize_term = self.bias_regularizer(self.bias) if self.bias_regularizer is not None else 0
dLdb = dLdy
bias_delta = dLdb + bias_regularize_term
self.bias = self.bias - optimizer.learning_rate * bias_delta
dLdx = np.dot(dLdy, self.weight.T)
return dLdx
regularizer term 은 제외함
Parameter
- units : int
- use_bias : boolean , default True
- kernel_intializer : string or callable funtion, default 'glorot_uniform'
- bias_initializer : string or callable function, default 'zeros'
- kernel_regularizer : string or callable function, default None
- bias_regularizer : string or callable function, default None
- dtype : Type Class, default np.float32
- training : boolean, default True