Linear Discriminant Analysis (Part 2-1)

`LDA` 두번째 파트입니다. `LDA`는 분류와 차원축소에 사용되며 각각에 해당되는 수학적 배경을 이전 파트에서 설명하였습니다. 두번째 파트에서는 각각에 대해 스크래치부터 구현해보는 시간을 갖도록 하겠습니다. 이 부분도 가독성으로 인해 `classification`과 `dimentionality reduction`두가지에 대해 포스팅 하겠습니다. 먼저 `classification`입니다.

Classification

Implementation

분류 문제에서는 Bayes' rule을 통해 선형 결정경계를 구합니다. 여기서 모수가 필요한데, 샘플을 통해 모수를 추정하는 작업 부터 하겠습니다.

Sample Parameters

$\hat{\pi} = N_k/N$

$\hat{\mu_k} = \Sigma_{g_i = k}{x_i/N_k}$

$\hat{ \Sigma } = \Sigma_{k=1}^K\Sigma_{g_i = k}(x_i - \hat{\mu_k}) (x_i - \hat{\mu_k})^T/(N-K)$

여기서 $g_i$는 $x_i$의 그룹입니다.

def LDA_sample_parameter(x1, x2):
    X = np.vstack([x1, x2])
    
    # 1. Prior probability
    pi1 = len(x1) / len(X)
    pi2 = len(x2) / len(X)
    
    # 2. Mean vector
    mu1 = np.mean(x1, axis=0)
    mu2 = np.mean(x2, axis=0)
    
    # 3. Covariance matrix
    Sigma = np.cov(X.T)
    Sigma_inv = np.linalg.inv(Sigma)
    
    return pi1, pi2, mu1, mu2, Sigma, Sigma_inv

그 후, 추정한 모수들을 결정 경계의 식에 적용합니다. 선형 방정식이다보니, $x$의 계수에 해당되는 부분과, 절편부분으로 나누어 계산하겠습니다.

Decision Boundary

$$ log\frac{\pi_1}{\pi_2} - \frac{1}{2}(\mu_1+\mu_2)^T\Sigma^{-1} (\mu_1-\mu_2) + x^T \Sigma^{-1} (\mu_1-\mu_2) = 0$$

def LDA_classification(x1, x2):
    pi1, pi2, mu1, mu2, Sigma, Sigma_inv = LDA_sample_parameter(x1, x2)
    
    # Hyperplane : X^T * A + B = 0
    
    ## Coefficient (A) : xTΣ−1(μ1−μ2)
    coeff = np.dot(Sigma_inv, (mu1 - mu2))
    
    ## Intercept (B) : log(π1/π2)−1/2(μ1+μ2)^T*Σ^{−1}*(μ1−μ2)
    intercept = (
        -0.5 * np.dot(mu1.T, np.dot(Sigma_inv, mu1))
        + 0.5 * np.dot(mu2.T, np.dot(Sigma_inv, mu2))
        + np.log(pi1 / pi2)
    )
    
    return coeff, intercept

실제로 시각화를 위해 적당한 $x$축 범위와 그에 따른 결정경계 $y$값을 얻기위해 다음과 같은 변환을 합니다.

먼저 결정경계는 다음과 같이 주어집니다:

$$x^TΣ^{−1}(μ_1 −μ_2 )+intercept=0$$

이를 2차원 자료 $x=[xx,yy]^T$에 대해 풀면: (이미 $x$를 썻어서 x,y 데카르트 좌표계 축을 $xx$, $yy$로 표현하겠습니다.)

$$coeff[0]*xx + coeff[1]*yy+intercept=0$$

그래프를 그리기위해 $yy$축에 대하여 정리하자면:

$$yy =−coeff[1]*coeff[0]*xx −coeff[1]*intercept $$

이렇게 되며, 아래와 같이 구현하게 됩니다.

def LDA_visualization(x1, x2, coeff, intercept):
    x_range = np.linspace(np.min([np.min(x1[:,0]), np.min(x2[:,0])]), np.max([np.max(x1[:,0]), np.max(x2[:,0])]), 500)
    y_range = -(coeff[0] * x_range + intercept) / coeff[1] # coeff[0] * x+coeff[1] * y+intercept = 0

    plt.scatter(x1[:, 0], x1[:, 1], label="Class 1", alpha=0.7)
    plt.scatter(x2[:, 0], x2[:, 1], label="Class 2", alpha=0.7)
    plt.plot(x_range, y_range, color="red", label="Decision Boundary")
    plt.xlabel("X")
    plt.ylabel("Y")
    plt.legend()
    plt.title("LDA Decision Boundary")
    plt.show()

Example

시뮬레이션 해보면 다음과같이 적당한 결정경계가 생기게 됩니다.

np.random.seed(42)

# Data for class 1 & 2
cov = [[1, 0.5], [0.5, 1]]
x1 = np.random.multivariate_normal(mean=[-2, 4], cov=cov, size=130)
x2 = np.random.multivariate_normal(mean=[0, 0], cov=cov, size=240)

coeff, intercept = LDA_classification(x1, x2)

visualization(x1, x2, coeff, intercept)

그렇다면 어떤 자료에 `LDA`가 잘 작동할까요? 이에 대하여 다음 포스팅에서 마저 알아보겠습니다!

Classification

Implementation

Sample Parameters

Decision Boundary

Example

티스토리툴바