[이론/Imple] 노멀라이징 플로 모델(normalizing flow)

카테고리 없음

[이론/Imple] 노멀라이징 플로 모델(normalizing flow)

_leezoee_ 2024. 6. 4. 20:49

이 게시물은 <만들면서 배우는 생성 AI 2판> 교재의 내용과 소스코드를 기반으로 실습한 내용을 기반으로 하고있다.

https://github.com/rickiepark/Generative_Deep_Learning_2nd_Edition/

GitHub - rickiepark/Generative_Deep_Learning_2nd_Edition: <만들면서 배우는 생성 AI 2판>의 코드 저장소

<만들면서 배우는 생성 AI 2판>의 코드 저장소. Contribute to rickiepark/Generative_Deep_Learning_2nd_Edition development by creating an account on GitHub.

github.com

노멀라이징 플로 모델

노멀라이징 플로 모델은 앞서 공부한 자기회귀 모델, 변이형 오토인코더 공통점이 있는데,

자기회귀 모델처럼 명시적인 데이터 생성 분포 p(x)를 모델링 할 수 있다는점이고

VAE 처럼 데이터를 가우스 분포와 같은 간단한 분포에 매핑한다는 점이다.

주요 차이점은 노멀라이징 플로가 매핑 함수의 형태에 제약을 둔다는 점이다.

(반전 가능해야하고 이를 사용해 새로운 데이터 포인트를 생성)

노멀라이징 플로우 모델이 사용 응용 분야에서 사용된다.

1. 생성 모델링(Generative Modeling) : 새로운 데이터 생성

2. 변형 인코더(Variational Inference) : 잠재 변수 분포를 추정

3. 밀도 추정(Density Estimation) : 고차원 데이터 확률 밀도를 추정

신경망은 일반적으로 비가역적 함수인데 (신경망이 비선형 활성화 함수를 포함하고 있기 때문)

노멀라이징 플로우 모델은 신경망의 유연성, 성능을 활용하면서도 반전 가능한 변환을 만드는 방법을 제안한다.

(변수변환, 야코비 행렬식, 변수변환 방정식)

변수변환

: 데이터 x를 잠재변수 z로 변환하는 함수 f.

이 함수는 가역함수(invertible function)이다. 즉 모든 z를 이에 해당하는 x로 다시 매핑할 수 있는 함수 g가 존재한다.

(두 공간 사이를 일관되게 매핑할 수 있음)

야코비 행렬식

: 복잡한 확률분포를 간단한 분포로 변환하거나, 반대 과정에서 두 변수 간 확률 밀도 함수 관계를 유지해야함,

가역변환에 있어서 변환이 가역적임을 보장할 수 있으며, 이를 통해 모델이 양방향 전환을 정확히 수행할 수 있음.

모델 학습 과정에서 필요한 로그 확률 계산에 필수적

변수 변환 방정식

: 변환된 함수가 쉽게 샘플링할 수 있는 간단한 가우스 분포식이라면 이론적으로 데이터 X에서 Z로 매핑할 적절한 가역함수 f(x)와 샘플링 된 Z를 원래 도메인 포인트 X로 다시 매핑하는데 사용할 역함수 g(x)를 찾기만 하면 된다. 이때 야코비 행렬식이 적용된 방정식을 사용해 데이터 분포 p(x)를 위한 공식을 찾을 수 있다.

* 이때 두 가지 문제 파생

1) 고차원 행렬의 행렬식을 계산하려면 매우 많은 비용이 듦. 시간 복잡도 O(n세제곱)

2) f(x)의 역함수를 계산하는 방법이 명확하지 않음

=> 변수 변환 함수 f가 반전 가능하고 행렬식을 쉽게 계산할 수 있는 특별한 신경망 구조를 사용해야함.

RealNVP

RealNVP(Real-valued Non-Volume Preserving) 모델은 가역적인 변환을 통해 간단한 분포(ex.표준 정규 분포)를 보갖ㅂ한 분포로 변환하거나 반대작업을 수행한다. 이 모델은 이미지 생성, 밀도 추정 문제에서 유용하게 사용되낟.

먼저 RealNVP에 대한 튜토리얼은 케라스에서 제공하고 있다.

https://keras.io/examples/generative/real_nvp/

Keras documentation: Density estimation using Real NVP

Density estimation using Real NVP Authors: Mandolini Giorgio Maria, Sanna Daniele, Zannini Quirini Giorgio Date created: 2020/08/10 Last modified: 2020/08/10 Description: Estimating the density distribution of the "double moon" dataset. ⓘ This example us

keras.io

학습한 교재의 소스를 살펴보면 먼저 sklearn의 make_moon 함수로 두 개의 초승달 모양을 띈 2D 데이터셋을 만든다.

#라이브러리 임포트
import numpy as np
import matplotlib.pyplot as plt

from sklearn import datasets

import tensorflow as tf
from tensorflow.keras import (
    layers,
    models,
    regularizers,
    metrics,
    optimizers,
    callbacks,
)
import tensorflow_probability as tfp

#파라미터 정의
COUPLING_DIM = 256
COUPLING_LAYERS = 6
INPUT_DIM = 2
REGULARIZATION = 0.01
BATCH_SIZE = 256
EPOCHS = 300

# 데이터 로드
# 노이즈가 있고, 정규화 되지 않은 3000개 포인트로 구성된 초승달 데이터셋 
data = datasets.make_moons(30000, noise=0.05)[0].astype("float32")
norm = layers.Normalization()
norm.adapt(data)
# 평균 0, 표준편차 1이 되도록 정규화 진행
normalized_data = norm(data)
plt.scatter(
    normalized_data.numpy()[:, 0], normalized_data.numpy()[:, 1], c="green"
)

데이터를 만들고 나면 커플링 층을 만든다. 실습 전에 먼저 커플링 층에 대한 이론을 공부해본다.

커플링 층(coupling layer)은 입력의 각 원소에 대해 스케일 계수(s)와 이동계수(t)를 만든다.

(입력과 정확히 동일한 크기의 텐서 두 개를 만듦)

* 스케일 계수(scale factor) : 데이터의 특정 부분을 확대/축소하는 역할

* 이동 계수(translation factor) : 데이터 특정 부분을 이동(평행이동)하는 역할

실습에서 커플링 층은 Dense 층을 쌓아 스케일 출력을 만들고 또 다른 Dense 층을 쌓아 이동계수를 만든다

(이미지에는 커플링 층 블록에서 Dense 층 대신 Conv2D를 사용)

# RealNVP 네트워크 구축
def Coupling(input_dim, coupling_dim, reg):
    # shape 2로 커플링 블록 입력 2차원 사용
    input_layer = layers.Input(shape=input_dim)
    # 스케일 계수를 위해 크기가 256인 Dense 층 쌓음 , COUPLING_DIM = 256
    s_layer_1 = layers.Dense(
        coupling_dim, activation="relu", kernel_regularizer=regularizers.l2(reg)
    )(input_layer)
    s_layer_2 = layers.Dense(
        coupling_dim, activation="relu", kernel_regularizer=regularizers.l2(reg)
    )(s_layer_1)
    s_layer_3 = layers.Dense(
        coupling_dim, activation="relu", kernel_regularizer=regularizers.l2(reg)
    )(s_layer_2)
    s_layer_4 = layers.Dense(
        coupling_dim, activation="relu", kernel_regularizer=regularizers.l2(reg)
    )(s_layer_3)
    # 마지막 층은 크기가 2이고 tanh 활성화 함수를 사용
    s_layer_5 = layers.Dense(
        input_dim, activation="tanh", kernel_regularizer=regularizers.l2(reg)
    )(s_layer_4)

    # 이동 계수를 위해 크기가 256인 Dense 층을 쌓음
    t_layer_1 = layers.Dense(
        coupling_dim, activation="relu", kernel_regularizer=regularizers.l2(reg)
    )(input_layer)
    t_layer_2 = layers.Dense(
        coupling_dim, activation="relu", kernel_regularizer=regularizers.l2(reg)
    )(t_layer_1)
    t_layer_3 = layers.Dense(
        coupling_dim, activation="relu", kernel_regularizer=regularizers.l2(reg)
    )(t_layer_2)
    t_layer_4 = layers.Dense(
        coupling_dim, activation="relu", kernel_regularizer=regularizers.l2(reg)
    )(t_layer_3)
    # 커플링 층은 두 개의 출력(스케일 계수, 이동 계수)이 있는 케라스 Model로 구성
    t_layer_5 = layers.Dense(
        input_dim, activation="linear", kernel_regularizer=regularizers.l2(reg)
    )(t_layer_4)

    return models.Model(inputs=input_layer, outputs=[s_layer_5, t_layer_5])

입력에 있는 처음 d개 원소는 모델에 의해 업데이트 되지않고 남는 문제가 있는데,

이를 해결하기 위해 커플링 층을 쌓는 트릭을 사용할 수 있다.

커플링 층을 쌓으면서 마스킹 패턴을 번갈아 적용하면, 한 층에서 변경되지 않은 부분이 다음 층에서 업데이트 된다.

이 구조는 심층 신경망을 구성하기 때문에 데이터에서 더 복잡한 표현을 학습할 수 있다는 추가 이점이 있다.

커플링 층을 쌓고 매번 마스킹을 뒤집으면 간단한 야코비 행렬식과 가역성이라는 필수 속성을 유지하면서 전체 입력 텐서를 변환하는 신경망을 만들 수 있다.

이제 RealNVP 모델을 만들고 훈련을 진행한다.

class RealNVP(models.Model):
    def __init__(
        self, input_dim, coupling_layers, coupling_dim, regularization
    ):
        super(RealNVP, self).__init__()
        self.coupling_layers = coupling_layers
        # 타깃 분포는 표준 2D 가우스 분포
        self.distribution = tfp.distributions.MultivariateNormalDiag(
            loc=[0.0, 0.0], scale_diag=[1.0, 1.0]
        )
        # 번갈아 바뀌는 마스크 패턴을 만듦
        self.masks = np.array(
            [[0, 1], [1, 0]] * (coupling_layers // 2), dtype="float32"
        )
        self.loss_tracker = metrics.Mean(name="loss")
        # coupling 층 리스트로 RealNVP 신경망을 정의
        self.layers_list = [
            Coupling(input_dim, coupling_dim, regularization)
            for i in range(coupling_layers)
        ]

    @property
    def metrics(self):
        return [self.loss_tracker]

    def call(self, x, training=True):
        log_det_inv = 0
        direction = 1
        if training:
            direction = -1
        # call 함수를 호출하면 Coupling 층 순회
        # training=True 면 정방향으로 층 통과 (데이터 -> 잠재공간)
        # training=False 면 역방향으로 층 통과 (잠재공간 -> 데이터)
        for i in range(self.coupling_layers)[::direction]:
            x_masked = x * self.masks[i]
            reversed_mask = 1 - self.masks[i]
            s, t = self.layers_list[i](x_masked)
            s *= reversed_mask
            t *= reversed_mask
            gate = (direction - 1) / 2
            # direction에 따라 정방향과 역방향 식을 구현
            x = (
                reversed_mask
                * (x * tf.exp(direction * s) + direction * t * tf.exp(gate * s))
                + x_masked
            )
            # 손실함수를 계산할 때 필요한 야코비 행렬식의 로그 값은 단순히 스케일링 계수의 합
            log_det_inv += gate * tf.reduce_sum(s, axis=1)
        return x, log_det_inv

    def log_loss(self, x):
        y, logdet = self(x)
        # 손실함수는 타깃 가우스 분포와 야코비 행렬식의 로그 값으로 결정되는 변환된 데이터의 음의 로그 확률의 합
        log_likelihood = self.distribution.log_prob(y) + logdet
        return -tf.reduce_mean(log_likelihood)

    def train_step(self, data):
        with tf.GradientTape() as tape:
            loss = self.log_loss(data)
        g = tape.gradient(loss, self.trainable_variables)
        self.optimizer.apply_gradients(zip(g, self.trainable_variables))
        self.loss_tracker.update_state(loss)
        return {"loss": self.loss_tracker.result()}

    def test_step(self, data):
        loss = self.log_loss(data)
        self.loss_tracker.update_state(loss)
        return {"loss": self.loss_tracker.result()}


model = RealNVP(
    input_dim=INPUT_DIM,
    coupling_layers=COUPLING_LAYERS,
    coupling_dim=COUPLING_DIM,
    regularization=REGULARIZATION,
)

# 모델 컴파일 및 훈련
model.compile(optimizer=optimizers.Adam(learning_rate=0.0001))

tensorboard_callback = callbacks.TensorBoard(log_dir="./logs")


class ImageGenerator(callbacks.Callback):
    def __init__(self, num_samples):
        self.num_samples = num_samples

    def generate(self):
        # 데이터에서 잠재 공간까지.
        z, _ = model(normalized_data)

        # 잠재 공간에서 데이터까지.
        samples = model.distribution.sample(self.num_samples)
        x, _ = model.predict(samples, verbose=0)

        return x, z, samples

    def display(self, x, z, samples, save_to=None):
        f, axes = plt.subplots(2, 2)
        f.set_size_inches(8, 5)

        axes[0, 0].scatter(
            normalized_data[:, 0], normalized_data[:, 1], color="r", s=1
        )
        axes[0, 0].set(title="Data space X", xlabel="x_1", ylabel="x_2")
        axes[0, 0].set_xlim([-2, 2])
        axes[0, 0].set_ylim([-2, 2])
        axes[0, 1].scatter(z[:, 0], z[:, 1], color="r", s=1)
        axes[0, 1].set(title="f(X)", xlabel="z_1", ylabel="z_2")
        axes[0, 1].set_xlim([-2, 2])
        axes[0, 1].set_ylim([-2, 2])
        axes[1, 0].scatter(samples[:, 0], samples[:, 1], color="g", s=1)
        axes[1, 0].set(title="Latent space Z", xlabel="z_1", ylabel="z_2")
        axes[1, 0].set_xlim([-2, 2])
        axes[1, 0].set_ylim([-2, 2])
        axes[1, 1].scatter(x[:, 0], x[:, 1], color="g", s=1)
        axes[1, 1].set(title="g(Z)", xlabel="x_1", ylabel="x_2")
        axes[1, 1].set_xlim([-2, 2])
        axes[1, 1].set_ylim([-2, 2])

        plt.subplots_adjust(wspace=0.3, hspace=0.6)
        if save_to:
            plt.savefig(save_to)
            print(f"\nSaved to {save_to}")

        plt.show()

    def on_epoch_end(self, epoch, logs=None):
        if epoch % 10 == 0:
            x, z, samples = self.generate()
            self.display(
                x,
                z,
                samples,
                save_to="./output/generated_img_%03d.png" % (epoch),
            )


img_generator_callback = ImageGenerator(num_samples=3000)

model.fit(
    normalized_data,
    batch_size=BATCH_SIZE,
    epochs=EPOCHS,
    callbacks=[tensorboard_callback, img_generator_callback],
)

위 : 정뱡형, 아래 : 역방향 / 왼쪽 : 훈련 전 모델 입력, 오른쪽 : 출력

모델을 훈련하고 나면 이를 사용해 훈련 세트를 잠재 공간으로 변환 할 수 있다(정방향 함수 f 사용)

잠재 공간에서 샘플링 된 포인트를 원본 데이터 분포에서 샘플링 된 것처럼 보이는 포인트로 변환할 수 있다(역방향 함수 g 사용)

훈련한 후 정방향 과정이 훈련 세트에 있는 포인트를 가우스 분포와 닮은 분포로 변환하고,

역방향 과정은 가우스 분포에서 샘플링한 포인트를 원본 데이터와 닮은 분포로 역매핑할 수 있음.

위 : 정방향, 아래 : 역방향 / 왼쪽 : 훈련 전 모델 입력 , 오른쪽 : 출력

마지막으로 이미지를 생성한다.

x, z, samples = img_generator_callback.generate()
img_generator_callback.display(x, z, samples)

응용모델

노멀라이징 플로 접근 방식을 사용하는 확률 모델링 기법으로 두 가지를 소개한다.

Glow 모델

: RealNVP의 확장판으로, 데이터의 복잡한 분포를 학습하고 샘플링하는데 사용된다.

유사한 방식으로 작동하지만 차이점이 있다.

역마스킹 설정을 반전 가능한 1*1 컨볼루션(합성곱 층)으로 대체한다 => 모델이 원하는 채널 순서대로 조합을 생성할 수 있음

FFJORD 모델

연속적인 노멀라이징 플로우를 사용해 데이터의 확률 분포를 학습한다.

이는 차분 방정식을 활용해 데이터 변환을 수행한다.

두 가지 응용모델은 고차원 데이터이 확률 분포를 효과적으로 학습하고, 새로운 데이터를 생성하는데 매우 유용하다.

노멀라이징 플로 모델은 신경망에 의해 정의된 반전 가능한 함수로

변수 변환을 활용해 데이터 밀도를 직접 모델링 할 수 있다.

RealNVP 모델은 신경망의 형태를 제한해 가역성, 계산의 용이성을 충족시켰다.

파생된 문제를 해결하기 위해 커플링 층을 쌓아 각 단계에서 스케일, 변환 계수를 생성하는 방법을 사용한다.

이때 커플링 층이 데이터가 통과할 때 마스킹하여 야코비 행렬이 하삼각 행렬이 되도록 만들어주어야 한다.

정방향 변환 과정의 목표를 쉽게 샘플링 할 수 있는 표준 가우스 분포로 삼는다.

다음엔 에너지 기반 모델을 공부해볼 예정이다!

현재글[이론/Imple] 노멀라이징 플로 모델(normalizing flow)

ZOELOG

이코테, 컨볼루션신경망, 코딩테스트, subsampling, 특징추출, 동빈나 강의, 딥러닝 논문, next.js render twice, ai, ChatGPT, next.js, 파이썬, Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition, 이분탐색 알고리즘, Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition 번역, 이코테2021, 이진탐색 알고리즘, 파이썬코테, serverless-sql, cnn분류,

Today :
Yesterday :

일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

ZOELOG