<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0">
    <channel>
      <title>ChanJoon — #RL</title>
      <link>https://chanjoon.github.io/tags/RL</link>
      <description>Posts tagged "RL" on ChanJoon</description>
      <generator>Quartz -- quartz.jzhao.xyz</generator>
      <item>
    <title>[CS285] 6. Actor-Critic Algorithms</title>
    <link>https://chanjoon.github.io/posts/2024-03-18-CS285_Week6</link>
    <guid>https://chanjoon.github.io/posts/2024-03-18-CS285_Week6</guid>
    <description><![CDATA[ Lecture 6: Actor-Critic Algorithms Part 1 State &amp; state-action value functions Value function fitting 위 3개의 value function 중 어떤 것에 맞춰야 할까? \begin{align} Q^\pi(s_t,a_t)&amp;=\sum^T_{t&#039;=t}E_{\pi_\theta}[r(s_{t&#039;},a_{t&#039;})|s_t,a_t] \\ &amp;=r(s_t,a_t)+\sum^T_{t&#039;=t+1}E_{\pi_\theta}... ]]></description>
    <pubDate>Mon, 18 Mar 2024 00:00:00 GMT</pubDate>
  </item><item>
    <title>[CS285] 5. Policy Gradients</title>
    <link>https://chanjoon.github.io/posts/2024-02-01-CS285_Week5</link>
    <guid>https://chanjoon.github.io/posts/2024-02-01-CS285_Week5</guid>
    <description><![CDATA[ Lecture 5: Policy Gradients 이전 강의에서 다룬 여러 알고리즘 중 Policy gradients 에 대해 먼저 다뤄본다. Part 1. ]]></description>
    <pubDate>Thu, 01 Feb 2024 00:00:00 GMT</pubDate>
  </item><item>
    <title>[CS285] HW #1</title>
    <link>https://chanjoon.github.io/posts/2024-01-28-CS285_HW1</link>
    <guid>https://chanjoon.github.io/posts/2024-01-28-CS285_HW1</guid>
    <description><![CDATA[ HW#1 Imitation Learning HW #1 Behavioral Cloning DAgger 를 적용해보기 전에는 replay buffer 에서 sampling 하는 것과 MLP policy 네트워크 코드만 작성하면 된다. ]]></description>
    <pubDate>Sun, 28 Jan 2024 00:00:00 GMT</pubDate>
  </item><item>
    <title>[CS285] 4. Introduction to Reinforcement Learning</title>
    <link>https://chanjoon.github.io/posts/2024-01-24-CS285_Week4</link>
    <guid>https://chanjoon.github.io/posts/2024-01-24-CS285_Week4</guid>
    <description><![CDATA[ Lecture 4: Introduction to Reinforcement Learning rail.eecs.berkeley.edu/deeprlcourse/deeprlcourse/static/slides/lec-4.pdf Part 1 Reward functions r(s,a) 로 표현하는 보상함수는 어떤 state 와 action 이 좋은지 판별해주는 기준이다. ]]></description>
    <pubDate>Wed, 24 Jan 2024 00:00:00 GMT</pubDate>
  </item><item>
    <title>[CS285] 2. Imitation Learning Part 3 ~ 3. PyTorch</title>
    <link>https://chanjoon.github.io/posts/2024-01-17-CS285_Week3</link>
    <guid>https://chanjoon.github.io/posts/2024-01-17-CS285_Week3</guid>
    <description><![CDATA[ Lecture 2. Imitation Learning Part 3 ~ Lecture 3. PyTorch Lecture 2. ]]></description>
    <pubDate>Wed, 17 Jan 2024 00:00:00 GMT</pubDate>
  </item><item>
    <title>[CS285] 1. Introduction - 2. Imitation Learning Part 2</title>
    <link>https://chanjoon.github.io/posts/2024-01-13-CS285_Week2_2</link>
    <guid>https://chanjoon.github.io/posts/2024-01-13-CS285_Week2_2</guid>
    <description><![CDATA[ Lecture 1. Introduction - Lecture 2. Imitation Learning Part 2 Lecture 1. ]]></description>
    <pubDate>Sat, 13 Jan 2024 00:00:00 GMT</pubDate>
  </item><item>
    <title>9. Continuous control with deep reinforcement learning(DDPG)</title>
    <link>https://chanjoon.github.io/posts/2023-11-29-DDPG</link>
    <guid>https://chanjoon.github.io/posts/2023-11-29-DDPG</guid>
    <description><![CDATA[ 9. Continuous control with deep reinforcement learning(DDPG) Lillicrap, Timothy P., et al. ]]></description>
    <pubDate>Wed, 29 Nov 2023 00:00:00 GMT</pubDate>
  </item><item>
    <title>7. Proximal Policy Optimization Algorithms</title>
    <link>https://chanjoon.github.io/posts/2023-11-22-PPO</link>
    <guid>https://chanjoon.github.io/posts/2023-11-22-PPO</guid>
    <description><![CDATA[ 7. ]]></description>
    <pubDate>Wed, 22 Nov 2023 00:00:00 GMT</pubDate>
  </item><item>
    <title>4. Policy Gradient (2)</title>
    <link>https://chanjoon.github.io/posts/2023-11-21-PolicyGradient2</link>
    <guid>https://chanjoon.github.io/posts/2023-11-21-PolicyGradient2</guid>
    <description><![CDATA[ 4. Policy Gradient (2) 이전에 다 살펴보지 못한 Policy Gradient를 강의 교안과 “수학으로 풀어보는 강화학습” 교재로 마저 공부하기로 했다. ]]></description>
    <pubDate>Tue, 21 Nov 2023 00:00:00 GMT</pubDate>
  </item><item>
    <title>3. Policy Gradient</title>
    <link>https://chanjoon.github.io/posts/2023-11-20-PolicyGradient</link>
    <guid>https://chanjoon.github.io/posts/2023-11-20-PolicyGradient</guid>
    <description><![CDATA[ 3. Policy Gradient Q. Why do we use Action-value function over State-Value function in Model Free reinforcement learning Q. ]]></description>
    <pubDate>Mon, 20 Nov 2023 00:00:00 GMT</pubDate>
  </item><item>
    <title>2. Introduction to RL</title>
    <link>https://chanjoon.github.io/posts/2023-10-15-IntroToRL</link>
    <guid>https://chanjoon.github.io/posts/2023-10-15-IntroToRL</guid>
    <description><![CDATA[ 2. Introduction to RL Roadmap에 있는 내용 가볍게 훑은 후 별도 교안으로 학습 02 강화학습 개념 강화학습 개요 강화학습이란 agent가 environment와 상호작용하며 학습하는 머신러닝의 한 종류. ]]></description>
    <pubDate>Sun, 15 Oct 2023 00:00:00 GMT</pubDate>
  </item><item>
    <title>1. Basic Mathematics of RL</title>
    <link>https://chanjoon.github.io/posts/2023-10-02-RLMath</link>
    <guid>https://chanjoon.github.io/posts/2023-10-02-RLMath</guid>
    <description><![CDATA[ 1. ]]></description>
    <pubDate>Mon, 02 Oct 2023 00:00:00 GMT</pubDate>
  </item>
    </channel>
  </rss>