四足机器人科研项目工作总结 Month 7 Week 1

本周总结

主要完成了如下工作

文献调研
阅读 PyBullet QuickStart Guide (只是工具书，等需要使用到再进行查询即可)
阅读 Stable-Baselines3 官方网页
阅读 Unitree A1 Software Manual
搞明白 MICRO_QUADRUPED_ARCHIVE 代码框架和逻辑

2023.07.02

强化学习杂谈 2 续写

完成章节 Markov 链 / Markov 奖励过程 / Markov 决策过程

文献调研 (50-100篇)

年份

2018 年 - 今

期刊

第一档

Nature Machine Intelligence (NMI)
Science Robotics

第二档

IEEE Transactions on Robotics (TRO)

第三档

IEEE Robotics and Automation Letters (RAL)
IEEE Transactions on Neural Network Learning System (TNNLS)

会议

第一档

RSS (Robotics: Science and Systems)

第二档

ICRA (IEEE International Conference on Robotics and Automation)
IROS (IEEE Robotics and Automation Society)

检索库

Google Scholar

IEEE Xplore

Research Gate

关键词

$\star$ Learning $\rightarrow$ Reinforcement Learning

$\star$ Quadruped

$\star$ $\star$ locomotion control

$\star$ $\star$ Imitation

$\star$ $\star$ Transfer

Task

整理 PDF，按 [期刊名年份课题组信息(非必要)] 文章名 进行整理

使用 Zotero 进行文献管理，具体参见官方文档 Zotero Documentation

Robotics

Design (mechanical)

Perceptron

Path planning

Control via RL

gait generator / planner

课题组

ETH Zurich

MIT

University of California Berkeley (UCB)

机器狗

ANYMAL ETH

Cheetah 3 MIT

Unitree A1 $\rightarrow$ Go1 / Laikago

环境

Pybullet (仿真环境为蓝白地砖) + stable baseline
Omniverse (仿真环境为灰白地砖) + PPO. Isaac Gym

期刊影响因子查询

LetPub

2023.07.03

Zotero + 坚果云（云同步）

详细请参考 zotero+坚果云，免费跨平台文献管理最佳实践指南

[RAL 2022] CPG-RL: Learning Central Pattern Generators for Quadruped Locomotion

师兄带读论文

状态空间 18 DOF (Degree of Freedom)

$x, y, z$ (位置)
$\varphi$ (俯仰)，$\theta$ (偏航)，$\psi$ (滚转)
12 个关节

Gait 步态：犬类步态参考四足机器人步态参考多种步态 bilibili

相位
占空比

动力系统

MPC：$x_d \leftarrow \text{控制器}- x \leftarrow \text{观测器}- \hat{x}$

2023.07.04

DeepRL 强化学习

Policy Gradient Basics

装配环境 Ubuntu 22.04 LTS + Python 3.8 + Pytorch + CUDA 11.7

装配前请确保显卡驱动已经正确安装，在 (base) 命令行中输入

1
2
3

nvidia-smi

ncvv -V

新建 / 激活 / 删除虚拟环境

% 新建环境
conda create -n <env_name> python=3.8

% 激活环境
conda activate <env_name>
source activate <env_name> % Linux 系统
    
% 克隆环境
conda create -n <env_new> --clone <env_old> 

% 删除环境
conda remove -n <env_name> --all

PyTorch

可供参考的网站：

1
2
3

conda install -c nvidia cudatoolkit=11.7

conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia

可以通过如下指令判断 pytorch 是否能够正常使用

1 2	print(torch.cuda.is_available) print(torch.cuda.get_device_name(0))

Tensorboard

1	pip install tensorboard

Stable-Baselines3

1 2	pip install stable-baselines3 pip install stable-baselines3[extra]

后者安装可选的依赖库，如 Tensorboard, OpenCV, ale-py (用于在 Atari 游戏上进行训练)

Task

Doxygen Wiki
Stable-Baselines3 运行 Examples: Pendulum, Breakout
Tensorboard
Micro Quadruped

Reinforcement Learning Resources — Stable Baselines3 2.1.0a0 documentation

2023.07.05

解决驱动问题

报错：找不到设备

1 2	(base) ...$ nvidia-smi No device were found

输入如下指令，实际上显卡还在

1 2	(base) ...$ lspci \| grep -i vga 01:00.0 VGA compatible controller: NVIDIA Corporation GA102 [GeForce RTX 3080 Ti] (rev a1)

输入如下指令，选择 recommended 会出现找不到设备的问题，参考了一些帖子，好像不要选择带 open 的安装就不会出现问题

(base) ...$ ubuntu-drivers devices
== /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 ==
modalias : ...
vendor   : NVIDIA Corporation
model    : GA102 [GeForce RTX 3080 Ti]
driver   : nvidia-driver-470-server - distro non-free 
driver   : nvidia-driver-525-open - distro non-free
driver   : nvidia-driver-535-open - distro non-free recommended
driver   : nvidia-driver-535 - distro non-free 
driver   : nvidia-driver-525-server - distro non-free 
driver   : nvidia-driver-470 - distro non-free 
driver   : nvidia-driver-525 - distro non-free 
driver   : xserver-xorg-video-nouveau - distro free builtin

选择 535 进行安装并重启

1 2	(base) ...$ sudo apt-get install nvidia-driver-535 (base) ...$ reboot % 重启系统

此时再输入指令 nvidia-smi 就可以正常输出表格了

向日葵远控

使用向日葵远程控制实验室的电脑

PyBullet

通过 QuickStart Guide 文档了解 PyBullet (需要科学上网)

Introduction

pybullet.connect, pybullet.disconnect, loadURDF, loadSDF, setGravity, stepSimulation…

getBasePositionAndOrientation(obj.uniq.id), resetBasePositionAndOrientation()…

getQuarternionFromEuler, getEulerFromQuaternion…

2023.07.06

PyBullet

通过 QuickStart Guide 文档了解 PyBullet (需要科学上网)

阅读了 Controlling a Robot 章节

感觉官方文档实际上就是一个工具书，继续读下去意义不大

Tensorboard

参考 Get started with TensorBoard | TensorFlow (需要科学上网)

下载 Tensorflow

下载 TensorFlow 2，激活虚拟环境 SB3，在命令行中输入 pip install tensorflow，报错

(SB3) ...$ pip install tensorflow
...
ERROR: pip's dependency...
triton 2.0.0 requires cmake, which is not installed.
triton 2.0.0 requires lit, which is not installed.

先后执行如下命令

1 2	(SB3) ...$ pip install cmake (SB3) ...$ pip install lit

再用 pip 安装 tensorflow，结果成功

使用 TensorBoard

from stable_baselines3 import PPO

...
# 模型 (vec_env 是 ... 中设定的环境, 这里我们使用 Atari 游戏 Breakout)
model = PPO('CnnPolicy', vec_env, verbose=1, tensorboard_log='./ppo_breakout_tensorboard/')
# 训练
model.learn(total_timesteps=1e5, tb_log_name='first_run')
model.learn(total_timesteps=1e5, tb_log_name='second_run', reset_num_timesteps=False)
model.learn(total_timesteps=1e5, tb_log_name='third_run', reset_num_timesteps=False)

在 Terminal 中输入

tensorboard --logdir ./ppo_breakout_tensorboard/

# 载入历史 logging folders
tensorboard --logdir ./ppo_breakout_tensorboard/;./a2c_breakout_tensorboard/

点击 localhost 链接查看结果，结束时按 Ctrl+C

MICRO_Quadruped_ARCHIVE_2023_05_17

MICRO 四足机器人训练及测试平台

安装及依赖

系统：Ubuntu 22.04 LTS

系统架构：AMD64

Python 版本：3.8

环境：Anaconda

依赖包：numpy, matplotlib, pybullet, pytorch, stable-baselines3

集成开发环境 (IDE)：PyCharm

项目架构

Training_Results：训练过程及结果，包括训练中 Tensorboard 数据及最终 pkl 训练文件
Testing_Results：仿真测试结果，包括原始日志数据，绘制曲线图及录制视频
Optimization_Data：储存贝叶斯优化中的采样结果，供算法计算损失函数并进行参数优化
Robots：各机器人模型文件 (包含 urdf 文件和 mesh 文件，其中 mesh 文件用于描述更加复杂的结构)
- A1
- Go1
- Aliengo
- Laikago
Scripts：主要源代码
- utils：包含项目所需的所有类文件
- CPGs：包含用于四足机器人步态规划的若干种类 CPG (中枢模式发生器)
- examples
- optimization
- reinforcement_learning

A1 Software Manual v2.0

参考 A1SoftwareGuidev2.0-en.pdf (需要科学上网)

Getting Started

Robot system structure
Setting up the network
Units
Coordinate, kinematics and dynamics
Foot force sensor

API

High-level control mode
- the robot may run in normal mode or motion mode
- levelFlag: the flag of control level, high level:0x00, low level:0xff
- initializing the udp target ip and port
  - normal mode ip: 192.168.123.10, port: 8007
  - motion mode ip: 192.168 .123.161, port: 8081
Low-level control mode
- the motor has three modes: torque mode, speed mode, and position mode
Protection mode

Control tutorial

Forbidden flag value, PosStopF = 2.146E+9f, VelStopF = 16000.0f

Torque mode

motorCmd[FL_1].position = PosStopF;
motorCmd[FL_1].positionStiffness = 0;
motorCmd[FL_1].velocity = VelStopF;
motorCmd[FL_1].velocityStiffness = 0;
motorCmd[FL_1].torque = T;

Speed mode

motorCmd[FL_1].position = PosStopF;
motorCmd[FL_1].positionStiffness = 0;
motorCmd[FL_1].velocity = V;
motorCmd[FL_1].velocityStiffness = 4; // just for reference
motorCmd[FL_1].torque = 0;

Position mode

motorCmd[FL_1].position = P;
motorCmd[FL_1].positionStiffness = 5; // just for reference
motorCmd[FL_1].velocity = 0;
motorCmd[FL_1].velocityStiffness = 1; // just for reference
motorCmd[FL_1].torque = 0;

A1_ros

没太看明白是什么意思

廖雪峰 Python 教程

编码

ASCII 编码，Unicode 编码，UTF-8 编码

输出格式化字符串

r = 2.5
s = 3.14 ** 2
print('This is the output of %s and %2f' % (r, s))
print(f'This is the output of {r} and {s:.2f}')

歐拉角 - 維基百科，自由的百科全書

俯仰角 (pitch)，偏航角 (yaw)，滚转角 (roll) [分别代表绕 y, z, x 轴转动]

Doxygen - Wikipedia
无处不在的小土-URDF和机器人模型(一))
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
<?xml version="1.0"?>
<robot name="myfirst">
    <material name="blue"><color rgba="0 0 0.8 1"/></material>
    <material name="white"><color rgba="1 1 1 1"/></material>

    <link name="base_link">
        <visual>
            <geometry>
                <cylinder length="0.6" radius="0.2"/>
            </geometry>
            <material name="blue"/>
        </visual>
    </link>

    <link name="right_leg">
        <visual>
            <geometry>
                <box size="0.6 0.1 0.2"/>
            </geometry>
            <material name="white"/>
        </visual>
    </link>

    <joint name="base_to_right_leg" type="fixed">
        <parent link="base_link"/>
        <child link="right_leg"/>
        <origin xyz="0 -1.22 0.25" rpy="0 0 0"/>
    </joint>
</robot>
URDF 文件本质上就只是一个简单的 xml 文档，它用link表示机器人中的各个部分，用 joint 表示各个部分之间的连接关系。joint 连接的两个 link 具有父子关系，整个机器人可以用一棵树来描述。各个 link 之间存在一些相对的位置和姿态关系，可以通过在 link 或者 joint 中加入 origin 标签来描述
Building a Visual Robot Model with URDF from Scratch - ROS Wiki

PyCharm - Wikipedia

Integrated Development Environment (IDE). It provides code analysis, a graphical debugger, an integrated unit tester, integration with version control systems, and supports web development with Django

Internet protocol suite - Wikipedia

User Datagram Protocol - Wikipedia

UDP is a simple message-oriented transport layer protocol

一个简单的面向数据包的通信协议，位于OSI模型的传输层。

在TCP/IP模型中，UDP为网络层以上和应用层以下提供了一个简单的接口。UDP只提供数据的不可靠传递，它一旦把应用程序发给网络层的数据发送出去，就不保留数据备份 (所以UDP有时候也被认为是不可靠的数据包协议)

一些应用程序不太需要可靠性机制，甚至可能因为引入可靠性机制而降低性能，所以它们使用UDP这种缺乏可靠性的协议。流媒体，实时多人游戏和IP语音 (VoIP)是经常使用UDP的应用程序。在这些特定应用中，丢包通常不是重大问题

Software development kit (SDK) - Wikipedia

A software development kit (SDK) is a collection of software development tools in one installable package. They facilitate the creation of applications by having a compiler, debugger and sometimes a software framework. They are normally specific to a hardware platform and operating system combination.

一些被软件工程师用于为特定的软件包、软件框架、硬件平台及操作系统等创建应用软件的开发工具之集合

Examples: iOS SDK, Microsoft Windows SDK

Application Programming Interface (API) - Wikipedia

应用程序接口是指电脑操作系统或程序库提供给应用程序调用使用的代码，其主要目的是让应用程序开发人员得以调用一组例程)功能，而无须考虑其底层的源代码为何、或理解其内部工作机制的细节。API本身是抽象)的，它仅定义了一个接口)，而不涉及应用程序在实际实现过程中的具体操作

应用程序接口经常是软件开发工具包（SDK）的一部分

A1SoftwareGuidev2.0-en.pdf

IP address - Wikipedia

左右手坐标系与旋转正向 - StubbornHuang Blog

判断三维坐标系旋转正方向的简单方法 - Wonderffee’s Blog

无处不在的小土’s Blog | Index

2023.07.07

Pybullet

机器人常用 API

loadURDF(filename)
getNumJoints(uid)
getJointInfo(uid, jointIndex)
setJointMotorControl2 / setJointMotorArray
getJointState(uid, jointIndex)
enableJointForceTorqueSensor
getLinkState(uid, linkIndex)
calculateJacobian
calculateInverseKinematics

关节类型

JOINT_REVOLUTE：旋转关节
JOINT_PRISMATIC：移动关节
JOINT_SPHERICAL：球形关节
JOINT_PLANAR：平面关节
JOINT_FIXED：

继续 Stable-Baselines3

跳过 Vectorized Environment 章节，边看 Stable-Baselines3 边看 MICRO_Quadruped_ARCHIVE

Policy Networks $*$

SB3 Policy

在 SB3 中，policy 这个词比 RL 语境中的 policy 被更加滥用。后者通常指代负责 actor 的网络，也即与测行为的网络

SB3 网络可以分为两个主要部分：特征提取器和全连接网络。前者用于从高维数据中提取特征 (feature_extractor_kwargs)，后者用于将特征映射到动作 (pi) / 价值 (vf) （net_arch）
Default Network Architecture

On-policy 算法共享 CNN 网络以减少计算量。Off-policy 算法有分离的特征提取器，这种配置有利于最佳性能
Custom Network Architecture

自定义网络架构需要在创建模型时利用 policy_kwargs 传递参数
Custom Feature Extractor

自定义特征提取器（例如自定义 CNN），可以定义派生于 BaseFeatureExtractor 的类，然后在训练时将其传递给模型
预处理与特征提取
- 预处理原始输入 observation：环境直接返回 np.ndarray 的 image，值为 [0,255]，那么需要归一化 [0,1]；或者环境空间是 Discrete(5) 返回的是枚举值 4，那么一般需要进行 one-hot 变成特征 [0,0,0,0,1]，得到预处理后的原始特征
- 原始特征通过神经网络的特征提取形成网络表征，最基础的是 MlpExtractor 来处理 vector，CnnExtractor 来处理 image，CombinedExtractor 来处理 nested input
- 网络表征再通过 Distribution 层来建模相应量的分布，如策略分布 - actor，V 值 - critic 等
Advanced

如果需要对动作 / 价值架构有更加精细的控制，建议直接重新定义策略，定义派生于 nn.Module 的类

Using Custom Environments $*$

自定义环境 $*$

由 gym.Env 派生出来

环境的基类 $*$

Box，Discrete，MultiBinary，MultiDiscrete

环境的核心 $*$

核心是输入当前状态和动作值，输出下一状态值、奖励值。因此环境的核心数据对象有 observation_space, action_space，reward_range 这三个空间，主要行为$\ step(s, a) \rightarrow s’,r$
其余的 reset (重置模拟器环境，并返回重启后的模拟器中 agent 的初始 state)，close，render，seed 只是用来控制环境状态的辅助行为

Notes

一个标准的 gym 环境输出的 state 最好是 ndarray 对象。如果输出的 state 不是 ndarray，那么在使用 check_env 检测模型时会报错
step 函数中必须返回 state, reward, done, info 这四个量，其中的 reward 对应强化学习中的奖励函数，而 done 的确定逻辑也很重要，这将关乎模拟器再训练强化学习算法时的性能。因此，实际项目中 reward 和 done 的确定都比较麻烦

Callbacks $*$

A Callback is a set of functions that will be called at given stages of the training procedure. You can use callbacks to access internal state of the RL model during training. It allows one to do monitoring, auto savning, model manipulation, progress bars,…

Custom Callback

由 BaseCallback 派生的类

Callback Collection

CheckpointCallback $*$: saving the model periodically
EvalCallback $*$: evaluating the model periodically and saving the best one
CallbackList $*$:
EventCallback $*$, EveryNTimesteps: triggering callback on events
StopTrainingOnRewardThreshold $*$: stopping the training early based on a reward threshold

Tensorboard Integration

Basic Usage

定义模型时传入参数 tensorboard_log
如果要分阶段显示曲线，可以在 learn 时传入参数 tb_log_name

Logging More Values

logging images, figures / plots, videos, hyperpatameters

class TensorboardCallback(BaseCallback):
    """
    Custom callback for plotting additional values in tensorboard
    """
    
    def __init__(self, verbose=0):
        super().__init__(verbose)
        
    def _on_step(self) -> bool:
        value = np.random.random()
        self.logger.record("random_value", value)
        return True

model.learn(5_0000, callback=TensorboardCallback())

NAS

导出 Zotero 中的论文为 papers.rdf 并上传到云

utils 文件夹

在 Python 中，utils 可能是一个模块、包、文件夹或脚本的名称。它可能包含一些常用的功能、函数或类，用于其他代码中重复使用。一个应用程序可能会创建一个 utils 模块，其中包含与应用程序相关的常用函数，如数据解析、日志记录等

PyBullet 简介与机器人仿真入门 - 知乎

一个典型的仿真程序主要分为两部分：

配置：PyBullet 的 API 基于 client-server 设计，client 发送命令，physics server 返回状态。所以首先是 p.connet(xxx)，常用 p.GUI（带渲染） 和 p.DIRECT（不带渲染，适合强化学习训练） 两种模式。之后就是设置重力以及仿真器的一些参数，最后是加载机器人的模型

循环：p.stepSimulation() 在一个正向动力学仿真步骤中执行所有动作，默认时间步长 1/240 秒

Client–server model - Wikipedia

Welcome to Spinning Up in Deep RL! — Spinning Up documentation 【学习资源】

利用 stable-baselines3 优雅便捷地玩转深度强化学习算法 - 知乎

PyBullet笔记（八）强化学习环境的引入与训练 - 知乎

PyBullet 笔记（九）自定义 gym 强化学习环境搭建 - 知乎

思路清晰简明，值得阅读

Stable-baselines3 的基本原理与使用-1 | Nemo555的博客

讲了代码实现逻辑，如果不熟悉的话，阅读难度较大

开发要搞清楚什么是并发，并行，串行，同步，异步？软件开发串接什么意思 | 涤生大数据的博客

并发当有多个线程在操作时，如果系统只有一个CPU,则它根本不可能真正同时进行一个以上的线程，它只能把CPU运行时间划分成若干个时间段,再将时间段分配给各个线程执行，在一个时间段的线程代码运行时，其它线程处于挂起状。这种方式我们称之为并发 (Concurrent)

并行 (Parallel) 指两个或两个以上事件或活动在同一时刻发生。在多道程序环境下，并行性使多个程序同一时刻可在不同CPU上同时执行

并发和并行是即相似又有区别的两个概念，并行是指两个或者多个事件在同一时刻发生；而并发是指两个或多个事件在同一时间间隔内发生。在多道程序环境下，并发性是指在一段时间内宏观上有多个程序在同时运行，但在单处理机系统中，每一时刻却仅能有一道程序执行，故微观上这些程序只能是分时地交替执行。倘若在计算机系统中有多个处理机，则这些可以并发执行的程序便可被分配到多个处理机上，实现并行执行，即利用每个处理机来处理一个可并发执行的程序，这样，多个程序便可以同时执行

同步就是顺序执行，执行完一个再执行下一个，需要等待、协调运行

(开始烧水 $\rightarrow$ 等水烧开泡茶 $\rightarrow$ 做家务)

异步就是彼此独立，在等待某事件的过程中继续做自己的事，不需要等待这一事件完成后再工作

(开始烧水 $\rightarrow$ 做家务 $\rightarrow$ 水烧开泡茶 $\rightarrow$ 继续做家务)

并行环境让采样速度快两个量级：Isaac Gym 提速强化学习 - 深度强化学习实验室

Python函数参数中的冒号与箭头 - 青山牧云人 - 博客园

def funcName(param) $\rightarrow$ type，箭头后面指示了函数的输出类型

Synology Drive Client - Synology 知识中心

RL Baselines3 Zoo | GitHub

在 utils 文件夹下有一个文件，文件名叫参数优化，里面是使用了 optuna 这个工具直接在超参空间进行搜索

2023.07.08

Python

字典 `dict`

查找和插入速度极快
需要占用大量的内存

dict 的 key 必须是不可变对象

1 2	kwargs = dict(name='A', age=20) kwargs1 = {'name':'A', 'age': 20}

参数检查

def myAbs(x):
    if not isinstance(x, (int, float)):
        raise TypeError('Bad operand type!')
    ...

函数参数传递

默认参数必须指向不变对象，因为定义函数时，默认参数的值就被计算出来了，因为默认参数也是变量，指向设定的默认值

继续 Stable-Baselines3

Integrations $*$

Weights & Bias $*$ 是兼容多种深度学习框架的可视化工具，类似于 Tensorboard

RL Baselines3 Zoo $$ Documentation $$

A training framework for RL. It provides scripts for training, evaluating agents, tuning hyperparameters, plotting results and recording videos.

Breakout 调训练超参数

policy_kwargs = dict(activation_fn=torch.nn.ReLU,)
lr_set = [1e-2, 2e-2, 5e-2, 1e-3, 2e-3, 5e-3, 1e-4, 2e-4, 5e-4, 1e-5, 2e-5, 5e-5]
for i in range(len(lr_set)):
    params = dict(learning_rate=lr_set[i], gamma=0.9,)
    model = PPO("CnnPolicy", env, verbose=1, tensorboard_log='./ppo_breakout_tensorboard/',
               policy_kwargs=policy_kwargs, **params)

Zotero + MicroNAS

参考资料

其中最重要的就是资料 1，3，5，7

大致步骤

MicroNAS 开启 WebServer 服务
在 Zotero 的 File Sync 中设置 URL，用户名和密码，然后验证服务器

宇树四足机器人开发入门

2023.07.09

文献泛读

模板

方法简述：

实验形式：

文章亮点：

实物机器人类型：

仿真与训练平台：

实验数据 / 源代码：

视频：

High-speed quadruped locomotion by imitation-relaxation reinforcement learning

方法简述：引入 Imitation-Relaxation-RL 分阶段优化目标；引入随机稳定性进行系统鲁棒性分析

实验形式：实物 8 种不同地形 (rubber track, lawn, grass, concrete floor, bricks, rock trail, wooden bridge, stairs)

文章亮点：分阶段优化多个目标

实物机器人类型：MIT cheetah like (BlackPanther)

仿真与训练平台：Raisim (J.Hwangbo)

实验数据 / 源代码：https://github.com/WoodenJin/High_Speed_Quadrupedal_Locomotion_by_IRRL

视频：

本周总结

2023.07.02

强化学习杂谈 2 续写

文献调研 (50-100篇)

年份

期刊

会议

检索库

关键词

Task

Robotics

课题组

机器狗

环境

期刊影响因子查询

2023.07.03

Zotero + 坚果云 （云同步）

[RAL 2022] CPG-RL: Learning Central Pattern Generators for Quadruped Locomotion

2023.07.04

DeepRL 强化学习

装配环境 Ubuntu 22.04 LTS + Python 3.8 + Pytorch + CUDA 11.7

新建 / 激活 / 删除虚拟环境

PyTorch

Tensorboard

Stable-Baselines3

Task

2023.07.05

解决驱动问题

向日葵远控

PyBullet

Introduction

2023.07.06

PyBullet

Tensorboard

下载 Tensorflow

使用 TensorBoard

MICRO_Quadruped_ARCHIVE_2023_05_17

安装及依赖

项目架构

A1 Software Manual v2.0

Getting Started

API

Control tutorial

A1_ros

廖雪峰 Python 教程

编码

输出格式化字符串

2023.07.07

Pybullet

继续 Stable-Baselines3

Policy Networks $*$

Using Custom Environments $*$

Callbacks $*$

Tensorboard Integration

NAS

2023.07.08

Python

字典 dict

参数检查

函数参数传递

继续 Stable-Baselines3

Integrations $*$

RL Baselines3 Zoo $*$ Documentation $*$

Breakout 调训练超参数

Zotero + MicroNAS

参考资料

大致步骤

2023.07.09

文献泛读

模板

High-speed quadruped locomotion by imitation-relaxation reinforcement learning

Zotero + 坚果云（云同步）

字典 `dict`

RL Baselines3 Zoo $$ Documentation $$