![alt text](./media/banner.png) # Negatively Correlated Ensemble RL ## 环境安装 创建conda环境 ```bash conda create -n ncerl python=3.9 ``` 安装环境依赖 ```bash pip install -r requirements.txt ``` 注:该程序不需要您使用任何显卡,但是需要安装pytorch。如果您的显卡支持cuda,那么请安装cuda版本,否则安装cpu版本。使用cuda版本可以提高推理速度。 切换conda环境 ``` conda activate ncerl ``` ## 快速开始 如果您想查看效果,可以通过 ``` python app.py ``` 后打开命令行显示连接互动查看。 也可以通过运行 ``` python generate_and_play.py ``` 后查看`models/example_policy/samples.png`查看生成效果。 ## 开始训练 All training are launched by running `train.py` with option and arguments. For example, execute `python train.py ncesac --lbd 0.3 --m 5` will train NCERL with hyperparameters set as $\lambda = 0.3, m=5$. Plot script is `plots.py` * `python train.py gan`: to train a decoder which maps a continuous action to a game level segment. * `python train.py sac`: to train a standard SAC as the policy for online game level generation * `python train.py asyncsac`: to train a SAC with an asynchronous evaluation environment as the policy for online game level generation * `python train.py ncesac`: to train an NCERL based on SAC as the policy for online game level generation * `python train.py egsac`: to train an episodic generative SAC (see paper [*The fun facets of Mario: Multifaceted experience-driven PCG via reinforcement learning*](https://dl.acm.org/doi/abs/10.1145/3555858.3563282)) as the policy for online game level generation * `python train.py pmoe`: to train an episodic generative SAC (see paper [*Probabilistic Mixture-of-Experts for Efficient Deep Reinforcement Learning*](https://arxiv.org/abs/2104.09122)) as the policy for online game level generation * `python train.py sunrise`: to train a SUNRISE (see paper [*SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning*](https://proceedings.mlr.press/v139/lee21g.html)) as the policy for online game level generation * `python train.py dvd`: to train a DvD-SAC (see paper [*Effective Diversity in Population Based Reinforcement Learning*](https://proceedings.neurips.cc/paper_files/paper/2020/hash/d1dc3a8270a6f9394f88847d7f0050cf-Abstract.html)) as the policy for online game level generation For the training arguments, please refer to the help `python train.py [option] --help` ## 目录结构 ``` NCERL-DIVERSE-PCG/ * analysis/ * generate.py 未使用 * tests.py 做evaluation使用 * media/ markdown素材文件 * models/ * example_policy/ 做生成展示使用 * smb/ 马里奥仿真以及图片资源数据 * src/ * ddpm/ ddpm模型相关目录 * drl/ drl模型、训练目录 * env/ 马里奥gym环境和reward function * gan/ gan模型、训练目录 * olgen/ 在线生成环境与policy目录 * rlkit/ 强化学习使用部件目录 * smb/ 马里奥与仿真器交互组件以及多进程异步池组件 * utils/ 一些功能性文件 * training_data/ 训练数据 * README.md 当前文件 * app.py 用于gradio展示用途文件 * generate_and_play.py 用于非gradio展示文件 * train.py 训练文件 * test_ddpm.py 测试训练ddpm文件 * requirements.txt 环境依赖文件 ```