-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature(yzj): add ptz ctde pipeline #149
base: main
Are you sure you want to change the base?
Conversation
lzero/model/muzero_model_mlp.py
Outdated
|
||
next_latent_state, reward = self.dynamics_network(state_action_encoding) | ||
agent_state_action_encoding = torch.cat((agent_latent_state, action_encoding), dim=1) | ||
global_state_action_encoding = torch.cat((agent_latent_state, global_latent_state, action_encoding), dim=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- global_state_action_encoding 有必要把agent_latent_state也拼接进去吗?
- 拼接进去后,action_encoding只占了5/(256*2+5),信息密度是否过低呢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个需要测试一下
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- s‘,s1‘, s2’, s3', r =(s,s1,s2,s3,a1,a2,a3) 用一个网络建模联合dynamic function,需要同时考虑team中每个agent的信息。
- collect按照team存储data。
- foward_learn中需要更改数据处理流程。unroll 5步,是整个team同时roll 5步。
- foward_learn 中reward的处理。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
global_state_dynamic的输入只有一个agent action, 没有joint action,是不合理的
lzero/policy/muzero.py
Outdated
policy_logits = policy_logits.detach().cpu().numpy().tolist() | ||
|
||
legal_actions = [[i for i, x in enumerate(action_mask[j]) if x == 1] for j in range(active_collect_env_num)] | ||
reward_roots = [[reward_root]*self.cfg.model.agent_num for reward_root in reward_roots] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里reward_roots就是一个长度为24的list,为什么要按照这里的方式变换呢?24=8*3,按理讲,应该每3个对应的reward都是同一个team_reward才对?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个地方就是变成每3个智能体用同一个reward去搜索
@@ -0,0 +1,116 @@ | |||
from easydict import EasyDict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
8d71f96
to
829d86d
Compare
No description provided.