首页
学习
活动
专区
工具
TVP
发布

蓝里小窝

专栏作者
62
文章
36063
阅读量
14
订阅数
强化学习在生成式预训练语言模型中的研究现状简单调研
本文旨在深入探讨强化学习在生成式预训练语言模型中的应用,特别是在对齐优化、提示词优化和经验记忆增强提示词等方面的具体实践。通过对现有研究的综述,我们将揭示强化学习在提高生成式语言模型性能和人类对话交互的关键作用。虽然这些应用展示了巨大的潜力,但也将讨论现有方法的挑战和可能的未来发展方向。
Ranlychan
2024-01-10
2730
强化学习Double DQN方法玩雅达利Breakout游戏完整实现代码与评估pytorch
Breakout是一款经典的雅达利游戏,也就是我们所熟知的“打砖块”。玩家需要左右移动在屏幕下方的短平板子将一颗不断弹跳的小球反弹回屏幕上方,使其将一块块矩形砖块组成的六行砖块墙面打碎,并防止小球从屏幕底部掉落。在Atari 2600版本的Breakout中,玩家共有5次小球掉落机会,一旦用完就标志游戏结束,每打掉一块砖块得1分,全部打掉则游戏胜利结束。
Ranlychan
2024-01-10
3950
论文复现 | Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy
初始条件介绍和必要准备工作,代码来自https://github.com/thuml/Anomaly-Transformer,论文数据来自作者提供的Google Cloud
Ranlychan
2023-12-24
7781
阅读笔记 | Privacy vs. Efficiency: Achieving Both Through Adaptive Hierarchical Federated Learning
The paper argue that the efficiency and data privacy of Federated Learning are non-orthogonal from the perspective of model training, which means they are restricting each other. So that the paper strictly formulates the problem at first, and designs a cloud-edge-end hierarchical FL system with adaptive control algorithm embedding a two-level Differential Protection method to relieve both the resource and privacy concerns. The design follows the following ideas:
Ranlychan
2023-11-29
1130
OpenCloudOS | yum源配置
Ranlychan
2023-10-31
5910
阅读笔记|Life on the Edge: Unraveling Policies into Configurations
info: W. X. Zhao et al., “A Survey of Large Language Models.” arXiv, Sep. 11, 2023. Accessed: Sep. 18, 2023. [Online]. Available: http://arxiv.org/abs/2303.18223
Ranlychan
2023-10-29
1130
阅读笔记|SIMPLE-fying Middlebox Policy Enforcement Using SDN
info: Qazi, Zafar Ayyub, Rui Miao, Cheng-Chun Tu, Vyas Sekar, Luis Chiang, and Minlan Yu. “SIMPLE-Fying Middlebox Policy Enforcement Using SDN,” n.d.
Ranlychan
2023-10-29
1090
阅读笔记|P4: programming protocol-independent packet processors
info: Bosshart, Pat, Dan Daly, Glen Gibb, Martin Izzard, Nick McKeown, Jennifer Rexford, Cole Schlesinger, et al. “P4: Programming Protocol-Independent Packet Processors.” ACM SIGCOMM Computer Communication Review 44, no. 3 (July 28, 2014): 87–95. https://doi.org/10.1145/2656877.2656890.
Ranlychan
2023-10-29
1700
阅读笔记|Verifying and Monitoring IoTs Network Behavior Using MUD Profiles
info: A. Hamza, D. Ranathunga, H. H. Gharakheili, T. A. Benson, M. Roughan, and V. Sivaraman, “Verifying and Monitoring IoTs Network Behavior Using MUD Profiles,” IEEE Trans. Dependable and Secure Comput., vol. 19, no. 1, pp. 1–18, Jan. 2022, doi: 10.1109/TDSC.2020.2997898.
Ranlychan
2023-10-15
1110
阅读笔记|Attention Is All You Need
info: A. Vaswani et al., “Attention Is All You Need,” 2017, doi: 10.48550/ARXIV.1706.03762.
Ranlychan
2023-10-15
2450
阅读笔记|Reinforcement Learning with Feedback from Multiple Humans with Diverse Skills
info: T. Benson, A. Akella, and D. A. Maltz, “Mining policies from enterprise network configuration,” in Proceedings of the 9th ACM SIGCOMM conference on Internet measurement, Chicago Illinois USA: ACM, Nov. 2009, pp. 136–142. doi: 10.1145/1644893.1644909.
Ranlychan
2023-10-15
1750
阅读笔记|Language Models are Few-Shot Learners
info: T. B. Brown et al., “Language Models are Few-Shot Learners,” 2020, doi: 10.48550/ARXIV.2005.14165.
Ranlychan
2023-10-15
2750
阅读笔记|A Survey of Large Language Models
info: W. X. Zhao et al., “A Survey of Large Language Models.” arXiv, Sep. 11, 2023. Accessed: Sep. 18, 2023. [Online]. Available: http://arxiv.org/abs/2303.18223
Ranlychan
2023-10-15
2980
阅读笔记|Mining policies from enterprise network configuration
info: T. Benson, A. Akella, and D. A. Maltz, “Mining policies from enterprise network configuration,” in Proceedings of the 9th ACM SIGCOMM conference on Internet measurement, Chicago Illinois USA: ACM, Nov. 2009, pp. 136–142. doi: 10.1145/1644893.1644909.
Ranlychan
2023-10-15
1060
阅读笔记|The evolution of network configuration: a tale of two campuses
info: H. Kim, T. Benson, A. Akella, and N. Feamster, “The evolution of network configuration: a tale of two campuses,” in Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference, Berlin Germany: ACM, Nov. 2011, pp. 499–514. doi: 10.1145/2068816.2068863.
Ranlychan
2023-10-15
1360
阅读笔记|DeepConfig: Automating Data Center Network Topologies Management with Machine Learning
info: C. Streiffer, H. Chen, T. Benson, and A. Kadav, “DeepConfig: Automating Data Center Network Topologies Management with Machine Learning.” arXiv, Dec. 11, 2017. Accessed: Aug. 06, 2023. [Online]. Available: http://arxiv.org/abs/1712.03890
Ranlychan
2023-10-15
1310
阅读笔记|Demystifying configuration challenges and trade-offs in network-based ISP services
info: T. Benson, A. Akella, and A. Shaikh, “Demystifying configuration challenges and trade-offs in network-based ISP services,” in Proceedings of the ACM SIGCOMM 2011 conference, Toronto Ontario Canada: ACM, Aug. 2011, pp. 302–313. doi: 10.1145/2018436.2018471.
Ranlychan
2023-10-15
1470
阅读笔记|Efficient and Safe Network Updates with Suffix Causal Consistency
info: S. Liu, T. A. Benson, and M. K. Reiter, “Efficient and Safe Network Updates with Suffix Causal Consistency,” in Proceedings of the Fourteenth EuroSys Conference 2019, Dresden Germany: ACM, Mar. 2019, pp. 1–15. doi: 10.1145/3302424.3303965.
Ranlychan
2023-10-15
900
阅读笔记 | Edge AI: On-Demand Accelerating Deep Neural Network Inference via Edge Computing
info: E. Li, L. Zeng, Z. Zhou, and X. Chen, “Edge AI: On-Demand Accelerating Deep Neural Network Inference via Edge Computing,” IEEE Trans. Wireless Commun., vol. 19, no. 1, pp. 447–457, Jan. 2020, doi: 10.1109/TWC.2019.2946140.
Ranlychan
2023-10-15
1830
阅读笔记 | Edge-Cloud Polarization and Collaboration: A Comprehensive Survey for AI
info: J. Yao et al., “Edge-Cloud Polarization and Collaboration: A Comprehensive Survey for AI,” IEEE Trans. Knowl. Data Eng., pp. 1–1, 2022, doi: 10.1109/TKDE.2022.3178211.
Ranlychan
2023-10-15
3463
点击加载更多
社区活动
腾讯技术创作狂欢月
“码”上创作 21 天,分 10000 元奖品池!
Python精品学习库
代码在线跑,知识轻松学
博客搬家 | 分享价值百万资源包
自行/邀约他人一键搬运博客,速成社区影响力并领取好礼
技术创作特训营·精选知识专栏
往期视频·千货材料·成员作品 最新动态
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档