Xiang Long (SwordFaith)
大语言模型后训练技术研究员和算法工程师
个人简介
我是一位专注于大语言模型后训练技术的算法工程师和研究员,主要研究方向包括强化学习从人类反馈(RLHF)、 直接偏好优化(DPO)、监督式微调(SFT)等后训练方法。在分布式训练框架开发和高质量数据构建方面有丰富经验。
致力于通过技术创新推动AI系统的对齐和安全性,让大语言模型更好地服务于人类社会。
研究方向
后训练技术
- • RLHF (Reinforcement Learning from Human Feedback)
- • DPO (Direct Preference Optimization)
- • SFT (Supervised Fine-Tuning)
- • Constitutional AI
强化学习算法
- • PPO (Proximal Policy Optimization)
- • SAC (Soft Actor-Critic)
- • DQN (Deep Q-Network)
- • Multi-agent RL
分布式系统
- • 大规模模型训练框架
- • 分布式推理系统
- • 模型并行与数据并行
- • 混合精度训练
数据工程
- • 高质量训练数据构建
- • 数据清洗与预处理
- • 偏好数据标注
- • 数据增强技术
发表工作
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
arXiv preprint arXiv:2404.06395, 2024 | 324 citations
Authors: S Hu, Y Tu, X Han, C He, G Cui, X Long, Z Zheng, Y Fang, Y Huang, et al.
Introduces MiniCPM, a series of end-side language models with only 2.4B parameters that significantly outperform Llama2-7B on comprehensive benchmarks. This work demonstrates efficient training strategies for small-scale language models that achieve strong performance through innovative architectural designs and training methodologies.
MiniCPM4: Ultra-Efficient LLMs on End Devices
arXiv preprint arXiv:2506.07900, 2025
Authors: M Team, C Xiao, Y Li, X Han, Y Bai, J Cai, H Chen, W Chen, X Cong, X Long, et al.
Latest advancement in the MiniCPM series, focusing on ultra-efficient deployment of large language models on end devices with enhanced performance and reduced computational requirements.
IntTower: The Next Generation of Two-Tower Model for Pre-ranking System
Proceedings of the 31st ACM International Conference on Information & Knowledge Management, 2022 | 25 citations
Authors: X Li, B Chen, HF Guo, J Li, C Zhu, X Long, S Li, Y Wang, W Guo, L Mao, et al.
Proposes IntTower, a novel two-tower architecture for large-scale pre-ranking systems that significantly improves efficiency and accuracy in recommendation systems through innovative interaction modeling techniques.
Exploring Text-Transformers in AAAI 2021 Shared Task: Covid-19 Fake News Detection in English
International Workshop on Combating Online Hostile Posts in Regional Languages, 2021 | 52 citations
Authors: X Li, Y Xia, X Long, Z Li, S Li
Develops transformer-based approaches for COVID-19 fake news detection, achieving state-of-the-art performance in the AAAI 2021 shared task through advanced natural language processing techniques.
FenceMask: A Data Augmentation Approach for Pre-extracted Image Features
arXiv preprint arXiv:2006.07877, 2020 | 37 citations
Authors: P Li, X Li, X Long
Introduces FenceMask, a novel data augmentation technique for pre-extracted image features that improves model robustness and generalization in computer vision tasks.
Low Resource Style Transfer via Domain Adaptive Meta Learning
arXiv preprint arXiv:2205.12475, 2022 | 9 citations
Authors: X Li, X Long, Y Xia, S Li
Addresses the challenge of style transfer in low-resource settings using domain adaptive meta learning techniques.
KDD CUP 2021 MAG240M-LSC Team Passages Winner Solution
KDD CUP 2021 Competition, 2021 | 🏆 Winner
Authors: K Li, X Long, Z Feng, M Wang, X Liu, P Wang, Q Lin, K Zhao, B Ai
Winning solution for the KDD CUP 2021 MAG240M-LSC challenge, demonstrating excellence in large-scale graph learning and academic paper analysis.
Citation Statistics: 498 total citations | h-index: 8 | i10-index: 7
View Full Google Scholar Profile →GitHub 统计
访问 GitHub 主页 ↗正在加载 GitHub 数据...
联系方式
如果您对我的研究工作感兴趣,或者想要合作交流,欢迎联系我。邮箱:mid.of.change@gmail.com
发送邮件