Xiang Long (SwordFaith)

大语言模型后训练技术研究员和算法工程师

GitHub Twitter 知乎

个人简介

我是一位专注于大语言模型后训练技术的算法工程师和研究员，主要研究方向包括强化学习从人类反馈（RLHF）、直接偏好优化（DPO）、监督式微调（SFT）等后训练方法。在分布式训练框架开发和高质量数据构建方面有丰富经验。

致力于通过技术创新推动AI系统的对齐和安全性，让大语言模型更好地服务于人类社会。

研究方向

后训练技术

• RLHF (Reinforcement Learning from Human Feedback)
• DPO (Direct Preference Optimization)
• SFT (Supervised Fine-Tuning)
• Constitutional AI

强化学习算法

• PPO (Proximal Policy Optimization)
• SAC (Soft Actor-Critic)
• DQN (Deep Q-Network)
• Multi-agent RL

分布式系统

• 大规模模型训练框架
• 分布式推理系统
• 模型并行与数据并行
• 混合精度训练

数据工程

• 高质量训练数据构建
• 数据清洗与预处理
• 偏好数据标注
• 数据增强技术

发表工作

MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies

arXiv preprint arXiv:2404.06395, 2024 | 324 citations

Authors: S Hu, Y Tu, X Han, C He, G Cui, X Long, Z Zheng, Y Fang, Y Huang, et al.

Introduces MiniCPM, a series of end-side language models with only 2.4B parameters that significantly outperform Llama2-7B on comprehensive benchmarks. This work demonstrates efficient training strategies for small-scale language models that achieve strong performance through innovative architectural designs and training methodologies.

MiniCPM4: Ultra-Efficient LLMs on End Devices

arXiv preprint arXiv:2506.07900, 2025

Authors: M Team, C Xiao, Y Li, X Han, Y Bai, J Cai, H Chen, W Chen, X Cong, X Long, et al.

Latest advancement in the MiniCPM series, focusing on ultra-efficient deployment of large language models on end devices with enhanced performance and reduced computational requirements.

IntTower: The Next Generation of Two-Tower Model for Pre-ranking System

Proceedings of the 31st ACM International Conference on Information & Knowledge Management, 2022 | 25 citations

Authors: X Li, B Chen, HF Guo, J Li, C Zhu, X Long, S Li, Y Wang, W Guo, L Mao, et al.

Proposes IntTower, a novel two-tower architecture for large-scale pre-ranking systems that significantly improves efficiency and accuracy in recommendation systems through innovative interaction modeling techniques.

Exploring Text-Transformers in AAAI 2021 Shared Task: Covid-19 Fake News Detection in English

International Workshop on Combating Online Hostile Posts in Regional Languages, 2021 | 52 citations

Authors: X Li, Y Xia, X Long, Z Li, S Li

Develops transformer-based approaches for COVID-19 fake news detection, achieving state-of-the-art performance in the AAAI 2021 shared task through advanced natural language processing techniques.

FenceMask: A Data Augmentation Approach for Pre-extracted Image Features

arXiv preprint arXiv:2006.07877, 2020 | 37 citations

Authors: P Li, X Li, X Long

Introduces FenceMask, a novel data augmentation technique for pre-extracted image features that improves model robustness and generalization in computer vision tasks.

Low Resource Style Transfer via Domain Adaptive Meta Learning

arXiv preprint arXiv:2205.12475, 2022 | 9 citations

Authors: X Li, X Long, Y Xia, S Li

Addresses the challenge of style transfer in low-resource settings using domain adaptive meta learning techniques.

KDD CUP 2021 MAG240M-LSC Team Passages Winner Solution

KDD CUP 2021 Competition, 2021 | 🏆 Winner

Authors: K Li, X Long, Z Feng, M Wang, X Liu, P Wang, Q Lin, K Zhao, B Ai

Winning solution for the KDD CUP 2021 MAG240M-LSC challenge, demonstrating excellence in large-scale graph learning and academic paper analysis.

Citation Statistics: 498 total citations | h-index: 8 | i10-index: 7

View Full Google Scholar Profile →

GitHub 统计

访问 GitHub 主页 ↗

正在加载 GitHub 数据...

联系方式

如果您对我的研究工作感兴趣，或者想要合作交流，欢迎联系我。邮箱：mid.of.change@gmail.com

发送邮件