期刊目次

加入编委

期刊订阅

添加您的邮件地址以接收即将发行期刊数据:

Open Access Article

Journal of Engineering Research. 2025; 4: (9) ; 18-22 ; DOI: 10.12208/j.jer.20250394.

Application of real-time feature lakes in large-scale risk control: high-concurrency writing and low-latency recall
实时特征湖在大规模风控中的落地:高并发写入与低时延召回

作者: 刘超 *

Devz AI technologies Inc., America

*通讯作者: 刘超,单位:Devz AI technologies Inc., America;

发布时间: 2025-12-31 总浏览量: 120

摘要

随着电子商务与金融业务的快速发展,实时风控系统对数据处理的时效性、一致性与可扩展性提出了更高要求。传统基于Hive、Kafka和Flink的流批分离架构在特征管理、数据更新与查询效率方面存在明显瓶颈。本文基于本人在多家互联网企业的大数据与风控平台的实战建设经验,提出并实践了一种基于Apache Paimon 构建的实时特征湖架构,实现了高并发数据写入与低时延特征召回的平衡。该系统已在一些全球电商的风控场景中成功落地,支持日均千亿级事件处理,特征更新延迟降至秒级,查询性能提升3倍以上,为大规模实时风控提供了可复用的架构范式。本文详细阐述了系统架构设计、关键技术选型与优化策略,并对落地过程中的挑战与解决方案进行了深入总结。

关键词: 实时特征湖;风控系统;Apache Paimon;高并发写入;低时延查询;流批一体;特征工程

Abstract

With the rapid development of e-commerce and financial services, real-time risk control systems have placed higher demands on data processing timeliness, consistency, and scalability. Traditional stream-batch separation architectures based on Hive, Kafka, and Flink exhibit significant bottlenecks in feature management, data updates, and query efficiency. Building on my hands-on experience in constructing big data and risk control platforms for multiple internet enterprises, this paper proposes and implements a real-time feature lake architecture based on Apache Paimon. This architecture achieves a balance between high-concurrency data writes and low-latency feature recall. Successfully deployed in risk control scenarios for global e-commerce platforms, the system supports daily processing of hundreds of billions of events, reduces feature update latency to seconds, and enhances query performance by over threefold, providing a reusable architectural paradigm for large-scale real-time risk control. The paper elaborates on system architecture design, key technology selection and optimization strategies, and provides an in-depth summary of challenges and solutions encountered during implementation.

Key words: Real-time feature lake; Risk control system; Apache Paimon; High-concurrency writes; Low-latency queries; Stream-batch integration; Feature engineering

参考文献 References

[1] Apache Paimon Official Documentation.

     https://paimon.apache.org/

[2] Liu, C. Research on Industry Situation of Apache Paimon. 2025.

[3] Flink CDC Documentation.

     https://github.com/ververica/flink-cdc-connectors

[4] 王培斌. 基于Paimon与StarRocks的实时湖仓探索. 2024.

[5] 刘艳梅等. 湖仓一体技术及其行业现状研究. TrustCom 2023.

[6] 赵晓明, 李建国. 基于流批一体架构的实时风控系统设计. 计算机工程与应用, 2024, 60(5): 112-120.

[7] 陈晓红, 张伟. 大数据环境下实时特征平台架构研究. 软件学报, 2023, 34(8): 3567-3585. 


引用本文

刘超, 实时特征湖在大规模风控中的落地:高并发写入与低时延召回[J]. 工程学研究, 2025; 4: (9) : 18-22.