期刊目次

加入编委

期刊订阅

添加您的邮件地址以接收即将发行期刊数据:

Open Access Article

Journal of Engineering Research. 2022; 1: (3) ; 49-52 ; DOI: 10.12208/j.jer.20220062.

Data Capture and Analysis Based on Python
基于Python的数据抓取和用户特征分析

作者: 甘长笑 *, 孙立太

武汉东湖学院 湖北武汉

*通讯作者: 甘长笑,单位:武汉东湖学院 湖北武汉;

发布时间: 2022-09-09 总浏览量: 444

摘要

在最近几年,在大数据的快速发展中,数据成为了网络产业最宝贵的财富。特别是在如今B2C的时代,资料的使用越来越有意义。大量的资料是有很高的科研价值的。在网络产业中,数据挖掘有着举足轻重的作用。由于社会网路的日益流行,社会网路的传播与传播的速率日益提高。有许多资料是由直接或间接提供的。在任何一个纵向的研究中,都需要对目标网站进行实时捕捉和分析,并将这些信息传递到目标使用者。文章论述了Web Crawler(Web Crawler)的基本理论和构成,并对其构成进行了详细的描述,并将淘宝作为一个有代表性的app进行了数据收集和分析,从而对淘宝的各种特性进行了优化。

关键词: Python爬虫;数据分析;用户特征分析

Abstract

In recent years, in the rapid development of big data, data has become the most valuable wealth of the network industry. Especially in today's B2C era, the use of data is becoming more and more meaningful. A large amount of data is of high scientific value. Data mining plays an important role in network industry. Due to the increasing popularity of social networks, the spread and speed of social networks are increasing day by day. Much information is provided directly or indirectly. In any longitudinal study, it is necessary to capture and analyze the target website in real time and transmit the information to the target users. This paper discusses the basic theory and constitution of Web Crawler (Web Crawler), and gives a detailed description of its constitution, and takes Taobao as a representative app for data collection and analysis, so as to optimize the various characteristics of Taobao.

Key words: Python; Data analysis; User Feature Analysis

参考文献 References

[1] 张华平,高凯,黄河燕,赵燕平.大数据搜索与挖掘[M].科学出版社.2014.3-6  

[2] 龙香妤.基于网络爬虫技术的数据抓取程序的设计[J].技术与市场,2021,28(10):41-43. 

[3] 徐志,金伟.Python爬虫技术的网页数据抓取与分析[J].数字技术与应用,2020,38(10):30-32. 

[4] 任洛漪.基于Scrapy的商务网站数据抓取[J].信息与电脑(理论版),2018(19):56-57. 

引用本文

甘长笑, 孙立太, 基于Python的数据抓取和用户特征分析[J]. 工程学研究, 2022; 1: (3) : 49-52.