微信号:PythonTZXY

介绍:每天更新,更新python相关的知识.希望诸君有所收获!

Python实现抓取斗鱼实时弹幕

2019-06-14 15:45 obeina
基本环境配置

版本:python3

系统:Windows

相关模块:

requests,BeautifulSoup4,lxml

安装模块:

pip install xxx


代码实现


'''文件名:爬取斗鱼直播间信息到jsonline文件.py遇到python不懂的问题,可以加Python学习交流群:1004391443一起学习交流,群文件还有零基础入门的学习资料
'''from __future__ import unicode_literalsimport multiprocessingimport socketimport timeimport reimport requestsfrom bs4 import BeautifulSoupimport json # 配置socket的ip和端口client = socket.socket(socket.AF_INET, socket.SOCK_STREAM)host = socket.gethostbyname("openbarrage.douyutv.com")port = 8601client.connect((host, port)) # 获取用户昵称及弹幕信息的正则表达式danmu = re.compile(b'type@=chatmsg.*?/nn@=(.*?)/txt@=(.*?)/.*?/level@=(.*?)/.*?/bnn@=(.*?)/bl@=(.*?)/')  def sendmsg(msgstr):    '''    客户端向服务器发送请求的函数,集成发送协议头的功能    msgHead: 发送数据前的协议头,消息长度的两倍,及消息类型、加密字段和保密字段    使用while循环发送具体数据,保证将数据都发送出去    '''    msg = msgstr.encode('utf-8')    data_length = len(msg) + 8    code = 689    msgHead = int.to_bytes(data_length, 4, 'little') \              + int.to_bytes(data_length, 4, 'little') + int.to_bytes(code, 4, 'little')    client.send(msgHead)    sent = 0    while sent < len(msg):        tn = client.send(msg[sent:])        sent = sent + tn  def start(roomid):    '''    发送登录验证请求后,获取服务器返回的弹幕信息,同时提取昵称及弹幕内容    登陆请求消息及入组消息末尾要加入\0    '''    msg = 'type@=loginreq/roomid@={}/\0'.format(roomid)    sendmsg(msg)    msg_more = 'type@=joingroup/rid@={}/gid@=-9999/\0'.format(roomid)    sendmsg(msg_more)     print('---------------欢迎连接到{}的直播间---------------'.format(get_name(roomid)))    while True:        data = client.recv(1024)        danmu_more = danmu.findall(data)        if not data:            break        else:            with open(format(get_name(roomid))+time.strftime('%Y.%m.%d',time.localtime(time.time()))+'直播弹幕', 'a') as f:                try:                    for i in danmu_more:                        dmDict={}                        #print(i)                        dmDict['昵称'] = i[0].decode(encoding='utf-8', errors='ignore')                        dmDict['弹幕内容'] = i[1].decode(encoding='utf-8', errors='ignore')                        dmDict['等级'] = i[2].decode(encoding='utf-8', errors='ignore')                        dmDict['徽章昵称'] = i[3].decode(encoding='utf-8', errors='ignore')                        dmDict['徽章等级'] = i[4].decode(encoding='utf-8', errors='ignore')                        dmJsonStr = json.dumps(dmDict, ensure_ascii=False)+'\n'                        #print(dmDict['昵称'])                        print(dmDict['弹幕内容'])                        f.write(dmJsonStr)                        danmuNum = danmuNum + 1                except:                    continue def keeplive():    '''    发送心跳信息,维持TCP长连接    心跳消息末尾加入\0    '''    while True:        msg = 'type@=mrkl/\0'        sendmsg(msg)        time.sleep(45)  def get_name(roomid):    '''    利用BeautifulSoup获取直播间标题    '''    r = requests.get("http://www.douyu.com/" + roomid)    soup = BeautifulSoup(r.text, 'lxml')    return soup.find('a', {'class', 'Title-anchorName'}).string # 启动程序if __name__ == '__main__':    room_id = input('请输入房间ID: ')    p1 = multiprocessing.Process(target=start, args=(room_id,))    p2 = multiprocessing.Process(target=keeplive)    p1.start()    p2.start()


实现效果


且会在当前目录下生成以主播名字命名的文件



 
Python学习交流 更多文章 Python爬取豆瓣排行榜电影数据(含GUI界面版) Python 一键获取百度网盘提取码 12306火车票抢票Python代码最新完整版发布 不小心执行了rm -f,除了跑路,如何恢复? 520来啦~教你用Python给自己造了一个女朋友
猜您喜欢 WKWebView 不支持 NSURLProtocol 吗 统计数据告诉你:身价10亿的创始人如何选女友? 新人在中小公司遇到的成长困境 如何高效工作——个人硬件篇 程序员颈椎病防护指南