自学了一段时间Python,闲来无事爬了本《西游记》给大家分享下 - 免费资源

免费资源

点此查看:《自学了一段时间Python,闲来无事爬了本《西游记》给大家分享下 - 免费资源》完整版内容

运行没有问题,加注释就容易出错
[Python] 纯文本查看 复制代码

import requests
import os,time
from lxml import etree
from fake_useragent import UserAgent

def get_html(url):
    ua = UserAgent()
    headers = {'UserAgent': ua.random}
    response = requests.get(url,headers=headers)
    response.encoding = response.apparent_encoding
    return response

def be_tree(url):
    r = get_html(url)
    tree = etree.HTML(r.text)
    return tree

def get_mulu_lists(mulu_url):
    tree = be_tree(mulu_url)
    novel_name = tree.xpath('//h1/span[1]/b/text()')[0]
    cha_urls = tree.xpath('//ul/span/a/@href')
    titles = tree.xpath('//ul/span/a/text()')
    return novel_name,titles,cha_urls

def down_onechapter(novel_name,down_url):
    tree = be_tree(dow_url)
    datas = tree.xpath('//div[1]/div/p/text()')
    for data in datas:
        with open(f'./{novel_name}.txt','a',encoding='utf-8')as f:
            f.write(data)
#写入2行空字符,以便章节内容排版
    with open(f'./{novel_name}.txt', 'a', encoding='utf-8')as f:
        f.write('\n')
        f.write('\n')
    print('下载完成')

if __name__ == '__main__':
    start = time.time()
    # 西游记目录,其他书籍替换链接即可
    url = 'https://so.gushiwen.cn/guwen/book_46653FD803893E4FBF8761BEF60CD7D9.aspx'
    base_url =url.split('/guwen')[0]
    novel_name, titles, cha_urls = get_mulu_lists(url)
    for title,cha_url in zip(titles,cha_urls):
        dow_url = base_url + cha_url
        print(title,dow_url)
        with open(f'./{novel_name}.txt','a',encoding='utf-8')as f:
            f.write(title)
            f.write('\n')
        down_onechapter(novel_name,dow_url)
        print('全本下载完成')
    end = time.time()
    use_time = int(end) - int(start)
    print(f'下载耗时{use_time}秒')

声明:本站所有资源均由网友分享,如有侵权内容,请在文章下方留言,本站会立即处理。

相关文章:
  1. 巧妙运用CE修改器修改网页动态JS代码….
  2. 用了很久的血手鼠标宏文件
  3. 【IOS免越狱存档】画廊-涂色本和装饰,20亿金币+20亿星星+订阅,初始存档
  4. 罗技免费csgo宏
继续阅读
发表观点
  • 昵称不能为空
  • 邮箱不能为空
  • 还是写点什么卅...