#> RESTKHZ _

休止千鹤 | 我依旧是一名平凡的学生

kWordSentry: 使用Python监听网页关键词变化的抢票提醒工具

  休止千鹤  |    23/06/2022

kWordSentry

这是一个简单的,可以定时监听网页关键词变化并通知你的脚本。你可以用它来通过邮件或者微信提醒你抢票抢货。

kWordSentry

Github:kWordSentry

设计

想给npy买的东西虽然有很多网站上都有它的信息,可是因为过于抢手导致根本没有货。即便是有了,得知消息后查看也只是Out of stock。

所以花了两个小时去写了这个脚本。

我需要把这个脚本部署在服务器上,同时它并不怎么复杂,所以这个活用python整比较容易。

我的思路是,直接对比这些网页是否有Out of stock或者类似的东西,仅在缺货时才有的一个字符串。可以有一些不同因为我们可以用正则表达式匹配。比较巧的是这些信息不难获得。只需要一个HTTP GET方法。

我比较喜欢使用requests库:

def getContent(url):
    try:
        r = requests.get(url, headers=cfg.HEADER)
        r.raise_for_status()
        return r.text
    except Exception as e:
        logging.warning('Something went wrong: {0}'.format(e))
        return ''

这样我们便能获得这个URL的html文本内容。如果有错误就显示出来,但是不能阻断脚本运行。headers主要考虑UA,语言等等。尽可能模仿浏览器。我把它放进了cfg里去了。

接下来我们要一个函数来检查内容是否可以用正则匹配。

def checkContent(url, content, kWords):
    if content == '':
        return
    if re.search(r'{}'.format(kWords), content) == None:
        logging.info('!!!kWord not found, page might be changed!!!')
        logging.info("!!!@URL: %s" % (url,))
        alarm.trigger(url)

如果没有匹配到缺货的特征,那么就触发alarm,去提醒我。

我们需要整合起来这两个函数,并且定时访问列表的网站。

urllist = {'https://xxx/xxx.html':'out of stock.'} # 在别的文件中

def loop():
    while True:
        logging.info('Checking...Starting a loop')
        for url in urllist:
            content = getContent(url)
            checkContent(url, content, urllist[url])
        logging.info("Done. Goto sleep for %ssec." % (cfg.DURATION))
        sleep(cfg.DURATION)

我们从URL列表中读取URL和关键字,使用了一个死循环来不停地进行检查。
最终我们用sleep挂起一段时间。

所以整体看起来大概是这样:

import requests
import logging
from time import sleep
import re
import config as cfg
from urllist import urllist
import alarm

logging.basicConfig(level=logging.INFO, format='%(levelname)s - %(asctime)s - %(name)s - %(message)s')

def getContent(url):
    try:
        r = requests.get(url, headers=cfg.HEADER)
        r.raise_for_status()
        return r.text
    except Exception as e:
        logging.warning('Something went wrong: {0}'.format(e))
        return ''

def checkContent(url, content, kWords):
    if content == '':
        return
    if re.search(r'{}'.format(kWords), content) == None:
        logging.info('!!!kWord not found, page might be changed!!!')
        logging.info("!!!@URL: %s" % (url,))
        alarm.trigger(url)


def loop():
    while True:
        logging.info('Checking...Starting a loop')
        for url in urllist:
            content = getContent(url)
            checkContent(url, content, urllist[url])
        logging.info("Done. Goto sleep for %ssec." % (cfg.DURATION))
        sleep(cfg.DURATION)


if __name__ == "__main__":
    print('''
  _   __        __            _ ____             _              
 | | _\ \      / /__  _ __ __| / ___|  ___ _ __ | |_ _ __ _   _ 
 | |/ /\ \ /\ / / _ \| '__/ _` \___ \ / _ \ '_ \| __| '__| | | |
 |   <  \ V  V / (_) | | | (_| |___) |  __/ | | | |_| |  | |_| |
 |_|\_\  \_/\_/ \___/|_|  \__,_|____/ \___|_| |_|\__|_|   \__, |
                                                          |___/ 
        ''')
    logging.info('Starting sentry duty...')
    loop()

加载提醒

我已经写了邮件和Server酱发到微信两个模块用于提醒。

在之前的代码, 我们调用了alarm这个模块。Alarm在配置文件中可以是字符串,也可以是列表。这里我们根据模块名(.py文件名)动态加载。而后统一调用alert函数。

#alarm.py
from config import ALARM
import importlib

def trigger(url):
    if isinstance(ALARM,list):
        for a in ALARM:
            alarm = importlib.import_module(a)
            alarm.alert(url)
    else:
        alarm = importlib.import_module(ALARM)
        alarm.alert(url)

发送邮件

参考文档 https://docs.python.org/zh-cn/3/library/smtplib.html

这里我使用了smtplib。我们首先构造邮件本身。

注意,如果是QQ或者163等邮箱需要你去申请SMTP授权,他们会告诉你端口,加密,还有你的特殊密码。对的,不是你的登录密码。

如果是Gmail你需要开二次验证后,给你的APP使用独立密码,然后使用独立密码在这里发送邮件。

邮件是一个MIME格式的东西。我们需要from email.mime.text import MIMEText加载。然后构造它。配置文件会从外部加载。

from config import MAIL

def makeMsg(url,recv):
    msg = '''
Hello:

kWordSentry found a page might be changed:
{0}

kWordSentry
    '''.format(url)
    mail = MIMEText(msg,'plain','utf-8')
    mail['From']='''"{0}" <{1}>'''.format(MAIL['USER_NICKNAME'], MAIL['USER_ADDR'])
    mail['To']=recv
    mail['Subject']='[kWordSentry] Page changes'
    logging.debug(mail.as_string())
    return mail

这样我们会有一个MIME的邮件对象。

然后我们需要把邮件发出去。当然,为了适应更多不同类型的邮箱,我这里通过配置可以兼容SSL或者TLS的邮箱。当然你需要改配置文件。

def sendmail(url):
    for recv in MAIL['RECV_ADDR']:
        logging.info("Sending mail to %s" % (recv,))
        try:
            mail = makeMsg(url, recv)
            if MAIL['SSL']:
                s = smtplib.SMTP_SSL(MAIL['HOST'], MAIL['PORT'])
            else:
                s = smtplib.SMTP(MAIL['HOST'], MAIL['PORT'])
                if MAIL['TLS']:
                    s.starttls()
            s.login(MAIL['USER_ADDR'], MAIL['USER_PASS'])
            s.sendmail(MAIL['USER_ADDR'], recv, mail.as_string())
            s.quit()
        except Exception as e:
            logging.warning("Error: %s" % (e,))
    logging.info('Done.')

看一下全貌吧。alert()函数为了alarm模块触发而留着。test()是为了测试。

import logging
import smtplib
from config import MAIL
from email.mime.text import MIMEText

def alert(url):
    logging.basicConfig(level=logging.INFO, format='%(levelname)s - %(asctime)s - %(name)s - %(message)s')
    sendmail(url)

def makeMsg(url,recv):
    msg = '''
Hello:

kWordSentry found a page might be changed:
{0}

kWordSentry
    '''.format(url)
    mail = MIMEText(msg,'plain','utf-8')
    mail['From']='''"{0}" <{1}>'''.format(MAIL['USER_NICKNAME'], MAIL['USER_ADDR'])
    mail['To']=recv
    mail['Subject']='[kWordSentry] Page changes'
    logging.debug(mail.as_string())
    return mail

def sendmail(url):
    for recv in MAIL['RECV_ADDR']:
        logging.info("Sending mail to %s" % (recv,))
        try:
            mail = makeMsg(url, recv)
            if MAIL['SSL']:
                s = smtplib.SMTP_SSL(MAIL['HOST'], MAIL['PORT'])
            else:
                s = smtplib.SMTP(MAIL['HOST'], MAIL['PORT'])
                if MAIL['TLS']:
                    s.starttls()
            s.login(MAIL['USER_ADDR'], MAIL['USER_PASS'])
            s.sendmail(MAIL['USER_ADDR'], recv, mail.as_string())
            s.quit()
        except Exception as e:
            logging.warning("Error: %s" % (e,))
    logging.info('Done.')

def test():
    logging.basicConfig(level=logging.DEBUG, format='%(levelname)s - %(asctime)s - %(name)s - %(message)s')
    sendmail('http://<test-parameter>')

if __name__ == "__main__":
    print('TESTING...')
    test()

发送到微信

我们这里使用了Server酱的服务, 感谢他们。
https://sct.ftqq.com/

import requests
import logging
import urllib.parse
from config import APIKEY

def alert(url):
    logging.basicConfig(level=logging.INFO, format='%(levelname)s - %(asctime)s - %(name)s - %(message)s')
    sendMsg(url)

def test():
    logging.basicConfig(level=logging.DEBUG, format='%(levelname)s - %(asctime)s - %(name)s - %(message)s')
    sendMsg('http://<test-parameter>')

def sendMsg(url):
    title = '[kWordSentry]'
    msg = "kWordSentry发现有关键词消失:{0}".format(url)
    msg = urllib.parse.quote_plus(msg)
    api = "https://sctapi.ftqq.com/{0}.send?title={1}&desp={2}".format(APIKEY, title, msg)
    try:
        r = requests.get(api)
    except Exception as e:
        logging.warning('Failed to connect to ServerChan: {0}'.format(e))
        return
    logging.info('Done.')

if __name__ == "__main__":
    print('TESTING...')
    test()

Views:

 Comments


(no comments...maybe you can be the first?)