休止千鹤 | 我依旧是一名平凡的学生
这是一个简单的,可以定时监听网页关键词变化并通知你的脚本。你可以用它来通过邮件或者微信提醒你抢票抢货。
Github:kWordSentry
想给npy买的东西虽然有很多网站上都有它的信息,可是因为过于抢手导致根本没有货。即便是有了,得知消息后查看也只是Out of stock。
所以花了两个小时去写了这个脚本。
我需要把这个脚本部署在服务器上,同时它并不怎么复杂,所以这个活用python整比较容易。
我的思路是,直接对比这些网页是否有Out of stock或者类似的东西,仅在缺货时才有的一个字符串。可以有一些不同因为我们可以用正则表达式匹配。比较巧的是这些信息不难获得。只需要一个HTTP GET方法。
我比较喜欢使用requests库:
def getContent(url):
try:
r = requests.get(url, headers=cfg.HEADER)
r.raise_for_status()
return r.text
except Exception as e:
logging.warning('Something went wrong: {0}'.format(e))
return ''
这样我们便能获得这个URL的html文本内容。如果有错误就显示出来,但是不能阻断脚本运行。headers主要考虑UA,语言等等。尽可能模仿浏览器。我把它放进了cfg里去了。
接下来我们要一个函数来检查内容是否可以用正则匹配。
def checkContent(url, content, kWords):
if content == '':
return
if re.search(r'{}'.format(kWords), content) == None:
logging.info('!!!kWord not found, page might be changed!!!')
logging.info("!!!@URL: %s" % (url,))
alarm.trigger(url)
如果没有匹配到缺货的特征,那么就触发alarm,去提醒我。
我们需要整合起来这两个函数,并且定时访问列表的网站。
urllist = {'https://xxx/xxx.html':'out of stock.'} # 在别的文件中
def loop():
while True:
logging.info('Checking...Starting a loop')
for url in urllist:
content = getContent(url)
checkContent(url, content, urllist[url])
logging.info("Done. Goto sleep for %ssec." % (cfg.DURATION))
sleep(cfg.DURATION)
我们从URL列表中读取URL和关键字,使用了一个死循环来不停地进行检查。
最终我们用sleep挂起一段时间。
所以整体看起来大概是这样:
import requests
import logging
from time import sleep
import re
import config as cfg
from urllist import urllist
import alarm
logging.basicConfig(level=logging.INFO, format='%(levelname)s - %(asctime)s - %(name)s - %(message)s')
def getContent(url):
try:
r = requests.get(url, headers=cfg.HEADER)
r.raise_for_status()
return r.text
except Exception as e:
logging.warning('Something went wrong: {0}'.format(e))
return ''
def checkContent(url, content, kWords):
if content == '':
return
if re.search(r'{}'.format(kWords), content) == None:
logging.info('!!!kWord not found, page might be changed!!!')
logging.info("!!!@URL: %s" % (url,))
alarm.trigger(url)
def loop():
while True:
logging.info('Checking...Starting a loop')
for url in urllist:
content = getContent(url)
checkContent(url, content, urllist[url])
logging.info("Done. Goto sleep for %ssec." % (cfg.DURATION))
sleep(cfg.DURATION)
if __name__ == "__main__":
print('''
_ __ __ _ ____ _
| | _\ \ / /__ _ __ __| / ___| ___ _ __ | |_ _ __ _ _
| |/ /\ \ /\ / / _ \| '__/ _` \___ \ / _ \ '_ \| __| '__| | | |
| < \ V V / (_) | | | (_| |___) | __/ | | | |_| | | |_| |
|_|\_\ \_/\_/ \___/|_| \__,_|____/ \___|_| |_|\__|_| \__, |
|___/
''')
logging.info('Starting sentry duty...')
loop()
我已经写了邮件和Server酱发到微信两个模块用于提醒。
在之前的代码, 我们调用了alarm这个模块。Alarm在配置文件中可以是字符串,也可以是列表。这里我们根据模块名(.py文件名)动态加载。而后统一调用alert函数。
#alarm.py
from config import ALARM
import importlib
def trigger(url):
if isinstance(ALARM,list):
for a in ALARM:
alarm = importlib.import_module(a)
alarm.alert(url)
else:
alarm = importlib.import_module(ALARM)
alarm.alert(url)
参考文档 https://docs.python.org/zh-cn/3/library/smtplib.html
这里我使用了smtplib。我们首先构造邮件本身。
注意,如果是QQ或者163等邮箱需要你去申请SMTP授权,他们会告诉你端口,加密,还有你的特殊密码。对的,不是你的登录密码。
如果是Gmail你需要开二次验证后,给你的APP使用独立密码,然后使用独立密码在这里发送邮件。
邮件是一个MIME格式的东西。我们需要from email.mime.text import MIMEText
加载。然后构造它。配置文件会从外部加载。
from config import MAIL
def makeMsg(url,recv):
msg = '''
Hello:
kWordSentry found a page might be changed:
{0}
kWordSentry
'''.format(url)
mail = MIMEText(msg,'plain','utf-8')
mail['From']='''"{0}" <{1}>'''.format(MAIL['USER_NICKNAME'], MAIL['USER_ADDR'])
mail['To']=recv
mail['Subject']='[kWordSentry] Page changes'
logging.debug(mail.as_string())
return mail
这样我们会有一个MIME的邮件对象。
然后我们需要把邮件发出去。当然,为了适应更多不同类型的邮箱,我这里通过配置可以兼容SSL或者TLS的邮箱。当然你需要改配置文件。
def sendmail(url):
for recv in MAIL['RECV_ADDR']:
logging.info("Sending mail to %s" % (recv,))
try:
mail = makeMsg(url, recv)
if MAIL['SSL']:
s = smtplib.SMTP_SSL(MAIL['HOST'], MAIL['PORT'])
else:
s = smtplib.SMTP(MAIL['HOST'], MAIL['PORT'])
if MAIL['TLS']:
s.starttls()
s.login(MAIL['USER_ADDR'], MAIL['USER_PASS'])
s.sendmail(MAIL['USER_ADDR'], recv, mail.as_string())
s.quit()
except Exception as e:
logging.warning("Error: %s" % (e,))
logging.info('Done.')
看一下全貌吧。alert()
函数为了alarm
模块触发而留着。test()
是为了测试。
import logging
import smtplib
from config import MAIL
from email.mime.text import MIMEText
def alert(url):
logging.basicConfig(level=logging.INFO, format='%(levelname)s - %(asctime)s - %(name)s - %(message)s')
sendmail(url)
def makeMsg(url,recv):
msg = '''
Hello:
kWordSentry found a page might be changed:
{0}
kWordSentry
'''.format(url)
mail = MIMEText(msg,'plain','utf-8')
mail['From']='''"{0}" <{1}>'''.format(MAIL['USER_NICKNAME'], MAIL['USER_ADDR'])
mail['To']=recv
mail['Subject']='[kWordSentry] Page changes'
logging.debug(mail.as_string())
return mail
def sendmail(url):
for recv in MAIL['RECV_ADDR']:
logging.info("Sending mail to %s" % (recv,))
try:
mail = makeMsg(url, recv)
if MAIL['SSL']:
s = smtplib.SMTP_SSL(MAIL['HOST'], MAIL['PORT'])
else:
s = smtplib.SMTP(MAIL['HOST'], MAIL['PORT'])
if MAIL['TLS']:
s.starttls()
s.login(MAIL['USER_ADDR'], MAIL['USER_PASS'])
s.sendmail(MAIL['USER_ADDR'], recv, mail.as_string())
s.quit()
except Exception as e:
logging.warning("Error: %s" % (e,))
logging.info('Done.')
def test():
logging.basicConfig(level=logging.DEBUG, format='%(levelname)s - %(asctime)s - %(name)s - %(message)s')
sendmail('http://<test-parameter>')
if __name__ == "__main__":
print('TESTING...')
test()
我们这里使用了Server酱的服务, 感谢他们。
https://sct.ftqq.com/
import requests
import logging
import urllib.parse
from config import APIKEY
def alert(url):
logging.basicConfig(level=logging.INFO, format='%(levelname)s - %(asctime)s - %(name)s - %(message)s')
sendMsg(url)
def test():
logging.basicConfig(level=logging.DEBUG, format='%(levelname)s - %(asctime)s - %(name)s - %(message)s')
sendMsg('http://<test-parameter>')
def sendMsg(url):
title = '[kWordSentry]'
msg = "kWordSentry发现有关键词消失:{0}".format(url)
msg = urllib.parse.quote_plus(msg)
api = "https://sctapi.ftqq.com/{0}.send?title={1}&desp={2}".format(APIKEY, title, msg)
try:
r = requests.get(api)
except Exception as e:
logging.warning('Failed to connect to ServerChan: {0}'.format(e))
return
logging.info('Done.')
if __name__ == "__main__":
print('TESTING...')
test()
Views:
Comments
(no comments...maybe you can be the first?)