I've recently started looking at the awesome flexget and thought it would solve a problem with my sons school newsletter.
Like most schools, the one our son attends has a weekly newsletter to inform the parents of upcoming activities and events at the school. When he started the school gave us the option to either have a physical printout of it sent home with our child1, or we could just download the pdf from their website.
We opted-out of the printed newsletter and were happy to check the website for the pdf version of the newsletter.
As we're both busy and sometimes forget to check the website we'd occasionally miss things.
I wanted to download the pdf automatically when it appeared on the website and email it to us so we wouldn't have to remember to check for the latest version.
Initially I used wget called from cron to check the website like this:
/usr/bin/wget -r -l1 -N --no-verbose --continue --no-parent \ --no-directories --no-host-directories --reject html,htm,txt \ --accept .pdf -o /var/log/newsletters.log \ --directory-prefix=/srv/samba/newsletters \ http://www.quakershie-p.schools.nsw.edu.au/newsletters
I used the
--continue flag so that it wouldn't download the same pdfs over and over. Even taking this into consideration this method still felt like a brute force approach.
(I won't go into it here, but I then use
incron to look for changes to the
/srv/samba/newsletters, which calls another script and emails the file as an attachment)
I like how flexget remembers what it has seen in a database and not download that file again. I thought this would solve the problem very eloquently and as I couldn't find much info about getting files from a URL automatically I thought I'd share my config here so others looking to do the same could benefit.
presets: global: free_space: path: /srv/samba/newsletters space: 1 #make sure there's Xgb free before downloading more domain_delay: www.quakershie-p.schools.nsw.edu.au: 10 seconds email: active: True from: davidmarsh to: - email@example.com feeds: newsletter: interval: 6 hours html: url: http://www.quakershie-p.schools.nsw.edu.au/newsletters/ title_from: link regexp: accept: - quakers_whisper* rest: reject download: /srv/samba/newsletters
- In the global section
- Check there's 1gb free on
/srv/samba/newsletters(which is on the same disk as /)
- Wait 10 seconds between checking
www.quakershie-p.schools.nsw.edu.au(even though there`s only one check, I wanted this here if I add more later)
- email me if it downloads something
- Check there's 1gb free on
- In the feeds section:
- wait 6 hours between checks (if called sooner it will not run the check)
- use the url for checks
- name the downloaded files from their link title
- accept any file starting with "quakers_whisper" (the name of the newsletter)
- reject any other links it finds that don't match the above
- download the matching links to the
I call it from cron with this command:
/usr/local/bin/flexget --cron -c /home/davidmarsh/.flexget/newsletter.yml
It doesn't really matter how often it runs as it will only actually hit the website every 6 hours
due to the
interval: 6 hours option in the
(like before, I'm still using
incron to call scripts to email the files)
Now we get an email with the newsletter attached within 6 hours of a new newsletter appearing on the schools website.
Of course I could make it more frequent, but 6 hours seemed good enough. ↩