Thursday, January 19, 2012

Email reminder for new TV Series episodes using Google Apps script

Since a long time, I wanted to find a service that mails you to remind about the new episode to be aired.  As I couldn't find one, I sat and wrote one for myself using Google Apps script.

Why I chose Google Apps script?
  • I could use a RSS feed but it also contains info about the episodes I am not intereseted in. To use the RSS and filter the episodes I wanted and get them mailed to me is also a choice but for that I would need a computer on which the script will be running. So, it won't work if my desktop/laptop is switched off.
  • I don't own a VPS so I cannot run the script on that too. This script runs on google's servers.
  • I wanted to get started with javascript. (So, my code may look crappy to an experienced javascript programmer)
Basically, I use macros in Google Docs spreadsheet to do the stuff. 
Using the macros, I read the name of the series from the cells (column 1) & then use tvrage.com to scrap the data from. Other cells are also updated accordingly. 

Video of script in action (Sorry for thu blurry video, youtube is the culprit):- 


Requirements:- 

Name of the series should be according to the link of tvrage. For ex, for Breaking Bad, it should be Breaking_Bad.



A typical spreadsheet looks like this: 

Setting Up:- 
  • Goto this link and do File->Make a copy.
  • Before running the script first time, make sure that column D is empty because values from that column are used to see if the email should be sent or not. 
  • First time you run it, you will need to authorize the script. It's a one time only thing. 

  • Goto GDocs spreadsheet menu and select Tools->Script Editor. Then, Triggers->Current Script Triggers.
  • Add a trigger as shown in the image. It just means that run the main function every 6 hours. (main function actually checks from the site and send the email). 

  • Now, goto Run->main to execute the main function. As it's the first time, Column D should not anything Row 2 onwards. I have configured the script to send an email around 24 hours prior to starting of episode. You can play with the time_limit variable & Triggers to suit your needs.
  • After the main function is executed, Column D should now be filled with EMAIL_SENT or EMAIL_NOT_SENT. All the series for which the time remeaning is less than time_limit, an email will also be sent to your email id. 
  • If a script is run again, email will not be sent again because now the cell has EMAIL_SENT written.

If you want to have a quick look at the script:-

var emailAddress = Session.getUser().getEmail();
var time_limit = 24 //In hours
var ep_no = Array() //I know it's dirty but be it.
var time_rem = Array()
var sheet = SpreadsheetApp.getActiveSheet();

function onOpen() {
  var ss = SpreadsheetApp.getActiveSpreadsheet();
  var menuEntries = [ {name: "Calculate", functionName: "main"}];
  ss.addMenu("Manual", menuEntries);
}


function getEpisodeList(){
  var sheet = SpreadsheetApp.getActiveSheet();  
  var all_episodes = new Array();
  for (var i=2; i<=sheet.getLastRow(); i++) //This loop gets all the labels
  {
    var cell = sheet.getRange("A"+i);
    all_episodes = all_episodes.concat(cell.getValue());
  }
  return all_episodes;
}

function main() {
  var all_episodes = getEpisodeList();
  sheet.getRange("B2:C"+sheet.getLastRow()).setValue("Calculating...");

  SpreadsheetApp.flush();
  for (var i=2; i<=all_episodes.length+1; i++){
     time_rem = getTimeRemaining(all_episodes[i-2]);
     sheet.getRange(i,2).setValue(time_rem);
     sheet.getRange(i,3).setValue(ep_no);

    if (isSendNoti(time_rem,i) == true){
      if (sheet.getRange(i,4).getValue() != "EMAIL_SENT"){
         sendEmail(all_episodes[i-2], ep_no, time_rem); 
         sheet.getRange(i,4).setValue("EMAIL_SENT");
      }      
    }
    else sheet.getRange(i,4).setValue("EMAIL_NOT_SENT");
    SpreadsheetApp.flush();
  }
  
}

function getTimeRemaining(series_name){
  var response = UrlFetchApp.fetch("http://www.tvrage.com/"+series_name);
  var content = response.getContentText();
  var pos = content.search(series_name+"/episodes");
  var link_end_pos = pos+series_name.length+20;
  link = content.substring(pos,link_end_pos);
  ep_no = content.substring(link_end_pos+2,link_end_pos+6);
  var response = UrlFetchApp.fetch("http://www.tvrage.com/"+link);
  var content = response.getContentText();
  var start = content.search("Voting Closed") + 67;
  if (start == 66) return "Season recently finished";
  var end = content.indexOf("<",start);
  return content.substring(start,end);
}

function isSendNoti(time_left,i){

  var days = time_left.match(/^(\d+) Days/i);    
  var hours = time_left.match(/^(\d+) Hours/i);
  var mins = time_left.match(/(\d+) Min.$/i);
  
  if (days != null) {
  //Means reset the email thing
  sheet.getRange(i,4).setValue("EMAIL_NOT_SENT");
  return false; //No notification if no_of_days included in time remaining
  }
  if ((hours == null) && (mins == null)) return false;

  if (parseInt(hours[1]) <= time_limit) return true; 
  else return false;
  }


function sendEmail(series_name, ep_no){
  Logger.log(series_name, ep_no);
  MailApp.sendEmail(emailAddress, "EPISODE ALERT: " +series_name+ " : "+ep_no, "Episode to be aired in : "+time_rem);
}

Sunday, November 6, 2011

Google Dictionary API example in python - gets primaries and webDefinitions

As pointed out earlier by google, it has an official dictionary API.

The response that comes from the server is json string. I wrote two scripts define.py (that gets the meaning and converts it to dictionary) & pretty_print.py(print the meanings in a pretty way).

These scripts are a part of a GUI software which is under deveopment. You can also get the sources from  https://github.com/shadyabhi/godict

define.py

#!/usr/bin/python2

import json
import urllib
import re
import binascii

def asciirepl(match):
  s = match.group()  
  return '\\u00' + match.group()[2:]

def get_meaning(query):
    p = urllib.urlopen('http://www.google.com/dictionary/json?callback=a&q='+query+'&sl=en&tl=en&restrict=pr,de&client=te')
    page = p.read()[2:-10] #As its returned as a function call
    
    #To replace hex characters with ascii characters
    p = re.compile(r'\\x(\w{2})')
    ascii_string = p.sub(asciirepl, page)

    #Now decoding cleaned json response
    data = json.loads(ascii_string)
    
    #Assumes that we always recieve a webDefinitions. ??Yet to check??
    if "webDefinitions" not in data:
        return None

    no_of_meanings = len(data['webDefinitions'][0]['entries']) 
    all_meanings = dict()
    all_meanings['primaries'] = dict()
    all_meanings['webDefinitions'] = list()

    if 'primaries') in data:
        #Creating list() for each types: adj, verb, noun
        for bunch in data['primaries']:
            #This list contains meanings and examples
            all_meanings['primaries'][bunch['terms'][0]['labels'][0]['text']] = list()
            means = all_meanings['primaries'][bunch['terms'][0]['labels'][0]['text']]
            
            for i in range(len(bunch['entries'])):
                #Choosen meaning, others can be related
                if bunch['entries'][i]['type'] != "meaning": continue
                meaning = bunch['entries'][i]['terms'][0]['text']
                try:    
                    example = list()
                    #Examples start with ZERO index
                    for i_ex in range(0, len(bunch['entries'][i]['entries'])):
                        example.append(bunch['entries'][i]['entries'][i_ex]['terms'][0]['text'])
                        
                except:
                    example = None
                means.append([meaning, example])
                
    #Web definitions
    for meaning in data['webDefinitions'][0]['entries']:
        all_meanings['webDefinitions'].append(meaning['terms'][0]['text'])
    
    return all_meanings

The test script for the above module.

pretty_print.py

#!/usr/bin/python2

import define
import sys
import httplib
import xml.dom.minidom


means = define.get_meaning(sys.argv[1])

if means is not None:
    #Short Summary
    for sec in means['primaries'].keys():
        meanings = means['primaries'][sec]
        print sec, "\n---------------"
        for m in meanings:
            print "\n\t", m[0]
            try: 
                for e in m[1]: print "\t\t--",e
            except: pass
    #Web Definitions
    print "\nWeb Definitions","\n---------------"
    for defs in means['webDefinitions']:
        print "\t",defs
else:
    print "Word not found. These are he suggestions"
    data = """ 
    <spellrequest textalreadyclipped="0" ignoredups="0" ignoredigits="1" ignoreallcaps="1">
    <text> %s </text>
    </spellrequest>
    """

    word_to_spell = sys.argv[1]
    con = httplib.HTTPSConnection("www.google.com")
    con.request("POST", "/tbproxy/spell?lang=en", data % word_to_spell)
    response = con.getresponse()

    dom = xml.dom.minidom.parseString(response.read())
    dom_data = dom.getElementsByTagName('spellresult')[0]

    for child_node in dom_data.childNodes:
            result = child_node.firstChild.data.split()
            print result


When i execute pretty_print,


In a few days, I plan to make a GUi to this that also reminds of the words searched & hence help improving vocabulary.

Friday, October 14, 2011

Get torrent info like seeds/peers/completed from tracker (UDP) aka scraping torrent

The previous script I made adds trackers to a .torrent file. After I made that I thought that it would be if I could remove all the dead torrents by checking how many seeds/peers are available according to a particular tracker. So, the following script finds seeds/peers information if we have tracker url & torrent hash. It also finds the torrent name from the torrent hash using torrentz.me 

You can read more about the protocol from here:
http://bittorrent.org/beps/bep_0015.html#udp-tracker-protocol


Code:
"""
Author: shadyabhi abhijeet.1989@gmail.com
For protocol description(not mine), check http://bittorrent.org/beps/bep_0015.html#udp-tracker-protocol
"""

import socket
import struct   
from random import randrange #to generate random transaction_id
from urllib import urlopen
import re

tracker = "tracker.istole.it"
port = 80
torrent_hash = ["3ebde329f208b9e2e81c8e0f80d14384d5f416e4", "3ac9002ce1a7d5dde2c02b7cf9dc9e0f15eda7cb", "00e058f6629a19b42458af4dea5f6b9e2ebe8e25"]
torrent_details = {}

def get_torrent_name(infohash):
    url = "http://torrentz.me/" + infohash
    p = urlopen(url)
    page = p.read()
    c = re.compile(r'<h2><span>(.*?)</span>')
    return c.search(page).group(1)

def pretty_show(infohash):
    print "Torrent Hash: ", infohash
    try:
        print "Torrent Name (from torrentz): ", get_torrent_name(infohash)
    except:
        print "Coundn'f find torrent name"
    print "Seeds, Leechers, Completed", torrent_details[infohash] 
    print

#Create the socket
clisocket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
clisocket.connect((tracker, port))

#Protocol says to keep it that way
connection_id=0x41727101980
#We should get the same in response
transaction_id = randrange(1,65535)

packet=struct.pack(">QLL",connection_id, 0,transaction_id)
clisocket.send(packet)
res = clisocket.recv(16)
action,transaction_id,connection_id=struct.unpack(">LLQ",res)

packet_hashes = ""
for infohash in torrent_hash:
    packet_hashes = packet_hashes + infohash.decode('hex')

packet = struct.pack(">QLL", connection_id, 2, transaction_id) + packet_hashes

clisocket.send(packet)
res = clisocket.recv(8 + 12*len(torrent_hash))

index = 8
for infohash in torrent_hash:
    seeders, completed, leechers = struct.unpack(">LLL", res[index:index+12])
    torrent_details[infohash] = (seeders, leechers, completed)
    pretty_show(infohash)
    index = index + 12 

Usage (The above script has 3 hashes for demonstration, you can change them):
 shadyabhi@archlinux ~ $ python2 check_trackers.py 
Torrent Hash:  3ebde329f208b9e2e81c8e0f80d14384d5f416e4
Torrent Name (from torrentz):  House.S08E02.HDTV.XviD-LOL.avi
Seeds, Leechers, Completed (10297, 1051, 172274)

Torrent Hash:  3ac9002ce1a7d5dde2c02b7cf9dc9e0f15eda7cb
Torrent Name (from torrentz):  Dexter.S06E02.Once.Upon.a.Time.HDTV.XviD-FQM.avi
Seeds, Leechers, Completed (10962, 1328, 248032)

Torrent Hash:  00e058f6629a19b42458af4dea5f6b9e2ebe8e25
Torrent Name (from torrentz):  Breaking.Bad.S04E13.Face.Off.HDTV.XviD-FQM.avi
Seeds, Leechers, Completed (7751, 495, 183809)

shadyabhi@archlinux ~ $ 


Add trackers to .torrent files in linux

Windows users have BEncode Editor to edit their torrent files without changing info hash of the torrent file. But, linux hasn't yet had a torrent editor.

I use rtorrent for downloading torrents and it doesn't have the feature of adding trackers. To me, that is very necessary because in my institute, many trackers are blocked and I have to reply on UDP trackers to get the seeds/peers stats.

There is a python library named bencode that helps you editing the torrent file but there is no usage given on the internet.

So, here is an example script to add torrents to a .torrent file using bencode.
  • This script assumes that you have a file that contains the list of trackers separated by new-lines. (first argument to the script)
  • This script creates a new torrent file in the current directory with the added trackers. 
  • The script takes the input torrent file as the second argument.
WARNING: Use it only on public trackers, otherwise it will get you banned.
#!/usr/bin/python2

"""
Author: shadyabhi (abhijeet[dot]1989[at]gmail[dot]com)

"""

import bencode
import sys
from os import path

if len(sys.argv) < 2:
    print "First argument: File containing trackers"
    print "Second argument: .torrent file"
    sys.exit(0)

tracker_file = open(sys.argv[1])
torrent_file = open(sys.argv[2])

trackers_list = []
#Creating the list of trackers.
trackers_list[:] = (value for value in tracker_file.read().split("\n") if value != '')
decoded_data = bencode.bdecode(torrent_file.read())

print "Trackers Before: "
for tracker in decoded_data['announce-list']:
    print tracker

for tracker in trackers_list:
    if tracker not in decoded_data['announce-list']:
        decoded_data['announce-list'].append([tracker])

print "Now, the trackers in the torrent are: "
for tracker in decoded_data['announce-list']:
    print tracker

#Writing the torrent file
f = open("new_"+path.basename(sys.argv[2]), "w")
f.write(bencode.bencode(decoded_data))
f.close() 

Sample Usage:

shadyabhi@archlinux ~/github/bencode_py $ python2 add_tracker.py /media/abhijeet/Misc/trackers ~/house.torrent 
Trackers Before: 
['http://10.rarbg.com:80/announce']
['http://9.rarbg.com:2710/announce']
['udp://11.rarbg.com:80/announce']
Now, the trackers in the torrent are: 
['http://10.rarbg.com:80/announce']
['http://9.rarbg.com:2710/announce']
['udp://11.rarbg.com:80/announce']
['udp://tracker.openbittorrent.com:80']
['udp://tracker.publicbt.com:80']
['udp://tracker.ccc.de:80']
['udp://tracker.istole.it:80']
['udp://tracker.1337x.org:80/announce']
['udp://tracker.torrentbox.com:2710']
['udp://tracker.openbittorrent.com:80']
['udp://tracker.torrentbox.com:2710']
['udp://tracker.openbittorrent.com:80/announce']
['udp://tracker.publicbt.com:80/announce']
shadyabhi@archlinux ~/github/bencode_py $ ls
total 36K
-rwxr-xr-x 1 shadyabhi users  960 Oct 14 02:20 add_tracker.py
-rw-r--r-- 1 shadyabhi users 3.2K Oct 13 20:12 bencode.py
-rw-r--r-- 1 shadyabhi users 4.4K Oct 13 20:12 bencode.pyc
-rw-r--r-- 1 shadyabhi users  786 Oct 14 02:05 check_trackers.py
-rw-r--r-- 1 shadyabhi users  15K Oct 14 02:51 new_house.torrent
shadyabhi@archlinux ~/github/bencode_py $

 You can also get the script at https://github.com/shadyabhi/Bencode-Torrent-Editor.

Sunday, September 11, 2011

pLibraryOrganizer - Manage your music library like iTunes does

I made a script to manage my music. By managing, I mean it will sort the files according to the rules you give as a command-line argument.

Project page:- https://github.com/shadyabhi/pLibraryOrganizer

shadyabhi@archlinux ~/github/pLibraryOrganizer $ ./pLibraryOrganizer.py -h
usage: pLibraryOrganizer.py [-h] -f FORMAT -d DIRECTORY [-D FINALDIRECTORY]
                            [-v] [-et EDITTITLE EDITTITLE]
                            [-ea EDITARTIST EDITARTIST]
                            [-eA EDITALBUM EDITALBUM] [-dr]

Organizes your library

optional arguments:
  -h, --help            show this help message and exit
  -f FORMAT, --format FORMAT
                        Enter format for organizing the music
  -d DIRECTORY, --directory DIRECTORY
                        Enter the directory root.
  -D FINALDIRECTORY, --finaldirectory FINALDIRECTORY
                        Directory to finally move the mp3 files too
  -v, --verbose         For more verbose output
  -et EDITTITLE EDITTITLE, --edittitle EDITTITLE EDITTITLE
                        Replace in Title
  -ea EDITARTIST EDITARTIST, --editartist EDITARTIST EDITARTIST
                        Replace in Artist
  -eA EDITALBUM EDITALBUM, --editalbum EDITALBUM EDITALBUM
                        Replace in Album
  -dr, --dryrun         Don't move the files. Just show what you are doing
shadyabhi@archlinux ~/github/pLibraryOrganizer $ 

Usage:-
  • If you want delete all the folders and keep all the mp3 files in root with name %artist% - %title%, execute:
python2 pLibraryOrganizer.py -f "%artist% - %title%" -d "/home/shadyabhi/music/"
  • Suppose, you want your music sorted such that all the mp3s have name as %artist% - %title% and each artist should have a different folder. Also, in the process you want to remove or replace " - www.Songs.PK" in the titles. [All files downloaded from songs.Pk have that annoying titles]
python2 pLibraryOrganizer.py -f "%artist%/%artist% - %title%" -d "/home/shadyabhi/music/" -et " - www.Songs.PK" ""
  • If you want your music to move to a new directory, lets say /tmp/music: 
python2 pLibraryOrganizer.py -f "%artist%/%artist% - %title%" -d "/home/shadyabhi/music/" -D "/tmp/music/"

Known issues: If your music directory has files other than *mp3 file, this script will fail to delete that directory. You can remove all files from a directory other than mp3 by using this command (removes all *.jpg files): [ Caution: Run this command from the root of your music directory only. It deletes file recursively starting from the current folder ]
find -type f -name ".jpg" -exec rm -v {} \;
Hope that script will be of use to you. :)