Not signed in (Sign In)
nl88 Member
bean Julian Delphiki Member
chocolate Vanilla Chocolate Administrator
sannie Member
Benjiiiiiiii Member
pascool Member
Sterke-Jerke Member
Woose Member
Frozen Member
reeng Member
paulanna Member
alioushk Member
kris2099 Member
    • CommentAuthornl88
    • CommentTimeDec 27th 2008 edited
     

    The BBC have managed to improve the availability of subtitles in the iPlayer's streamed videos.
    http://www.bbc.co.uk/blogs/bbcinternet/2008/12/iplayer_subtitles_increase_our.html
    Indeed, before this week I hardly saw any subtitled videos but now every one I've tried has working subtitles almost straightaway. I thought it would be rather handy if I could somehow download these to keep.
    So I did a bit of research and it turns out they use xml files to store the subtitles so the flash videos can load them on the fly. Using the programme meta data (http://beebhack.wikia.com/wiki/IPlayer_TV#Subtitles), the xml file is easily accessible. Inside the xml file each line is listed in this format:

    <p begin="00:15:16.52" id="p245" end="00:15:19.60">Oh, er...</p>

    All the elements are there - the id is subtitle line, we have the start and end time (although they need a comma and extra digit in the last set of numbers) and the line itself.
    I thought it would be possible to download these xml files and quickly convert these into correct synchronised srt subtitles. Unfortunately my meagre knowledge of programming isn't up to it.
    Does anyone have the skills to take this further? If someone could develop either a desktop or online tool to do it, it might be possible to create accurate subtitles for BBC programmes within minutes rather than hours and days it sometimes takes.
    nl

    •  
      CommentAuthorbean
    • CommentTimeDec 27th 2008
     
    Posted By: nl88

    I thought it would be possible to download these xml files and quickly convert these into correct synchronised srt subtitles. Unfortunately my meagre knowledge of programming isn't up to it.
    Does anyone have the skills to take this further? If someone could develop either a desktop or online tool to do it, it might be possible to create accurate subtitles for BBC programmes within minutes rather than hours and days it sometimes takes.
    nl

    It should be, indeed.
    But first, to access BBC iPlayer, you need a UK ip. Which is not the case for many people here, otherwise they would watch the shows on TV :).

    About the hours/days, that's simply just not true.
    With the stuff available here, for BBC shows (no ads in them), it's just a matter of minutes to delay the start of the srt or ssa subtitle file (if you want coloured subs) by the right amount of seconds to get something exactly as it was broadcasted and/or the same as it could be on iPlayer.
    The thing is, it could be not very well synchronized, compared to high end sync like some in sw2/sw3 on swsub.com, but to watch the show, it's more than enough.

    But it could provide lots of stuff, instead of just a few, and a 7-day window(? or more days, don't know how long it stays on iPlayer) to get subs.

    Apparantly, the captions files are available even outside of UK, so I might just try to do something (at least to convert the xml file into a srt from a user given sub link (such as http://www.bbc.co.uk/iplayer/subtitles/b0008dc8rstreaming89808204.xml)). I'll see if I've got time at work to check this out this week.

    •  
      CommentAuthorbean
    • CommentTimeDec 28th 2008
     

    @nl88: Could you get me the url of a current subtitle file?
    For instance, Outnumbered 2x06?
    Because the file name might be easy to figure out, ie. b0008dc8rstreaming89808204.xml seems to be b0008dc8r + streaming + 89808204 + .xml
    The first part seems to be the id of the video in iplayer (http://www.bbc.co.uk/iplayer/episode/b00g7qzp/Outnumbered_Series_2_Episode_6/ ie. b00g7qzp for Outnumbered 2x06), maybe the other numbered part is easy to find out, even from a non-UK ip. So that all subtitles from iplayer could be easily converted automatically.

    Thanks.

    •  
      CommentAuthorbean
    • CommentTimeDec 28th 2008 edited
     

    OK, found it (thanks to beebhack and an UK proxy) : http://www.bbc.co.uk/iplayer/subtitles/b00g7qyv.xml , apparently they've changed their naming scheme, and it's the plain version_id now, so it's retrievable even without a UK proxy (if they keep that naming scheme because to find out the names for sure, it needs a UK @ip).
    Time to go to bed now.

    •  
      CommentAuthorbean
    • CommentTimeDec 29th 2008
     

    Not that much time available today, some quick and dirty processing, and with the already downloaded file, it looks like that :

    475
    00:28:24,840 --> 00:28:28,960
    I had bruises and a pattern of unexplained injuries.

    476
    00:28:29,000 --> 00:28:30,600
    Mrs Brockman?

    477
    00:28:38,000 --> 00:28:40,000
    Subtitles by Red Bee Media Ltd

    • CommentAuthornl88
    • CommentTimeDec 29th 2008
     

    That looks promising. How did you process it? By hand or do you have a script?

    •  
      CommentAuthorbean
    • CommentTimeDec 31st 2008
     

    I've made an alpha version available:
    http://tools.swsub.com/bbciplayerxml_to_srt.html

    I only tested it on the default values (pre-filled).
    Feel free to test and report back here.

    Obviously the following step will be to just give the PID and that the program do the rest, but I've got no time for that right now.

    •  
      CommentAuthorchocolate
    • CommentTimeDec 31st 2008
     

    Kewl!
    Thanks.

    If anybody is willing to upload some stuff here, especially for the shows that have incomplete transcripts from the live recording, they're welcome.
    Just let me know.

    •  
      CommentAuthorsannie
    • CommentTimeDec 31st 2008 edited
     

    (edited to exclude the trailing dot from the url)
    39 Steps seems to be http://www.bbc.co.uk/iplayer/subtitles/ng/b00gd1qq_101380926.xml
    Judging from the outputted srt, Bean's tool is not (yet) using xsl. It shouldn't be too hard to write the right code for this.

    •  
      CommentAuthorbean
    • CommentTimeJan 1st 2009
     

    Indeed, it was just a simple regex on the p stanza.
    This sub (The 39 steps) is nothing like the 206 of Outnumbered.
    I would certainly use some xml tools next time.

    For the time being, I tweaked the regex a little bit and it's still OK with the old one, and better with the new one, though it's not a sub with the new one, but at least there's all the dialogs :smile:

    •  
      CommentAuthorbean
    • CommentTimeJan 5th 2009
     

    I updated it at work, added a radio button to choose between "Simple" xml file or "Complicated".
    I tested with Simple for outnumbered 206 and Complicated for the 39 steps, and it produces correct srt AFAIK.

    I'll put it on the website tonight (once back home), this way the person(s) who tested on Top Gear will be able to do it again as "Complicated" and get a correct srt.
    It's still alpha, but it works for me, and it's better than nothing :smile:

    •  
      CommentAuthorbean
    • CommentTimeJan 6th 2009
     
    •  
      CommentAuthorbean
    • CommentTimeMar 8th 2009
     

    I've been able to work on that this WE, basically, all that is needed now is to provide the first ID "b00g7qzp" found in the url of the episode (http://www.bbc.co.uk/iplayer/episode/b00g7qzp/Outnumbered_Series_2_Episode_6/ for Outnumbered 2x06), for instance.
    The main issue, though, is to get a working UK proxy to get the second url with the link to the subtitles. Does anyone have one?
    Because I've seen quite a few use of the tool :), so I guess, maybe, somebody have that :).
    At the moment, I'm going through a proxy that works only 1 out 4 try, the 3 other trys, it just times out. If it's a web proxy, I should be able to use that as well. But it will be much easier with a real proxy.
    Obviously, it's not something up to the public eye, so you can just "whisper" it to me, it's the private message of here :). No need to create a new discussion, just whisper in this one.

    Thanks for help!

    •  
      CommentAuthorbean
    • CommentTimeMar 9th 2009 edited
     

    Well, I haven't got any response for a better proxy, so it will be, for the moment, with the current, shaky working, one.

    Page is: http://subtitles.toh.info/bbciplayerid_to_zip.html

    You just need to enter the ID of the episode and click the button.

    For instance, for http://www.bbc.co.uk/iplayer/episode/b00cgprk/Top_Gear_Series_11_Episode_3/, you just enter b00cgprk.
    The episode name will be figured out, and you will get a zip file with 2 srt (only one of those will be OK), it's an enhancement I still have to do.
    If you get (after a long while most of the time) a download of: bbciplayerid_to_zip.php, it's because the proxy timed out and it couldn't retrieve the proper streamID file with the captions link in it.
    If the episode ID/subtitles was already successfully created, the following ones don't use the proxy, so it's very fast.
    I already did the example above, so it should be pretty fast to get the subs for this one.

    •  
      CommentAuthorchocolate
    • CommentTimeMar 10th 2009
     

    Hi All,

    If it proves to be working correctly for most series, I will no longer record anything on BBC for the site.

  1.  

    :smile:

    •  
      CommentAuthorbean
    • CommentTimeMar 13th 2009
     

    It's just been updated.

    Now you should, also, get an unified srt, this one should be OK all the time if you get it. (At least, that's what I intend to create :)).
    But at the moment, the xml parsing for this one raises an exception if there's the "£" in the file, which happens quite a lot I guess, but unfortunately not in the 2 files I used to test during the day!
    So I'll have to debug this a bit more, but I'm heading in the right direction :).

    Please feedback on anything you may find.

    •  
      CommentAuthorbean
    • CommentTimeMar 16th 2009
     

    New update, now if there's no "captions" ie. subtitles link in the stream definition, it will display this.

    There's more error management coming, for instance, the next one is a message if there's nothing because the proxy timed out.

    •  
      CommentAuthorbean
    • CommentTimeMar 18th 2009 edited
     

    I found some time to work on this.

    Now I think I have all the errors managed though you will get an ugly message with the reason :)
    And it doesn't cache (anymore) if no captions are available. So you can try again later on (if you had nothing for some program, you can try again now, twice, as the first time, it will try to use the previous cache, display the error, and remove the cache, so the 2nd time it will go and get the data again).

    •  
      CommentAuthorbean
    • CommentTimeMar 23rd 2009 edited
     

    OK, a few things to make life easier for everybody.

    Those pages are recreated automatically (every hour at 05 minutes) from the BBC RSS feeds of what's available on BBC iPlayer.
    They contains the last date/time of the update for each specific episodes listed.
    It's sorted by dates (newest on top, oldest at the bottom).
    Clicking on the episodes names call the other tool with the appropriate info, and tries to make the subs out of what's available on the BBC website.

    BBC One links from RSS feed
    BBC Two links from RSS feed
    BBC Three links from RSS feed
    BBC Four links from RSS feed

    For episodes not listed in there, you just have to do the old fashion way, browse the BBC iplayer website, and if the episode was requested before, you should be able to get subs for it)

    • CommentAuthorpascool
    • CommentTimeMar 23rd 2009
     

    Thanks Bean, it works great!

  2.  

    Works!

    •  
      CommentAuthorbean
    • CommentTimeMar 29th 2009 edited
     

    As those pages seem to behave correctly, they're now on the homepage (http://http://subtitles.toh.info), just one address to remember now :)

    •  
      CommentAuthorbean
    • CommentTimeApr 3rd 2009 edited
     

    Feed pages are updated with a "status", at the moment only 2 statuses:
    - Cached, the xml files are cached locally, so subtitles generation should be pretty instantaneous.
    - Unknown, it can be many things:
    1. Nobody asked for the subs yet.
    2. Some people did, but it wasn't available for some reason.
    3. It's cached, but the page hasn't been regenerated yet (once an hour @ xx hours 5 min).

    • CommentAuthorWoose
    • CommentTimeApr 11th 2009
     

    Wow, Great!
    Thanx Bean XD

    •  
      CommentAuthorbean
    • CommentTimeApr 21st 2009
     

    A few more updates.
    - It's updated every 15 minutes now.
    - There's pages for cbbc and cbeebies.
    - Corrected a bug in the generation. It used the update date/time as a key, and it turns out that not good as an ID as the BBC yesterday just updated many of their entry with exactly the same second. Using the ID in entry now. So the listing are complete again.

    • CommentAuthornl88
    • CommentTimeApr 22nd 2009
     

    Excellent job, thanks. You've done far more than I ever imagined possible when I first made the post.

    Worth stickying this thread to keep it at the top (or putting some sort of notice up) so others, particularly new members know the score?

    I was also going to suggest adding the new BBC HD channel given the number of repeats on it. Unfortunately, none of the older repeats seem to be available and of course the newer programmes are all on the BBC 1/2/3/4 lists anyway so it doesn't seem worth the bother at the moment.

    •  
      CommentAuthorsannie
    • CommentTimeApr 22nd 2009
     

    Great job indeed. However I "miss" chocolate's updates. It was good to learn about new (crime/drama) series and one-offs here...

    • CommentAuthorFrozen
    • CommentTimeApr 22nd 2009
     

    Sorry guys, but I can't understand how to use iplayer for subtitles !

    I live in Jordan, can I do that ?

    I need subs for james may big ideas if possible

    Thanks all

    •  
      CommentAuthorbean
    • CommentTimeApr 23rd 2009
     
    Posted By: nl88

    Excellent job, thanks. You've done far more than I ever imagined possible when I first made the post.

    Worth stickying this thread to keep it at the top (or putting some sort of notice up) so others, particularly new members know the score?

    I was also going to suggest adding the new BBC HD channel given the number of repeats on it. Unfortunately, none of the older repeats seem to be available and of course the newer programmes are all on the BBC 1/2/3/4 lists anyway so it doesn't seem worth the bother at the moment.

    BBC HD didn't had a rss feed just yet.
    I remember checking again last week, and it wasn't there.

    Today it's available, so I've just added it.

    •  
      CommentAuthorbean
    • CommentTimeApr 23rd 2009 edited
     
    Posted By: Frozen

    Sorry guys, but I can't understand how to use iplayer for subtitles !

    I live in Jordan, can I do that ?

    I need subs for james may big ideas if possible

    Thanks all

    To get stuff from the current things available on BBC iPlayer, just go there : http://subtitles.toh.info and at the bottom, pick the channel where your program was aired recently.
    Then click on the link whose name matches the video you'd like.

    To get things not currently available, you need to get the BBC iPlayer ID for the episode. Then go to http://subtitles.toh.info/bbciplayerid_to_zip.html enter the ID and pray that it has been requested in the past, so the link to the xml caption file can be retrieved from the locally cached xml file, and that the program will be able to create the subtitles from it.

    Episodes ID are found from the program pages (http://www.bbc.co.uk/programmes/b00dvqll/episodes) for your James May Big Ideas, if you click on the first episode :
    http://www.bbc.co.uk/programmes/b00dtl3f the ID is the last part of the link ie. b00dtl3f

    •  
      CommentAuthorchocolate
    • CommentTimeApr 23rd 2009
     
    Posted By: nl88

    Excellent job, thanks. You've done far more than I ever imagined possible when I first made the post.

    Worth stickying this thread to keep it at the top (or putting some sort of notice up) so others, particularly new members know the score?

    Yep, I've made it sticky.

    • CommentAuthorFrozen
    • CommentTimeApr 23rd 2009
     

    Thank you bean.

    The result is negative :

    ERROR: Episode is either not available anymore, or you've entered a wrong ID: b00dtl3f
    ERROR: No stream ID for James.May's.Big.Ideas-Come.Fly.with.Me

    but now I know how to do it.

    • CommentAuthorreeng
    • CommentTimeApr 26th 2009
     

    thanks a lot:smile:

    •  
      CommentAuthorchocolate
    • CommentTimeMay 21st 2009
     

    I've moved the discussion to the Public area, it will be available for all to see.

    •  
      CommentAuthorbean
    • CommentTimeJun 4th 2009
     

    Due to some issues with all the sub-way stuff.

    Site is now accessible from http://subtitles.toh.info/

    •  
      CommentAuthorbean
    • CommentTimeJun 4th 2009
     

    The unified sub had small issues, and "Complicated" was most of the time the way to go.
    boomer2 told me about it, but it completely slipped my mind. I've been reminded today, and I've made the necessary correction.

    Now, on the 2 files tested that had issues previously (Ashes to Ashes s02E07, and Robin Hood S03E09), it's the same as "Complicated".

    So, if no more complaint arises in the next few weeks, 'Simple' and 'Complicated' will be obsoleted, and only 'Unified' will remain, within the 'Unified' part in the name, as there will be only one file in the zip.

    So it you notice issues, please notify me.

    •  
      CommentAuthorpaulanna
    • CommentTimeJun 12th 2009
     

    Really a great job.
    Everyday I check the new subtitles but how can I see previous days? I mean on BBC1 now I can see sub from june 4th
    to today. If I want to see June 3rd?
    Thanks

    •  
      CommentAuthorbean
    • CommentTimeJun 12th 2009
     
    Posted By: paulanna

    Really a great job.
    Everyday I check the new subtitles but how can I see previous days? I mean on BBC1 now I can see sub from june 4th
    to today. If I want to see June 3rd?
    Thanks

    You can't.
    It's from the BBC rss feed, so it's what's currently availabled on their website (it's on purpose, this way it follows their guidelines about availabality).

    If you want stuff from before, and that it has been on since march/april (I don't remember), there's a good chance the necessary location are cached, but you'll need to get the ID of the episode on your own, and to go to http://subtitles.toh.info/bbciplayerid_to_zip.html

    • CommentAuthoralioushk
    • CommentTimeJun 13th 2009
     

    Works great.Thanks.
    I just got DW confidential from april (desert storm), that wasn't available when I checked in when it was aired (not cached). Then I forgot, :smile:and now it's there.
    One question, which of the files do I use? The unified one?
    Thanks

    •  
      CommentAuthorbean
    • CommentTimeJun 13th 2009
     
    Posted By: alioushk

    Works great.Thanks.
    I just got DW confidential from april (desert storm), that wasn't available when I checked in when it was aired (not cached). Then I forgot,:smile:and now it's there.
    One question, which of the files do I use? The unified one?
    Thanks

    Unified now, yes. Normally, it should be OK every time.
    I recently modified it as it was missing text sometimes because of tags inside of tags.

    • CommentAuthoralioushk
    • CommentTimeJun 15th 2009
     

    Ok great, thanks fot your answer.

    •  
      CommentAuthorbean
    • CommentTimeAug 2nd 2009
     

    New site layout and design has been installed.

    Direct link to the BBC iPlayer stuff is http://subtitles.toh.info/bbc-iplayer.html
    As of today, Simple and Complicated subs have been removed from the zip file, only Unified remains.
    If you were still accessing the site with tools.swsub.com, you will see that it, automatically, redirects you to the new domain name. As swsub.com does no longer exists, the domain name will not be renewed therefore it won't work when it happens.
    You have access to all the tools from the homepage http://subtitles.toh.info/
    The new design allows direct access via menu to all the bbc's page (and tools too) once you've already opened one (basic web stuff I wasn't doing till now :))

    •  
      CommentAuthorbean
    • CommentTimeAug 2nd 2009
     
    Posted By: bean

    I've made an alpha version available:
    http://tools.swsub.com/bbciplayerxml_to_srt.html

    I only tested it on the default values (pre-filled).
    Feel free to test and report back here.

    Obviously the following step will be to just give the PID and that the program do the rest, but I've got no time for that right now.

    Posted By: bean

    It's live now:)
    http://tools.swsub.com/bbciplayerxml_to_srt.html

    This one doesn't exist anymore as it wasn't useful anymore.

    All other links with tools.swsub.com have been updated with http://subtitles.toh.info (I've edited all my post normally, except for those 2 as page does not exist on the new design)

    •  
      CommentAuthorbean
    • CommentTimeAug 4th 2009 edited
     

    I've had a report that some program ID (b00lz3gd - Rivers with Griff Rhys Jones: 2. North) didn't produce a working file.
    It's because the sign "1/2" used in the file is not following utf-8 encoding as the file specifies.

    I will surely do a quick "fix" of transcoding to utf-8 prior to parse, just to be sure, in a day or 2.

    I might (later on) will try to use lxml instead of minidom, as it's more forgiving to such thing (normally) so that, even if the xml file is not completely valid, the subs will be generated.

    •  
      CommentAuthorbean
    • CommentTimeAug 7th 2009
     

    Well, it's not so easy :).

    I've just put a new error management code, so it will let you know when you can't get the subs because there was some parsing issue with the xml file.

    I've tried lxml, but with the "recover" parameter that was supposed to be more forgiving, it just stops processing where the error occur (so I'm able to get the start of the file, but there's not much point to it)
    Apparantly it's some intern for the summer that create files using windows-1252 encoding while tagging them as utf-8, so no proper xml parsing librairies will process them correctly.

    I'll do some workaround later (convert the xml file to utf-8 from windows-1252) if the first parsing is not OK, and try to parse again.
    Basically, it doesn't work because the BBC says the tin contains mac&cheese while they give me a tin with something else in. But the analogy stops here, as there's no way for me to be sure what's in the tin even after opening it, that's the way character encodings are... A pain! :sick:

    •  
      CommentAuthorbean
    • CommentTimeAug 10th 2009
     
    Posted By: bean

    I'll do some workaround later (convert the xml file to utf-8 from windows-1252) if the first parsing is not OK, and try to parse again.

    Done.

    It works on the files that were not working previously.
    Hopefully they'll with the same wrong encoding when they do it badly.

    • CommentAuthorkris2099
    • CommentTimeNov 4th 2009
     

    Hi. Whereabouts can I download English subtitles for "The Sarah Jane Adventures" Series 3 Episode 6, etc? :smile:

    •  
      CommentAuthorbean
    • CommentTimeNov 5th 2009
     
    Posted By: kris2099

    Hi. Whereabouts can I download English subtitles for "The Sarah Jane Adventures" Series 3 Episode 6, etc?:smile:

    It's aired on CBBC, so check the CBBC page.
    It's there, not with the episode number number though, you have to find the episode name.

    • CommentAuthorkris2099
    • CommentTimeNov 5th 2009
     

    Hi bean. I could not find it. I went to do a search and nothing came up. Could you if you have the time give me the direct link please so that I can download the subtitles? Thanks. :smile: