I’ve been enjoying Jan Ozer’s new book, Video Compression for Flash, Apple Devices and HTML5. It’s the comprehensive how-to tutorial on video encoding you’d expect from Jan, as well as a lot of insight about best practices and all the things you should be paying attention to when you’re publishing video online.
Chock full of examples, test results, tables of useful data, and technical information you can put to use right away – this is a great resource for anyone: novice to expert.
HTML5’s value proposition today, and for the foreseeable future, is “encode in more formats that offer no advantage over H.264, and play on fewer computers, and distribute your on-demand content to vastly fewer viewers with lower quality of service, less features and a reduced ability to monetize than you can with Flash or Silverlight. Oh, and forget live.”
Don’t get me wrong – he still covers everything you need to know (in great detail!) about targeting HTML5 players. But he explains in practical terms what it really means to do so, and when and why you ought to.
Here’s a nice one-stop-shop for comparing HTML5-capable video players: VideoSWS, (where SWS apparently means, “See What Sucks”).
The chart provides a rough view of player capabilities. but clicking the names of each player brings you to a working example of the player. Not extensive analysis of each, but great for a quick survey of what’s out there for embeddable players.
hype around HTML5 video is finally getting pierced with a dose of
reality. That reality, as far as I can see, is that HTML5 is
nascent idea of something that will undoubtedly be useful some day.
But at the moment, for many of us publishing video to the
'Net, it's more of a problem than a solution.
Some great thoughts on the issue have come from Jan Ozer at the
Learning Center, and technical analyst extraordinare at streamingmedia.com.
In his article, The
Five Key Myths About HTML5, Jan points out that in practice,
supporting HTML5 means encoding multiple formats of everything, an
inability to do live streaming or on-demand stremaing using a true
streaming protocol, working around numerous browser incompatibilities,
and no adaptive/dynamic streaming. He summarizes:
No major media sitepresents HTML5 as their primary viewing optionHTML5-compatible browser penetration is low, and will continue to be well into the future
Though HTML5 is great for low volume video playback, it lacks many critical features currently available in plug-in based technologies
Full HTML5 support will require 2 or 3 times the encoding chores of Flash support
The video tag is still in
its infancy and misses certain core functionalities. As developers
demand these features, browser vendors are tempted to implement
incompatible solutions instead of agreeing upon standards.
These hasty developments, already underway, are setting HTML video up
for the same chaos as HTML styling in the pre-CSS era.
We remember those days...multiple coding and testing for every possible
brower combination, and any web application with an interesting,
innovative, or especially responsive UI (using CSS and
DOM-manipulation) was fragile and expensive to maintain.
Eventually, standards got better and better-supported, and
libraries like ExtJS
provided abstraction that made authoring powerful and reliable
applications easier. Things in a web app that used to be done
with a Flash or Java applet UI are now routinely done using these
So there's hope for HTML5 video, but it's not there yet and it won't be
there for years. The hype around HTML5 isn't matched by the reality -
which is that it's a pain that complicates our work in streaming; and that
Flash or Silverlight are going to be better choices for most purposes
for some time to come.
In the direction of standard libraries to make life easier for the
streaming publisher, Longtail
Video has just released a Beta of their JW Player 5.3, which
seamlessly integrates Flash and HTML5 support. It's got a
you set the HTML5 failover in either or two options:
Use HTML5 wherever it's supported, otherwise failover to
Use Flash unless it's not supported, then failover to HTLM5.
I'll be testing the 5.3 Beta player over the next few days and will
post my impressions.
Our adventures in live mobile streaming continue. If anyone should happen to read this post on Thursday May 27, you can see the results of this effort at http://harvard.edu/commencement2010/
So what are the lessons learned so far? Here’s a preliminary list in no particular order:
Setting up the server side of things is the easiest part. Configuring for FMS delivery from Limelight, and for Wowza on Amazon EC2 was a breeze. Multiple bitrates, the RSS playlist for JW Player, the SMIL playlist for Wowza….once you figure out the moving parts, it works almost just like it’s supposed to.
Adaptive streaming from Limelight and other CDNs that use the ‘fcsubscribe’ method for load-balancing can cause a problem when switching to a streams that comes from a new edge node. More on this later…
Mobile devices: Make sure you’re encoding H.264 with Baseline profile level as low as you can go. iPhones and iPads turned out to be the easiest to support fully. Blackberries and Droids work…or they don’t. It seems to depend on the model phone, and on the network you’re on. My personal Blackberry gets the RTSP stream just fine. Others around the office with different Blackberries can’t play the stream. Same with Droids – some people are able to play it, some not. I haven’t discovered why just yet. Codec issues are a likely possibility, but it’ll take some digging to find out. I have not found any useful documentation on the differences between Blackberry models, in terms of live video streaming support.
Encoders – this has been the headache of all headaches and took many many man-hours to get right.
Encoding three bitrates (100k, 500k, 1000k) to two different CDNs (Limelight, Wowza/EC2) takes a lot of horsepower.
One brand new 8-core Cisco machine with a brand new Osprey 240 proved unsuitable for capturing video at all.
A 2-core IBM/Windows/Osprey system running FMLE gave us better encoding performance than an 8-core Mac Pro/AJA system running Wirecast.
All of the above systems had issues with audio/video sync, either being off from the start, or drifting as the webcast went on. Only on the Mac/AJA system were we able to resolve these in time for a successful webcast.
Ordinary desktop PC running consumer USB video capture devices are easiest to set up and are the machines most likely to work right off the bat. No audio/video sync issues occurred with these, even though we were capturing video on one of a couple $50 USB devices and audio using the built-in audio support on the PC. The more expensive and industrial-grade the hardware, the more trouble it gave us.
Our final encoding configuration included an 8-core MacPro/Wirecast for the 1Mbps and 500kbps streams, a single-core desktop PC running FMLE for the 100k streams, and a dual-core desktop PC with FMLE for capturing a 1.2Mbps H.264 archive file.
Some of our partner schools are using our infrastructure for mobile streaming. They’ve got Digital Rapids TouchStream appliances, and have had no encoding issues doing multiple bitrates from HD down to 3G/mobile. I’m quickly becoming a big fan of purpose-built appliances for encoding.
That’s about it for now…I’ll follow up on some of these as we do some analysis and learn more.
For an upcoming university commencement, I’ve been looking into doing live streaming in H.264/Flash, as well as http streaming to Apple iPhone/iPad/iPod devices (herein referred to as iP* devices) and rtsp streaming to Droids and Blackberrys. It’s been an experience piecing it all together, and I’ll be writing about some of the surprises and pitfalls as we figure out how to best do it.
In a nutshell, we’re using Limelight Networks’ Flash Media Server 3.5 for delivery to browsers on PCs and Macs. For mobile streaming, I provisioned and started up an instance of Wowza on Amazon EC2. One stream in (or several, for multiple bitrate support) via RTMP, and Wowza delivers in all the right formats – whether it’s chunked HTTP (Apple devices), RTSP (Droids and Blackberrys), or RTMP (Flash). Setting that up involved an awful lot of moving parts, but half a day later, it was up and running and has been flawless in testing. We’ve been streaming multiple bitrates (100kbps, 500kbps, 900kbps) from Adobe Flash Media Live Encoder on a PC, as well as from Telestream Wirecast on a Mac.
We’ve developed a page that uses the JW Player (Flash) as the default, and falls back to HTML5 if it’s an iP* device, or provides an rtsp:// link if it’s a Droid or a Blackberry. Yes…Flash is the default for all browsers that will allow it, as it provides a uniform experience for all users, and a single thing to worry about from a user-support perspective.
What’s been interesting to me is how quickly it all went together. In a couple of days, starting with no deep mobile experience, we’ve provisioned infrastructure in the cloud, configured it, and are up and running with live Flash and mobile streaming for short money. More details to follow in the coming days…
Flash security constraints can prevent a SWF hosted on one domain from reading data hosted on another domain. Users trying out the SlideSync and SlideScroller plugins might encounter this issue if the XML data file that contains the slide URLs and timing is on a different website from the one that hosts the JW FLV Player itself.
But I always thought there oughtta be a simpler way. So I made one. Honestly, I didn’t know if it was possible using the JW Plugin API, and while I’m a pretty good Java/Web programmer, I’m definitely not a Flash/ActionScript ace. So I decided to give it a try as a learning experience. The result is two plugins for the JW FLV Player: SlideSync and SlideScroller. These are free for commercial and non-commercial use.
In the course of researching my article on Dynamic Streaming in Flash, I ended up doing way more testing than I’d initially intended. But things didn’t work the way I expected right away, and being the way I am (foolish? glutton for punishment?), I had to find out why.
There’ll be more on that in the article when it comes out on streamingmedia.com, but for now, I wanted to make a note about how to simulate fluctuating bandwidth conditions.
On Windows, Netlimiter 3 Lite works OK, especially if you’re just doing bandwidth detection to select the appropriate stream at startup. Shunra VE Desktop seemed to create more realistic test conditions for fluctuating bandwidth and stream-switching during playback, an impression that was validated by colleagues I spoke with. At $850 a pop, it certainly ought to be better than the $20 NetLimiter.
But on the Mac, it all worked for free. It’s already built in to the OS’s Unix roots. It’s in the ipfw command. You set it up by creating filters with bandwidth limits, then associating those filters with the ports you want limited. Here’s how to set up a bandwidth limiter for testing streaming over all ports. Note that if you’re not logged in as root, you will need to use sudo to run these:
sudo ipfw pipe 1 config bw 400kbps
sudo ipfw add 10 pipe 1 tcp from any to me
sudo ipfw add 11 pipe 1 tcp from any to me
Change it at will by issuing the pipe command again…
With good quality audio, YouTube made a much better caption file. To be fair, in the beginning I throw around a few company names which aren’t real words, and I didn’t expect those to be right in the caption. But YouTube seems to be unable to recognize “YouTube”, which is kind of funny in its own way.
The other issue is the awful audio/video sync problem I’ve had recording direct from Webcam into YouTube. Oddly, I downloaded the video and corrected the problem using QT Sync. When I re-uploaded the corrected file to YouTube, the sync was off again.
Anyway, the captions are the interesting part. Here’s the clip:
The industry has seen a variety of solutions for using speech-recognition to create a transcript of a video or podcast. Virage, Pictron, Streamsage, Podzinger all have done this. Only Pictron is more or less the same company it was at the start. Virage was acquired by Autonomy and has languished there as a Web product, Streamsage was acquired by Comcast and turned into an internal division, Podzinger has become Ramp…I’m not sure what they do, at this point, but it’s not the podcast transcription service they used to be. Virage and Streamsage go back almost ten years in this space, but their systems are still running in various enterprise and educational settings.
But back to YouTube… I use Google Voice, and the speech recognition is pretty good. I rarely have to actually listen to a voice mail, since it shows up in my email as a text message that’s almost always easily decipherable, if not perfect. So just for fun, I tried YouTube’s captioning. Here’s the result.
Usually, speech-recognition provides a good set of words for searching, if nothing else. I’ve used speech-to-text to create searchable text from a video with very good results. It makes the video file, which is essentially opaque to a search engine, into something transparent. OK…in this case, maybe translucent.
I’m sure this would do better with better audio, and I will test that. In the meantime, YouTube does provide the means to download and edit the caption file, which is probably what this is best suited for, anyway. It’s a head start on a caption file, complete with time markers already in place. For those of us who are not professional transcriptionists, that has to beat making one from scratch.