(21 Jan 2003) I am still busy doing other work,
but it looks like I may have found a way to be paid for work that is related
to OIC. So now I can work on my code and pretend it's work! I don't have
much to report, but things are happening behind the scenes. Rewriting
everything from scratch was worth it. It is so much easier to add stuff now.
O.I.C. - A new video codec
OIC is a video compressor and decompressor that I have been working
on for some months now. I aim to release it as an open source, patent-free
codec, and hope that it will be good enough that everyone chooses to use
it. I have been working on it long enough that I think a web page is in
order for it. I will probably be dual licensing it in a way that lets me
sell it to people who want to use it as part of a proprietary, closed,
product.
Why O.I.C?
The name or the codec? :-) I decided that I wanted a format that
allows me to archive video on my computer, or on CD. MPEG 1 and 2 aren't
good enough, and all of the MPEGs look to be a mess of patents and licences.
DivX is not free and open either, and could be liable to be sued out of existence
one day as it is sort of an MPEG-4 implementation. I don't really know,
but I decided to take no chances.
I was looking at a bunch of low bitrate video I downloaded and got
very annoyed with the blocky artifacts and things. Yech. I decreed that
There Had To Be A Better Way, and in an entirely in-character display of
hubris, declared That I Had Found It. The idea is to use much higher resolution
motion vectors than normal, and literally warp one frame into the next.
Coupled with wavelet compression, which is a lot better than MDCT methods
when it comes to avoiding the blockies, I figured I had a decent chance of
making it work. It was my duty to humanity and the world to set things right
and do it The Right Way.
The fact that I got all excited about the idea of using much more
detailed motion fields as a form of image representation and needed an
excuse to try it has
nothing to do with anything. :-)
It was pretty easy coming up with code that could warp one frame
into another. Of course, in the real world, it becomes obvious that you
need to encode your vectors pretty cleanly to keep the current frame looking
like the previous frame after it has been eaten and vomited by your pet
mutt. I have not yet found The Perfect Way, but I have something good enough
at the moment. I have high hopes that I will significantly improve
it over time.
Why the name? Well, I needed one and this one appeals to me. It really
really beats the other ones I came up with with friends on IRC.
It works as a distinctive three letter extension too. You can pronounce
it "Oh I see", or "oyk!" if you are in a hurry. Either way I like it. Hopefully
it's what you'll say either way when you see results of the final product.
OIC - See it and say it!
Technical stuff
OIC is a wavelet based codec. I use Scalar Quantisation at the moment,
cos it's easy and was a good stepping stone. I'll use Vector Quantisation
eventually, when I figure out which way is up on this bloody great big
textbook I just bought. I have my own little algorithm that takes the
quadtree of wavelet co-efficients and spews out bits to define it. Each
frame consists of several sections of data, each encoded in an embedded
manner. I dunno why I bothered to make it embedded, but I figured I would
if I could, and maybe I'll be able to make use of it for bitrate peeling
one day.
The quadtree encoder is designed so I can literally drop in a VQ
when I figure out exactly which type I'll be using. VQ should improve
its performance a lot, as I waste bits at the moment that I havent bothered
cleaning up cos the encoder isnt final anyway.
OIC is a VBR codec. That means "Variable Bit Rate". It is primarily
designed so that you the user specify a quality for the video, and it makes
the smallest stream it can at that quality. When you want to stream video
over the net or something, you generally want a Constant Bit Rate encoder,
to make sure you dont overflow buffers. OIC can be made CBR to a useful
degree with a little massaging, but I'm not going to bother now.
Oh yes. This is all written in Pascal, by the way. Hah! Bet that
scares you. Seriously, I needed a language the compiles all my code in
milliseconds so I can experiment a lot. Of course, things like range checking
help a lot as well. I don't have the time to waste looking for unforced
programmer errors. I've been using Kylix, and haven't done anything particularly
esoteric, so it should compile in the free version of Kylix, and, I hope,
freepascal as well. If you are wondering, yes, I am a Bondage And Discipline
programmer. I know I make mistakes, and I choose my preferred language
accordingly.
I may produce a C version if there is any demand for it.
Screenshot
Well, we all know you
have to have a
screenshot
now don't we? :-) The different buttons execute different functions within
the code, allowing me to play with different ideas just by clicking around.
This also allows me to examine the state of just about everything at any
point in the compression. This is important so I can tweak my algorithms.
Current Status
Well, I have code that will take streams of JPEGs, GIFs, or PNGs
and output a stream of PPMs after compressing and decompressing it. It
works at 320x256, black and white only. This is because I want my code
to be readable and easy to modify. It really needs to be cleaned up, and
bolting on resolution independence and colour will just confuse the issue
at the moment. I know how to do it, and roughly how many extra bits colour
will eat per frame, and will leave it at that til later. Colour will be
a relatively low bitrate addition.
If I fix the quality at around 40dB PSNR in each frame (which is
pretty good quality for my input sequences) a talking head sequence (at
30fps) is around 200kbits/sec. Talking heads with a busy background (like
people talking at a party), is around 500kbps. Two people beating the crap
out of each other with the camera doing its nut panning around them is around
700kbps. This is using SQ, and I think I can do much better with VQ.
How much better is going to be the killer question. I have my own
theories, and they make me warm and fuzzy, but I'll not say anything till
I have it working.
Known Bugs/Issues
A problem is I haven't figured out a nice way to get
real streams
into the encoder yet, without going via a bunch of still frames. I'm looking
at the ffmpeg stuff for this. It looks really simple to use, and well done
too.
I send the LL band of my wavelet transform as raw data at the moment.
Absolutely zero compression. As I improve other things, this becomes more
and more important. I'll do something after the VQ stuff is done. Dunno
what yet.
Error is strictly held below a certain threshold at the moment. This
is bad, as I can probably throw all sorts of stuff away if I have a better
idea of what won't be noticed. I have a cunning plan on how to do this,
but have not implemented it yet. I could probably squeeze another 20-40%
out of the current code (SQ based) if I added this.
Current Plans
My current plan is to add a Vector Quantiser to my wavelet codec. This
should dramatically improve the visual quality, and give me a lot more leeway
for picking lower bitrates. I intend to use a hybrid classification/finite
state/gain shape Pruned-TSVQ eventually.
I need to get a good supply of varied video streams to train my VQ codebook.
High quality interlace-artifact free stuff. At the moment I am using some
Buffy episodes encoded with MPEG-1 and a few video clips I extracted from
a DVD before Broadcast 2000 weirded out on me. I may add an ffmpeg section
to my code just to get training sequences.
At some point, I will get around to adding the error handling I mentioned
in the issues section. It's not a priority because it's relatively simple
and I have more difficult things to do.
Finally, I need to spend some time cleaning things up before I plug
in the new quantiser. The code is monolithic and ugly at the moment, as
if someone just hacked it together as a prototype. Gee. I wonder why? :-)
Anyone who would like to help with donations of short high-quality video
clips, please feel free to mail me and we'll make a plan. I need lots of
different samples, of maybe 30 frames apiece.
Conclusion
I am writing my code in between other things at the moment, and will
start releasing source when I have stabilised the format. I'll be using
Ogg as the stream encapsulation format, and probably thus for audio too.