Summary: Fixing subtitle encoding in DivX videos is easy... once you know how to do it.
I have been hunting for a copy of Moi Ivan, toi Abraham (AKA "Ivan and Abram", "Я - Иван, ты - Абрам") since I saw the movie on cable in mid-90s. The movie has not been released on a DVD, and I do not have a VHS player, but fortunately, I got a decent DivX version of the movie with Russian subtitles (the movie is mostly in Yiddish).
Unfortunately, instead of legitimate Cyrillic, the subtitle captions displayed garbage (accented characters). As I later found out, the subtitle file was encoded in ASCII for Windows-1251 (Cyrillic) code page instead of a Western code page (such as Windows-1252), so they appear fine only on a Russian version of Windows. So, what's a girl to do? I ran a few Google searches and found some posts from people running into a similar problem, but none of them contained any answers. I thought I would write a post explaining how I fixed the problem (really easy) hoping that it would help someone.
First, a quick intro to subtitles in DivX. Well, I do not really know much about this, but this is how much you -- a typical movie viewer -- need to know (if I misstate or omit something important, feel free to correct me). A typical DivX (AVI) file does not contain embedded subtitles. Subtitles normally come from a separate file, such as SRT, SUB, SSA/ASS. Normally, a subtitle file has the same name (and different extension) as the DivX file. For example, this would be a pair of a DivX (AVI) and a subtitle (SRT) files:
Moi Ivan, Moi Abraham.aviThere is nothing magic about a subtitle file: it's just a text file, which confirms to a certain data format. Here is the format of the SubRip (SRT) subtitle file (directly from Wikipedia):
Moi Ivan, Moi Abraham.srt
Subtitle numberHere is an example:
Start time --> End time
Text of subtitle (one or more lines)
1Many popular video players (KMPlayer, VLC, etc), as well as DVD players, will automatically load and display the default subtitles from the file with the same name (as the DivX file) and the same folder, but you can also load additional subtitle files manually (e.g. you may have subtitles translated in several languages). In my favorite KMPlayer, you can load non-default subtitles via the Subtitles - Load Subtitle menu.
00:00:18,700 --> 00:00:21,889
00:03:16,190 --> 00:03:21,760
Я - ИВАН, ТЫ - АБРАМ
The original subtitle file I got looked like this:
1Although this text looks like garbage, it's not useless: it just needs to be re-encoded from one code page to another (and desirebly, to something non-code-page-specific, e.g. to Unicode). But how do you do it?
00:00:18,700 --> 00:00:21,889
00:03:16,190 --> 00:03:21,760
ß - ÈÂÀÍ, ÒÛ - ÀÁÐÀÌ
Help comes from Mozilla Firefox (and I suspect from any other web browser). If you need to fix the encoding of a subtitle file (or any other text file), here is what you need to do (you can use a similar approach to recover text in other types of documents, such as email, text files, and so on).
- Launch Firefox (or you favorite web browser).
- Open the subtitle file. To locate file in Firefox 3.5, use the File - Open File menu; in IE 8, use the File - Open menu, and click the Browse button; in Google Chrome 4.0 press the CTRL + O keys (when using Google Chrome, you need to change extension of the subtitle file to .TXT before opening the file; otherwise, it will launch the default program associated with the original file extension instead of displaying the file text in the browser).
- Once the browser opens the file, it may automatically adjust encoding. If you still see garbage, select a different encoding option until the text appears correctly. To change encoding in Firefox 3.5, select appropriate encoding from the View - Character Encoding menu (Auto-Detect menu for the appropriate language can be helpful); In IE 8, use the View - Encoding menu; In Google Chrome, click the Control the current page toolbar button and pick the appropriate option from the Encoding menu (again, the Auto detect option may help).
- Once you select the correct encoding option and verify that the text is displayed correctly highlight all text (you can use CTRL + A), and copy the selected text to the clipboard (press CTRL + C).
- Open Notepad (or your favorite plain text editor, such as Notepad++, PSPad, etc), create a new file (File - New menu option in Notepad) and paste the contents of the clipboard in the new file (press CTRL + V).
- Save the text file as the new subtitle file. If you decide to overwrite the original subtitle file, make sure that you first make a backup in case something goes wrong. When saving the file, you will most likely be prompted to change the default ANSI encoding, so pick the Unicode encoding.
- Close the newly created subtitle file in Notepad (or your text editor), and reopen it to verify that encoding is still intact and text appears correctly, and if so, use it as a new subtitle file.
UPDATE: As I recently found out, the process of correcting the code page related issues in subtitles can be even easier, assuming that you have a free text editor Notepad++ installed. What you need to do is:See also:
- Back up the subtitles file (just in case something goes wrong).
- Open the subtitles file in Notepad++.
- From the Encoding menu, select the Characters Set option.
- Under the character set, select the appropriate language family and then the code page (you may need to try a few code pages if you don't know which one to use).
- When you see the characters appearing in the correct format, select the Convert to UTF-8 option under the Encoding menu.
- Save the file.
The 3 Best Subtitle Sites For Your Movies & TV Series
How To Add Subtitles To A Movie Or TV Series
SubDownloader: Fast and Easy Subtitle Downloader
DivXLand Media Subtitler Embeds Subtitles into Movie Files
Sublight Labs: Searching subtitles has never been this easy
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)