OpenAIの音声認識モデルWhisperを利用した文字起こしアプリ「Whisper Transcription/MacWhisper v8.0」がリリース。作成した字幕を確認できる動画プレイヤーや様々なAIモデルを選択できるWhisperKitをサポート。

ChatGPT＆AI

2024.05.21

記事内に広告が含まれています。This article contains advertisements.

　OpenAIの音声認識モデルWhisperを利用した文字起こしアプリ「Whisper Transcription/MacWhisper v8.0」がリリースされています。詳細は以下から。

　Good SnoozeのJordi Bruinさんは現地時間2024年05月20日、OpenAIの音声認識モデルWhisperを利用した音声文字起こし(Speech to Text)アプリ「Whisper Transcription (Mac App Store版) / MacWhisper (Gumroad版)」のメジャーアップデートとなる「Whisper Transcription/MacWhisper v8.0」をリリースしたと発表しています。

ビデオプレイヤー

　Whisper Transcription/MacWhisper v8.0では、新たにビデオプレイヤー機能が実装され、動画ファイルを文字起こしする際に、文字起こししたテキストを字幕として表示することが可能となっています。

動画プレーヤー

00:00

ボリューム調節には上下矢印キーを使ってください。

WhisperKit

　新たにサポートされたWhisperKitは、様々なWhisperエンジンを選択できる機能で、現在は実験的な機能として扱われており、利用するにはWhisper Transcription/MacWhisperメニューから[Settings…] → [Advanced] → [Show WhisperKit Models]を有効にする必要があります。

　このWhisperKitを有効にすると、Whisperモデルの選択画面にCoreMLフレームワークに最適化されたWhisperKitやHugging FaceのDistil-Whisperなど様々なAIモデル(Whisperエンジン)が追加され、ダウンロード＆利用できるようになります。

ChatGPT-4o

　この機能はGumroad版のMacWhisper v8.0限定の機能となっていますが、MacWhisper v8.0ではOpenAIのAPIを入力することで、最新のChatGPT-4oによる音声の文字起こしや、文字起こしの要約、箇条書きに変換、校正などを行うことが可能になっています。

　Whisper Transcription/MacWhisper v8.0では、この他、YouTube動画の翻訳の際の動画ダウンロードの高速化や音声のみのダウンロードオプション、動画の高画質/低画質の選択、10以上の改善と不具合修正が行われているので、ユーザーの方はアップデートしてみてください。

Whisper Transcription v8.0

New Features:

Video Player

Experience an inline video player when transcribing video files! It offers a pop-out option for a separate window experience. Subtitles are displayed directly on the video, with translations appearing as additional subtitles.

WhisperKit Support

Select from various Whisper engines for your transcriptions. WhisperKit provides streamlined models for enhanced speed, and transcriptions are streamed in real time. Activate WhisperKit under Settings > Advanced.

Improved:

The app now avoids cutting off words mid-way if a character limit is set.
Introduced a new menubar icon to avoid confusion with the standard microphone icon.
Moved quality and language selectors to the toolbar; expand your window to see them if hidden.
Opening .whisper files is now feasible even as models are loading.
Upgraded to the latest Whisper C++ engine featuring Flash Attention (enable in Settings > Advanced).
The Manage Models screen has been redesigned for simpler model selection. Your feedback is appreciated.
Improved error handling for model downloads.
Removed “MS Teams Virtual Mic” from microphone choices due to it not being an actual microphone.
Fixed an issue where invalid license error codes were not shown.
Fixed a crash occurring when non-pro users added more than two speakers.
The Esc key now won’t close screens during ongoing activities like recording or batch transcription.