警告

Torchaudio 的 C++ API 是原型特性。不保證 API/ABI 向後相容性。

注意

頂層名稱空間已從 torchaudio 更改為 torio。StreamReader 已重新命名為 StreamingMediaDecoder。

torio::io::StreamingMediaDecoder¶

StreamingMediaDecoder 是 Python 等效實現使用的類，並提供類似的介面。在使用自定義 I/O（例如記憶體中的資料）時，可以使用 StreamingMediaDecoderCustomIO 類。

這兩個類定義了相同的方法，因此用法相同。

建構函式¶

StreamingMediaDecoder¶

class StreamingMediaDecoder¶

逐塊獲取和解碼音訊/影片流。

由 torio::io::StreamingMediaDecoderCustomIO 繼承

警告

doxygenfunction: 無法在目錄 cpp/xml 中專案“libtorio”的 doxygen xml 輸出中解析帶有引數 (const std::string&, const std::optional<std::string>&, const c10::optional<OptionDict>&) 的函式“torio::io::StreamingMediaDecoder::StreamingMediaDecoder”。潛在匹配項

- StreamingMediaDecoder(const std::string &src, const std::optional<std::string> &format = c10::nullopt, const std::optional<OptionDict> &option = c10::nullopt)

StreamingMediaDecoderCustomIO¶

class StreamingMediaDecoderCustomIO : private detail::CustomInput, public torio::io::StreamingMediaDecoder ¶: StreamingMediaDecoder 的子類，使用自定義讀取函式。可用於從記憶體或自定義物件解碼媒體。

torio::io::StreamingMediaDecoderCustomIO::StreamingMediaDecoderCustomIO(void *opaque, const std::optional<std::string> &format, int buffer_size, int (*read_packet)(void *opaque, uint8_t *buf, int buf_size), int64_t (*seek)(void *opaque, int64_t offset, int whence) = nullptr, const std::optional<OptionDict> &option = c10::nullopt)¶

使用自定義讀取和定位函式構建 StreamingMediaDecoder。

引數：:

opaque – read_packet 和 seek 函式使用的自定義資料。
format – 指定輸入格式。
buffer_size – 中間緩衝區的大小，FFmpeg 用於將資料傳遞給 read_packet 函式。
read_packet – FFmpeg 呼叫以從目標讀取資料的自定義讀取函式。
seek – 用於定位目標的可選定位函式。
option – 初始化格式上下文時傳遞的自定義選項。

查詢方法¶

find_best_audio_stream¶

int64_t torio::io::StreamingMediaDecoder::find_best_audio_stream() const¶

使用 ffmpeg 的啟發式方法查詢合適的音訊流。

如果成功，返回最佳流的索引（>=0）。否則返回負值。

find_best_video_stream¶

int64_t torio::io::StreamingMediaDecoder::find_best_video_stream() const¶

使用 ffmpeg 的啟發式方法查詢合適的影片流。

如果成功，返回最佳流的索引（>=0）。否則返回負值。

get_metadata¶

OptionDict torio::io::StreamingMediaDecoder::get_metadata() const¶: 獲取源媒體的元資料。

num_src_streams¶

int64_t torio::io::StreamingMediaDecoder::num_src_streams() const¶

獲取輸入媒體中找到的源流數量。

源流不僅包括音訊/影片流，還包括字幕和其他流。

get_src_stream_info¶

SrcStreamInfo torio::io::StreamingMediaDecoder::get_src_stream_info(int i) const¶

獲取指定源流的資訊。

有效值範圍是 [0, num_src_streams())。

num_out_streams¶

int64_t torio::io::StreamingMediaDecoder::num_out_streams() const¶: 獲取客戶端程式碼定義的輸出流數量。

get_out_stream_info¶

OutputStreamInfo torio::io::StreamingMediaDecoder::get_out_stream_info(int i) const¶

獲取指定輸出流的資訊。

有效值範圍是 [0, num_out_streams())。

is_buffer_ready¶

bool torio::io::StreamingMediaDecoder::is_buffer_ready() const¶: 檢查所有輸出流的緩衝區是否已有足夠的已解碼幀。

配置方法¶

add_audio_stream¶

void torio::io::StreamingMediaDecoder::add_audio_stream(int64_t i, int64_t frames_per_chunk, int64_t num_chunks, const std::optional<std::string> &filter_desc = c10::nullopt, const std::optional<std::string> &decoder = c10::nullopt, const std::optional<OptionDict> &decoder_option = c10::nullopt)¶

定義一個輸出音訊流。

引數：:

i – 源流的索引。
frames_per_chunk – 作為一塊（chunk）返回的幀數。
如果源流在緩衝到 frames_per_chunk 幀之前耗盡，則按原樣返回該塊。因此，該塊中的幀數可能小於 frames_per_chunk。

提供 -1 會停用分塊，在這種情況下，pop_chunks() 方法會將所有已緩衝的幀作為一塊返回。
num_chunks – 內部緩衝區大小。
當緩衝塊的數量超過此數值時，舊的塊將被丟棄。例如，如果 frames_per_chunk 為 5 且 buffer_chunk_size 為 3，則早於 15 幀的幀將被丟棄。

提供 -1 會停用此行為，強制保留所有塊。
filter_desc – 應用於源流的濾波器圖描述。
decoder – 要使用的解碼器名稱。提供此引數時，使用指定的解碼器而不是預設解碼器。
decoder_option – 傳遞給解碼器的選項。
要列出某個解碼器的選項，可以使用 ffmpeg -h decoder=<DECODER> 命令。

除了特定於解碼器的選項外，您還可以傳遞與多執行緒相關的選項。這些選項僅在解碼器支援時才有效。如果兩者都未提供，StreamingMediaDecoder 將預設為單執行緒。
- "threads": 執行緒數，或值 "0" 讓 FFmpeg 根據其啟發式方法決定。
- "thread_type": 要使用的多執行緒方法。有效值是 "frame" 或 "slice"。請注意，每個解碼器支援的方法集不同。如果未提供，將使用預設值。
  - "frame": 一次解碼多個幀。每個執行緒處理一個幀。這將使解碼延遲增加每個執行緒一幀。
  - "slice": 一次解碼單個幀的多個部分。

add_video_stream¶

void torio::io::StreamingMediaDecoder::add_video_stream(int64_t i, int64_t frames_per_chunk, int64_t num_chunks, const std::optional<std::string> &filter_desc = c10::nullopt, const std::optional<std::string> &decoder = c10::nullopt, const std::optional<OptionDict> &decoder_option = c10::nullopt, const std::optional<std::string> &hw_accel = c10::nullopt)¶

定義一個輸出影片流。

引數：:

i, frames_per_chunk, num_chunks, filter_desc, decoder, decoder_option – 參見 add_audio_stream()。
hw_accel – 啟用硬體加速。
當影片在 CUDA 硬體上解碼時（例如透過指定 "h264_cuvid" 解碼器），將 CUDA 裝置指示符傳遞給 hw_accel（即 hw_accel="cuda:0"）將使 StreamingMediaDecoder 直接將結果幀放在指定的 CUDA 裝置上作為 CUDA tensor。

如果為 None，則該塊將移動到 CPU 記憶體。

remove_stream¶

void torio::io::StreamingMediaDecoder::remove_stream(int64_t i)¶

移除一個輸出流。

引數：:: i – 要移除的輸出流的索引。有效值範圍是 [0, num_out_streams())。

流方法¶

seek¶

void torio::io::StreamingMediaDecoder::seek(double timestamp, int64_t mode)¶

定位到給定的時間戳。

引數：:

timestamp – 目標時間戳，單位為秒。
mode – 定位模式。
- 0: 關鍵幀模式。定位到給定時間戳之前最近的關鍵幀。
- 1: 任意模式。定位到給定時間戳之前的任何幀（包括非關鍵幀）。
- 2: 精確模式。首先定位到給定時間戳之前最近的關鍵幀，然後解碼幀直到到達最接近給定時間戳的幀。

process_packet¶

int torio::io::StreamingMediaDecoder::process_packet()¶

解複用並處理一個包。

返回::

0: 一個包已成功處理，且流中仍有包剩餘，客戶端程式碼可以再次呼叫此方法。
1: 一個包已成功處理，且已到達檔案末尾 (EOF)。客戶端程式碼不應再次呼叫此方法。
<0: 發生了錯誤。

process_packet_block¶

int torio::io::StreamingMediaDecoder::process_packet_block(const double timeout, const double backoff)¶

類似於 process_packet()，但如果因資源暫時不可用而失敗，它會自動重試。

此行為在使用裝置輸入（如麥克風）時很有幫助，因為在進行取樣採集時，緩衝區可能會很忙。

引數：:

timeout – 超時時間（毫秒）。
- >=0: 繼續重試直到給定時間過去。
- <0: 永遠重試。
backoff – 重試前等待的時間（毫秒）。

process_all_packets¶

void torio::io::StreamingMediaDecoder::process_all_packets()¶: 處理包直到檔案末尾 (EOF)。

fill_buffer¶

int torio::io::StreamingMediaDecoder::fill_buffer(const std::optional<double> &timeout = c10::nullopt, const double backoff = 10.)¶

處理包直到所有塊緩衝區至少包含一個塊

引數：:

timeout – 參見 process_packet_block()
backoff – 參見 process_packet_block()

檢索方法¶

pop_chunks¶

std::vector<std::optional<Chunk>> torio::io::StreamingMediaDecoder::pop_chunks()¶: 如果可用，從每個輸出流彈出一個塊。

支援結構¶

Chunk¶

struct Chunk¶

儲存解碼後的幀和元資料。

公共成員

torch::Tensor frames¶

音訊/影片幀。

對於音訊，形狀為 [time, num_channels]，且 dtype 取決於輸出流配置。

對於影片，形狀為 [time, channel, height, width]，且 dtype 為 torch.uint8。

double pts¶: 第一幀的演示時間戳，單位為秒。

SrcStreaminfo¶

struct SrcStreamInfo¶

輸入媒體中找到的源流資訊。

通用成員

AVMediaType media_type¶

流媒體型別。

請參閱 FFmpeg 文件瞭解可用值

待辦: 引入自己的列舉並移除對 FFmpeg 的依賴

const char *codec_name = "N/A"¶: 編解碼器名稱。

const char *codec_long_name = "N/A"¶: 編解碼器的完整、易於理解的名稱。

const char *fmt_name = "N/A"¶

對於音訊，它是取樣格式。

常見值有：

"u8", "u8p": 8位無符號整數。
"s16", "s16p": 16位有符號整數。
"s32", "s32p": 32位有符號整數。
"s64", "s64p": 64位有符號整數。
"flt", "fltp": 32位浮點數。
"dbl", "dblp": 64位浮點數。

對於影片，它是顏色通道格式。

常見值包括：

"gray8": 灰度
"rgb24": RGB
"bgr24": BGR
"yuv420p": YUV420p

int64_t bit_rate = 0¶: 位元率。

int64_t num_frames = 0¶: 幀數。

注意

在某些格式中，此值不可靠或不可用。

int bits_per_sample = 0¶: 每取樣點位數。

OptionDict metadata = {}¶

元資料

此方法可以從 MP3 中獲取 ID3 標籤。

示例

{
  "title": "foo",
  "artist": "bar",
  "date": "2017"
}

音訊專用成員

double sample_rate = 0¶: 取樣率。

int num_channels = 0¶: 通道數。

影片專用成員

int width = 0¶: 寬度。

int height = 0¶: 高度。

double frame_rate = 0¶: 幀率。

OutputStreaminfo¶

struct OutputStreamInfo¶

使用者程式碼配置的輸出流資訊。

音訊專用成員

double sample_rate = -1¶: 取樣率。

int num_channels = -1¶: 通道數。

影片專用成員

int width = -1¶: 寬度。

int height = -1¶: 高度。

AVRational frame_rate = {0, 1}¶: 幀率。

公共成員

int source_index¶: 輸入源流的索引。

AVMediaType media_type = AVMEDIA_TYPE_UNKNOWN¶

流媒體型別。

請參閱 FFmpeg 文件瞭解可用值

待辦: 引入自己的列舉並移除對 FFmpeg 的依賴

int format = -1¶: 媒體格式。音訊為 AVSampleFormat，影片為 AVPixelFormat。

std::string filter_description = {}¶: 濾波器圖定義，例如 "aresample=16000,aformat=sample_fmts=fltp"。

torio::io::StreamingMediaDecoder¶

建構函式¶

StreamingMediaDecoder¶

StreamingMediaDecoderCustomIO¶

查詢方法¶

find_best_audio_stream¶

find_best_video_stream¶

get_metadata¶

num_src_streams¶

get_src_stream_info¶

num_out_streams¶

get_out_stream_info¶

is_buffer_ready¶

配置方法¶

add_audio_stream¶

add_video_stream¶

remove_stream¶

流方法¶

seek¶

process_packet¶

process_packet_block¶

process_all_packets¶

fill_buffer¶

檢索方法¶

pop_chunks¶

支援結構¶

Chunk¶

SrcStreaminfo¶

OutputStreaminfo¶

文件

教程

資源