StreamingMediaEncoder¶

class torio.io.StreamingMediaEncoder(dst: Union[str, Path, BinaryIO], format: Optional[str] = None, buffer_size: int = 4096)[source]¶

分塊編碼並寫入音訊/影片流

引數:

dst (str, path-like or file-like object) –
編碼資料將被寫入的目標位置。如果為字串型別，則必須是 FFmpeg 可以處理的資源指示符。支援的值取決於系統中找到的 FFmpeg 版本。

如果為類檔案物件 (file-like object)，則必須支援 write 方法，其簽名應為 write(data: bytes) -> int。

請參考以下連結，瞭解 write 方法的預期簽名和行為。
- https://docs.python.club.tw/3/library/io.html#io.BufferedIOBase.write
format (str or None, optional) –
覆蓋輸出格式，或指定輸出媒體裝置。預設值: None (不覆蓋格式也不輸出到裝置)。

此引數用於兩種不同的用例。
1. 覆蓋輸出格式。當寫入原始資料或使用與副檔名不同的格式時非常有用。
2. 指定輸出裝置。這允許將媒體流輸出到硬體裝置，如揚聲器和影片螢幕。
注意

此選項大致對應於 ffmpeg 命令的 -f 選項。請參考 ffmpeg 文件以獲取可能的值。

https://ffmpeg.org/ffmpeg-formats.html#Muxers

請使用 get_muxers() 來列出當前環境中可用的複用器 (muxer)。

對於裝置訪問，可用值因硬體（AV 裝置）和軟體配置（ffmpeg 構建）而異。請參考 ffmpeg 文件以獲取可能的值。

https://ffmpeg.org/ffmpeg-devices.html#Output-Devices

請使用 get_output_devices() 來列出當前環境中可用的輸出裝置。
buffer_size (int) –
內部緩衝區大小（位元組）。僅在 dst 是類檔案物件時使用。

預設值: 4096。

方法¶

add_audio_stream¶

StreamingMediaEncoder.add_audio_stream(sample_rate: int, num_channels: int, format: str = 'flt', *, encoder: Optional[str] = None, encoder_option: Optional[Dict[str, str]] = None, encoder_sample_rate: Optional[int] = None, encoder_num_channels: Optional[int] = None, encoder_format: Optional[str] = None, codec_config: Optional[CodecConfig] = None, filter_desc: Optional[str] = None)[source]¶

新增一個輸出音訊流。

引數:

sample_rate (int) – 取樣率。
num_channels (int) – 通道數。
format (str, optional) –
輸入取樣格式，決定輸入張量的 dtype。
- "u8": 輸入張量必須是 torch.uint8 型別。
- "s16": 輸入張量必須是 torch.int16 型別。
- "s32": 輸入張量必須是 torch.int32 型別。
- "s64": 輸入張量必須是 torch.int64 型別。
- "flt": 輸入張量必須是 torch.float32 型別。
- "dbl": 輸入張量必須是 torch.float64 型別。
預設值: "flt"。
encoder (str or None, optional) –
使用的編碼器名稱。如果提供，則使用指定的編碼器而不是預設編碼器。

要列出可用編碼器，音訊請使用 get_audio_encoders()，影片請使用 get_video_encoders()。

預設值: None。
encoder_option (dict or None, optional) –
傳遞給編碼器的選項。鍵值對均為字串。

要列出特定編碼器的選項，可以使用命令 ffmpeg -h encoder=<ENCODER>。

預設值: None。

除了編碼器特定選項外，還可以傳遞與多執行緒相關的選項。這些選項僅在編碼器支援時有效。如果兩者均未提供，StreamReader 預設為單執行緒。

"threads": 執行緒數（字串型別）。提供值 "0" 將讓 FFmpeg 根據其啟發式演算法決定。

"thread_type": 使用哪種多執行緒方法。有效值為 "frame" 或 "slice"。請注意，每種編碼器支援的方法集不同。如果未提供，則使用預設值。
- "frame": 一次編碼多個幀。每個執行緒處理一個幀。這將增加每個執行緒一個幀的編碼延遲。
- "slice": 一次編碼單個幀的多個部分。
encoder_sample_rate (int or None, optional) –
覆蓋編碼時使用的取樣率。有些編碼器對編碼使用的取樣率有限制。如果源取樣率不受編碼器支援，則使用源取樣率，否則選擇一個預設值。

例如，"opus" 編碼器僅支援 48k Hz，因此，當使用 "opus" 編碼器編碼波形時，始終編碼為 48k Hz。同時 "mp3" ("libmp3lame") 支援 44.1k, 48k, 32k, 22.05k, 24k, 16k, 11.025k, 12k 和 8k Hz。如果原始取樣率是其中之一，則使用原始取樣率，否則會重取樣到預設值 (44.1k)。當編碼為 WAV 格式時，對取樣率沒有限制，因此將使用原始取樣率。

提供 encoder_sample_rate 將覆蓋此行為，並使編碼器嘗試使用提供的取樣率。提供的值必須是編碼器支援的值之一。
encoder_num_channels (int or None, optional) –
覆蓋編碼使用的通道數。

與取樣率類似，某些編碼器（如 "opus"、"vorbis" 和 "g722"）對編碼使用的通道數有限制。

如果原始通道數受編碼器支援，則使用原始通道數，否則編碼器嘗試將通道混音到支援的通道數之一。

提供 encoder_num_channels 將覆蓋此行為，並使編碼器嘗試使用提供的通道數。提供的值必須是編碼器支援的值之一。
encoder_format (str or None, optional) –
用於編碼媒體的格式。當編碼器支援多種格式時，傳遞此引數將覆蓋編碼使用的格式。

要列出編碼器支援的格式，可以使用命令 ffmpeg -h encoder=<ENCODER>。

預設值: None。

注意

未提供 encoder_format 選項時，編碼器使用其預設格式。

例如，當將音訊編碼為 wav 格式時，使用 16 位有符號整數；當將影片編碼為 mp4 格式（h264 編碼器）時，使用 YUV 格式之一。

這是因為通常在音訊模型中使用 32 位或 16 位浮點數，但這在音訊格式中並不常用。類似地，RGB24 在視覺模型中常用，但影片格式通常（且更好地）支援 YUV 格式。
codec_config (CodecConfig or None, optional) –
編解碼器配置。請參閱 CodecConfig 獲取配置選項。

預設值: None。
filter_desc (str or None, optional) – 編碼輸入媒體之前要應用的額外處理。

add_video_stream¶

StreamingMediaEncoder.add_video_stream(frame_rate: float, width: int, height: int, format: str = 'rgb24', *, encoder: Optional[str] = None, encoder_option: Optional[Dict[str, str]] = None, encoder_frame_rate: Optional[float] = None, encoder_width: Optional[int] = None, encoder_height: Optional[int] = None, encoder_format: Optional[str] = None, codec_config: Optional[CodecConfig] = None, filter_desc: Optional[str] = None, hw_accel: Optional[str] = None)[source]¶

新增一個輸出影片流。

此方法必須在呼叫 open 之前呼叫。

引數:

frame_rate (float) – 影片的幀率。
width (int) – 影片幀的寬度。
height (int) – 影片幀的高度。
format (str, optional) –
輸入畫素格式，決定輸入張量的顏色通道順序。
- "gray8": 單通道，灰度。
- "rgb24": 三通道，順序為 RGB。
- "bgr24": 三通道，順序為 BGR。
- "yuv444p": 三通道，順序為 YUV。
預設值: "rgb24"。

在任何情況下，輸入張量必須是 torch.uint8 型別，並且形狀必須是 (幀, 通道, 高度, 寬度)。
encoder (str or None, optional) –
使用的編碼器名稱。如果提供，則使用指定的編碼器而不是預設編碼器。

要列出可用編碼器，音訊請使用 get_audio_encoders()，影片請使用 get_video_encoders()。

預設值: None。
encoder_option (dict or None, optional) –
傳遞給編碼器的選項。鍵值對均為字串。

要列出特定編碼器的選項，可以使用命令 ffmpeg -h encoder=<ENCODER>。

預設值: None。

除了編碼器特定選項外，還可以傳遞與多執行緒相關的選項。這些選項僅在編碼器支援時有效。如果兩者均未提供，StreamReader 預設為單執行緒。

"threads": 執行緒數（字串型別）。提供值 "0" 將讓 FFmpeg 根據其啟發式演算法決定。

"thread_type": 使用哪種多執行緒方法。有效值為 "frame" 或 "slice"。請注意，每種編碼器支援的方法集不同。如果未提供，則使用預設值。
- "frame": 一次編碼多個幀。每個執行緒處理一個幀。這將增加每個執行緒一個幀的編碼延遲。
- "slice": 一次編碼單個幀的多個部分。
encoder_frame_rate (float or None, optional) –
覆蓋編碼時使用的幀率。

一些編碼器（如 "mpeg1" 和 "mpeg2"）對編碼使用的幀率有限制。在這種情況下，如果源幀率（透過 frame_rate 提供）不是受支援的幀率之一，則選擇一個預設值，並且幀率會動態更改。否則，使用源幀率。

提供 encoder_frame_rate 將覆蓋此行為，並使編碼器嘗試使用提供的取樣率。提供的值必須是編碼器支援的值之一。
encoder_width (int or None, optional) – 編碼時使用的影像寬度。這允許在編碼期間更改影像大小。
encoder_height (int or None, optional) – 編碼時使用的影像高度。這允許在編碼期間更改影像大小。
encoder_format (str or None, optional) –
用於編碼媒體的格式。當編碼器支援多種格式時，傳遞此引數將覆蓋編碼使用的格式。

要列出編碼器支援的格式，可以使用命令 ffmpeg -h encoder=<ENCODER>。

預設值: None。

注意

未提供 encoder_format 選項時，編碼器使用其預設格式。

例如，當將音訊編碼為 wav 格式時，使用 16 位有符號整數；當將影片編碼為 mp4 格式（h264 編碼器）時，使用 YUV 格式之一。

這是因為通常在音訊模型中使用 32 位或 16 位浮點數，但這在音訊格式中並不常用。類似地，RGB24 在視覺模型中常用，但影片格式通常（且更好地）支援 YUV 格式。
codec_config (CodecConfig or None, optional) –
編解碼器配置。請參閱 CodecConfig 獲取配置選項。

預設值: None。
filter_desc (str or None, optional) – 編碼輸入媒體之前要應用的額外處理。
hw_accel (str or None, optional) –
啟用硬體加速。

例如，當影片在 CUDA 硬體上編碼（例如 encoder=”h264_nvenc”）時，將 CUDA 裝置指示符傳遞給 hw_accel（即 hw_accel=”cuda:0”）將使 StreamingMediaEncoder 期望影片塊是 CUDA Tensor。傳遞 CPU Tensor 將導致錯誤。

如果為 None，則影片塊 Tensor 必須是 CPU Tensor。預設值: None。

close¶

StreamingMediaEncoder.close()[source]¶

關閉輸出

StreamingMediaEncoder 也是一個上下文管理器，因此支援 with 語句。推薦使用上下文管理器，因為在退出 with 子句時檔案會自動關閉。

有關更多詳細資訊，請參閱 StreamingMediaEncoder.open()。

flush¶

StreamingMediaEncoder.flush()[source]¶: 清空編碼器中的幀並將幀寫入目標。

open¶

StreamingMediaEncoder.open(option: Optional[Dict[str, str]] = None) → StreamingMediaEncoder[source]¶

Open the output file / device and write the header.

StreamingMediaEncoder is also a context manager and therefore supports the with statement. This method returns the instance on which the method is called (i.e. self), so that it can be used in with statement. It is recommended to use context manager, as the file is closed automatically when exiting from with clause.

引數:: option (dict or None, optional) – Private options for protocol, device and muxer. See example.

Example - Protocol option

>>> s = StreamingMediaEncoder(dst="rtmp://:1234/live/app", format="flv")
>>> s.add_video_stream(...)
>>> # Passing protocol option `listen=1` makes StreamingMediaEncoder act as RTMP server.
>>> with s.open(option={"listen": "1"}) as f:
>>>     f.write_video_chunk(...)

Example - Device option

>>> s = StreamingMediaEncoder("-", format="sdl")
>>> s.add_video_stream(..., encoder_format="rgb24")
>>> # Open SDL video player with fullscreen
>>> with s.open(option={"window_fullscreen": "1"}):
>>>     f.write_video_chunk(...)

Example - Muxer option

>>> s = StreamingMediaEncoder("foo.flac")
>>> s.add_audio_stream(...)
>>> s.set_metadata({"artist": "torio contributors"})
>>> # FLAC muxer has a private option to not write the header.
>>> # The resulting file does not contain the above metadata.
>>> with s.open(option={"write_header": "false"}) as f:
>>>     f.write_audio_chunk(...)

set_metadata¶

StreamingMediaEncoder.set_metadata(metadata: Dict[str, str])[source]¶

Set file-level metadata

引數:: metadata (dict or None, optional) – File-level metadata.

write_audio_chunk¶

StreamingMediaEncoder.write_audio_chunk(i: int, chunk: Tensor, pts: Optional[float] = None)[source]¶

Write audio data

引數:

i (int) – Stream index.
chunk (Tensor) – Waveform tensor. Shape: (frame, channel). The dtype must match what was passed to add_audio_stream() method.
pts (float, optional, or None) –
If provided, overwrite the presentation timestamp.

注意

The provided value is converted to integer value expressed in basis of sample rate. Therefore, it is truncated to the nearest value of n / sample_rate.

write_video_chunk¶

StreamingMediaEncoder.write_video_chunk(i: int, chunk: Tensor, pts: Optional[float] = None)[source]¶

Write video/image data

引數:

i (int) – Stream index.
chunk (Tensor) – Video/image tensor. Shape: (time, channel, height, width). The dtype must be torch.uint8. The shape (height, width and the number of channels) must match what was configured when calling add_video_stream()
pts (float, optional or None) –
If provided, overwrite the presentation timestamp.

注意

The provided value is converted to integer value expressed in basis of frame rate. Therefore, it is truncated to the nearest value of n / frame_rate.

Support Structures¶

CodecConfig¶

class torio.io.CodecConfig(bit_rate: int = -1, compression_level: int = -1, qscale: Optional[int] = None, gop_size: int = -1, max_b_frames: int = -1)[source]¶

Codec configuration.

bit_rate: int = -1¶: Bit rate

compression_level: int = -1¶: Compression level

qscale: Optional[int] = None¶

Global quality factor. Enables variable bit rate. Valid values depend on encoder.

For example: MP3 takes 0 - 9 (https://trac.ffmpeg.org/wiki/Encode/MP3) while libvorbis takes -1 - 10.

gop_size: int = -1¶: The number of pictures in a group of pictures, or 0 for intra_only

max_b_frames: int =-1¶: maximum number of B-frames between non-B-frames.

StreamingMediaEncoder¶

方法¶

add_audio_stream¶

add_video_stream¶

close¶

flush¶

open¶

set_metadata¶

write_audio_chunk¶

write_video_chunk¶

Support Structures¶

CodecConfig¶

文件

教程

資源