何でも音源分離 AudioSep を試してみる【Python】

2023.11.10

はじめに
前提条件
必要なライブラリのインストール
AudioSep の準備
音源分離の実行
おわりに

はじめに

前回は 4GB VRAM で動作する画像生成AI invokeAI について説明しました。

今回は、何でも音源分離できる AudioSep を試してみます。

github はこちらです。

前提条件

前提条件は以下の通りです。

Python3.10.6
CUDA == 11.7
Windows11

必要なライブラリのインストール

requirements.txt を用意しました。

aiohttp==3.8.6
aiosignal==1.3.1
ansicon==1.89.0
anyio==4.0.0
arrow==1.3.0
async-timeout==4.0.3
attrs==23.1.0
audioread==3.0.1
beautifulsoup4==4.12.2
blessed==1.20.0
boto3==1.28.72
botocore==1.31.72
braceexpand==0.1.7
certifi==2023.7.22
cffi==1.16.0
charset-normalizer==3.3.1
click==8.1.7
colorama==0.4.6
croniter==1.3.15
dateutils==0.6.12
decorator==5.1.1
deepdiff==6.6.1
exceptiongroup==1.1.3
fastapi==0.88.0
filelock==3.12.4
frozenlist==1.4.0
fsspec==2023.10.0
ftfy==6.1.1
h11==0.14.0
h5py==3.8.0
huggingface-hub==0.18.0
idna==3.4
inquirer==3.1.3
itsdangerous==2.1.2
Jinja2==3.1.2
jinxed==1.2.0
jmespath==1.0.1
joblib==1.3.2
lazy_loader==0.3
librosa==0.10.1
lightning==2.0.0
lightning-cloud==0.5.44
lightning-utilities==0.9.0
llvmlite==0.41.1
markdown-it-py==3.0.0
MarkupSafe==2.1.3
mdurl==0.1.2
msgpack==1.0.7
multidict==6.0.4
numba==0.58.1
numpy==1.26.1
ordered-set==4.1.0
packaging==23.2
pandas==1.5.3
Pillow==10.1.0
platformdirs==3.11.0
pooch==1.8.0
psutil==5.9.6
pycparser==2.21
pydantic==1.10.13
Pygments==2.16.1
PyJWT==2.8.0
python-dateutil==2.8.2
python-editor==1.0.4
python-multipart==0.0.6
pytorch-lightning==2.0.1
pytz==2023.3.post1
PyYAML==6.0.1
readchar==4.0.5
regex==2023.10.3
requests==2.31.0
rich==13.6.0
s3transfer==0.7.0
scikit-learn==1.3.2
scipy==1.11.3
six==1.16.0
sniffio==1.3.0
soundfile==0.12.1
soupsieve==2.5
soxr==0.3.7
starlette==0.22.0
starsessions==1.3.0
threadpoolctl==3.2.0
tokenizers==0.13.3
torch==1.13.1+cu117
torchaudio==0.13.1+cu117
torchlibrosa==0.1.0
torchmetrics==1.2.0
torchvision==0.14.1+cu117
tqdm==4.66.1
traitlets==5.12.0
transformers==4.28.1
types-python-dateutil==2.8.19.14
typing_extensions==4.8.0
urllib3==2.0.7
uvicorn==0.23.2
wcwidth==0.2.8
webdataset==0.2.48
websocket-client==1.6.4
websockets==11.0.3
wget==3.2
yarl==1.9.2

AudioSep の準備

続いて、ファイル、チェックポイントを準備していきます。

git clone https://github.com/Audio-AGI/AudioSep.git
cd AudioSep
mkdir checkpoint

今作成した checkpoint フォルダに、こちらのモデルを保存してください。

これで、準備完了です。

音源分離の実行

do_separate.py ファイルを用意します。

from pipeline import build_audiosep, inference
import torch

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

model = build_audiosep(
      config_yaml='config/audiosep_base.yaml', 
      checkpoint_path='checkpoint/audiosep_base_4M_steps.ckpt', 
      device=device)

audio_file = 'source/test.mp3'
text = "human voice"
output_file='separated_audio.wav'

# AudioSep processes the audio at 32 kHz sampling rate  
inference(model, audio_file, text, output_file, device)

上記プログラムの