Music and Audio Generation

Music and Audio Generation

We will use LSTM (Long Short-Term Memory) networks in PyTorch to build a simple model for making music in this guide. We'll take a set of MIDI files, clean them up, teach the model, and then make new music.

Step 1: Setup and Import Necessary Libraries

First, we need to get the packages we need and install them.

# Install necessary libraries
!pip install torch music21 numpy

# Import libraries
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
from music21 import converter, instrument, note, chord, stream
import glob
import os

We use pip to set up PyTorch, music21, and numpy.

We bring in PyTorch and its tools for neural networks and optimizers.

We use numpy to do tasks with numbers.

We bring in music21 to work with MIDI files and do jobs related to music theory.

Step 2: Preprocess the Data

We need to load the MIDI files, take out the sounds and notes, and get the data ready for training.

# Function to get notes and chords from MIDI files
def get_notes(midi_files):
    notes = []
    for file in midi_files:
        midi = converter.parse(file)
        notes_to_parse = None
        parts = instrument.partitionByInstrument(midi)
        if parts:  # File has instrument parts
            notes_to_parse = parts.parts[0].recurse()
        else:  # File has flat notes
            notes_to_parse = midi.flat.notes
        
        for element in notes_to_parse:
            if isinstance(element, note.Note):
                notes.append(str(element.pitch))
            elif isinstance(element, chord.Chord):
                notes.append('.'.join(str(n) for n in element.normalOrder))
    return notes

# Load the dataset
midi_files = glob.glob('path_to_midi_files/*.mid')
notes = get_notes(midi_files)

# Save unique notes
unique_notes = sorted(set(notes))

# Create a mapping from notes to integers
note_to_int = {note: num for num, note in enumerate(unique_notes)}

# Prepare the sequences used by the Neural Network
sequence_length = 100
network_input = []
network_output = []

for i in range(len(notes) - sequence_length):
    seq_in = notes[i:i + sequence_length]
    seq_out = notes[i + sequence_length]
    network_input.append([note_to_int[char] for char in seq_in])
    network_output.append(note_to_int[seq_out])

n_patterns = len(network_input)

# Reshape and normalize the input
network_input = np.reshape(network_input, (n_patterns, sequence_length, 1))
network_input = network_input / float(len(unique_notes))
network_output = np.array(network_output)

To get notes and chords from MIDI files, we define the get_notes method.

We tell it where to find the MIDI files in order to load the dataset.

We take notes and chords out of the MIDI files and save them as separate notes.

We make a connection between notes and numbers.

The input and output patterns that the neural network will use are set up.

The raw data is reshaped and normalized.

Step 3: Define the LSTM Model

We are going to set up an LSTM model for making music.

# Define the LSTM Model
class MusicLSTM(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(MusicLSTM, self).__init__()
        self.hidden_size = hidden_size
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers=2, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        h_0 = torch.zeros(2, x.size(0), self.hidden_size).to(device)
        c_0 = torch.zeros(2, x.size(0), self.hidden_size).to(device)
        out, _ = self.lstm(x, (h_0, c_0))
        out = self.fc(out[:, -1, :])
        return out

# Initialize the model
input_size = 1
hidden_size = 256
output_size = len(unique_notes)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = MusicLSTM(input_size, hidden_size, output_size).to(device)

We give the MusicLSTM class sizes for input, secret, and output.

The LSTM is made up of two layers and then a fully linked layer.

We set the model's input size, secret size, and output size at the beginning.

It is moved to the device (GPU if possible).

Step 4: Train the Model

We are going to set up the planner and loss function and teach the model.

# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Prepare data for training
train_x = torch.tensor(network_input, dtype=torch.float32).to(device)
train_y = torch.tensor(network_output, dtype=torch.long).to(device)

# Train the model
num_epochs = 200
for epoch in range(num_epochs):
    model.train()
    optimizer.zero_grad()
    output = model(train_x)
    loss = criterion(output, train_y)
    loss.backward()
    optimizer.step()
    
    if (epoch + 1) % 10 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

Cross-entropy loss is how we describe the loss function.

Adam with a learning rate of 0.001 is our choice for the planner.

To get the training data ready, we change it into PyTorch tensors and move it to the device.

For a certain number of epochs, we train the model and print the loss every 10 epochs.

Step 5: Generate New Music

We will use the trained model to make new songs.

# Function to generate new music
def generate_music(model, start_sequence, int_to_note, num_notes=500):
    model.eval()
    generated = []
    sequence = start_sequence

    for _ in range(num_notes):
        sequence_input = torch.tensor(sequence, dtype=torch.float32).unsqueeze(0).to(device)
        prediction = model(sequence_input)
        _, top_note = torch.max(prediction, 1)
        generated_note = top_note.item()
        
        sequence = np.append(sequence[1:], generated_note)
        generated.append(int_to_note[generated_note])

    return generated

# Map integers back to notes
int_to_note = {num: note for num, note in enumerate(unique_notes)}

# Generate new music
start_sequence = network_input[0]
generated_notes = generate_music(model, start_sequence, int_to_note)

# Convert generated notes to MIDI
output_notes = []
for pattern in generated_notes:
    if ('.' in pattern) or pattern.isdigit():
        chord_notes = pattern.split('.')
        notes = [note.Note(int(n)) for n in chord_notes]
        new_chord = chord.Chord(notes)
        output_notes.append(new_chord)
    else:
        new_note = note.Note(pattern)
        output_notes.append(new_note)

output_midi = stream.Stream(output_notes)
output_midi.write('midi', fp='generated_music.mid')

We set up the generate_music method to use the learned model to make new music.

We connect numbers to notes.

We start making new songs with a given order.

The notes that were made are saved as a MIDI file.

logo

Generative AI

Music and Audio Generation

Beginner 5 Hours

Music and Audio Generation

We will use LSTM (Long Short-Term Memory) networks in PyTorch to build a simple model for making music in this guide. We'll take a set of MIDI files, clean them up, teach the model, and then make new music.

Step 1: Setup and Import Necessary Libraries

First, we need to get the packages we need and install them.

# Install necessary libraries
!pip install torch music21 numpy

# Import libraries
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
from music21 import converter, instrument, note, chord, stream
import glob
import os

We use pip to set up PyTorch, music21, and numpy.

We bring in PyTorch and its tools for neural networks and optimizers.

We use numpy to do tasks with numbers.

We bring in music21 to work with MIDI files and do jobs related to music theory.

Step 2: Preprocess the Data

We need to load the MIDI files, take out the sounds and notes, and get the data ready for training.

# Function to get notes and chords from MIDI files
def get_notes(midi_files):
    notes = []
    for file in midi_files:
        midi = converter.parse(file)
        notes_to_parse = None
        parts = instrument.partitionByInstrument(midi)
        if parts:  # File has instrument parts
            notes_to_parse = parts.parts[0].recurse()
        else:  # File has flat notes
            notes_to_parse = midi.flat.notes
        
        for element in notes_to_parse:
            if isinstance(element, note.Note):
                notes.append(str(element.pitch))
            elif isinstance(element, chord.Chord):
                notes.append('.'.join(str(n) for n in element.normalOrder))
    return notes

# Load the dataset
midi_files = glob.glob('path_to_midi_files/*.mid')
notes = get_notes(midi_files)

# Save unique notes
unique_notes = sorted(set(notes))

# Create a mapping from notes to integers
note_to_int = {note: num for num, note in enumerate(unique_notes)}

# Prepare the sequences used by the Neural Network
sequence_length = 100
network_input = []
network_output = []

for i in range(len(notes) - sequence_length):
    seq_in = notes[i:i + sequence_length]
    seq_out = notes[i + sequence_length]
    network_input.append([note_to_int[char] for char in seq_in])
    network_output.append(note_to_int[seq_out])

n_patterns = len(network_input)

# Reshape and normalize the input
network_input = np.reshape(network_input, (n_patterns, sequence_length, 1))
network_input = network_input / float(len(unique_notes))
network_output = np.array(network_output)

To get notes and chords from MIDI files, we define the get_notes method.

We tell it where to find the MIDI files in order to load the dataset.

We take notes and chords out of the MIDI files and save them as separate notes.

We make a connection between notes and numbers.

The input and output patterns that the neural network will use are set up.

The raw data is reshaped and normalized.

Step 3: Define the LSTM Model

We are going to set up an LSTM model for making music.

# Define the LSTM Model
class MusicLSTM(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(MusicLSTM, self).__init__()
        self.hidden_size = hidden_size
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers=2, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        h_0 = torch.zeros(2, x.size(0), self.hidden_size).to(device)
        c_0 = torch.zeros(2, x.size(0), self.hidden_size).to(device)
        out, _ = self.lstm(x, (h_0, c_0))
        out = self.fc(out[:, -1, :])
        return out

# Initialize the model
input_size = 1
hidden_size = 256
output_size = len(unique_notes)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = MusicLSTM(input_size, hidden_size, output_size).to(device)

We give the MusicLSTM class sizes for input, secret, and output.

The LSTM is made up of two layers and then a fully linked layer.

We set the model's input size, secret size, and output size at the beginning.

It is moved to the device (GPU if possible).

Step 4: Train the Model

We are going to set up the planner and loss function and teach the model.

# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Prepare data for training
train_x = torch.tensor(network_input, dtype=torch.float32).to(device)
train_y = torch.tensor(network_output, dtype=torch.long).to(device)

# Train the model
num_epochs = 200
for epoch in range(num_epochs):
    model.train()
    optimizer.zero_grad()
    output = model(train_x)
    loss = criterion(output, train_y)
    loss.backward()
    optimizer.step()
    
    if (epoch + 1) % 10 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

Cross-entropy loss is how we describe the loss function.

Adam with a learning rate of 0.001 is our choice for the planner.

To get the training data ready, we change it into PyTorch tensors and move it to the device.

For a certain number of epochs, we train the model and print the loss every 10 epochs.

Step 5: Generate New Music

We will use the trained model to make new songs.

# Function to generate new music
def generate_music(model, start_sequence, int_to_note, num_notes=500):
    model.eval()
    generated = []
    sequence = start_sequence

    for _ in range(num_notes):
        sequence_input = torch.tensor(sequence, dtype=torch.float32).unsqueeze(0).to(device)
        prediction = model(sequence_input)
        _, top_note = torch.max(prediction, 1)
        generated_note = top_note.item()
        
        sequence = np.append(sequence[1:], generated_note)
        generated.append(int_to_note[generated_note])

    return generated

# Map integers back to notes
int_to_note = {num: note for num, note in enumerate(unique_notes)}

# Generate new music
start_sequence = network_input[0]
generated_notes = generate_music(model, start_sequence, int_to_note)

# Convert generated notes to MIDI
output_notes = []
for pattern in generated_notes:
    if ('.' in pattern) or pattern.isdigit():
        chord_notes = pattern.split('.')
        notes = [note.Note(int(n)) for n in chord_notes]
        new_chord = chord.Chord(notes)
        output_notes.append(new_chord)
    else:
        new_note = note.Note(pattern)
        output_notes.append(new_note)

output_midi = stream.Stream(output_notes)
output_midi.write('midi', fp='generated_music.mid')

We set up the generate_music method to use the learned model to make new music.

We connect numbers to notes.

We start making new songs with a given order.

The notes that were made are saved as a MIDI file.

Frequently Asked Questions for generative-ai

line

Copyrights © 2024 letsupdateskills All rights reserved