Waveform Studio Workbench


Table of Contents

nGene Media Player

Development Consultation

  1. Media Format and Codec Overview
  2. Meta Information Extraction (Audio and Video)
  3. Design and UX Improvements for Desktop

Script

Editing Videos

Trimming Videos with FFmpeg or Graphical Editors on Windows and macOS, and Resolving Homebrew PATH Issues

➥ ★Trimming Videos on macOS with FFmpeg


Meta Information

Python Script for BPM & Tempo Extraction from Multiple M4A Files (Written May 18, 2025)

Python Script for BPM & Tempo Extraction from Multiple Media Files (Written June 21, 2025)


Mathematical Modes

Summing Audio Tracks in Logic Pro (Written May 31, 2025)

Digital waveform amplitude & bidirectional dynamics (Written May 31, 2025)

Perceptual loudness normalization for multitrack mixing (Written June 7, 2025)

Bit depth and sample rate in digital audio (Written June 7, 2025)

Logarithmic perception of pitch and loudness in human hearing (Written June 7, 2025)

The mathematical foundations of musical harmony (Written June 8, 2025)



Guide to nGene Media Player v 2.4

TopicDetails
Purpose Self-contained, resizable HTML5 player for audio (MP3/M4A) and video (MP4/MOV/WEBM). Pure vanilla JS—no frameworks required.
New since v 1.8: tempo-aware track-list showing BPM (integer-rounded), auto-loading from tempo_meta.json; initial volume defaults to 17 % at page-load.
File locations Place nmp.html anywhere.
Media files live in a sibling /media/ folder.
Ensure readable permissions with chmod 644 *.
Playlist Optional /media/playlist.json—an array of paths (order preserved). If absent, the player simply waits for user uploads.
Tempo metadata Run extract_meta_from_media.py v 2.4 to generate tempo_meta.json (single integer-rounded bpm). Player displays it beside each track and in the title-bar as “### BPM”.
Uploads Upload button and drag-&-drop. Files become blob-URLs, so nothing is written to disk.
First-30-second attention cue Uploader border, hint-text and container gently pulse, glow and scale every 2 s for the first 30 s after page-load.
A-B Looping Seek-bar sports two cerulean “brackets”:
A handle “[” — left edge marks loop-start.
B handle “]” — right edge marks loop-end.
Drag to set; ultramarine bar fills the loop range. ✖ Clear button instantly resets the loop.
Click-to-toggle video Click anywhere on the visible video to play/pause; the ⏸︎/▶︎ button stays synchronised.
Autoplay The first track auto-starts; subsequent behaviour follows Repeat Mode.
Repeat Mode Begins at 🔂 One (loop current). Button cycles: 🔂 One → 🔁 All → 🔁 Off.
Controls ⏮︎ Prev • ⏸︎/▶︎ Toggle • ⏭︎ Next • Repeat — plus ✖ Loop-Clear beside the seek-bar.
Seek & Time Sleek seek-bar with live “elapsed / total” timer, integrated A-B loop handles and ultramarine fill.
Volume Smooth 0–100 % slider with live percentage label; initial default 17 % (0.17).
Speed 0.70× – 2.00× slider with − / + step buttons and 1× reset. Applies to audio & video.
Resizable wrapper Outer .wrapper uses resize:both; default width governed by --w (360 px). Track-list is vertically resizable.
Accent colour Edit --accent (default #1e90ff) to rebrand buttons, slider thumbs, active-track row and uploader pulse.
Source-code reveal Built-in “Full Source Code” accordion shows the entire page, syntax-highlighted via Highlight.js.
Namespace All logic wrapped in an IIFE; CSS uses local class names—safe to embed anywhere.

Guide to nGene Media Player v 1.8 (c)

TopicDetails
Purpose Self‑contained, resizable HTML5 player for audio (MP3/M4A) and video (MP4/MOV/WEBM). Pure vanilla JS—no frameworks.
New since v 1.6 (c): draggable cerulean‑blue “bracket” handles for precise A‑B looping, ultramarine loop‑fill, and click‑to‑toggle playback directly on the video surface.
File locations Place nmp.html anywhere.
Media files live in a sibling /media/ folder.
Ensure readable permissions with chmod 644 *.
Playlist Optional /media/playlist.json—an array of paths (order preserved). If absent, the player simply waits for user uploads.
Uploads ➕ Upload button and drag‑&‑drop. Files become blob‑URLs, so nothing is written to disk.
First‑30‑second attention cue Uploader border, hint‑text and container gently pulse, glow and scale every 2 s for the first 30 s after page‑load.
A‑B Looping (1.8 series) Seek‑bar sports two cerulean “brackets”:
A handle “[” — left edge marks loop‑start.
B handle “]” — right edge marks loop‑end.
Drag to set; ultramarine bar fills the loop range. ✖ Clear button instantly resets the loop.
Click‑to‑toggle video Click anywhere on the visible video to play/pause; the ⏸︎/▶︎ button stays synchronised.
Autoplay The first track auto‑starts; subsequent behaviour follows Repeat Mode.
Repeat Mode (default) Begins at 🔂 One (loop current). Button cycles: 🔂 One → 🔁 All → 🔁 Off.
Controls ⏮︎ Prev • ⏸︎/▶︎ Toggle • ⏭︎ Next • Repeat — plus ✖ Loop‑Clear beside the seek‑bar.
Seek & Time Sleek seek‑bar with live “elapsed / total” timer. Integrates A‑B loop handles and ultramarine fill described above.
Volume Smooth 0–100 % slider with live percentage label.
Resizable wrapper Outer .wrapper uses resize:both; default width governed by --w (360 px). Track‑list is vertically resizable.
Accent colour Edit --accent (default #1e90ff) to rebrand buttons, slider thumbs, active‑track row and uploader pulse.
Source‑code reveal Built‑in “Full Source Code” accordion shows the entire page, syntax‑highlighted via Highlight.js.
Namespace All logic wrapped in an IIFE; CSS uses local class names—safe to embed anywhere.

Media Format and Codec Overview

Modern media players should support a variety of audio and video file formats. Below is an overview of commonly used formats, including their typical use cases, compatibility considerations, licensing issues, technical notes, and recommendations for use. Emphasis is placed on desktop and HTML5/JavaScript environments.

Common Audio Formats

MP3 (MPEG Audio Layer III)

AAC / M4A (Advanced Audio Coding)

Ogg Vorbis (and Opus)

FLAC (Free Lossless Audio Codec)

WAV (Waveform Audio File Format / PCM)

Common Video Formats

MP4 (H.264 Video in MP4 Container)

WebM (VP8/VP9 Video in WebM Container)

AV1 (Next-Generation Open Video Codec)

MKV (Matroska Video Container)

AVI (Audio Video Interleave)

MOV (QuickTime File Format)

Recommended Default Formats: Considering the above, for broadest compatibility and ease of use in a web-based desktop player, the recommended default formats are MP3 for audio and MP4 (H.264/AAC) for video. These two cover nearly all browsers and platforms with no special setup. In practice, this means the player should primarily handle MP3 for music and MP4 for video. However, to make nGene Media Player more robust and appealing, it should also support the common alternatives: including AAC (M4A) ensures high-quality audio support, Ogg Vorbis/Opus provides open-format options, and FLAC allows for lossless audio playback. On the video side, adding support for WebM (VP8/VP9) is advisable for modern browsers, and being mindful of AV1 will keep the player up-to-date with emerging standards. Less common or legacy formats like MKV, AVI, and MOV can be acknowledged, but the strategy should be to handle them via conversion or not at all, rather than as primary supported formats. By focusing on MP3 and MP4 as the core, and supplementing with the next tier of formats, the player will cater to most use cases while maintaining reliability.

Written on March 9, 2025


Meta Information Extraction (Audio and Video)

A media player like nGene Media Player not only plays audio and video but often also presents information about the media to the user. This includes basic details (duration, title) and possibly more advanced metadata (like album name, video resolution, etc.). Below, we outline what metadata can be obtained from media files and discuss methods to extract this information using web technologies (JavaScript in the browser) and Python (which could be used server-side or via PyScript in-browser). We also provide guidance on when to use client-side vs. server-side (or local) analysis based on the depth of metadata required.

Types of Media Metadata

Most of the above metadata can be accessed or computed with the right tools. The next sections describe how to retrieve these details using JavaScript in the browser and using Python, respectively.

Client-Side JavaScript Methods

In a purely browser-based environment (vanilla JavaScript), one can extract a subset of the above information. The HTML5 media elements and additional libraries are the primary means to do so:

Using the above methods, a web-based media player can gather a wealth of information without leaving the browser. For instance, on loading a file, the player could immediately display the duration via the duration property, show the title/artist by parsing tags with music-metadata, show the resolution via videoWidth/Height , and perhaps generate a waveform preview using Web Audio – all done client-side. The main constraints are performance (very large files or very detailed analysis can be slow) and the necessity to include libraries or WASM modules (increasing app size). When extremely detailed info or heavy computation is needed, one might then consider Python or server-side tools, as described next.

Python and PyScript Approaches

Python has a rich ecosystem for media processing, and it can be used in two ways: on a backend server (or a local machine, outside the browser) to preprocess or analyze media, or via PyScript/WebAssembly to run Python code in the browser. Here we outline how Python libraries can extract metadata and do deeper analysis, and how that might fit into the architecture of the media player.

Architectural Considerations

When implementing metadata extraction in nGene Media Player, it’s important to choose the right tool for the job to provide a good user experience without unnecessary overhead. Here are some guidelines on when to use client-side JS vs. Python/back-end solutions:

In conclusion, the strategy for metadata should match the needs of the user base and the resources available. For a relatively small-scale or personal project, sticking to client-side solutions keeps things simple and respects user privacy. For a larger-scale application with many users and files, investing in backend services for richer metadata could greatly enhance the user experience. nGene Media Player can start by extracting what’s easy (duration, basic tags via JS) and progressively incorporate more advanced metadata features using Python tools as needed, ensuring that the architecture remains flexible for such upgrades.

Written on March 9, 2025


Design and UX Improvements for Desktop

With the functionality in place, attention turns to improving the user interface and experience of nGene Media Player. A desktop-focused web media player should leverage the larger screen and input options (mouse, keyboard) to provide an engaging and efficient experience. Below are suggestions for design and UX enhancements, organized into layout/visual improvements, interaction improvements, and the use of modern libraries to add polish. The tone of these suggestions is to enhance usability and aesthetics in a professional, subtle way without overwhelming the user.

Enhanced Layout and Visualizations

Improved User Interaction

Modern UI Libraries and Frameworks

By implementing these design and UX improvements, nGene Media Player will not only be functionally robust but also user-friendly and visually appealing. It will feel like a modern desktop application, with responsive controls, rich visuals like waveforms, and thoughtful details (like shortcuts and drag-drop) that desktop users appreciate. The use of web technologies and libraries means the player can achieve a high level of polish comparable to native apps, while remaining customizable and lightweight. As always, incremental enhancement is wise: features can be added step by step, gathering user feedback to refine the UX. Over time, these improvements can significantly elevate the user’s enjoyment and efficiency when using the media player, fulfilling the goal of a comprehensive and professional media playback experience.

Written on May 9, 2025


Script


Editing Videos


Trimming Videos with FFmpeg or Graphical Editors on Windows and macOS, and Resolving Homebrew PATH Issues

This guide provides a structured approach to trimming a specific section from a video using either FFmpeg or a graphical editor on Windows and macOS. It explains how to install and use FFmpeg, explores alternative editing methods, and offers troubleshooting steps to resolve PATH issues when installing FFmpeg via Homebrew on macOS. Every step and consideration is presented to ensure a smooth and professional workflow.

Trimming Videos with FFmpeg

  1. Installing FFmpeg

    FFmpeg is a free, open-source tool that supports a wide range of audio and video operations. It is available on both Windows and macOS.

    Platform Installation Steps
    Windows
    1. Download the latest release from ffmpeg.org.
    2. Extract or install the package.
    3. (Optional) Add the FFmpeg bin folder to the system’s PATH for easier command-line usage.
    macOS
    1. Use Homebrew if installed, by running:
      brew install ffmpeg
    2. Or manually download from ffmpeg.org.
    3. Ensure that the FFmpeg directory is included in the system’s PATH.

    Note: If Homebrew is used on Apple Silicon (M1/M2) Macs, binaries often reside in /opt/homebrew/bin. On Intel Macs, they often reside in /usr/local/bin.

  2. Basic Command to Trim a Video

    Once FFmpeg is installed, the following command trims a segment from abc.mp4—starting at 00:00:49 and ending at 00:04:41—and saves the trimmed content into abc_edited.mp4:

    ffmpeg -i abc.mp4 -ss 00:00:49 -to 00:04:41 -c copy abc_edited.mp4
    • -i abc.mp4: Specifies the input file.
    • -ss 00:00:49: Sets the start time at 49 seconds.
    • -to 00:04:41: Sets the end time at 4 minutes, 41 seconds.
    • -c copy: Performs a “stream copy,” preserving the original quality without re-encoding.
    • abc_edited.mp4: Specifies the output filename.
  3. Additional FFmpeg Usage Notes

    Feature Stream Copy Re-encoding
    Quality Original (No quality loss) May degrade slightly depending on settings
    Speed Very fast (no compression needed) Slower (requires processing and compression)
    Editing Limited to cutting/trimming Flexible (supports format conversion, resizing, etc.)
    Command Example -c copy -c:v libx264 -c:a aac (or other codecs)
    • Re-encoding:
      • Removing -c copy and specifying codecs (e.g., -c:v libx264 -c:a aac) will force FFmpeg to re-encode:
        ffmpeg -i abc.mp4 -ss 00:00:49 -to 00:04:41 -c:v libx264 -c:a aac abc_edited.mp4
      • Re-encoding may reduce quality and increase processing time, but it allows changing codecs or formats.
    • Keyframe Alignment:
      • Using -ss before -i can cause FFmpeg to seek to the nearest keyframe, which occasionally introduces slight timing differences. If necessary, experiment with placing -ss either before or after -i:
        ffmpeg -ss 00:00:49 -i abc.mp4 -to 00:04:41 -c copy abc_edited.mp4
      • In most scenarios, -ss after -i is sufficient with -c copy.

Trimming Videos Using Graphical Editors

Although FFmpeg is command-line based, some may prefer graphical methods. These editors typically re-encode video, which can take longer and potentially reduce quality, but they offer an intuitive visual interface.

  1. Windows (Built-In Photos/Video Editor on Windows 10/11)

    1. Right-click abc.mp4 and select Open with → Photos (or Video Editor).
    2. Choose the Trim option (in Photos) or create a New video project (in Video Editor).
    3. Drag the slider handles to select content from 00:49 to 04:41.
    4. Save the trimmed section by choosing Save a copy (in Photos) or Finish video (in Video Editor).
    5. Export the result as abc_edited.mp4.
  2. macOS (Using iMovie)

    1. Launch iMovie and import abc.mp4 into a new or existing project timeline.
    2. Position the playhead at 00:49 and use Command + B (or the Split Clip command) to split the clip.
    3. Repeat at 00:04:41 to isolate the desired segment.
    4. Delete content before 00:49 and after 04:41.
    5. Go to File → Share → File to export the trimmed result as abc_edited.mp4.

Troubleshooting FFmpeg PATH Issues on macOS with Homebrew

Occasionally, macOS users who install FFmpeg via Homebrew experience “command not found” errors. This typically indicates that the shell cannot locate the installed FFmpeg binary, often due to PATH misconfiguration.

  1. Verifying FFmpeg Installation

    brew list ffmpeg

    or

    brew info ffmpeg

    These commands display details about the FFmpeg package. If no information appears, consider reinstalling:

    brew reinstall ffmpeg
  2. Locating the FFmpeg Binary

    Homebrew generally installs software in one of the following directories:

    • Apple Silicon (M1/M2): /opt/homebrew/bin
    • Intel Macs: /usr/local/bin

    To confirm the exact location of FFmpeg, run:

    find "$(brew --prefix)" -name ffmpeg -type f

    This command returns the full path to the installed ffmpeg binary (for example, /opt/homebrew/bin/ffmpeg).

  3. Checking the PATH Environment Variable

    To see if the correct installation directory is in the PATH, run:

    echo $PATH

    If /opt/homebrew/bin (Apple Silicon) or /usr/local/bin (Intel) is absent, the shell will not be able to locate FFmpeg.

  4. Updating the PATH

    Apple Silicon (M1/M2) Macs
    If /opt/homebrew/bin is missing, add the following line to the shell configuration file (e.g., ~/.zshrc), then reload:

    export PATH="/opt/homebrew/bin:$PATH"
    source ~/.zshrc

    Intel Macs
    If /usr/local/bin is missing (uncommon, but possible), add the following line to the shell configuration file (e.g., ~/.zshrc or ~/.bash_profile):

    export PATH="/usr/local/bin:$PATH"
    source ~/.zshrc   # or source ~/.bash_profile if using bash

    Running ffmpeg -version afterward verifies a successful configuration.

Written on February 12, 2025


Trimming Videos on macOS with FFmpeg

FFmpeg is a powerful, open‐source multimedia framework capable of handling a wide range of video and audio operations. On macOS, it provides an efficient way to trim, concatenate, and re‐encode video clips via command‐line instructions. This guide focuses on installing and configuring FFmpeg on macOS, trimming videos (both single and multiple segments), and verifying the tool’s installation path.

Installing and Configuring FFmpeg on macOS

  1. Homebrew Installation (Recommended)

    1. Install Homebrew if it is not already present. Instructions are available at https://brew.sh/.
    2. Install FFmpeg using Homebrew:
      brew install ffmpeg
    3. Default Homebrew Paths
      • Apple Silicon (M1/M2): /opt/homebrew/bin
      • Intel-based Macs: /usr/local/bin
  2. Manual Download (Alternative)

    1. Download a macOS build of FFmpeg from https://ffmpeg.org/download.html.
    2. Unzip or install the downloaded package according to the official instructions.
    3. Optionally, move the ffmpeg binary to a convenient directory, such as ~/ffmpeg.
  3. Adding FFmpeg to the PATH

    1. Edit the relevant shell configuration file (e.g., ~/.zshrc or ~/.bash_profile).
    2. Append one of the following lines, depending on system architecture:
      • Apple Silicon (M1/M2):
        echo 'export PATH="/opt/homebrew/bin:$PATH"' >> ~/.zshrc
        source ~/.zshrc
      • Intel Macs:
        echo 'export PATH="/usr/local/bin:$PATH"' >> ~/.bash_profile
        source ~/.bash_profile
    3. Confirm the installation and path configuration:
      ffmpeg -version
      A successful output indicates FFmpeg is correctly installed and accessible.
  4. Locating the FFmpeg Installation

    In some cases, it may be necessary to confirm the exact path where FFmpeg has been installed (for example, when configuring external tools or diagnosing “command not found” errors). The following command uses Homebrew’s prefix to locate the ffmpeg binary:

    find "$(brew --prefix)" -name ffmpeg -type f
    • brew --prefix returns the base directory where Homebrew is installed. On Apple Silicon systems, this is commonly /opt/homebrew; on Intel-based Macs, /usr/local.
    • Substitution with $(...) instructs the shell to execute brew --prefix and insert that output into the find command.
    • find "$(brew --prefix)" -name ffmpeg -type f searches all subdirectories under Homebrew’s prefix for any file named ffmpeg, restricting results to regular files (-type f).
    • Outcome: Provides the precise file path to the FFmpeg binary, enabling verification or troubleshooting of installation and PATH issues.

Trimming a Single Segment with FFmpeg

Trimming one continuous portion of a video is simple using -ss (start time), -to (end time), and -c copy (stream copy). Stream copy avoids re-encoding, preserving original quality and saving time.

ffmpeg -i abc.mp4 -ss 00:00:49 -to 00:04:41 -c copy abc_edited.mp4

Trimming Multiple Segments from One File

When multiple non-contiguous sections of a video need to be combined into a single output, there are two primary approaches:

  1. Approach A: Concat Demuxer (Two-Step Process)

    Step A: Extract Each Desired Segment

    Assume three segments are required from abc.mp4:

    • 2:03 to 3:12
    • 3:40 to 4:03
    • 5:02 to 5:55
    # Segment 1: 2:03–3:12
    ffmpeg -i abc.mp4 -ss 00:02:03 -to 00:03:12 -c copy part1.mp4
    
    # Segment 2: 3:40–4:03
    ffmpeg -i abc.mp4 -ss 00:03:40 -to 00:04:03 -c copy part2.mp4
    
    # Segment 3: 5:02–5:55
    ffmpeg -i abc.mp4 -ss 00:05:02 -to 00:05:55 -c copy part3.mp4
    Step B: Concatenate the Segments
    1. Create a text file (e.g., mylist.txt) with each extracted segment in order:
      file 'part1.mp4'
      file 'part2.mp4'
      file 'part3.mp4'
    2. Run FFmpeg with the concat demuxer:
      ffmpeg -f concat -safe 0 -i mylist.txt -c copy abc_edited.mp4
      • -f concat: Uses the concat demuxer.
      • -safe 0: Allows absolute or relative paths in mylist.txt.
      • -c copy: Maintains source quality by copying streams without re-encoding.

    Note: This two-step method is fast and lossless but requires creating multiple intermediate files.

  2. Approach B: Filter Complex (Single Command)

    For a one-step method or when advanced processing (like overlays, resizing, or format changes) is needed, FFmpeg’s filter_complex can be used. This process involves re-encoding:

    ffmpeg -i abc.mp4 \
      -filter_complex "
        [0:v]trim=start=123:end=192,setpts=PTS-STARTPTS[v0];
        [0:a]atrim=start=123:end=192,asetpts=PTS-STARTPTS[a0];
        [0:v]trim=start=220:end=243,setpts=PTS-STARTPTS[v1];
        [0:a]atrim=start=220:end=243,asetpts=PTS-STARTPTS[a1];
        [0:v]trim=start=302:end=355,setpts=PTS-STARTPTS[v2];
        [0:a]atrim=start=302:end=355,asetpts=PTS-STARTPTS[a2];
        [v0][a0][v1][a1][v2][a2]concat=n=3:v=1:a=1[v][a]
      " \
      -map "[v]" -map "[a]" \
      -c:v libx264 -c:a aac -crf 18 -preset veryfast abc_edited.mp4
    • trim / atrim: Select specified time ranges for video/audio.
    • setpts / asetpts: Reset timestamps for seamless concatenation.
    • concat=n=3: Concatenates three segments.
    • -c:v libx264 -c:a aac: Encodes video with H.264 and audio with AAC.
    • -crf 18 -preset veryfast: Manages output quality and encoding speed.

    Note: Re-encoding can reduce quality unless CRF or bitrate settings are high, and it generally takes longer than stream copy.

  3. Comparison of the Two Methods

    Criteria Concat Demuxer Filter Complex
    Workflow Two-step (extract → concatenate) Single command
    Re-encoding No (lossless) Yes (may affect quality unless configured carefully)
    Speed Faster (stream copy only) Slower (due to re-encoding)
    Flexibility Limited to trimming and joining Supports resizing, overlays, format changes, etc.

Additional FFmpeg Notes

Written on February 21, 2025


Meta Information


Python Script for BPM & Tempo Extraction from Multiple M4A Files (Written May 18, 2025)

This document describes extract_meta_from_media.py (v1.1), an enhanced Python script that computes the global BPM of every .m4a file in ~/Desktop/m4a and—new in this release—extracts tempo metadata and an instantaneous tempo curve for deeper musical analysis.

1. Objective

The script will:

  1. Locate all .m4a files in the m4a folder on your Desktop.
  2. For each file:
    • Estimate its global BPM with librosa.
    • Read any embedded BPM tag (iTunes “tmpo” atom).
    • Generate a frame-level tempo curve to reveal fluctuations over time.
  3. Print a clean report to the console for every track.

2. Prerequisites

  1. Python 3.8 + (macOS ships with an older Python—install a recent one via Homebrew if needed).
  2. Virtual-environment setup (recommended)
    Execute these commands from ~/Desktop:
    python3 -m venv venv
    source venv/bin/activate
    pip install --upgrade pip
  3. Libraries
    Install the three required packages inside the venv:
    pip install librosa mutagen numpy
    Optional but wise: librosa benefits from FFmpeg for broad codec support:
    brew install ffmpeg
  4. Folder structure
    Ensure your Desktop looks like:
    Desktop/
    ├── extract_meta_from_media.py
    └── m4a/
        ├── song1.m4a
        ├── song2.m4a
        └── …

3. Implementation

The complete v1.1 source code is reproduced below.

#!/usr/bin/env python3
"""
Filename  : extract_meta_from_media.py
Version   : 1.1
Author    : Hyunsuk Frank Roh

Description
-----------
Walk through ~/Desktop/m4a, estimate the *global* BPM of every .m4a file,
**and** (new in v1.1) extract extra tempo information:

•  Embedded tempo/BPM tag from the file’s metadata (iTunes ‘tmpo’ atom).  
•  An instantaneous tempo curve so you can see how BPM fluctuates over time.

Dependencies
------------
    pip install librosa mutagen numpy

Usage
-----
    python extract_meta_from_media.py
"""
import warnings
warnings.filterwarnings("ignore", category=UserWarning)
warnings.filterwarnings("ignore", category=FutureWarning)

import os
from typing import List, Tuple, Optional

import numpy as np
import librosa
from mutagen.mp4 import MP4


# --------------------------------------------------------------------------- #
#                               Core routines                                 #
# --------------------------------------------------------------------------- #
def compute_tempo(
    audio_file_path: str,
    sr_target: int | None = None
) -> Tuple[float, List[float]]:
    """
    Return (global_bpm, tempo_curve).

    Parameters
    ----------
    audio_file_path : str
        Path to an audio file (.m4a).
    sr_target : int | None
        Target sample-rate for librosa.load (None = original file rate).

    Returns
    -------
    global_bpm : float
        Single BPM estimate from librosa’s beat tracker.
    tempo_curve : list[float]
        Frame-level BPMs returned by librosa.beat.tempo(..., aggregate=None).
    """
    y, sr = librosa.load(audio_file_path, sr=sr_target)

    # Global BPM via beat tracking
    global_bpm, _ = librosa.beat.beat_track(y=y, sr=sr)

    # Instantaneous tempo curve
    tempo_curve = librosa.beat.tempo(y=y, sr=sr, aggregate=None)

    return float(global_bpm), tempo_curve.tolist()


def read_tagged_tempo(audio_file_path: str) -> Optional[float]:
    """
    Fetch embedded tempo/BPM tag (iTunes ‘tmpo’ atom) if present.
    Returns None when no tag is found or the file type is unsupported.
    """
    try:
        audio = MP4(audio_file_path)
        if "tmpo" in audio.tags:          # ‘tmpo’ is usually a single int
            return float(audio.tags["tmpo"][0])
    except Exception:
        pass                              # Unsupported container or no tag
    return None


# --------------------------------------------------------------------------- #
#                                Main driver                                  #
# --------------------------------------------------------------------------- #
def main() -> None:
    desktop_path = os.path.join(os.path.expanduser("~"), "Desktop")
    m4a_folder   = os.path.join(desktop_path, "m4a")

    if not os.path.isdir(m4a_folder):
        print(f"Folder not found: {m4a_folder}")
        return

    m4a_files = sorted(
        f for f in os.listdir(m4a_folder) if f.lower().endswith(".m4a")
    )
    if not m4a_files:
        print(f"No .m4a files found in {m4a_folder}")
        return

    for filename in m4a_files:
        file_path = os.path.join(m4a_folder, filename)
        print(f"\nProcessing {filename} …")
        try:
            global_bpm, tempo_curve = compute_tempo(file_path)
            tagged_tempo = read_tagged_tempo(file_path)

            print(f"Estimated global BPM    : {global_bpm:.2f}")
            if tagged_tempo is not None:
                print(f"Embedded tempo tag      : {tagged_tempo:.2f} BPM")
            else:
                print("Embedded tempo tag      : – (none)")

            if tempo_curve:
                arr = np.array(tempo_curve)
                print(
                    "Instantaneous tempo stats:"
                    f" min {arr.min():.2f}"
                    f" | mean {arr.mean():.2f}"
                    f" | max {arr.max():.2f} BPM"
                )
                # Uncomment if you want to peek at the first few entries
                # print('Tempo curve (first 10):', ', '.join(f'{v:.2f}' for v in arr[:10]))

        except Exception as exc:
            print(f"Error processing {filename}: {exc}")


if __name__ == "__main__":
    main()  

4. Explanation of Key Enhancements

Componentv1.0 Behaviourv1.1 Upgrade
read_tagged_tempo() Uses mutagen to pull the iTunes BPM tag (tmpo) if it exists.
compute_tempo() Returned a single BPM value. Also returns a frame-level tempo curve via librosa.beat.tempo(..., aggregate=None).
Console output Only global BPM printed. Adds embedded tag (if present) plus min/mean/max of the tempo curve for quick insight.
Dependencies librosa, soundfile Now librosa, mutagen, numpy (soundfile is still auto-pulled by librosa).

5. Program Flow Diagram (Updated)

┌────────────────────────────┐
│   Start Script             │
└────────────────────────────┘
            │
            ▼
┌────────────────────────────┐
│ 1. Verify ~/Desktop/m4a    │
└────────────────────────────┘
            │
            ▼
┌────────────────────────────┐
│ 2. List all .m4a files     │
└────────────────────────────┘
            │
   ┌────────┴─────────┐
   │ Any files found? │
   └────────┬─────────┘
      Yes   │   No
            │
            ▼
┌────────────────────────────────────┐
│ 3. For each file:                  │
│    • Estimate global BPM           │
│    • Read embedded BPM tag         │
│    • Compute tempo curve           │
│    • Print results                 │
└────────────────────────────────────┘
            │
            ▼
┌────────────────────────────┐
│          End               │
└────────────────────────────┘

6. Usage Instructions

  1. Activate your venv each session (from ~/Desktop):
    source venv/bin/activate
  2. Run the script:
    python extract_meta_from_media.py
  3. Inspect output—for each track you’ll see:
    Processing song1.m4a …
    Estimated global BPM    : 128.12
    Embedded tempo tag      : 128.00 BPM
    Instantaneous tempo stats: min 127.50 | mean 128.05 | max 128.60 BPM
  4. When finished, deactivate:
    deactivate

Written on May 18, 2025


Python Script for BPM & Tempo Extraction from Multiple Media Files (Written June 21, 2025)

This document presents extract_meta_from_media.py (v1.2), an upgraded Python script that scans ~/Desktop/media for audio-capable files (.m4a, .mp3, .mp4), computes each track’s global BPM, and extracts embedded tempo tags plus an instantaneous tempo curve for detailed musical analysis.

1. Objective

The script will:

  1. Locate all supported files (.m4a, .mp3, .mp4) in the media folder on your Desktop.
  2. For each file:
    • Estimate its global BPM using librosa.
    • Read any embedded BPM tag:
      – iTunes tmpo atom for .m4a/.mp4
      – ID3 TBPM frame (or EasyID3 “bpm”) for .mp3
    • Generate a frame-level tempo curve to reveal BPM fluctuations over time.
  3. Print a concise report to the console for every track.

2. Prerequisites

  1. Python 3.8+
  2. Virtual environment (recommended)
    From ~/Desktop:
    python3 -m venv venv
    source venv/bin/activate
    pip install --upgrade pip
  3. Libraries
    pip install librosa mutagen numpy
    Tip: Install FFmpeg for wider codec support:
    # macOS (Homebrew)
    brew install ffmpeg
  4. Folder structure
    Desktop/
    ├── extract_meta_from_media.py
    └── media/
        ├── song1.m4a
        ├── track2.mp3
        ├── clip3.mp4
        └── …

3. Implementation

The complete v1.2 source code is reproduced below.

#!/usr/bin/env python3
"""
Filename  : extract_meta_from_media.py
Version   : 1.2
Author    : Hyunsuk Frank Roh

Description
-----------
Walk through ~/Desktop/media, estimate the *global* BPM of every audio-capable
file (.m4a, .mp3, .mp4), **and** extract extra tempo information:

•  Embedded tempo/BPM tag from the file’s metadata  
   – iTunes 'tmpo' atom for .m4a / .mp4  
   – ID3 'TBPM' (or EasyID3 "bpm") for .mp3  
•  An instantaneous tempo curve so you can see how BPM fluctuates over time.

Dependencies
------------
    pip install librosa mutagen numpy

Usage
-----
    python extract_meta_from_media.py
"""
import warnings
warnings.filterwarnings("ignore", category=UserWarning)
warnings.filterwarnings("ignore", category=FutureWarning)

import os
from typing import List, Tuple, Optional

import numpy as np
import librosa
from mutagen.mp4 import MP4
from mutagen import File as MutagenFile


# --------------------------------------------------------------------------- #
#                               Core routines                                 #
# --------------------------------------------------------------------------- #
def compute_tempo(
    audio_file_path: str,
    sr_target: int | None = None
) -> Tuple[float, List[float]]:
    """
    Return (global_bpm, tempo_curve).
    """
    y, sr = librosa.load(audio_file_path, sr=sr_target, mono=True)

    # Global BPM via beat tracking
    global_bpm, _ = librosa.beat.beat_track(y=y, sr=sr)

    # Instantaneous tempo curve
    tempo_curve = librosa.beat.tempo(y=y, sr=sr, aggregate=None)

    return float(global_bpm), tempo_curve.tolist()


def read_tagged_tempo(audio_file_path: str) -> Optional[float]:
    """
    Return embedded BPM tag (if any) or None.
    """
    ext = os.path.splitext(audio_file_path)[1].lower()
    try:
        if ext in {".m4a", ".mp4"}:
            audio = MP4(audio_file_path)
            if "tmpo" in audio.tags:
                return float(audio.tags["tmpo"][0])
        elif ext == ".mp3":
            audio = MutagenFile(audio_file_path)
            if audio and audio.tags:
                if "bpm" in audio.tags:
                    return float(audio.tags["bpm"][0])
                if "TBPM" in audio.tags:
                    return float(audio.tags["TBPM"].text[0])
    except Exception:
        pass
    return None


# --------------------------------------------------------------------------- #
#                                Main driver                                  #
# --------------------------------------------------------------------------- #
def main() -> None:
    desktop_path = os.path.join(os.path.expanduser("~"), "Desktop")
    media_folder = os.path.join(desktop_path, "media")

    if not os.path.isdir(media_folder):
        print(f"Folder not found: {media_folder}")
        return

    audio_exts = {".m4a", ".mp3", ".mp4"}

    media_files = sorted(
        f for f in os.listdir(media_folder)
        if os.path.splitext(f)[1].lower() in audio_exts
    )
    if not media_files:
        print(f"No supported audio files found in {media_folder}")
        return

    for filename in media_files:
        file_path = os.path.join(media_folder, filename)
        print(f"\nProcessing {filename} …")
        try:
            global_bpm, tempo_curve = compute_tempo(file_path)
            tagged_tempo = read_tagged_tempo(file_path)

            print(f"Estimated global BPM    : {global_bpm:.2f}")
            if tagged_tempo is not None:
                print(f"Embedded tempo tag      : {tagged_tempo:.2f} BPM")
            else:
                print("Embedded tempo tag      : – (none)")

            if tempo_curve:
                arr = np.array(tempo_curve)
                print(
                    "Instantaneous tempo stats:"
                    f" min {arr.min():.2f}"
                    f" | mean {arr.mean():.2f}"
                    f" | max {arr.max():.2f} BPM"
                )
        except Exception as exc:
            print(f"Error processing {filename}: {exc}")


if __name__ == "__main__":
    main()

4. Key Enhancements over v1.1

Component v1.1 Behavior v1.2 Upgrade
Target folder ~/Desktop/m4a ~/Desktop/media with mixed formats
Supported extensions .m4a .m4a, .mp3, .mp4
read_tagged_tempo() iTunes tmpo only Adds ID3 TBPM / EasyID3 “bpm” for .mp3
Error handling Basic Robust across multiple formats
Console output Per-track stats for .m4a Same stats for all supported formats

5. Program Flow Diagram (Updated)

┌────────────────────────────┐
│        Start Script        │
└────────────────────────────┘
            │
            ▼
┌────────────────────────────┐
│ 1. Verify ~/Desktop/media  │
└────────────────────────────┘
            │
            ▼
┌────────────────────────────┐
│ 2. List .m4a/.mp3/.mp4     │
└────────────────────────────┘
            │
   ┌────────┴─────────┐
   │ Any files found? │
   └────────┬─────────┘
      Yes   │   No
            │
            ▼
┌──────────────────────────────────────────────┐
│ 3. For each file:                            │
│    • Estimate global BPM                     │
│    • Read embedded BPM tag (if any)          │
│    • Compute tempo curve                     │
│    • Print results                           │
└──────────────────────────────────────────────┘
            │
            ▼
┌────────────────────────────┐
│           End              │
└────────────────────────────┘

6. Usage Instructions

  1. Activate your venv (each session):
    source venv/bin/activate
  2. Run the script:
    python extract_meta_from_media.py
  3. Inspect output — example:
    Processing track2.mp3 …
    Estimated global BPM    : 124.37
    Embedded tempo tag      : 125.00 BPM
    Instantaneous tempo stats: min 123.90 | mean 124.25 | max 125.10 BPM
  4. When finished, deactivate:
    deactivate

Happy beat tracking!

Written on June 21, 2025


Mathematical Models


Summing Audio Tracks in Logic Pro (Written May 31, 2025)

Logic Pro carries out calculations in the linear domain (floating-point amplitudes) but shows levels in dBFS. Each track’s gain, pan law, and plug-in chain are applied linearly, the results are summed, and only then is the value converted back to dB for the master fader.

The Core Equation 🔬

\[ S_{\text{mix}}(t)=\sum_{i=1}^{N} g_i\,s_i(t) \] \[ \text{dBFS}=20\log_{10}\!\bigl(\lvert S_{\text{mix}}(t)\rvert\bigr) \]

Because decibels are logarithmic, dB values cannot be added directly; each track must first be converted to linear amplitude (or power) before summation.

Equal vs. Weighted Summation

  1. Equal Weighting (Default)

    • A fader at 0 dB means a linear gain of 1. Two identical, phase-aligned mono tracks at 0 dB rise by +3 dB at the stereo output (pan law accounted for).
    • Real-world material seldom aligns perfectly, so typical boosts are closer to +1 – +2 dB.
  2. Custom Weighting with Faders

    • Lowering a track to -6 dB multiplies its samples by 0.5. In the equation above the term becomes \(0.5\,s_i(t)\), effectively halving that track’s influence.
    • Dynamics processors, sends, and other inserts introduce further, track-specific weighting before the mix bus.

Pan Law Considerations 🌀

Logic Pro’s default pan law is -3 dB center. A mono track panned hard left or right keeps full amplitude on one side, whereas a centered mono signal is attenuated (0.707×) on each side to preserve perceived loudness.

Worked Example 📊

Track Fader (dB) Linear Gain (g) Peak (dBFS) Contribution
to Mix (dBFS)
Kick01.00-6-6.0
Bass-4.50.60-9-13.2
Pads (stereo)-60.50-12-18.0
Summed Peak (linear)≈ -4.0 dBFS

Practical Guidance 🎚️

  1. Maintain head-room: keep master peaks between -6 dBFS and -3 dBFS to avoid inter-sample clipping when tracks reinforce one another.
  2. If the mix bus clips, trim individual faders rather than lowering the master fader to preserve plug-in gain staging.
  3. Use VU-style meters for perceived loudness; peak meters alone cannot reveal RMS energy buildup.

Written on May 31, 2025


Digital waveform amplitude & bidirectional dynamics (Written May 31, 2025)

Acoustic events are stored as waveforms. The vertical axis shows instantaneous amplitude; the horizontal axis shows time. Greater distance from the mid-line (zero) means greater air-pressure deviation and therefore louder perceived sound.

I. Digital full-scale reference (0 dBFS)

In PCM systems every sample is a signed number between -1.0 and +1.0. Both limits equal 0 dB full scale (0 dBFS). Attempts to exceed them cause quantization overflow; data are truncated and clipping distortion occurs.

When |sample| ≥ 1.0 (0 dBFS) the waveform is clipped. Logic Pro peak meters turn red to indicate this condition.

II. Ideal sinusoid and amplitude limit

An ideal sine of frequency f and phase ϕ is \[ A(t)=A_{\max}\sin\!\bigl(2\pi f t+\phi\bigr) \]. To avoid clipping require \(A_{\max}\le 1.0\).

Chart 1 — Sine wave approaching 0 dBFS

III. Bidirectional amplitude and the mid-line

A. Physical interpretation

A loudspeaker diaphragm moves forward (compression) and backward (rarefaction). Digital audio encodes this as a signed-value stream:

Sample value Acoustic state Perceptual result
+1.0 → 0.0CompressionLoud phase
0.0EquilibriumSilence / zero crossing
0.0 → -1.0RarefactionEqually loud, opposite polarity

B. Why polarity sounds identical 🙌

C. Mid-line (0) as a diagnostic reference ✨

  1. Zero crossings reveal fundamental frequency.
  2. DC offset lifts the whole waveform, wasting headroom and inviting clipping; apply high-pass or DC-removal.
  3. Digital silence = continuous zeros; any non-zero sample creates audible output.

Chart 2 — Compression (v ≥ 0) vs rarefaction (v < 0)

IV. Practical gain-staging recommendations 🚀

  1. Record peaks at least 3 dB below 0 dBFS to preserve headroom.
  2. Insert a brick-wall limiter on the master bus if track summation risks clipping.
  3. React immediately to red peak indicators by lowering track gain.

V. Engineering takeaways

VI. Summary

Waveform height from the mid-line encodes loudness. Exceeding ±1.0 causes clipping at 0 dBFS. Because ears sense absolute pressure change, positive and negative peaks sound the same. Thoughtful gain staging—keeping ample headroom and monitoring polarity symmetry—prevents distortion and maintains audio quality.

Compiled May 31, 2025

Written on June 7, 2025


Perceptual loudness normalization for multitrack mixing (Written June 7, 2025)

Balancing track levels by perceived loudness relies on two pillars: the Equal-Loudness Contour (ISO 226) that models frequency sensitivity and the ITU-R BS.1770 algorithm that outputs integrated loudness in LUFS. A streamlined workflow:

  1. Process every stem through the BS.1770 K-weighting filter and read its integrated LUFS.
  2. Select a platform-appropriate target, for example −16 LUFS for podcasts.
  3. Apply the simple gain offset  \( \Delta G_{\text{dB}} = L_{\text{target}} - L_{\text{track}} \) via a fader or Gain plug-in.

Advanced scripts replace step 3 with a Zwicker specific-loudness or partial-loudness routine that respects critical-band masking. Logic Pro’s Loudness Meter + Gain plug-ins are sufficient, while commercial tools such as iZotope Neutron and Sonible smart:limit automate the entire process internally.

I. Frequency-dependent human hearing

II. Practical standard — ITU-R BS.1770 K-weighting / LUFS

  1. Core measurement formula

    \( L_{\text{LKFS}} = -0.691 + 10 \log_{10}\!\Bigl(\displaystyle\sum_{i} G_i \, \overline{x_{i,K}^2}\Bigr) \)

    Integrated loudness sums K-weighted mean-square energy across channels, converts the result to decibels referenced to full scale, and applies an empirically derived −0.691 dB offset so that calibrated pink noise reads 0 LU.

  2. Term-by-term breakdown

    • \( x_{i,K}(t) \): sample of channel i after the K-weighting filter (60 Hz high-pass + 4 dB high-shelf at 4 kHz).
    • \( \overline{x_{i,K}^2} \): mean-square energy inside a 400 ms analysis block.
    • \( G_i \): channel weight that compensates for surround placement (see matrix below).
    • 10 log10: converts summed power to decibels relative to digital full scale.
    • −0.691 dB: bias aligning the objective value with subjective loudness tests.
  3. Channel weight matrix \(G_i\)

    ChannelWeightRationale
    L / R / C / LFE1.00On-axis reference
    LS / RS1.41Rear speakers radiate off-axis
    Height (immersive)1.00Elevation is inherently prominent
  4. Dual-gate time integration

    Each 400 ms block first passes an absolute gate at −70 LKFS, then a relative gate 10 dB below the running average. This rejects silence and low-level ambience, focusing the metric on program-relevant loudness.

  5. LU, LKFS, and LUFS

    One Loudness Unit (LU) equals 1 dB when measured with BS.1770. LUFS (loudness units relative to full scale) is therefore numerically identical to LKFS; for example, YouTube targets about −14 LUFS.

  6. Origin of the −0.691 dB offset

    Listening tests with full-band pink noise revealed a systematic 0.691 dB gap between perceived loudness and calculated energy, prompting inclusion of the constant for perceptual alignment.

  7. Worked example

    A stereo mix measures −18.2 LUFS (L) and −18.0 LUFS (R):
    \( \displaystyle L_{\text{mix}} = -0.691 + 10 \log_{10}\!\bigl(10^{-1.82} + 10^{-1.80}\bigr) \approx -18.1 \text{ LUFS} \)
    To hit a podcast target of −16 LUFS:
    \( \Delta G = -16 - (-18.1) = +2.1 \text{ dB} \) of gain is required.

III. Per-track automatic gain equation

StepOperationPurpose
K-weightingMimic human frequency response
Short-term LUFS (400 ms)Estimate perceived level
\( \Delta G = L_{\text{target}} - L_{\text{track}} \)Compute gain offset
Apply Gain / write fader automationNormalize track loudness

Typical targets: −23 LUFS (broadcast), −16 LUFS (streaming & podcasts), −14 LUFS (mainstream music video).

IV. Spectral fine-tuning — Zwicker & partial loudness

V. Logic Pro practical workflow

  1. Insert Loudness Meter on each stem, solo, and read the integrated LUFS.
  2. Match the target by trimming Gain or the channel fader by \( \Delta G \).
  3. Use Volume Relative automation for section-specific offsets without altering the static fader position.
  4. Finish with Loudness Range checks to confirm macro-dynamics.
  5. Optional: engage an AI assistant (Neutron Mix Assistant, smart:limit) for one-click loudness alignment and masking analysis.

VI. Limitations & best practice

Key equation recap ✏️

\( \boxed{\; \Delta G_{\text{dB}} = L_{\text{target (LUFS)}} - L_{\text{track (LUFS)}} \;} \)

Running this subtraction in a loop or script updates every fader so the mix starts from a scientifically grounded loudness foundation, ready for creative processing.

Written on June 7, 2025


Bit depth and sample rate in digital audio (Written June 7, 2025)

I. Core definitions

비트 깊이는 얼마나 세밀하게 진폭을 기술하는지를, 샘플레이트는 얼마나 자주 이를 기록하는지를 결정한다. 두 요소가 결합해 기계가 저장할 수 있는 수치적 충실도와 사람이 들을 수 있는 지각적 충실도를 동시에 규정한다.

II. Mathematical consequences

III. Practical meaning for devices 🖥️

IV. Perceptual meaning for listeners 👂

V. Comparison table

Configuration Sample rate Bit depth Theoretical
dynamic range
Primary use case
CD Audio44.1 kHz16-bit≈ 98 dBConsumer music distribution
Broadcast WAV48 kHz24-bit≈ 146 dBFilm / streaming production
Hi-Res96 kHz24-bit≈ 146 dBArchival & audio restoration
DXD352.8 kHz24-bit≈ 146 dBHybrid PCM/DSD workflows

VI. Best-practice guidelines ✅

Key formulas recap ✏️

\( f_s \ge 2 f_{\max} \)  — Nyquist criterion

\( \text{SQNR} \approx 6.02 N + 1.76 \;\text{dB} \)  — dynamic range per bit depth

Bit depth determines how finely amplitude is described; sample rate determines how often those descriptions occur. Together they define both the numerical fidelity a machine can store and the perceptual fidelity a human can hear.

Written on June 7, 2025


Logarithmic perception of pitch and loudness in human hearing (Written June 7, 2025)

I. Frequency and perceived pitch

A. Octave equivalence

The auditory system interprets pitch on a base-2 logarithmic axis. An octave step is defined by (\(P = \log_{2}\! \bigl(f / f_{0}\bigr)\)), so doubling frequency raises pitch by exactly one octave. For example, 27.5 Hz (A0) → 55 Hz (A1) → 110 Hz (A2).

B. Psychoacoustic refinements

The mel scale offers finer resolution: (\(\text{mel} \approx 2595 \log_{10} (1 + f/700)\)). Low-frequency bins appear densely packed, while spacing widens toward the treble, mirroring subjective pitch growth.

II. Sound-pressure level and perceived loudness

A. Decibel definition

Sound-pressure level (SPL) employs a base-10 logarithm: (\(L_{\text{dB}} = 20 \log_{10} (p / p_{0})\)), with \(p_{0} = 20\;\mu\text{Pa}\) as the threshold-of-hearing reference. A 6 dB increase doubles pressure amplitude yet is judged only “slightly louder,” honoring the Weber–Fechner law (\(S = k \log (I / I_{0})\)).

III. Piano keyboard versus auditory limits 🎹

Key position Frequency (Hz) Perceptual notes
A027.5Lowest practical musical pitch; borderline tactile
A4440Concert-pitch reference
C8≈ 4186Highest piano key; clearly audible to most listeners
+1 octave≈ 8 kHzAudible but devoid of distinct melodic identity
+2 octaves≈ 16 kHzPerceived by youth; sensitivity declines with age

Frequencies below 20 Hz (e.g., 13.75 Hz, one octave beneath A0) exceed the cochlea’s temporal-resolution limit; vibrations are sensed as rhythmic flutter rather than tonal pitch.

IV. Rationale for sub-20 Hz filtration 🛠️

V. Age-related high-frequency decline 👂

Key formulas recap ✏️

\(P = \log_{2} (f / f_{0})\) — octave-based pitch index

\(L_{\text{dB}} = 20 \log_{10} (p / p_{0})\) — sound-pressure level

Pitch and loudness are transduced through logarithmic mappings, enabling the auditory system to condense an enormous dynamic and spectral span into a manageable perceptual range. Musical instrument design, audio metering, and mix-engineering practices therefore align with base-2 and base-10 log scales to remain compatible with human hearing.

Written on June 7, 2025


The mathematical foundations of musical harmony (Written June 8, 2025)

Musical harmony rests upon deep mathematical principles. The present overview respectfully examines the key equations and structures that underlie tonal organization, tuning, and chordal relationships, offering a concise yet comprehensive synthesis for scholarly publication.

Frequency, pitch, and the harmonic series

When a resonant body vibrates at a fundamental frequency \(f_{0}\), overtones arise at integer multiples \(n\,f_{0}\). This integer progression, termed the harmonic series, shapes consonance perception and tonal color.

Descriptive alt text
Harmonic series frequencies for the first sixteen partials \((f_{0}=100\text{ Hz})\).

Tuning systems and frequency equations

  1. Just intonation

    Just intonation defines every interval by a simple rational ratio \(p:q\). For example, the perfect fifth employs \(3:2\). Given a fundamental \(f_{0}\), any pitch in a just system is \(f = \tfrac{p}{q}\,f_{0}\).

  2. Equal temperament

    In twelve-tone equal temperament (12-TET) the octave is divided logarithmically. The frequency of a note \(n\) semitones above the reference is \(f(n) = f_{0}\,2^{\,n/12}\). This exponential equation ensures transpositional symmetry but introduces minute deviations from just ratios.

    • Octave invariance: doubling frequency every twelve steps.
    • Modular arithmetic: pitch classes operate in \( \mathbb{Z}_{12} \).
    • Circle of fifths: successive seven-semitone moves trace the multiplicative group modulo 12.
  3. Cents and logarithmic measurement

    Pitch distance is often expressed in cents, where one cent equals \(1/100\) of a semitone: \(c = 1200 \log_{2}\!\bigl(\tfrac{f_{2}}{f_{1}}\bigr).\)

    Interval Just intonation ratio Equal temperament ratio Cent difference (JI – ET)
    Unison1/11.000000+0.00
    Minor second16/151.059463+11.73
    Major second9/81.122462+3.91
    Minor third6/51.189207+15.64
    Major third5/41.259921−13.69
    Perfect fourth4/31.334840−1.96
    Tritone45/321.414214−9.78
    Perfect fifth3/21.498307+1.96
    Minor sixth8/51.587401+13.69
    Major sixth5/31.681793−15.64
    Minor seventh9/51.781797+17.60
    Major seventh15/81.887749−11.73
    Octave2/12.000000+0.00

Chord structures and vector spaces

  1. Pitch-class set theory

    Chordal identity may be encoded as ordered or unordered pitch-class sets within \(\mathbb{Z}_{12}\). Operations of transposition \(T_{n}\) and inversion \(I_{n}\) correspond to affine transformations preserving set equivalence classes.

  2. Fourier representations

    The discrete Fourier transform (DFT) of pitch-class occurrences yields phase-angle spectra, illuminating interval content and aiding similarity measures between chords or scales.

Transformational theory and group operations

  1. Neo-Riemannian PLR group

    Transformations Parallel (P), Leittonwechsel (L), and Relative (R) act on triads, forming the dihedral group \(D_{6}\). Matrix encoding facilitates algebraic navigation through triadic space, modeling smooth harmonic progressions.

Mathematical models of voice leading

  1. Geometric chord space

    Recent studies embed voice leading as geodesic motion within high-dimensional orbifolds, where distance metrics correspond to total voice displacement. This geometric framework explicates common-tone retention and parsimonious motion.

Written on June 8, 2025


Back to Top