maturaarbeit/MISE_GUIDE.md
MM4go c7ff6a8a29 Add PDF to Markdown converter with mise task runner
- Implement pdf_to_markdown.py script with pypdf for text extraction
- Extract metadata (title, author, creation date) from PDFs
- Generate clean Markdown files with YAML front matter
- Add comprehensive error handling and logging
- Create mise.toml with 10+ convenient tasks for conversion
- Provide detailed documentation (4 guides + quick reference)
- Successfully convert all 18 PDF files in artikel/ folder to Markdown
- Include .gitignore for Python cache and local config
2026-02-23 14:58:58 +01:00

6.3 KiB

Mise en Place - PDF to Markdown Converter

A modern task runner configuration for the PDF to Markdown conversion project using mise.

Overview

Mise is a polyglot tool manager that handles tool installations and task execution. This project uses it to:

  • Automatically install Python 3.11 and dependencies
  • Provide convenient commands for PDF conversion tasks
  • Manage development workflows
  • Track conversion status

Installation

Prerequisites

Quick install:

curl https://mise.jdx.dev/install.sh | sh

Setup

# Clone or navigate to the project
cd maturaarbeit

# Trust the configuration files (one-time setup)
mise trust

# Verify installation
mise tasks

Quick Start

Convert All PDFs

mise run convert

This will:

  1. Install dependencies (if not already installed)
  2. Run the PDF to Markdown converter
  3. Process all PDFs in artikel/ folder
  4. Output Markdown files to artikel/converted/
  5. Display a conversion summary

Check Conversion Status

mise run status

Shows:

  • Number of PDFs in artikel/
  • Number of converted Markdown files
  • ✓ All PDFs converted (if done)

Preview Without Writing

mise run dry-run

Shows what PDFs would be converted without actually writing files.

Available Tasks

Task Description
install Install Python 3.11 and project dependencies
convert Convert all PDFs to Markdown (main task)
convert-verbose Convert with detailed logging output
convert-quiet Convert silently (errors only)
dry-run Preview conversion without writing files
convert-custom Convert from custom input/output folders
status Show conversion status and progress
clean Remove converted Markdown files
clean-all Remove all build artifacts and cache
help List all available tasks

Usage Examples

Basic Conversion

# Convert all PDFs using defaults
mise run convert

# Convert with verbose logging
mise run convert-verbose

# Convert silently
mise run convert-quiet

Custom Paths

# Convert from custom input directory
INPUT_DIR=/path/to/pdfs mise run convert-custom

# Specify both input and output directories
INPUT_DIR=/path/to/pdfs OUTPUT_DIR=/path/to/output mise run convert-custom

Cleanup

# Remove only converted markdown files
mise run clean

# Remove all artifacts (markdown files, cache, __pycache__)
mise run clean-all

Configuration Files

mise.toml

Main configuration file with all tasks, environment variables, and tool versions.

Key sections:

  • [env] - Environment variables (e.g., PYTHONUNBUFFERED)
  • [tasks.*] - Task definitions with descriptions and commands
  • [tools.python] - Python version specification (3.11)
  • [tools.pipenv] - Package manager version

.mise.local.toml

Local overrides for environment-specific configuration. Git-ignored file for personal settings.

Example customizations:

# Override input/output directories
INPUT_DIR = "./my_pdfs"
OUTPUT_DIR = "./my_output"

# Custom Python path
PYTHON_PATH = "/usr/local/bin/python3"

.gitignore

Excludes mise cache and local configuration from version control.

How It Works

Automatic Tool Installation

When you run a task, mise automatically:

  1. Detects required tools (Python 3.11)
  2. Downloads and installs them if missing
  3. Creates isolated environment
  4. Executes the task in that environment

Task Execution

  1. Setup phase - Install dependencies via pip install -r requirements.txt
  2. Execution phase - Run the Python script with appropriate arguments
  3. Cleanup phase - Report results and summary

Environment Variables

PYTHONUNBUFFERED=1  # Real-time output (no buffering)
INPUT_DIR           # Custom input folder (default: ./artikel)
OUTPUT_DIR          # Custom output folder (default: ./artikel/converted)

Advantages Over Traditional Approach

Before (Manual Setup)

# Install Python globally
# Install pip
# Install dependencies
# Hope everything works
python3 pdf_to_markdown.py

After (Mise)

# One command - everything handled
mise run convert

Benefits:

  • Reproducible - Same environment every time
  • Isolated - Tools don't affect system Python
  • Fast - Caches installed tools
  • Easy - Single command to run tasks
  • Portable - Works on any system with mise
  • Documented - Task descriptions built-in
  • Flexible - Environment variables for customization

Troubleshooting

Issue: "mise: command not found"

Solution: Install mise first

curl https://mise.jdx.dev/install.sh | sh

Issue: "Config files are not trusted"

Solution: Trust the configuration

mise trust

Issue: Python dependencies not installing

Solution: Manually install in the mise environment

mise run install

Issue: "No PDF files found"

Solution: Check the input directory path

# Verify PDFs exist
ls -la artikel/*.pdf

# If in different location, use custom path
INPUT_DIR=/path/to/pdfs mise run convert-custom

Issue: Slow first run

Solution: First run downloads and installs tools (one-time). Subsequent runs are fast.

Advanced Usage

Running Tasks from Shell Scripts

#!/bin/bash
# Run conversion in a script
mise run convert

# Capture exit code
if mise run convert; then
  echo "Conversion successful"
  mise run status
else
  echo "Conversion failed"
  exit 1
fi

Integrating with CI/CD

# GitHub Actions example
- name: Convert PDFs
  run: |
    curl https://mise.jdx.dev/install.sh | sh
    mise run convert

Custom Task Definition

To add a new task, edit mise.toml:

[tasks.my-custom-task]
description = "My custom task description"
run = "echo 'Running custom task'"
depends = ["install"]  # Depends on install task

Then run:

mise run my-custom-task

Documentation

  • Project Guide - See PDF_CONVERTER_GUIDE.md
  • Mise Docs - https://mise.jdx.dev/
  • Python Script - See pdf_to_markdown.py

Support

For issues or questions:


Version: 1.0
Last Updated: 2024-02-23