- Implement pdf_to_markdown.py script with pypdf for text extraction - Extract metadata (title, author, creation date) from PDFs - Generate clean Markdown files with YAML front matter - Add comprehensive error handling and logging - Create mise.toml with 10+ convenient tasks for conversion - Provide detailed documentation (4 guides + quick reference) - Successfully convert all 18 PDF files in artikel/ folder to Markdown - Include .gitignore for Python cache and local config
6.3 KiB
Mise en Place - PDF to Markdown Converter
A modern task runner configuration for the PDF to Markdown conversion project using mise.
Overview
Mise is a polyglot tool manager that handles tool installations and task execution. This project uses it to:
- Automatically install Python 3.11 and dependencies
- Provide convenient commands for PDF conversion tasks
- Manage development workflows
- Track conversion status
Installation
Prerequisites
- mise CLI installed: https://mise.jdx.dev/getting-started.html
Quick install:
curl https://mise.jdx.dev/install.sh | sh
Setup
# Clone or navigate to the project
cd maturaarbeit
# Trust the configuration files (one-time setup)
mise trust
# Verify installation
mise tasks
Quick Start
Convert All PDFs
mise run convert
This will:
- Install dependencies (if not already installed)
- Run the PDF to Markdown converter
- Process all PDFs in
artikel/folder - Output Markdown files to
artikel/converted/ - Display a conversion summary
Check Conversion Status
mise run status
Shows:
- Number of PDFs in
artikel/ - Number of converted Markdown files
- ✓ All PDFs converted (if done)
Preview Without Writing
mise run dry-run
Shows what PDFs would be converted without actually writing files.
Available Tasks
| Task | Description |
|---|---|
install |
Install Python 3.11 and project dependencies |
convert |
Convert all PDFs to Markdown (main task) |
convert-verbose |
Convert with detailed logging output |
convert-quiet |
Convert silently (errors only) |
dry-run |
Preview conversion without writing files |
convert-custom |
Convert from custom input/output folders |
status |
Show conversion status and progress |
clean |
Remove converted Markdown files |
clean-all |
Remove all build artifacts and cache |
help |
List all available tasks |
Usage Examples
Basic Conversion
# Convert all PDFs using defaults
mise run convert
# Convert with verbose logging
mise run convert-verbose
# Convert silently
mise run convert-quiet
Custom Paths
# Convert from custom input directory
INPUT_DIR=/path/to/pdfs mise run convert-custom
# Specify both input and output directories
INPUT_DIR=/path/to/pdfs OUTPUT_DIR=/path/to/output mise run convert-custom
Cleanup
# Remove only converted markdown files
mise run clean
# Remove all artifacts (markdown files, cache, __pycache__)
mise run clean-all
Configuration Files
mise.toml
Main configuration file with all tasks, environment variables, and tool versions.
Key sections:
[env]- Environment variables (e.g.,PYTHONUNBUFFERED)[tasks.*]- Task definitions with descriptions and commands[tools.python]- Python version specification (3.11)[tools.pipenv]- Package manager version
.mise.local.toml
Local overrides for environment-specific configuration. Git-ignored file for personal settings.
Example customizations:
# Override input/output directories
INPUT_DIR = "./my_pdfs"
OUTPUT_DIR = "./my_output"
# Custom Python path
PYTHON_PATH = "/usr/local/bin/python3"
.gitignore
Excludes mise cache and local configuration from version control.
How It Works
Automatic Tool Installation
When you run a task, mise automatically:
- Detects required tools (Python 3.11)
- Downloads and installs them if missing
- Creates isolated environment
- Executes the task in that environment
Task Execution
- Setup phase - Install dependencies via
pip install -r requirements.txt - Execution phase - Run the Python script with appropriate arguments
- Cleanup phase - Report results and summary
Environment Variables
PYTHONUNBUFFERED=1 # Real-time output (no buffering)
INPUT_DIR # Custom input folder (default: ./artikel)
OUTPUT_DIR # Custom output folder (default: ./artikel/converted)
Advantages Over Traditional Approach
Before (Manual Setup)
# Install Python globally
# Install pip
# Install dependencies
# Hope everything works
python3 pdf_to_markdown.py
After (Mise)
# One command - everything handled
mise run convert
Benefits:
- ✅ Reproducible - Same environment every time
- ✅ Isolated - Tools don't affect system Python
- ✅ Fast - Caches installed tools
- ✅ Easy - Single command to run tasks
- ✅ Portable - Works on any system with mise
- ✅ Documented - Task descriptions built-in
- ✅ Flexible - Environment variables for customization
Troubleshooting
Issue: "mise: command not found"
Solution: Install mise first
curl https://mise.jdx.dev/install.sh | sh
Issue: "Config files are not trusted"
Solution: Trust the configuration
mise trust
Issue: Python dependencies not installing
Solution: Manually install in the mise environment
mise run install
Issue: "No PDF files found"
Solution: Check the input directory path
# Verify PDFs exist
ls -la artikel/*.pdf
# If in different location, use custom path
INPUT_DIR=/path/to/pdfs mise run convert-custom
Issue: Slow first run
Solution: First run downloads and installs tools (one-time). Subsequent runs are fast.
Advanced Usage
Running Tasks from Shell Scripts
#!/bin/bash
# Run conversion in a script
mise run convert
# Capture exit code
if mise run convert; then
echo "Conversion successful"
mise run status
else
echo "Conversion failed"
exit 1
fi
Integrating with CI/CD
# GitHub Actions example
- name: Convert PDFs
run: |
curl https://mise.jdx.dev/install.sh | sh
mise run convert
Custom Task Definition
To add a new task, edit mise.toml:
[tasks.my-custom-task]
description = "My custom task description"
run = "echo 'Running custom task'"
depends = ["install"] # Depends on install task
Then run:
mise run my-custom-task
Documentation
- Project Guide - See
PDF_CONVERTER_GUIDE.md - Mise Docs - https://mise.jdx.dev/
- Python Script - See
pdf_to_markdown.py
Support
For issues or questions:
- Mise documentation: https://mise.jdx.dev/
- Project issues: https://github.com/anomalyco/opencode
Version: 1.0
Last Updated: 2024-02-23