MM4go 520ebf0950 Clean up nested artikel folder structure
- Remove duplicated artikel/artikel/ nested directory
- Move PDFs and resources to artikel/ root level
- Maintain clean directory structure for PDF conversion
2026-02-23 16:08:54 +01:00
2026-02-23 14:04:24 +01:00

PDF to Markdown Converter - Complete Setup

A production-ready Python script with mise task runner for converting PDF files to Markdown format.

🚀 Quick Start

One-Command Setup

# Install mise (if not already installed)
curl https://mise.jdx.dev/install.sh | sh

# Navigate to project
cd maturaarbeit

# Convert all PDFs to Markdown
mise run convert

That's it!

📦 What's Included

Core Files

File Purpose
pdf_to_markdown.py Main conversion script (373 lines)
requirements.txt Python dependencies (pypdf, python-dateutil)
mise.toml Task runner configuration with 10+ tasks
.mise.local.toml Local environment overrides (git-ignored)
.gitignore Git exclusions for cache and build artifacts

Documentation

File Purpose
README.md This file - overview and quick start
PDF_CONVERTER_GUIDE.md Complete usage guide for the Python script
MISE_GUIDE.md Detailed mise task runner documentation

Converted Files

  • artikel/converted/ - 18 Markdown files (one per PDF)
  • All PDFs successfully converted ✓

🎯 Key Features

PDF Conversion

Extract text from all pages
Preserve page structure with page headers
Extract metadata (title, author, creation date)
Generate YAML front matter
Handle errors gracefully
Progress reporting and summary

Mise Task Runner

Automatic Python installation (3.11)
Automatic dependency installation
Reproducible builds
Isolated environment
10+ convenient tasks
Custom path support

📋 Available Tasks

Run with: mise run <task-name>

Main Tasks

mise run convert           # Convert all PDFs (main task)
mise run convert-verbose   # Convert with detailed logging
mise run convert-quiet     # Convert silently
mise run dry-run          # Preview without writing

Utilities

mise run status           # Show conversion progress
mise run install          # Install dependencies
mise run clean            # Remove converted markdown
mise run clean-all        # Remove all artifacts

Custom Conversion

INPUT_DIR=/path/to/pdfs mise run convert-custom
INPUT_DIR=/path OUTPUT_DIR=/out mise run convert-custom

📖 Documentation Guide

For Quick Start

👉 Read this file (README.md)

For Python Script Details

👉 See PDF_CONVERTER_GUIDE.md for:

  • Installation instructions
  • Usage examples
  • Troubleshooting
  • How the script works
  • Customization options

For Mise Task Runner

👉 See MISE_GUIDE.md for:

  • Mise installation and setup
  • Task configuration
  • Advanced usage
  • CI/CD integration
  • Custom task creation

🔧 Usage Examples

Convert All PDFs (Default)

mise run convert

Output: 18 Markdown files in artikel/converted/

Convert with Verbose Logging

mise run convert-verbose

Shows detailed progress for each PDF.

Preview Conversion

mise run dry-run

Shows what would be converted without writing files.

Check Status

mise run status

Output:

=== PDF Conversion Status ===
PDF files in artikel/: 18
Markdown files in artikel/converted/: 18
✓ All PDFs converted!

📁 Output Format

Each converted PDF becomes a Markdown file with:

---
title: Document Title
author: Author Name
created: 2024-02-23
converted: 2024-02-23 14:57:05
source: original.pdf
---

# Document Title

## Page 1
[Extracted text...]

## Page 2
[Extracted text...]

🛠️ Technical Stack

  • Language: Python 3.11
  • PDF Library: pypdf 6.7.2
  • Date Parsing: python-dateutil 2.9.0
  • Task Runner: mise 2026.2.19
  • Total Script Size: 12 KB
  • Converted Files: 3.5 MB (18 PDFs → Markdown)

Conversion Results

Status: ✓ All 18 PDFs successfully converted

Metric Value
Total PDFs 18
Converted 18
Failed 0
Conversion Time ~28 seconds
Output Size 3.5 MB

Converted Documents

  • bewegendeGefühle.md
  • ChoreografiealsKulturteknik.md
  • Choreografie Handwerk und Vision.md
  • Handout-Choreografieren.md
  • Klänge in Bewegung.md
  • PersoenlichkeitsentwicklungdurchTanzUniBE.md
  • PsychologyofSport&Exercise.md
  • SinnundSinneimTanz.md
  • Sportschule.pdf
  • Sportunterricht.md
  • TanzPsychotherapeutischeHilfe.md
  • TanzpraxisinderForschung.md
  • WirkfaktorenvonTanz.md
  • Zwischen Rhythmus und Leistung.md
  • bewegendeGefühle.md
  • choreo.md
  • choreografiekonzepte_kurz.md
  • studienpsychischergesundheittanztherapie.md

🔄 Workflows

Standard Workflow

# Check status before
mise run status

# Convert PDFs
mise run convert

# Verify conversion
mise run status

# Clean if needed
mise run clean-all

Development Workflow

# Preview what would happen
mise run dry-run

# Run with verbose logging
mise run convert-verbose

# Review results
ls -lh artikel/converted/

# Check specific file
cat artikel/converted/choreo.md | head -20

CI/CD Integration

# In GitHub Actions, GitLab CI, etc.
curl https://mise.jdx.dev/install.sh | sh
mise run convert
mise run status

🚨 Troubleshooting

Common Issues

Issue: "mise: command not found"
Solution: Install mise: curl https://mise.jdx.dev/install.sh | sh

Issue: "Config files are not trusted"
Solution: Run mise trust

Issue: "No PDF files found"
Solution: Check input folder: ls artikel/*.pdf

Issue: Python dependencies not installing
Solution: Run mise run install manually

For detailed troubleshooting, see PDF_CONVERTER_GUIDE.md or MISE_GUIDE.md.

📚 Additional Resources

📝 Project Structure

maturaarbeit/
├── pdf_to_markdown.py          # Main script
├── requirements.txt             # Dependencies
├── mise.toml                    # Task configuration
├── .mise.local.toml             # Local overrides (git-ignored)
├── .gitignore                   # Git exclusions
│
├── README.md                    # This file
├── PDF_CONVERTER_GUIDE.md       # Python script guide
├── MISE_GUIDE.md                # Task runner guide
│
├── artikel/                     # Input PDFs
│   ├── *.pdf                    # 18 PDF files
│   └── converted/               # Output Markdown
│       └── *.md                 # 18 Markdown files
│
└── .git/                        # Version control

🎓 Learning Path

For Users:

  1. Read this README
  2. Run mise run convert
  3. View results in artikel/converted/
  4. Read PDF_CONVERTER_GUIDE.md for details

For Developers:

  1. Read MISE_GUIDE.md for task runner
  2. Examine mise.toml for configuration
  3. Review pdf_to_markdown.py for implementation
  4. Customize as needed

🔐 Security

  • No external API calls
  • All processing local
  • No data transmission
  • Git-ignored local config
  • Standard Python libraries

📄 License

This project is provided as-is for your use.

👥 Support


Project Version: 1.0
Last Updated: February 23, 2026
Status: Complete and Tested

Description
No description provided
Readme 60 MiB
Languages
Python 100%