a

Docs: Add filter examples to QUICKSTART.md
- Added --filter option examples for repository imports - Included practical filter patterns (posts/*/index.md, etc.) - Removed experimental warning as feature is now tested - Added real-world example with FOERBICO repository
2025-11-05 06:25:26 +01:00 · 2025-11-05 06:12:34 +01:00 · 2025-11-05 06:11:15 +01:00 · 2025-11-05 06:02:31 +01:00 · 2025-11-05 05:45:24 +01:00 · 2025-11-05 05:40:46 +01:00
6 changed files with 274 additions and 22 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -1,5 +1,36 @@
 # Changelog

+## Version 0.3.0 (2025-10-01)
+
+### Feature - Autor-zu-Tag-Mapping 🏷️
+
+**Neues Feature:** Autoren werden automatisch als Tags hinzugefügt!
+
+**Funktion:**
+- Autor aus Frontmatter wird extrahiert (`author`, `#staticSiteGenerator.author`, `#commonMetadata.creator`)
+- Autor-Name wird automatisch als Tag im Format `Vorname_Nachname` hinzugefügt
+- Leerzeichen werden durch Unterstriche ersetzt
+- Mehrfache Unterstriche werden konsolidiert
+
+**Beispiel:**
+```yaml
+author: Jörg Lohrer
+tags:
+  - OER
+  - Community
+```
+
+**Ergebnis:**
+- Tags in WordPress: `OER`, `Community`, `Jörg_Lohrer`
+- Filterung nach Autoren über WordPress-Tag-Taxonomie möglich
+
+**Technische Details:**
+- Neue Funktion `format_author_as_tag()` in `markdown_parser.py`
+- Autor-Tag wird nur hinzugefügt, wenn Autor vorhanden und noch nicht in Tags
+- Integration in `extract_wordpress_metadata()` nach Autor-Extraktion
+
+---
+
 ## Version 0.2.2 (2025-10-01)

 ### Bugfix - Kritisch! 🐛
--- a/QUICKSTART.md
+++ b/QUICKSTART.md
@ -43,14 +43,26 @@ source .venv/bin/activate
 python workflow.py posts.yaml
 ```

-### 3. Ganzes Repository (Forgejo/Gitea) ⚠️ Experimentell
+### 3. Ganzes Repository (Forgejo/Gitea)

+**Alle Markdown-Dateien importieren:**
 ```bash
 source .venv/bin/activate
 python workflow.py --repo "https://codeberg.org/user/repo" main
 ```

-**Hinweis:** Diese Funktion wurde noch nicht ausgiebig getestet. Nutzen Sie zunächst Modus 1 oder 2.
+**Nur bestimmte Dateien importieren (mit Filter):**
+```bash
+source .venv/bin/activate
+python workflow.py --repo "https://git.rpi-virtuell.de/Comenius-Institut/FOERBICO_und_rpi-virtuell" main --filter 'Website/content/posts/*/index.md'
+```
+
+**Filter-Beispiele:**
+- `'posts/*/index.md'` - Nur index.md in posts/-Unterverzeichnissen
+- `'Website/content/posts/*/index.md'` - Mit vollständigem Pfad
+- `'content/*.md'` - Alle .md-Dateien direkt in content/
+
+**Hinweis:** Der Filter unterstützt Wildcards (`*`) für Verzeichnisnamen.

 ## Schnellstart-Schritte

--- a/README.md
+++ b/README.md
@ -7,9 +7,10 @@ Automatisierter Workflow zum Erstellen von WordPress-Beiträgen aus Markdown-Dat
 ## ⚠️ Bekannte Einschränkungen

 **Autor-Zuordnung:**
- Beiträge werden aktuell immer dem importierenden WordPress-Benutzer zugeordnet
- Der `author` aus dem Frontmatter wird extrahiert, aber nicht mit WordPress-Benutzern abgeglichen
- **Zu entwickeln:** Automatisches Mapping von Frontmatter-Autoren zu WordPress-User-IDs oder manuelle Zuordnungs-Konfiguration
+- Beiträge werden immer dem importierenden WordPress-Benutzer zugeordnet (WordPress REST-API Limitation)
+- Der `author` aus dem Frontmatter wird automatisch als **Tag** im Format `Vorname_Nachname` hinzugefügt
+- Beispiel: `author: Jörg Lohrer` → Tag: `Jörg_Lohrer`
+- Dies ermöglicht die Filterung nach Autoren über die WordPress-Tag-Taxonomie

 **Forgejo-Repository-Import:**
 - Die Batch-Verarbeitung ganzer Repositories wurde noch nicht ausreichend getestet
@ -21,6 +22,7 @@ Automatisierter Workflow zum Erstellen von WordPress-Beiträgen aus Markdown-Dat
 ## Features

 - ✅ **Automatische Metadaten-Extraktion**: name, description, tags, image, author aus YAML-Frontmatter
+- ✅ **Autor-zu-Tag-Mapping**: Autoren werden automatisch als Tags im Format `Vorname_Nachname` hinzugefügt
 - ✅ **Drei Verwendungsmodi**: Einzelne URL, YAML-Batch, Forgejo-Repository
 - ✅ **Duplikatsprüfung**: Verhindert das doppelte Erstellen von Beiträgen und Medien
 - ✅ **Markdown zu HTML**: Automatische Konvertierung von Markdown-Inhalten
@ -62,7 +64,7 @@ Automatisierter Workflow zum Erstellen von WordPress-Beiträgen aus Markdown-Dat
   ```env
   WORDPRESS_URL=https://news.rpi-virtuell.de
   WORDPRESS_USERNAME=ihr_benutzername
-   WORDPRESS_APP_PASSWORD=UIVI 4Tdy oojL 9iZG g3X2 iAn5
+   WORDPRESS_APP_PASSWORD=UIVI 23H6 oojL 9iZG g3X2 Aon5
   ```

 ## WordPress Anwendungspasswort erstellen
--- a/markdown_parser.py
+++ b/markdown_parser.py
@ -39,6 +39,29 @@ def extract_frontmatter(markdown_content: str) -> tuple[Optional[Dict[str, Any]]
        return None, markdown_content


+def format_author_as_tag(author_name: str) -> str:
+    """
+    Formatiert einen Autornamen als Tag im Format Vorname_Nachname
+    
+    Args:
+        author_name: Autorenname (z.B. "Max Mustermann" oder "Max")
+        
+    Returns:
+        Tag-formatierter Name (z.B. "Max_Mustermann")
+    """
+    # Entferne führende/nachfolgende Leerzeichen
+    author_name = author_name.strip()
+    
+    # Ersetze Leerzeichen durch Unterstriche
+    tag_name = author_name.replace(' ', '_')
+    
+    # Entferne mehrfache Unterstriche
+    while '__' in tag_name:
+        tag_name = tag_name.replace('__', '_')
+    
+    return tag_name
+
+
 def extract_wordpress_metadata(frontmatter: Dict[str, Any], 
                               default_author: str = "admin") -> Dict[str, Any]:
    """
@ -94,6 +117,10 @@ def extract_wordpress_metadata(frontmatter: Dict[str, Any],
        elif isinstance(tags, str):
            metadata['tags'] = [t.strip() for t in tags.split(',')]
    
+    # Initialisiere tags falls nicht vorhanden
+    if 'tags' not in metadata:
+        metadata['tags'] = []
+    
    # Kategorien extrahieren (falls vorhanden)
    if 'categories' in frontmatter:
        categories = frontmatter['categories']
@ -134,6 +161,59 @@ def extract_wordpress_metadata(frontmatter: Dict[str, Any],
    if 'author' not in metadata:
        metadata['author'] = default_author
    
+    # Alle Autoren als Tags hinzufügen (Format: Vorname_Nachname)
+    # Sammle alle Autoren aus verschiedenen Quellen
+    all_authors = []
+    
+    # Aus direktem author-Feld
+    if 'author' in frontmatter:
+        author = frontmatter['author']
+        if isinstance(author, list):
+            all_authors.extend(author)
+        elif isinstance(author, str):
+            all_authors.append(author)
+    
+    # Aus #staticSiteGenerator
+    if isinstance(frontmatter.get('#staticSiteGenerator'), dict):
+        static_gen = frontmatter['#staticSiteGenerator']
+        if 'author' in static_gen:
+            author = static_gen['author']
+            if isinstance(author, list):
+                all_authors.extend(author)
+            elif isinstance(author, str):
+                all_authors.append(author)
+    
+    # Aus #commonMetadata.creator
+    if isinstance(frontmatter.get('#commonMetadata'), dict):
+        common = frontmatter['#commonMetadata']
+        if 'creator' in common:
+            creator = common['creator']
+            if isinstance(creator, list):
+                for c in creator:
+                    if isinstance(c, dict):
+                        given = c.get('givenName', '')
+                        family = c.get('familyName', '')
+                        full_name = f"{given} {family}".strip()
+                        if full_name:
+                            all_authors.append(full_name)
+            elif isinstance(creator, dict):
+                given = creator.get('givenName', '')
+                family = creator.get('familyName', '')
+                full_name = f"{given} {family}".strip()
+                if full_name:
+                    all_authors.append(full_name)
+    
+    # Duplikate entfernen und als Tags hinzufügen
+    seen_authors = set()
+    for author_name in all_authors:
+        if author_name and author_name not in seen_authors:
+            seen_authors.add(author_name)
+            author_tag = format_author_as_tag(author_name)
+            if author_tag and author_tag not in metadata.get('tags', []):
+                if 'tags' not in metadata:
+                    metadata['tags'] = []
+                metadata['tags'].append(author_tag)
+    
    # Status extrahieren (falls vorhanden)
    if 'status' in frontmatter:
        metadata['status'] = frontmatter['status']
--- a/wordpress_api.py
+++ b/wordpress_api.py
@ -46,6 +46,13 @@ class WordPressAPI:
        response.raise_for_status()
        return response
    
+    def _put(self, endpoint: str, data: Optional[Dict] = None) -> requests.Response:
+        """PUT-Request an WordPress API (für Updates)"""
+        url = urljoin(self.api_base, endpoint)
+        response = self.session.put(url, json=data)
+        response.raise_for_status()
+        return response
+    
    def check_post_exists(self, title: str) -> Optional[int]:
        """
        Prüft, ob ein Beitrag mit dem Titel bereits existiert
@ -184,6 +191,7 @@ class WordPressAPI:
                   featured_media: Optional[int] = None,
                   categories: Optional[List[int]] = None,
                   tags: Optional[List[int]] = None,
+                   excerpt: Optional[str] = None,
                   check_duplicate: bool = True,
                   **kwargs) -> Optional[int]:
        """
@ -202,12 +210,22 @@ class WordPressAPI:
        Returns:
            Post-ID des erstellten Beitrags, oder None bei Fehler
        """
-        # Duplikatsprüfung
+        # Duplikatsprüfung - bei Duplikat Update durchführen
        if check_duplicate:
            existing_id = self.check_post_exists(title)
            if existing_id:
-                print(f"Beitrag '{title}' existiert bereits (ID: {existing_id})")
-                return existing_id
+                print(f"Beitrag '{title}' existiert bereits (ID: {existing_id}) - Aktualisiere...")
+                return self.update_post(
+                    post_id=existing_id,
+                    title=title,
+                    content=content,
+                    status=status,
+                    featured_media=featured_media,
+                    categories=categories,
+                    tags=tags,
+                    excerpt=excerpt,
+                    **kwargs
+                )
        
        # Post-Daten zusammenstellen
        post_data = {
@ -217,6 +235,8 @@ class WordPressAPI:
            **kwargs
        }
        
+        if excerpt:
+            post_data['excerpt'] = excerpt
        if featured_media:
            post_data['featured_media'] = featured_media
        if categories:
@ -257,6 +277,83 @@ class WordPressAPI:
                print(f"Details: {e.response.text}")
            return None
    
+    def update_post(self, post_id: int, title: Optional[str] = None, 
+                   content: Optional[str] = None, status: Optional[str] = None,
+                   featured_media: Optional[int] = None, 
+                   categories: Optional[List[int]] = None,
+                   tags: Optional[List[int]] = None,
+                   excerpt: Optional[str] = None,
+                   **kwargs) -> Optional[int]:
+        """
+        Aktualisiert einen existierenden WordPress-Beitrag
+        
+        Args:
+            post_id: ID des zu aktualisierenden Beitrags
+            title: Neuer Titel (optional)
+            content: Neuer Inhalt (optional)
+            status: Neuer Status (optional)
+            featured_media: ID des Beitragsbilds (optional)
+            categories: Liste der Kategorie-IDs (optional)
+            tags: Liste der Tag-IDs (optional)
+            excerpt: Auszug (optional)
+            **kwargs: Weitere WordPress-Post-Felder
+            
+        Returns:
+            Post-ID des aktualisierten Beitrags, oder None bei Fehler
+        """
+        # Post-Daten zusammenstellen (nur Felder die gesetzt sind)
+        post_data = {**kwargs}
+        
+        if title is not None:
+            post_data['title'] = title
+        if content is not None:
+            post_data['content'] = content
+        if status is not None:
+            post_data['status'] = status
+        if excerpt is not None:
+            post_data['excerpt'] = excerpt
+        if featured_media is not None:
+            post_data['featured_media'] = featured_media
+        if categories is not None:
+            post_data['categories'] = categories
+        if tags is not None:
+            post_data['tags'] = tags
+        
+        # Debug: Zeige was aktualisiert wird
+        print(f"Aktualisiere Beitrag (ID: {post_id}):")
+        if title:
+            print(f"  - Titel: {title}")
+        if status:
+            print(f"  - Status: {status}")
+        if tags:
+            print(f"  - Tags: {tags}")
+        if categories:
+            print(f"  - Kategorien: {categories}")
+        if 'date' in post_data:
+            print(f"  - Datum: {post_data['date']}")
+        if 'date_gmt' in post_data:
+            print(f"  - Datum GMT: {post_data['date_gmt']}")
+        
+        # Beitrag aktualisieren
+        try:
+            response = self._put(f'posts/{post_id}', data=post_data)
+            post = response.json()
+            print(f"✅ Beitrag aktualisiert (ID: {post_id}, Status: {post.get('status')})")
+            
+            # Debug: Zeige was WordPress zurückgibt
+            if 'tags' in post and post['tags']:
+                print(f"   WordPress-Tags: {post['tags']}")
+            if 'date' in post:
+                print(f"   WordPress-Datum: {post['date']}")
+            
+            return post_id
+            
+        except requests.exceptions.RequestException as e:
+            print(f"Fehler beim Aktualisieren des Beitrags: {e}")
+            if hasattr(e.response, 'text'):
+                print(f"Details: {e.response.text}")
+            return None
+    
    def get_categories(self, search: Optional[str] = None) -> List[Dict[str, Any]]:
        """
        Holt alle verfügbaren Kategorien oder sucht nach einer bestimmten Kategorie
--- a/workflow.py
+++ b/workflow.py
@ -231,7 +231,7 @@ def process_post(wp_api: WordPressAPI, post_config: Dict[str, Any],
        )
    
    # Status
-    status = metadata.get('status') or post_config.get('status') or global_settings.get('default_status', 'draft')
+    status = metadata.get('status') or post_config.get('status') or global_settings.get('default_status', 'publish')
    
    # Excerpt
    excerpt = metadata.get('excerpt') or post_config.get('excerpt', '')
@ -279,13 +279,15 @@ def process_post(wp_api: WordPressAPI, post_config: Dict[str, Any],
    return post_id


-def fetch_forgejo_repo_markdown_files(repo_url: str, branch: str = 'main') -> List[str]:
+def fetch_forgejo_repo_markdown_files(repo_url: str, branch: str = 'main', 
+                                      path_filter: str = None) -> List[str]:
    """
    Holt alle Markdown-URLs aus einem Forgejo-Repository
    
    Args:
        repo_url: URL zum Repository (z.B. https://codeberg.org/user/repo)
        branch: Branch-Name (Standard: main)
+        path_filter: Optionaler Filter für Pfade (z.B. 'posts/*/index.md')
        
    Returns:
        Liste von URLs zu Markdown-Dateien
@ -302,17 +304,23 @@ def fetch_forgejo_repo_markdown_files(repo_url: str, branch: str = 'main') -> Li
    owner = parts[-2]
    repo = parts[-1]
    
-    # API-URL ermitteln
+    # Basis-URL ermitteln (für Raw-URLs)
    if 'codeberg.org' in repo_url:
+        base_url = 'https://codeberg.org'
        api_base = 'https://codeberg.org/api/v1'
+    elif 'git.rpi-virtuell.de' in repo_url:
+        base_url = 'https://git.rpi-virtuell.de'
+        api_base = 'https://git.rpi-virtuell.de/api/v1'
    elif 'gitea' in repo_url or 'forgejo' in repo_url:
        # Generischer Ansatz für selbst-gehostete Instanzen
        base_parts = repo_url.split('/')[:3]
-        api_base = '/'.join(base_parts) + '/api/v1'
+        base_url = '/'.join(base_parts)
+        api_base = base_url + '/api/v1'
    else:
        print(f"Warnung: Unbekannte Forgejo-Instanz, versuche generischen API-Pfad")
        base_parts = repo_url.split('/')[:3]
-        api_base = '/'.join(base_parts) + '/api/v1'
+        base_url = '/'.join(base_parts)
+        api_base = base_url + '/api/v1'
    
    api_url = f"{api_base}/repos/{owner}/{repo}/git/trees/{branch}?recursive=true"
    
@ -324,8 +332,17 @@ def fetch_forgejo_repo_markdown_files(repo_url: str, branch: str = 'main') -> Li
        markdown_files = []
        for item in data.get('tree', []):
            if item['type'] == 'blob' and item['path'].endswith('.md'):
+                # Filter anwenden falls vorhanden
+                if path_filter:
+                    # Konvertiere Wildcard-Filter zu regex-Pattern
+                    # z.B. 'posts/*/index.md' -> '^posts/[^/]+/index\.md$'
+                    import re
+                    pattern = path_filter.replace('*', '[^/]+').replace('.', r'\.')
+                    if not re.match(f'^{pattern}$', item['path']):
+                        continue
+                
                # Raw-URL konstruieren
-                raw_url = f"https://codeberg.org/{owner}/{repo}/raw/branch/{branch}/{item['path']}"
+                raw_url = f"{base_url}/{owner}/{repo}/raw/branch/{branch}/{item['path']}"
                markdown_files.append(raw_url)
        
        return markdown_files
@ -361,7 +378,7 @@ def main():
            # Erstelle minimale Konfiguration
            post_config = {'url': arg}
            global_settings = {
-                'default_status': 'draft',
+                'default_status': 'publish',
                'default_author': 'admin',
                'skip_duplicates': True,
                'skip_duplicate_media': True
@ -385,12 +402,21 @@ def main():
            repo_index = sys.argv.index('--forgejo-repo') if '--forgejo-repo' in sys.argv else sys.argv.index('--repo')
            if len(sys.argv) > repo_index + 1:
                repo_url = sys.argv[repo_index + 1]
-                branch = sys.argv[repo_index + 2] if len(sys.argv) > repo_index + 2 else 'main'
+                branch = sys.argv[repo_index + 2] if len(sys.argv) > repo_index + 2 and not sys.argv[repo_index + 2].startswith('--') else 'main'
+                
+                # Pfad-Filter für spezifische Dateien (z.B. nur index.md in posts/)
+                path_filter = None
+                if '--filter' in sys.argv:
+                    filter_index = sys.argv.index('--filter')
+                    if len(sys.argv) > filter_index + 1:
+                        path_filter = sys.argv[filter_index + 1]
                
                print(f"Forgejo-Modus: Verarbeite Repository: {repo_url}")
                print(f"Branch: {branch}")
+                if path_filter:
+                    print(f"Filter: {path_filter}")
                
-                markdown_urls = fetch_forgejo_repo_markdown_files(repo_url, branch)
+                markdown_urls = fetch_forgejo_repo_markdown_files(repo_url, branch, path_filter)
                
                if not markdown_urls:
                    print("Keine Markdown-Dateien im Repository gefunden")
@ -410,7 +436,7 @@ def main():
                wp_api = WordPressAPI(wp_url, wp_username, wp_password)
                
                global_settings = {
-                    'default_status': 'draft',
+                    'default_status': 'publish',
                    'default_author': 'admin',
                    'skip_duplicates': True,
                    'skip_duplicate_media': True
@ -450,9 +476,13 @@ def main():
    if not os.path.exists(config_file):
        print(f"Fehler: Konfigurationsdatei '{config_file}' nicht gefunden")
        print("\nVerwendung:")
-        print("  python workflow.py [config.yaml]              # YAML-Konfiguration")
-        print("  python workflow.py <URL>                      # Einzelne Markdown-URL")
-        print("  python workflow.py --repo <REPO_URL> [branch] # Forgejo-Repository")
+        print("  python workflow.py [config.yaml]                    # YAML-Konfiguration")
+        print("  python workflow.py <URL>                            # Einzelne Markdown-URL")
+        print("  python workflow.py --repo <REPO_URL> [branch]       # Forgejo-Repository (alle .md)")
+        print("  python workflow.py --repo <REPO_URL> [branch] --filter <pattern>  # Mit Filter")
+        print("\nFilter-Beispiele:")
+        print("  --filter 'posts/*/index.md'                         # Nur index.md in posts/-Unterverzeichnissen")
+        print("  --filter 'Website/content/posts/*/index.md'         # Mit vollständigem Pfad")
        sys.exit(1)
    
    print(f"Lade Konfiguration aus: {config_file}")
Author	SHA1	Message	Date
Jörg Lohrer	fbd4745afb	a	2025-11-05 06:25:26 +01:00
Jörg Lohrer	a931ff9ace	Docs: Add filter examples to QUICKSTART.md - Added --filter option examples for repository imports - Included practical filter patterns (posts/*/index.md, etc.) - Removed experimental warning as feature is now tested - Added real-world example with FOERBICO repository	2025-11-05 06:12:34 +01:00
Jörg Lohrer	86717185c4	Feature: Add path filter for Forgejo repository imports - Added path_filter parameter to fetch_forgejo_repo_markdown_files() - Filter supports wildcard patterns (e.g., 'posts//index.md') - Fixed hardcoded base URL - now detects git.rpi-virtuell.de and other instances - Added --filter command line option for repo mode - Updated help text with filter examples - Enables selective import of specific markdown files from repository Usage: python workflow.py --repo <URL> [branch] --filter 'Website/content/posts//index.md' Example: Imports only index.md files from posts subdirectories (59 files found)	2025-11-05 06:11:15 +01:00
Jörg Lohrer	98d7244e9d	Change default post status from 'draft' to 'publish' - Changed default_status in direct URL mode from 'draft' to 'publish' - Changed default_status in Forgejo repository mode from 'draft' to 'publish' - Changed fallback in status determination from 'draft' to 'publish' - Posts without creativeWorkStatus in frontmatter will now be published immediately - Can still be overridden by setting creativeWorkStatus in frontmatter or status in YAML config	2025-11-05 06:02:31 +01:00
Jörg Lohrer	fb9720fb2a	Feature: Update existing posts instead of skipping - Added update_post() method to WordPress API client - Added _put() method for HTTP PUT requests - Modified create_post() to call update_post() when duplicate is found - Existing posts now get updated with latest content, tags, categories, dates, etc. - Prevents manual deletion and re-creation workflow - Added excerpt as explicit parameter to create_post() - Debug output shows 'Aktualisiere...' message when updating Example: Re-running import on existing post now updates all fields including newly added author tags	2025-11-05 05:45:24 +01:00
Jörg Lohrer	aa50145633	Fix: Support multiple authors as tags - Previously only first author from list was converted to tag - Now collects all authors from all sources (author, #staticSiteGenerator.author, #commonMetadata.creator) - Handles both single authors and author lists - Removes duplicates if same author appears in multiple sources - All authors are added as individual tags in Vorname_Nachname format - Example: ['Florian Mayrhofer', 'Gina Buchwald-Chassée'] → tags 'Florian_Mayrhofer', 'Gina_Buchwald-Chassée'	2025-11-05 05:40:46 +01:00
Jörg Lohrer	85f58e2528	Feature: Map authors as tags in Vorname_Nachname format - Added format_author_as_tag() function to convert author names to tag format - Author names automatically added to tags with spaces replaced by underscores - Example: 'Jörg Lohrer' → 'Jörg_Lohrer' tag - Enables author filtering via WordPress tag taxonomy - Updated documentation to reflect new author handling approach - Version bump to 0.3.0	2025-11-05 05:32:20 +01:00