# 20min.ch Comment Analysis System A complete system for fetching, analyzing, and monitoring comments from 20min.ch articles. The system consists of two main components: 1. **Comment Fetcher** (`comment_fetcher.py`): A script that fetches comments for specific articles via the 20min.ch API 2. **Comment Pipeline** (`comment_pipeline.py`): An automated pipeline that finds recent articles, fetches their comments, and performs analysis ## Features - **Article Discovery**: Automatically find recent articles from 20min.ch - **API-based Comment Fetching**: Use official APIs rather than web scraping for reliability - **Comment Search**: Search for specific keywords within comments - **Automated Analysis**: Generate statistics and insights from comments - **Data Export**: Save all data as structured JSON files for further processing ## Requirements - Python 3.6+ - Required packages: - requests - beautifulsoup4 - tqdm - python-dotenv ## Installation 1. Clone this repository or download the files. 2. Create a virtual environment (optional but recommended): ```bash python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate ``` 3. Install dependencies: ```bash pip install -r requirements.txt ``` ## Usage ### Comment Fetcher The comment fetcher allows you to fetch comments for specific 20min.ch articles using their ID or URL. ```bash # Fetch comments by article ID python comment_fetcher.py --id 103339848 --output trump_article_comments.json # Search for specific keywords python comment_fetcher.py --id 103339848 --search EU SVP # Use case-sensitive search python comment_fetcher.py --id 103339848 --search EU --case-sensitive ``` For more options: ```bash python comment_fetcher.py --help ``` ### Comment Pipeline The comment pipeline automates the entire process of finding articles, fetching comments, and analyzing them. ```bash # Run the complete pipeline with default settings (10 latest articles) python comment_pipeline.py # Process a specific number of articles python comment_pipeline.py --articles 20 # Process articles from a specific category python comment_pipeline.py --category wirtschaft # Skip the analysis step python comment_pipeline.py --skip-analysis ``` For more options: ```bash python comment_pipeline.py --help ``` ## Output Files ### Comment Fetcher - Saves comment data to JSON files (default: `comments.json` or specified output file) - Shows comment summary in the terminal ### Comment Pipeline - **Data directory**: Contains JSON files with comments for each article - **Analysis directory**: Contains analysis results - `analysis_[timestamp].json`: Complete analysis data in JSON format - `summary_[timestamp].txt`: Human-readable summary of key insights ## Analysis The system generates the following insights: - Total comment and reply counts - Top articles by comment count - Top commenters (users with most comments) - Reaction statistics (awesome, bad, nonsense, smart, exact, unnecessary) ## Scheduling Regular Analysis You can set up cron jobs or task scheduler to run the pipeline regularly: ### Example Cron Job (Linux/Mac) ```bash # Run daily at 6 AM 0 6 * * * cd /path/to/comment-system && /path/to/python comment_pipeline.py --articles 50 > /path/to/logs/pipeline.log 2>&1 ``` ### Example Windows Task 1. Create a batch file (.bat): ```bat @echo off cd C:\path\to\comment-system C:\path\to\python.exe comment_pipeline.py --articles 50 ``` 2. Schedule it using Windows Task Scheduler ## API Endpoints The system uses the following 20min.ch API endpoints: - Comments API: `https://api.20min.ch/comment/v1/comments` - Comment Reactions API: `https://api.20min.ch/comment/v2/reactions` - User Reactions API: `https://api.20min.ch/comment/v1/user-reactions` ## Data Structure ### Article JSON Structure Each article's comments are saved in a JSON file with the following structure: ```json { "commentingEnabled": true|false, "comments": [ { "id": "comment_id", "authorNickname": "author name", "body": "comment text", "createdAt": "timestamp", "reactions": { "awesome": 10, "bad": 5, "nonsense": 3, "smart": 7, "exact": 12, "unnecessary": 2 }, "replies": [ { "id": "reply_id", "authorNickname": "author name", "body": "reply text", "createdAt": "timestamp", "reactions": { ... } } ] } ] } ``` ### Analysis JSON Structure The analysis is saved in a JSON file with the following structure: ```json { "total_comments": 124, "total_replies": 53, "total_interactions": 177, "top_articles": [ ... ], "top_commenters": [ ... ], "reaction_stats": { ... } } ``` ## License This project is for educational purposes only. Use responsibly and respect the terms of service of 20min.ch.