# 20min.ch Comment Analysis System

A complete system for fetching, analyzing, and monitoring comments from 20min.ch articles. The system consists of two main components:

1. **Comment Fetcher** (`comment_fetcher.py`): A script that fetches comments for specific articles via the 20min.ch API
2. **Comment Pipeline** (`comment_pipeline.py`): An automated pipeline that finds recent articles, fetches their comments, and performs analysis

## Features

- **Article Discovery**: Automatically find recent articles from 20min.ch
- **API-based Comment Fetching**: Use official APIs rather than web scraping for reliability
- **Comment Search**: Search for specific keywords within comments
- **Automated Analysis**: Generate statistics and insights from comments 
- **Data Export**: Save all data as structured JSON files for further processing

## Requirements

- Python 3.6+
- Required packages:
  - requests
  - beautifulsoup4
  - tqdm
  - python-dotenv

## Installation

1. Clone this repository or download the files.
2. Create a virtual environment (optional but recommended):
   ```bash
   python -m venv venv
   source venv/bin/activate  # On Windows: venv\Scripts\activate
   ```
3. Install dependencies:
   ```bash
   pip install -r requirements.txt
   ```

## Usage

### Comment Fetcher

The comment fetcher allows you to fetch comments for specific 20min.ch articles using their ID or URL.

```bash
# Fetch comments by article ID
python comment_fetcher.py --id 103339848 --output trump_article_comments.json

# Search for specific keywords
python comment_fetcher.py --id 103339848 --search EU SVP

# Use case-sensitive search
python comment_fetcher.py --id 103339848 --search EU --case-sensitive
```

For more options:
```bash
python comment_fetcher.py --help
```

### Comment Pipeline

The comment pipeline automates the entire process of finding articles, fetching comments, and analyzing them.

```bash
# Run the complete pipeline with default settings (10 latest articles)
python comment_pipeline.py

# Process a specific number of articles
python comment_pipeline.py --articles 20

# Process articles from a specific category
python comment_pipeline.py --category wirtschaft

# Skip the analysis step
python comment_pipeline.py --skip-analysis
```

For more options:
```bash
python comment_pipeline.py --help
```

## Output Files

### Comment Fetcher

- Saves comment data to JSON files (default: `comments.json` or specified output file)
- Shows comment summary in the terminal

### Comment Pipeline

- **Data directory**: Contains JSON files with comments for each article
- **Analysis directory**: Contains analysis results
  - `analysis_[timestamp].json`: Complete analysis data in JSON format
  - `summary_[timestamp].txt`: Human-readable summary of key insights

## Analysis

The system generates the following insights:

- Total comment and reply counts
- Top articles by comment count
- Top commenters (users with most comments)
- Reaction statistics (awesome, bad, nonsense, smart, exact, unnecessary)

## Scheduling Regular Analysis

You can set up cron jobs or task scheduler to run the pipeline regularly:

### Example Cron Job (Linux/Mac)

```bash
# Run daily at 6 AM
0 6 * * * cd /path/to/comment-system && /path/to/python comment_pipeline.py --articles 50 > /path/to/logs/pipeline.log 2>&1
```

### Example Windows Task

1. Create a batch file (.bat):
   ```bat
   @echo off
   cd C:\path\to\comment-system
   C:\path\to\python.exe comment_pipeline.py --articles 50
   ```
2. Schedule it using Windows Task Scheduler

## API Endpoints

The system uses the following 20min.ch API endpoints:

- Comments API: `https://api.20min.ch/comment/v1/comments`
- Comment Reactions API: `https://api.20min.ch/comment/v2/reactions`
- User Reactions API: `https://api.20min.ch/comment/v1/user-reactions`

## Data Structure

### Article JSON Structure

Each article's comments are saved in a JSON file with the following structure:

```json
{
  "commentingEnabled": true|false,
  "comments": [
    {
      "id": "comment_id",
      "authorNickname": "author name",
      "body": "comment text",
      "createdAt": "timestamp",
      "reactions": {
        "awesome": 10,
        "bad": 5,
        "nonsense": 3,
        "smart": 7,
        "exact": 12,
        "unnecessary": 2
      },
      "replies": [
        {
          "id": "reply_id",
          "authorNickname": "author name",
          "body": "reply text",
          "createdAt": "timestamp",
          "reactions": { ... }
        }
      ]
    }
  ]
}
```

### Analysis JSON Structure

The analysis is saved in a JSON file with the following structure:

```json
{
  "total_comments": 124,
  "total_replies": 53,
  "total_interactions": 177,
  "top_articles": [ ... ],
  "top_commenters": [ ... ],
  "reaction_stats": { ... }
}
```

## License

This project is for educational purposes only. Use responsibly and respect the terms of service of 20min.ch.