Compare commits
10 Commits
c41a2907b8
...
debb371c69
Author | SHA1 | Date |
---|---|---|
|
debb371c69 | |
|
fef5023d74 | |
|
c1cb0f46a4 | |
|
6a33715356 | |
|
03caebfb0b | |
|
bf830fd330 | |
|
c5306bb56e | |
|
7a1e35ec00 | |
|
b3246baf31 | |
|
50439b84bb |
|
@ -0,0 +1,239 @@
|
|||
# RuWiki -> SchoolNotes
|
||||
|
||||
Асинхронная система упрощения статей RuWiki для школьного образования с использованием LLM OpenAI.
|
||||
|
||||
## Описание
|
||||
|
||||
Система автоматически:
|
||||
- Загружает статьи из русской Википедии
|
||||
- Упрощает содержимое для учеников 8-11 классов с помощью OpenAI GPT
|
||||
- Сохраняет оригинальный и упрощённый текст в SQLite
|
||||
- Обрабатывает статьи параллельно
|
||||
|
||||
## Архитектура
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
A[CLI / AsyncRunner] -->|URLs| B[TaskQueue<br>(asyncio + anyio)]
|
||||
B --> C[Worker N<br>(coroutine)]
|
||||
C --> D[RuWikiAdapter]
|
||||
C --> E[TextSplitter]
|
||||
C --> F[LLMProvider]
|
||||
C --> G[WriteQueue]
|
||||
G --> H[(SQLite<br>aiosqlite)]
|
||||
```
|
||||
|
||||
## Установка
|
||||
|
||||
### Требования
|
||||
- Python ≥ 3.10
|
||||
- OpenAI API ключ
|
||||
|
||||
### Установка зависимостей
|
||||
|
||||
```bash
|
||||
# Создание виртуального окружения
|
||||
python -m venv .venv
|
||||
source .venv/bin/activate # Linux/Mac
|
||||
# или
|
||||
.venv\Scripts\activate # Windows
|
||||
|
||||
# Установка зависимостей
|
||||
pip install -r requirements.txt
|
||||
|
||||
# Для разработки
|
||||
pip install -r requirements-dev.txt
|
||||
```
|
||||
|
||||
### Конфигурация
|
||||
|
||||
1. Скопируйте `env_example.txt` в `.env`:
|
||||
```bash
|
||||
cp env_example.txt .env
|
||||
```
|
||||
|
||||
2. Настройте конфигурацию в env:
|
||||
```env
|
||||
OPENAI_API_KEY=your_openai_api_key_here
|
||||
...
|
||||
```
|
||||
|
||||
## Использование
|
||||
|
||||
### Основные команды
|
||||
|
||||
```bash
|
||||
# Обработка статей из файла
|
||||
python -m src.cli process input.txt
|
||||
|
||||
# Принудительная обработка (повторная)
|
||||
python -m src.cli process input.txt --force
|
||||
|
||||
# Ограничение количества статей
|
||||
python -m src.cli process input.txt --max-articles 10
|
||||
|
||||
# Настройка количества workers
|
||||
python -m src.cli process input.txt --max-workers 5
|
||||
|
||||
# Проверка здоровья системы
|
||||
python -m src.cli health
|
||||
|
||||
# Просмотр статей в БД
|
||||
python -m src.cli list-articles --limit 20
|
||||
|
||||
# Статистика по файлу с URL
|
||||
python -m src.cli stats input.txt
|
||||
```
|
||||
|
||||
### Формат файла с URL
|
||||
|
||||
```
|
||||
# input.txt
|
||||
https://ru.ruwiki.ru/wiki/Изотопы
|
||||
https://ru.ruwiki.ru/wiki/Вещественное_число
|
||||
https://ru.ruwiki.ru/wiki/Митоз
|
||||
```
|
||||
|
||||
### Примеры
|
||||
|
||||
```bash
|
||||
# Обработка тестовых статей
|
||||
python -m src.cli process input.txt --max-articles 3 --max-workers 2
|
||||
|
||||
# Вывод результатов в JSON
|
||||
python -m src.cli list-articles --format json --status completed
|
||||
|
||||
# Настройка логирования
|
||||
python -m src.cli --log-level DEBUG --log-format json process input.txt
|
||||
```
|
||||
|
||||
## Конфигурация
|
||||
|
||||
Основные параметры в `.env`:
|
||||
|
||||
```env
|
||||
# OpenAI
|
||||
OPENAI_API_KEY=your_openai_api_key_here
|
||||
OPENAI_MODEL=gpt-4o
|
||||
OPENAI_TEMPERATURE=0.0
|
||||
OPENAI_PROXY_URL='socks5h://37.18.73.60:5566' # socks5 recommended
|
||||
|
||||
# База данных
|
||||
DB_PATH=./data/wiki.db
|
||||
|
||||
# Производительность
|
||||
MAX_CONCURRENT_LLM=5
|
||||
OPENAI_RPM=200
|
||||
MAX_CONCURRENT_WIKI=10
|
||||
|
||||
# Логирование
|
||||
LOG_LEVEL=INFO
|
||||
LOG_FORMAT=json
|
||||
|
||||
# Обработка текста
|
||||
CHUNK_SIZE=2000
|
||||
CHUNK_OVERLAP=200
|
||||
|
||||
# Надёжность
|
||||
MAX_RETRIES=3
|
||||
RETRY_DELAY=1.0
|
||||
CIRCUIT_FAILURE_THRESHOLD=5
|
||||
CIRCUIT_RECOVERY_TIMEOUT=60
|
||||
```
|
||||
|
||||
## 📊 Мониторинг
|
||||
|
||||
### Логи
|
||||
```bash
|
||||
# Структурированные логи в JSON
|
||||
python -m src.cli --log-format json process input.txt
|
||||
|
||||
# Отладочные логи
|
||||
python -m src.cli --log-level DEBUG process input.txt
|
||||
```
|
||||
|
||||
### Метрики
|
||||
- Скорость обработки: статей/минуту
|
||||
- Процент успеха
|
||||
- Среднее время обработки
|
||||
- Использование токенов
|
||||
|
||||
### Health Check
|
||||
```bash
|
||||
python -m src.cli health
|
||||
```
|
||||
|
||||
## Тестирование
|
||||
|
||||
```bash
|
||||
# Все тесты
|
||||
python -m pytest
|
||||
|
||||
# С покрытием
|
||||
python -m pytest --cov=src --cov-report=html
|
||||
|
||||
# Без покрытия (быстрее)
|
||||
python -m pytest --no-cov
|
||||
|
||||
# Только unit тесты
|
||||
python -m pytest tests/test_models.py tests/test_adapters.py
|
||||
|
||||
# Интеграционные тесты
|
||||
python -m pytest tests/test_integration.py
|
||||
|
||||
# Конкретный тест
|
||||
python -m pytest tests/test_adapters.py::TestLLMProviderAdapter::test_simplify_text_success -v
|
||||
|
||||
```
|
||||
|
||||
### Статус тестов
|
||||
- ✅ **71 тест** - все проходят
|
||||
- ✅ **Unit тесты** - адаптеры, модели, сервисы
|
||||
- ✅ **Интеграционные тесты** - база данных, файловые операции
|
||||
- ✅ **Системные тесты** - полный цикл обработки
|
||||
|
||||
## Примеры результатов
|
||||
|
||||
### Исходная статья (фрагмент)
|
||||
```wiki
|
||||
'''Изотопы''' (от {{lang-grc|ἴσος}} «равный» и {{lang-grc|τόπος}} «место») — разновидности атомов химического элемента с одинаковым количеством протонов, но различным количеством нейтронов в ядре...
|
||||
```
|
||||
|
||||
### Упрощённая версия
|
||||
```wiki
|
||||
'''Изотопы''' — это разные виды атомов одного элемента.
|
||||
|
||||
== Что такое изотопы ==
|
||||
У атомов одного элемента всегда одинаковое число протонов. Но число нейтронов может быть разным. Атомы с разным числом нейтронов называются изотопами...
|
||||
```
|
||||
|
||||
## Разработка
|
||||
|
||||
### Качество кода
|
||||
```bash
|
||||
# Форматирование
|
||||
black src/ tests/
|
||||
|
||||
# Проверка стиля
|
||||
ruff check src/
|
||||
|
||||
# Типы
|
||||
mypy src/
|
||||
|
||||
# Безопасность
|
||||
bandit -r src/
|
||||
```
|
||||
|
||||
### Структура проекта
|
||||
```
|
||||
ruwiki_test/
|
||||
├── src/
|
||||
│ ├── adapters/ # Адаптеры для внешних API
|
||||
│ ├── models/ # Модели данных
|
||||
│ ├── services/ # Бизнес-логика
|
||||
│ ├── cli.py # CLI интерфейс
|
||||
│ └── runner.py # Основной runner
|
||||
├── tests/ # Тесты
|
||||
├── data/ # База данных
|
||||
└── requirements.txt # Зависимости
|
||||
```
|
|
@ -1,6 +1,7 @@
|
|||
OPENAI_API_KEY=your_openai_api_key_here
|
||||
OPENAI_MODEL=gpt-4o-mini
|
||||
OPENAI_MODEL=gpt-4o
|
||||
OPENAI_TEMPERATURE=0.0
|
||||
OPENAI_PROXY_URL='socks5h://37.18.73.60:5566' # socks5 recommended
|
||||
|
||||
DB_PATH=./data/wiki.db
|
||||
|
||||
|
|
|
@ -2,7 +2,7 @@ anyio>=4.2.0,<5.0.0
|
|||
aiohttp>=3.9.0,<4.0.0
|
||||
|
||||
aiosqlite>=0.19.0,<0.20.0
|
||||
sqlmodel>=0.0.14,<0.0.15
|
||||
sqlalchemy[asyncio]>=2.0.0,<3.0.0
|
||||
|
||||
openai>=1.13.0,<2.0.0
|
||||
tiktoken>=0.5.2,<0.6.0
|
||||
|
|
|
@ -1,12 +1,15 @@
|
|||
import asyncio
|
||||
import time
|
||||
|
||||
import httpx
|
||||
import openai
|
||||
import structlog
|
||||
import tiktoken
|
||||
from openai import AsyncOpenAI
|
||||
from openai.types.chat import ChatCompletion
|
||||
|
||||
from src.models.constants import LLM_MAX_INPUT_TOKENS, MAX_TOKEN_LIMIT_WITH_BUFFER
|
||||
|
||||
from ..models import AppConfig
|
||||
from .base import BaseAdapter, CircuitBreaker, RateLimiter, with_retry
|
||||
|
||||
|
@ -31,7 +34,9 @@ class LLMProviderAdapter(BaseAdapter):
|
|||
super().__init__("llm_adapter")
|
||||
self.config = config
|
||||
|
||||
self.client = AsyncOpenAI(api_key=config.openai_api_key)
|
||||
self.client = AsyncOpenAI(
|
||||
api_key=config.openai_api_key, http_client=self._build_http_client()
|
||||
)
|
||||
|
||||
try:
|
||||
self.tokenizer = tiktoken.encoding_for_model(config.openai_model)
|
||||
|
@ -87,7 +92,7 @@ class LLMProviderAdapter(BaseAdapter):
|
|||
model=self.config.openai_model,
|
||||
messages=messages,
|
||||
temperature=self.config.openai_temperature,
|
||||
max_tokens=1500,
|
||||
max_tokens=MAX_TOKEN_LIMIT_WITH_BUFFER,
|
||||
)
|
||||
return response
|
||||
except openai.RateLimitError as e:
|
||||
|
@ -102,8 +107,8 @@ class LLMProviderAdapter(BaseAdapter):
|
|||
prompt_template: str,
|
||||
) -> tuple[str, int, int]:
|
||||
input_tokens = self.count_tokens(wiki_text)
|
||||
if input_tokens > 6000:
|
||||
raise LLMTokenLimitError(f"Текст слишком длинный: {input_tokens} токенов (лимит 6000)")
|
||||
if input_tokens > LLM_MAX_INPUT_TOKENS:
|
||||
raise LLMTokenLimitError(f"Текст слишком длинный: {input_tokens} токенов")
|
||||
|
||||
try:
|
||||
prompt_text = prompt_template.format(
|
||||
|
@ -142,7 +147,7 @@ class LLMProviderAdapter(BaseAdapter):
|
|||
|
||||
output_tokens = self.count_tokens(simplified_text)
|
||||
|
||||
if output_tokens > 1200:
|
||||
if output_tokens > MAX_TOKEN_LIMIT_WITH_BUFFER:
|
||||
self.logger.warning(
|
||||
"Упрощённый текст превышает лимит",
|
||||
output_tokens=output_tokens,
|
||||
|
@ -179,6 +184,11 @@ class LLMProviderAdapter(BaseAdapter):
|
|||
|
||||
return messages
|
||||
|
||||
def _build_http_client(self) -> httpx.AsyncClient:
|
||||
if self.config.openai_proxy_url:
|
||||
return httpx.AsyncClient(proxy=self.config.openai_proxy_url, timeout=60.0)
|
||||
return httpx.AsyncClient(timeout=60.0)
|
||||
|
||||
async def health_check(self) -> bool:
|
||||
try:
|
||||
test_messages = [{"role": "user", "content": "Ответь 'OK' если всё работает."}]
|
||||
|
|
|
@ -55,7 +55,7 @@ class RuWikiAdapter(BaseAdapter):
|
|||
|
||||
def _create_client(self) -> mwclient.Site:
|
||||
try:
|
||||
site = mwclient.Site("ru.wikipedia.org")
|
||||
site = mwclient.Site("ru.ruwiki.ru")
|
||||
site.api("query", meta="siteinfo")
|
||||
self.logger.info("Соединение с RuWiki установлено")
|
||||
return site
|
||||
|
@ -66,7 +66,7 @@ class RuWikiAdapter(BaseAdapter):
|
|||
@staticmethod
|
||||
def extract_title_from_url(url: str) -> str:
|
||||
parsed = urlparse(url)
|
||||
if "wikipedia.org" not in parsed.netloc:
|
||||
if "ruwiki.ru" not in parsed.netloc:
|
||||
raise ValueError(f"Не является URL википедии: {url}")
|
||||
|
||||
path_parts = parsed.path.split("/")
|
||||
|
|
21
src/cli.py
21
src/cli.py
|
@ -223,9 +223,9 @@ def list_articles(
|
|||
repository = container.get_repository()
|
||||
|
||||
if status:
|
||||
from .models import ProcessingStatus
|
||||
from .models.article_dto import ArticleStatus
|
||||
|
||||
status_enum = ProcessingStatus(status)
|
||||
status_enum = ArticleStatus(status)
|
||||
articles = await repository.get_articles_by_status(status_enum, limit)
|
||||
else:
|
||||
articles = await repository.get_all_articles(limit)
|
||||
|
@ -238,8 +238,7 @@ def list_articles(
|
|||
"title": article.title,
|
||||
"status": article.status.value,
|
||||
"created_at": article.created_at.isoformat(),
|
||||
"token_count_raw": article.token_count_raw,
|
||||
"token_count_simplified": article.token_count_simplified,
|
||||
"simplified_text": article.simplified_text,
|
||||
}
|
||||
for article in articles
|
||||
]
|
||||
|
@ -249,21 +248,13 @@ def list_articles(
|
|||
click.echo("Статьи не найдены")
|
||||
return
|
||||
|
||||
click.echo(f"{'ID':<5} {'Статус':<12} {'Название':<50} {'Токены (исх/упр)':<15}")
|
||||
click.echo("-" * 87)
|
||||
click.echo(f"{'ID':<5} {'Статус':<12} {'Название':<50}")
|
||||
click.echo("-" * 72)
|
||||
|
||||
for article in articles:
|
||||
tokens_info = ""
|
||||
if article.token_count_raw and article.token_count_simplified:
|
||||
tokens_info = f"{article.token_count_raw}/{article.token_count_simplified}"
|
||||
elif article.token_count_raw:
|
||||
tokens_info = f"{article.token_count_raw}/-"
|
||||
|
||||
title = article.title[:47] + "..." if len(article.title) > 50 else article.title
|
||||
|
||||
click.echo(
|
||||
f"{article.id:<5} {article.status.value:<12} {title:<50} {tokens_info:<15}"
|
||||
)
|
||||
click.echo(f"{article.id:<5} {article.status.value:<12} {title:<50}")
|
||||
|
||||
except Exception as e:
|
||||
click.echo(f"Ошибка: {e}", err=True)
|
||||
|
|
|
@ -45,9 +45,6 @@ class DependencyContainer:
|
|||
if self._write_queue:
|
||||
await self._write_queue.stop()
|
||||
|
||||
if self._database_service:
|
||||
self._database_service.close()
|
||||
|
||||
self._initialized = False
|
||||
logger.info("Ресурсы очищены")
|
||||
|
||||
|
@ -141,7 +138,6 @@ class DependencyContainer:
|
|||
return checks
|
||||
|
||||
|
||||
@lru_cache(maxsize=1)
|
||||
def get_container(config: AppConfig | None = None) -> DependencyContainer:
|
||||
if config is None:
|
||||
config = AppConfig()
|
||||
|
|
|
@ -1,15 +1,12 @@
|
|||
from .article import Article, ArticleCreate, ArticleRead, ProcessingStatus
|
||||
from .article_dto import ArticleDTO, ArticleStatus
|
||||
from .commands import ProcessingResult, ProcessingStats, SimplifyCommand
|
||||
from .config import AppConfig
|
||||
from .constants import *
|
||||
|
||||
__all__ = [
|
||||
"AppConfig",
|
||||
"Article",
|
||||
"ArticleCreate",
|
||||
"ArticleRead",
|
||||
"ArticleDTO",
|
||||
"ArticleStatus",
|
||||
"ProcessingResult",
|
||||
"ProcessingStats",
|
||||
"ProcessingStatus",
|
||||
"SimplifyCommand",
|
||||
]
|
||||
|
|
|
@ -1,81 +0,0 @@
|
|||
from __future__ import annotations
|
||||
|
||||
from datetime import datetime, timezone
|
||||
from enum import Enum
|
||||
|
||||
from sqlmodel import Field, SQLModel
|
||||
|
||||
|
||||
class ProcessingStatus(str, Enum):
|
||||
PENDING = "pending"
|
||||
PROCESSING = "processing"
|
||||
COMPLETED = "completed"
|
||||
FAILED = "failed"
|
||||
|
||||
|
||||
class Article(SQLModel, table=True):
|
||||
|
||||
__tablename__ = "articles"
|
||||
|
||||
id: int | None = Field(default=None, primary_key=True)
|
||||
url: str = Field(index=True, unique=True, max_length=500)
|
||||
title: str = Field(max_length=300)
|
||||
raw_text: str = Field(description="Исходный wiki-текст")
|
||||
simplified_text: str | None = Field(
|
||||
default=None,
|
||||
description="Упрощённый текст для школьников",
|
||||
)
|
||||
status: ProcessingStatus = Field(default=ProcessingStatus.PENDING)
|
||||
error_message: str | None = Field(default=None, max_length=1000)
|
||||
token_count_raw: int | None = Field(
|
||||
default=None, description="Количество токенов в исходном тексте"
|
||||
)
|
||||
token_count_simplified: int | None = Field(
|
||||
default=None,
|
||||
description="Количество токенов в упрощённом тексте",
|
||||
)
|
||||
processing_time_seconds: float | None = Field(default=None)
|
||||
created_at: datetime = Field(default_factory=lambda: datetime.now(timezone.utc))
|
||||
updated_at: datetime | None = Field(default=None)
|
||||
|
||||
def mark_processing(self) -> None:
|
||||
self.status = ProcessingStatus.PROCESSING
|
||||
self.updated_at = datetime.now(timezone.utc)
|
||||
|
||||
def mark_completed(
|
||||
self,
|
||||
simplified_text: str,
|
||||
token_count_raw: int,
|
||||
token_count_simplified: int,
|
||||
processing_time: float,
|
||||
) -> None:
|
||||
self.simplified_text = simplified_text
|
||||
self.token_count_raw = token_count_raw
|
||||
self.token_count_simplified = token_count_simplified
|
||||
self.processing_time_seconds = processing_time
|
||||
self.status = ProcessingStatus.COMPLETED
|
||||
self.error_message = None
|
||||
self.updated_at = datetime.now(timezone.utc)
|
||||
|
||||
def mark_failed(self, error_message: str) -> None:
|
||||
self.status = ProcessingStatus.FAILED
|
||||
self.error_message = error_message[:1000]
|
||||
self.updated_at = datetime.now(timezone.utc)
|
||||
|
||||
|
||||
class ArticleCreate(SQLModel):
|
||||
url: str
|
||||
title: str
|
||||
raw_text: str
|
||||
|
||||
|
||||
class ArticleRead(SQLModel):
|
||||
id: int
|
||||
url: str
|
||||
title: str
|
||||
raw_text: str
|
||||
simplified_text: str | None
|
||||
status: ProcessingStatus
|
||||
token_count_raw: int | None
|
||||
token_count_simplified: int | None
|
||||
created_at: datetime
|
|
@ -0,0 +1,22 @@
|
|||
from datetime import datetime
|
||||
from dataclasses import dataclass
|
||||
from enum import Enum
|
||||
from typing import Optional
|
||||
|
||||
|
||||
class ArticleStatus(Enum):
|
||||
PENDING = "pending"
|
||||
SIMPLIFIED = "simplified"
|
||||
FAILED = "failed"
|
||||
|
||||
|
||||
@dataclass
|
||||
class ArticleDTO:
|
||||
url: str
|
||||
title: str
|
||||
raw_text: str
|
||||
status: ArticleStatus
|
||||
created_at: datetime
|
||||
id: Optional[int] = None
|
||||
simplified_text: Optional[str] = None
|
||||
updated_at: Optional[datetime] = None
|
|
@ -19,6 +19,7 @@ class AppConfig(BaseSettings):
|
|||
openai_temperature: float = Field(
|
||||
default=0.0, ge=0.0, le=2.0, description="Температура для LLM"
|
||||
)
|
||||
openai_proxy_url: str | None = Field(description="Proxy URL для OpenAI")
|
||||
|
||||
db_path: str = Field(default="./data/wiki.db", description="Путь к файлу SQLite")
|
||||
|
||||
|
@ -33,7 +34,7 @@ class AppConfig(BaseSettings):
|
|||
log_level: Literal["DEBUG", "INFO", "WARNING", "ERROR"] = Field(default="INFO")
|
||||
log_format: Literal["json", "text"] = Field(default="json")
|
||||
|
||||
chunk_size: int = Field(default=2000, ge=500, le=8000, description="Размер чанка для текста")
|
||||
chunk_size: int = Field(default=2000, ge=500, le=122000, description="Размер чанка для текста")
|
||||
chunk_overlap: int = Field(default=200, ge=0, le=1000, description="Перекрытие между чанками")
|
||||
max_retries: int = Field(default=3, ge=1, le=10, description="Максимум попыток повтора")
|
||||
retry_delay: float = Field(
|
||||
|
|
|
@ -0,0 +1,6 @@
|
|||
LLM_MAX_INPUT_TOKENS = 120000
|
||||
MAX_TOKEN_LIMIT_WITH_BUFFER = 16000
|
||||
ARTICLE_NAME_INDEX = -1
|
||||
MIN_WIKI_PATH_PARTS = 2
|
||||
WIKI_PATH_INDEX = 1
|
||||
WRITE_QUEUE_BATCH_SIZE = 10
|
|
@ -1,28 +1,33 @@
|
|||
### role: system
|
||||
Ты — опытный редактор Рувики и педагог-методист. Твоя задача — адаптировать научные статьи для школьного образования.
|
||||
Ты — опытный редактор Рувики и педагог-методист.
|
||||
Твоя задача — упростить текст научной статьи для школьников 8–11 классов, строго следуя правилам ниже.
|
||||
|
||||
ПРАВИЛА УПРОЩЕНИЯ:
|
||||
1. Сократи текст до ≤ 1000 токенов, сохранив ключевую информацию
|
||||
2. Замени сложные термины на простые аналоги с объяснениями
|
||||
3. Убери избыточные детали, оставь только суть
|
||||
4. Сохрани корректную wiki-разметку (== заголовки ==, '''жирный''', ''курсив'', [[ссылки]])
|
||||
5. Структурируй материал логично: определение → основные свойства → примеры
|
||||
6. Добавь простые примеры для лучшего понимания
|
||||
7. Убери технические подробности, не нужные школьникам
|
||||
|
||||
ЦЕЛЬ: Сделать статью понятной для учеников 8-11 классов, сохранив научную точность.
|
||||
|
||||
ФОРМАТ ОТВЕТА:
|
||||
- Начни сразу с упрощённого wiki-текста
|
||||
- Используй простые предложения
|
||||
- Избегай сложных конструкций
|
||||
- Заверши ответ маркером ###END###
|
||||
**ГЛАВНЫЕ ПРАВИЛА (следуй им без отклонений):**
|
||||
1. Не изменяй и не удаляй **ни одного** фрагмента внутри фигурных скобок `{{…}}`, двойных квадратных `[[…]]`, угловых `<ref>…</ref>` и т. д. — копируй их в ответ дословно.
|
||||
2. Сохрани wikicode-структуру статьи и все исходные секции; допускается лишь сокращать/упрощать свободный текст **внутри** секций.
|
||||
3. Укороти основной текст до ≤ 4000 слов, но не трогай «Примечания» и «Литературу».
|
||||
4. При упрощении:
|
||||
• сложные термины замени на простые аналоги (короткое пояснение в скобках);
|
||||
• убери страницы, глубокие детали, избыточные даты;
|
||||
• оставь только ключевые факты.
|
||||
5. Итоговая структура ровно такая (не добавляй других верхних уровней, НО сохрани все исходные секции):
|
||||
== Определение == (коротко, но простыми словами)
|
||||
== Основные свойства == (список, с объяснением)
|
||||
== Примеры == (несколько примеров)
|
||||
== История == (1–2 предложения)
|
||||
... другие исходные секции с сокращением для школьника
|
||||
== Примечания == (оставь как было)
|
||||
== Литература == (оставь как было)
|
||||
6. В конце ответа напиши `###END###` на отдельной строке.
|
||||
7. **Никогда** не придумывай новый заголовок: бери ровно тот, что указан во входном параметре `{title}`.
|
||||
8. Не добавляй собственные комментарии, подписи или объяснения — вывод = готовая wiki-страница.
|
||||
|
||||
### role: user
|
||||
Статья: {title}
|
||||
Заголовок статьи: {title}
|
||||
|
||||
Исходный текст статьи:
|
||||
<WikiSource>
|
||||
{wiki_source_text}
|
||||
</WikiSource>
|
||||
|
||||
Задание: сократи и упрости текст, следуя инструкциям system-сообщения.
|
||||
Задание: упрости и сократи статью согласно правилам. Выдай только готовый wiki-текст с сохранёнными шаблонами и разметкой. В конце поставь `###END###`.
|
||||
|
|
|
@ -69,7 +69,7 @@ class AsyncRunner:
|
|||
) -> None:
|
||||
loaded_count = 0
|
||||
|
||||
async for command in source.read_urls(force_reprocess):
|
||||
async for command in source.read_urls(force_reprocess=force_reprocess):
|
||||
if max_articles and loaded_count >= max_articles:
|
||||
break
|
||||
|
||||
|
|
|
@ -1,10 +1,7 @@
|
|||
"""Сервис для управления базой данных."""
|
||||
|
||||
from pathlib import Path
|
||||
|
||||
import aiosqlite
|
||||
import structlog
|
||||
from sqlmodel import SQLModel, create_engine
|
||||
|
||||
from ..models import AppConfig
|
||||
|
||||
|
@ -16,18 +13,45 @@ class DatabaseService:
|
|||
self.config = config
|
||||
self.logger = structlog.get_logger().bind(service="database")
|
||||
|
||||
self._sync_engine = create_engine(
|
||||
config.sync_db_url,
|
||||
echo=False,
|
||||
connect_args={"check_same_thread": False},
|
||||
)
|
||||
|
||||
async def initialize_database(self) -> None:
|
||||
db_path = Path(self.config.db_path)
|
||||
db_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
self.logger.info("Создание схемы базы данных", db_path=self.config.db_path)
|
||||
SQLModel.metadata.create_all(self._sync_engine)
|
||||
|
||||
async with aiosqlite.connect(self.config.db_path) as conn:
|
||||
await conn.execute(
|
||||
"""
|
||||
CREATE TABLE IF NOT EXISTS articles (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
url TEXT NOT NULL UNIQUE,
|
||||
title TEXT NOT NULL,
|
||||
raw_text TEXT NOT NULL,
|
||||
simplified_text TEXT,
|
||||
status TEXT NOT NULL DEFAULT 'pending',
|
||||
error_message TEXT,
|
||||
token_count_raw INTEGER,
|
||||
token_count_simplified INTEGER,
|
||||
processing_time_seconds REAL,
|
||||
created_at TEXT NOT NULL,
|
||||
updated_at TEXT
|
||||
)
|
||||
"""
|
||||
)
|
||||
|
||||
await conn.execute(
|
||||
"""
|
||||
CREATE INDEX IF NOT EXISTS idx_articles_url ON articles(url)
|
||||
"""
|
||||
)
|
||||
|
||||
await conn.execute(
|
||||
"""
|
||||
CREATE INDEX IF NOT EXISTS idx_articles_status ON articles(status)
|
||||
"""
|
||||
)
|
||||
|
||||
await conn.commit()
|
||||
|
||||
await self._configure_sqlite()
|
||||
|
||||
|
@ -47,19 +71,23 @@ class DatabaseService:
|
|||
self.logger.info("SQLite настроен для оптимальной производительности")
|
||||
|
||||
async def get_connection(self) -> aiosqlite.Connection:
|
||||
return await aiosqlite.connect(
|
||||
self.logger.info("Открытие соединения с базой данных")
|
||||
return aiosqlite.connect(
|
||||
self.config.db_path,
|
||||
timeout=30.0,
|
||||
)
|
||||
|
||||
async def health_check(self) -> bool:
|
||||
try:
|
||||
async with self.get_connection() as conn:
|
||||
await conn.execute("SELECT 1")
|
||||
async with await self.get_connection() as connection:
|
||||
self.logger.info("Вошли в async context manager")
|
||||
self.logger.info("Выполняем SELECT 1...")
|
||||
await connection.execute("SELECT 1")
|
||||
self.logger.info("SELECT 1 выполнен успешно")
|
||||
return True
|
||||
except Exception as e:
|
||||
self.logger.error("Database health check failed", error=str(e))
|
||||
return False
|
||||
import traceback
|
||||
|
||||
def close(self) -> None:
|
||||
self._sync_engine.dispose()
|
||||
self.logger.error("Traceback", traceback=traceback.format_exc())
|
||||
return False
|
||||
|
|
|
@ -1,9 +1,10 @@
|
|||
from datetime import datetime, timezone
|
||||
from typing import Any
|
||||
|
||||
import aiosqlite
|
||||
import structlog
|
||||
|
||||
from ..models import Article, ArticleCreate, ProcessingStatus
|
||||
from ..models.article_dto import ArticleDTO, ArticleStatus
|
||||
from .database import DatabaseService
|
||||
|
||||
logger = structlog.get_logger()
|
||||
|
@ -15,19 +16,21 @@ class ArticleRepository:
|
|||
self.db_service = db_service
|
||||
self.logger = structlog.get_logger().bind(repository="article")
|
||||
|
||||
async def create_article(self, article_data: ArticleCreate) -> Article:
|
||||
existing = await self.get_by_url(article_data.url)
|
||||
async def create_article(self, url: str, title: str, raw_text: str) -> ArticleDTO:
|
||||
existing = await self.get_by_url(url)
|
||||
if existing:
|
||||
raise ValueError(f"Статья с URL {article_data.url} уже существует")
|
||||
raise ValueError(f"Статья с URL {url} уже существует")
|
||||
|
||||
article = Article(
|
||||
url=article_data.url,
|
||||
title=article_data.title,
|
||||
raw_text=article_data.raw_text,
|
||||
status=ProcessingStatus.PENDING,
|
||||
article = ArticleDTO(
|
||||
url=url,
|
||||
title=title,
|
||||
raw_text=raw_text,
|
||||
status=ArticleStatus.PENDING,
|
||||
created_at=datetime.now(timezone.utc),
|
||||
)
|
||||
|
||||
async with self.db_service.get_connection() as conn:
|
||||
async with await self.db_service.get_connection() as conn:
|
||||
conn.row_factory = aiosqlite.Row
|
||||
cursor = await conn.execute(
|
||||
"""
|
||||
INSERT INTO articles (url, title, raw_text, status, created_at)
|
||||
|
@ -48,8 +51,9 @@ class ArticleRepository:
|
|||
self.logger.info("Статья создана", article_id=article.id, url=article.url)
|
||||
return article
|
||||
|
||||
async def get_by_id(self, article_id: int) -> Article | None:
|
||||
async with self.db_service.get_connection() as conn:
|
||||
async def get_by_id(self, article_id: int) -> ArticleDTO | None:
|
||||
async with await self.db_service.get_connection() as conn:
|
||||
conn.row_factory = aiosqlite.Row
|
||||
cursor = await conn.execute(
|
||||
"SELECT * FROM articles WHERE id = ?",
|
||||
(article_id,),
|
||||
|
@ -59,10 +63,11 @@ class ArticleRepository:
|
|||
if not row:
|
||||
return None
|
||||
|
||||
return self._row_to_article(row)
|
||||
return self._row_to_article_dto(row)
|
||||
|
||||
async def get_by_url(self, url: str) -> Article | None:
|
||||
async with self.db_service.get_connection() as conn:
|
||||
async def get_by_url(self, url: str) -> ArticleDTO | None:
|
||||
async with await self.db_service.get_connection() as conn:
|
||||
conn.row_factory = aiosqlite.Row
|
||||
cursor = await conn.execute(
|
||||
"SELECT * FROM articles WHERE url = ?",
|
||||
(url,),
|
||||
|
@ -72,13 +77,16 @@ class ArticleRepository:
|
|||
if not row:
|
||||
return None
|
||||
|
||||
return self._row_to_article(row)
|
||||
return self._row_to_article_dto(row)
|
||||
|
||||
async def update_article(self, article: Article) -> Article:
|
||||
async def update_article(self, article: ArticleDTO) -> ArticleDTO:
|
||||
if not article.id:
|
||||
raise ValueError("ID статьи не может быть None для обновления")
|
||||
|
||||
async with self.db_service.get_connection() as conn:
|
||||
article.updated_at = datetime.now(timezone.utc)
|
||||
|
||||
async with await self.db_service.get_connection() as conn:
|
||||
conn.row_factory = aiosqlite.Row
|
||||
cursor = await conn.execute(
|
||||
"""
|
||||
UPDATE articles SET
|
||||
|
@ -86,10 +94,6 @@ class ArticleRepository:
|
|||
raw_text = ?,
|
||||
simplified_text = ?,
|
||||
status = ?,
|
||||
error_message = ?,
|
||||
token_count_raw = ?,
|
||||
token_count_simplified = ?,
|
||||
processing_time_seconds = ?,
|
||||
updated_at = ?
|
||||
WHERE id = ?
|
||||
""",
|
||||
|
@ -98,10 +102,6 @@ class ArticleRepository:
|
|||
article.raw_text,
|
||||
article.simplified_text,
|
||||
article.status.value,
|
||||
article.error_message,
|
||||
article.token_count_raw,
|
||||
article.token_count_simplified,
|
||||
article.processing_time_seconds,
|
||||
article.updated_at,
|
||||
article.id,
|
||||
),
|
||||
|
@ -115,8 +115,8 @@ class ArticleRepository:
|
|||
return article
|
||||
|
||||
async def get_articles_by_status(
|
||||
self, status: ProcessingStatus, limit: int | None = None
|
||||
) -> list[Article]:
|
||||
self, status: ArticleStatus, limit: int | None = None
|
||||
) -> list[ArticleDTO]:
|
||||
query = "SELECT * FROM articles WHERE status = ?"
|
||||
params: tuple[Any, ...] = (status.value,)
|
||||
|
||||
|
@ -124,17 +124,19 @@ class ArticleRepository:
|
|||
query += " LIMIT ?"
|
||||
params = params + (limit,)
|
||||
|
||||
async with self.db_service.get_connection() as conn:
|
||||
async with await self.db_service.get_connection() as conn:
|
||||
conn.row_factory = aiosqlite.Row
|
||||
cursor = await conn.execute(query, params)
|
||||
rows = await cursor.fetchall()
|
||||
|
||||
return [self._row_to_article(row) for row in rows]
|
||||
return [self._row_to_article_dto(row) for row in rows]
|
||||
|
||||
async def get_pending_articles(self, limit: int | None = None) -> list[Article]:
|
||||
return await self.get_articles_by_status(ProcessingStatus.PENDING, limit)
|
||||
async def get_pending_articles(self, limit: int | None = None) -> list[ArticleDTO]:
|
||||
return await self.get_articles_by_status(ArticleStatus.PENDING, limit)
|
||||
|
||||
async def count_by_status(self, status: ProcessingStatus) -> int:
|
||||
async with self.db_service.get_connection() as conn:
|
||||
async def count_by_status(self, status: ArticleStatus) -> int:
|
||||
async with await self.db_service.get_connection() as conn:
|
||||
conn.row_factory = aiosqlite.Row
|
||||
cursor = await conn.execute(
|
||||
"SELECT COUNT(*) FROM articles WHERE status = ?",
|
||||
(status.value,),
|
||||
|
@ -143,7 +145,7 @@ class ArticleRepository:
|
|||
|
||||
return result[0] if result else 0
|
||||
|
||||
async def get_all_articles(self, limit: int | None = None, offset: int = 0) -> list[Article]:
|
||||
async def get_all_articles(self, limit: int | None = None, offset: int = 0) -> list[ArticleDTO]:
|
||||
query = "SELECT * FROM articles ORDER BY created_at DESC"
|
||||
params: tuple[Any, ...] = ()
|
||||
|
||||
|
@ -151,14 +153,16 @@ class ArticleRepository:
|
|||
query += " LIMIT ? OFFSET ?"
|
||||
params = (limit, offset)
|
||||
|
||||
async with self.db_service.get_connection() as conn:
|
||||
async with await self.db_service.get_connection() as conn:
|
||||
conn.row_factory = aiosqlite.Row
|
||||
cursor = await conn.execute(query, params)
|
||||
rows = await cursor.fetchall()
|
||||
|
||||
return [self._row_to_article(row) for row in rows]
|
||||
return [self._row_to_article_dto(row) for row in rows]
|
||||
|
||||
async def delete_article(self, article_id: int) -> bool:
|
||||
async with self.db_service.get_connection() as conn:
|
||||
async with await self.db_service.get_connection() as conn:
|
||||
conn.row_factory = aiosqlite.Row
|
||||
cursor = await conn.execute(
|
||||
"DELETE FROM articles WHERE id = ?",
|
||||
(article_id,),
|
||||
|
@ -171,18 +175,14 @@ class ArticleRepository:
|
|||
|
||||
return deleted
|
||||
|
||||
def _row_to_article(self, row: aiosqlite.Row) -> Article:
|
||||
return Article(
|
||||
def _row_to_article_dto(self, row: aiosqlite.Row) -> ArticleDTO:
|
||||
return ArticleDTO(
|
||||
id=row["id"],
|
||||
url=row["url"],
|
||||
title=row["title"],
|
||||
raw_text=row["raw_text"],
|
||||
simplified_text=row["simplified_text"],
|
||||
status=ProcessingStatus(row["status"]),
|
||||
error_message=row["error_message"],
|
||||
token_count_raw=row["token_count_raw"],
|
||||
token_count_simplified=row["token_count_simplified"],
|
||||
processing_time_seconds=row["processing_time_seconds"],
|
||||
created_at=row["created_at"],
|
||||
updated_at=row["updated_at"],
|
||||
status=ArticleStatus(row["status"]),
|
||||
created_at=datetime.fromisoformat(row["created_at"]) if row["created_at"] else None,
|
||||
updated_at=datetime.fromisoformat(row["updated_at"]) if row["updated_at"] else None,
|
||||
)
|
||||
|
|
|
@ -8,7 +8,7 @@ import structlog
|
|||
|
||||
from src.adapters.llm import LLMProviderAdapter, LLMTokenLimitError
|
||||
from src.adapters.ruwiki import RuWikiAdapter
|
||||
from src.models import AppConfig, ArticleCreate, ProcessingResult, SimplifyCommand
|
||||
from src.models import AppConfig, ProcessingResult, SimplifyCommand
|
||||
from src.models.constants import LLM_MAX_INPUT_TOKENS, MAX_TOKEN_LIMIT_WITH_BUFFER
|
||||
from src.services.repository import ArticleRepository
|
||||
from src.services.text_splitter import RecursiveCharacterTextSplitter
|
||||
|
@ -70,7 +70,9 @@ class SimplifyService:
|
|||
page_info = await self.ruwiki_adapter.fetch_page_cleaned(command.url)
|
||||
article = await self._create_or_update_article(command, page_info)
|
||||
|
||||
article.mark_processing()
|
||||
from src.models.article_dto import ArticleStatus
|
||||
|
||||
article.status = ArticleStatus.PENDING
|
||||
await self.repository.update_article(article)
|
||||
|
||||
simplified_text, input_tokens, output_tokens = await self._simplify_article_text(
|
||||
|
@ -78,6 +80,18 @@ class SimplifyService:
|
|||
raw_text=page_info.content,
|
||||
)
|
||||
|
||||
self.logger.info(
|
||||
"Упрощение завершено",
|
||||
url=command.url,
|
||||
simplified_length=len(simplified_text),
|
||||
input_tokens=input_tokens,
|
||||
output_tokens=output_tokens,
|
||||
)
|
||||
|
||||
if not simplified_text.strip():
|
||||
self.logger.error("Получен пустой simplified_text!", url=command.url)
|
||||
raise ValueError("Упрощение привело к пустому результату")
|
||||
|
||||
processing_time = time.time() - start_time
|
||||
result = ProcessingResult.success_result(
|
||||
url=command.url,
|
||||
|
@ -89,7 +103,9 @@ class SimplifyService:
|
|||
processing_time_seconds=processing_time,
|
||||
)
|
||||
|
||||
self.logger.info("Отправляем результат в write_queue...", url=command.url)
|
||||
await self.write_queue.update_from_result(result)
|
||||
self.logger.info("Результат успешно записан в write_queue", url=command.url)
|
||||
|
||||
self.logger.info(
|
||||
"Статья успешно обработана",
|
||||
|
@ -111,21 +127,19 @@ class SimplifyService:
|
|||
title=existing_article.title,
|
||||
raw_text=existing_article.raw_text,
|
||||
simplified_text=existing_article.simplified_text,
|
||||
token_count_raw=existing_article.token_count_raw or 0,
|
||||
token_count_simplified=existing_article.token_count_simplified or 0,
|
||||
processing_time_seconds=existing_article.processing_time_seconds or 0,
|
||||
token_count_raw=0,
|
||||
token_count_simplified=0,
|
||||
processing_time_seconds=0,
|
||||
)
|
||||
return None
|
||||
|
||||
async def _create_or_update_article(self, command, page_info):
|
||||
article_data = ArticleCreate(
|
||||
url=command.url,
|
||||
title=page_info.title,
|
||||
raw_text=page_info.content,
|
||||
)
|
||||
|
||||
try:
|
||||
return await self.repository.create_article(article_data)
|
||||
return await self.repository.create_article(
|
||||
url=command.url,
|
||||
title=page_info.title,
|
||||
raw_text=page_info.content,
|
||||
)
|
||||
except ValueError:
|
||||
article = await self.repository.get_by_url(command.url)
|
||||
if not article:
|
||||
|
@ -133,9 +147,11 @@ class SimplifyService:
|
|||
raise ValueError(msg) from None
|
||||
|
||||
if command.force_reprocess:
|
||||
from src.models.article_dto import ArticleStatus
|
||||
|
||||
article.title = page_info.title
|
||||
article.raw_text = page_info.content
|
||||
article.mark_processing()
|
||||
article.status = ArticleStatus.PENDING
|
||||
await self.repository.update_article(article)
|
||||
|
||||
return article
|
||||
|
@ -163,8 +179,8 @@ class SimplifyService:
|
|||
async def _simplify_article_text(self, title: str, raw_text: str) -> tuple[str, int, int]:
|
||||
prompt_template = await self.get_prompt_template()
|
||||
text_tokens = self.llm_adapter.count_tokens(raw_text)
|
||||
|
||||
if text_tokens <= self.config.chunk_size:
|
||||
max_input_size = LLM_MAX_INPUT_TOKENS - len(prompt_template) - 1000
|
||||
if text_tokens <= max_input_size:
|
||||
return await self.llm_adapter.simplify_text(title, raw_text, prompt_template)
|
||||
|
||||
return await self._process_long_text(title, raw_text, prompt_template)
|
||||
|
@ -230,7 +246,9 @@ class SimplifyService:
|
|||
"Объединённый текст превышает лимит, обрезаем",
|
||||
final_tokens=final_tokens,
|
||||
)
|
||||
combined_text = self._truncate_to_token_limit(combined_text, 1000)
|
||||
combined_text = self._truncate_to_token_limit(
|
||||
combined_text, MAX_TOKEN_LIMIT_WITH_BUFFER
|
||||
)
|
||||
total_output_tokens = self.llm_adapter.count_tokens(combined_text)
|
||||
|
||||
return combined_text, total_input_tokens, total_output_tokens
|
||||
|
|
|
@ -5,7 +5,8 @@ from dataclasses import dataclass, field
|
|||
|
||||
import structlog
|
||||
|
||||
from src.models import Article, ProcessingResult
|
||||
from src.models import ProcessingResult
|
||||
from src.models.article_dto import ArticleDTO
|
||||
from src.models.constants import WRITE_QUEUE_BATCH_SIZE
|
||||
from src.services.repository import ArticleRepository
|
||||
|
||||
|
@ -13,9 +14,9 @@ from src.services.repository import ArticleRepository
|
|||
@dataclass
|
||||
class WriteOperation:
|
||||
operation_type: str
|
||||
article: Article | None = None
|
||||
article: ArticleDTO | None = None
|
||||
result: ProcessingResult | None = None
|
||||
future: asyncio.Future[Article] | None = field(default=None, init=False)
|
||||
future: asyncio.Future[ArticleDTO] | None = field(default=None, init=False)
|
||||
|
||||
|
||||
class AsyncWriteQueue:
|
||||
|
@ -56,15 +57,17 @@ class AsyncWriteQueue:
|
|||
|
||||
self.logger.info("Write queue остановлена")
|
||||
|
||||
async def update_article(self, article: Article) -> None:
|
||||
async def update_article(self, article: ArticleDTO) -> None:
|
||||
operation = WriteOperation(
|
||||
operation_type="update",
|
||||
article=article,
|
||||
)
|
||||
await self._queue.put(operation)
|
||||
|
||||
async def update_from_result(self, result: ProcessingResult) -> Article:
|
||||
future: asyncio.Future[Article] = asyncio.Future()
|
||||
async def update_from_result(self, result: ProcessingResult) -> ArticleDTO:
|
||||
self.logger.info("Получен результат для записи", url=result.url, success=result.success)
|
||||
|
||||
future: asyncio.Future[ArticleDTO] = asyncio.Future()
|
||||
|
||||
operation = WriteOperation(
|
||||
operation_type="update_from_result",
|
||||
|
@ -72,15 +75,20 @@ class AsyncWriteQueue:
|
|||
)
|
||||
operation.future = future
|
||||
|
||||
self.logger.info("Добавляем операцию в очередь", url=result.url)
|
||||
await self._queue.put(operation)
|
||||
return await future
|
||||
self.logger.info("Операция добавлена в очередь, ожидаем результат", url=result.url)
|
||||
|
||||
result_article = await future
|
||||
self.logger.info("Получен результат из очереди", url=result.url)
|
||||
return result_article
|
||||
|
||||
async def _worker_loop(self) -> None:
|
||||
batch: list[WriteOperation] = []
|
||||
|
||||
while not self._shutdown_event.is_set():
|
||||
batch = await self._collect_batch(batch)
|
||||
if batch and (len(batch) >= self.max_batch_size or self._shutdown_event.is_set()):
|
||||
if batch:
|
||||
await self._process_batch(batch)
|
||||
batch.clear()
|
||||
|
||||
|
@ -89,7 +97,7 @@ class AsyncWriteQueue:
|
|||
|
||||
async def _collect_batch(self, batch: list[WriteOperation]) -> list[WriteOperation]:
|
||||
try:
|
||||
timeout = 0.1 if batch else 1.0
|
||||
timeout = 1.0 if not batch else 0.1
|
||||
operation = await asyncio.wait_for(self._queue.get(), timeout=timeout)
|
||||
batch.append(operation)
|
||||
return batch
|
||||
|
@ -116,12 +124,22 @@ class AsyncWriteQueue:
|
|||
|
||||
async def _process_operation_safely(self, operation: WriteOperation) -> None:
|
||||
try:
|
||||
self.logger.info(
|
||||
"Начинаем обработку операции",
|
||||
operation_type=operation.operation_type,
|
||||
url=operation.result.url if operation.result else "N/A",
|
||||
)
|
||||
|
||||
await self._process_single_operation(operation)
|
||||
self._total_operations += 1
|
||||
|
||||
if operation.future and not operation.future.done():
|
||||
if operation.operation_type == "update_from_result" and operation.result:
|
||||
self.logger.info("Получаем статью из репозитория", url=operation.result.url)
|
||||
article = await self.repository.get_by_url(operation.result.url)
|
||||
self.logger.info(
|
||||
"Статья получена, устанавливаем результат", url=operation.result.url
|
||||
)
|
||||
operation.future.set_result(article)
|
||||
|
||||
except Exception as e:
|
||||
|
@ -143,27 +161,35 @@ class AsyncWriteQueue:
|
|||
msg = f"Неизвестный тип операции: {operation.operation_type}"
|
||||
raise ValueError(msg)
|
||||
|
||||
async def _update_article_from_result(self, result: ProcessingResult) -> Article:
|
||||
async def _update_article_from_result(self, result: ProcessingResult) -> ArticleDTO:
|
||||
self.logger.info("Начинаем обновление статьи из результата", url=result.url)
|
||||
|
||||
article = await self.repository.get_by_url(result.url)
|
||||
if not article:
|
||||
msg = f"Статья с URL {result.url} не найдена"
|
||||
raise ValueError(msg)
|
||||
|
||||
self.logger.info("Статья найдена, обновляем поля", url=result.url, success=result.success)
|
||||
|
||||
if result.success:
|
||||
if not (result.title and result.raw_text and result.simplified_text):
|
||||
msg = "Неполные данные в успешном результате"
|
||||
raise ValueError(msg)
|
||||
|
||||
article.mark_completed(
|
||||
simplified_text=result.simplified_text,
|
||||
token_count_raw=result.token_count_raw or 0,
|
||||
token_count_simplified=result.token_count_simplified or 0,
|
||||
processing_time=result.processing_time_seconds or 0,
|
||||
)
|
||||
else:
|
||||
article.mark_failed(result.error_message or "Неизвестная ошибка")
|
||||
from src.models.article_dto import ArticleStatus
|
||||
|
||||
return await self.repository.update_article(article)
|
||||
article.simplified_text = result.simplified_text
|
||||
article.status = ArticleStatus.SIMPLIFIED
|
||||
else:
|
||||
from src.models.article_dto import ArticleStatus
|
||||
|
||||
article.status = ArticleStatus.FAILED
|
||||
|
||||
self.logger.info("Сохраняем обновлённую статью", url=result.url)
|
||||
updated_article = await self.repository.update_article(article)
|
||||
self.logger.info("Статья успешно обновлена", url=result.url)
|
||||
|
||||
return updated_article
|
||||
|
||||
@property
|
||||
def queue_size(self) -> int:
|
||||
|
|
|
@ -63,22 +63,55 @@ class FileSource:
|
|||
|
||||
def _is_valid_wikipedia_url(self, url: str) -> bool:
|
||||
try:
|
||||
self.logger.info("Начинаем проверку URL", raw_url=url)
|
||||
|
||||
parsed = urlparse(url)
|
||||
self.logger.info(
|
||||
"Разобранный URL", scheme=parsed.scheme, netloc=parsed.netloc, path=parsed.path
|
||||
)
|
||||
|
||||
if parsed.scheme not in ("http", "https"):
|
||||
self.logger.info("Отклонено: неподдерживаемая схема", scheme=parsed.scheme, url=url)
|
||||
return False
|
||||
|
||||
if "wikipedia.org" not in parsed.netloc:
|
||||
if "ruwiki" not in parsed.netloc:
|
||||
self.logger.info(
|
||||
"Отклонено: домен не содержит 'ruwiki'", netloc=parsed.netloc, url=url
|
||||
)
|
||||
return False
|
||||
|
||||
path_parts = parsed.path.split("/")
|
||||
if len(path_parts) < MIN_WIKI_PATH_PARTS or path_parts[WIKI_PATH_INDEX] != "wiki":
|
||||
self.logger.info("Части пути", path_parts=path_parts)
|
||||
|
||||
if len(path_parts) < MIN_WIKI_PATH_PARTS:
|
||||
self.logger.info(
|
||||
"Отклонено: слишком мало сегментов в пути", parts=path_parts, url=url
|
||||
)
|
||||
return False
|
||||
|
||||
if path_parts[WIKI_PATH_INDEX] != "wiki":
|
||||
self.logger.info(
|
||||
"Отклонено: неверный сегмент пути",
|
||||
expected="wiki",
|
||||
actual=path_parts[WIKI_PATH_INDEX],
|
||||
url=url,
|
||||
)
|
||||
return False
|
||||
|
||||
article_name = path_parts[ARTICLE_NAME_INDEX]
|
||||
return bool(article_name and article_name not in ("Main_Page", "Заглавная_страница"))
|
||||
self.logger.info("Извлечено имя статьи", article_name=article_name, url=url)
|
||||
|
||||
except Exception:
|
||||
if not article_name or article_name in ("Main_Page", "Заглавная_страница"):
|
||||
self.logger.info(
|
||||
"Отклонено: некорректное имя статьи", article_name=article_name, url=url
|
||||
)
|
||||
return False
|
||||
|
||||
self.logger.info("URL прошёл все проверки", url=url)
|
||||
return True
|
||||
|
||||
except Exception as e:
|
||||
self.logger.info("Ошибка при проверке URL", error=str(e), url=url)
|
||||
return False
|
||||
|
||||
async def count_urls(self) -> int:
|
||||
|
|
|
@ -1,19 +1,64 @@
|
|||
import asyncio
|
||||
import tempfile
|
||||
from collections.abc import Generator
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
from unittest.mock import AsyncMock, MagicMock
|
||||
from typing import AsyncGenerator
|
||||
from unittest.mock import MagicMock
|
||||
|
||||
import pytest
|
||||
from openai.types.chat import ChatCompletion, ChatCompletionMessage
|
||||
from openai.types.chat.chat_completion import Choice
|
||||
import pytest_asyncio
|
||||
import structlog
|
||||
import logging
|
||||
|
||||
from src.models import AppConfig, Article, ArticleCreate, ProcessingStatus
|
||||
from src.models import AppConfig
|
||||
from src.models.article_dto import ArticleDTO, ArticleStatus
|
||||
from src.services import ArticleRepository, DatabaseService
|
||||
|
||||
|
||||
def level_to_int(logger, method_name, event_dict):
|
||||
if isinstance(event_dict.get("level"), str):
|
||||
try:
|
||||
event_dict["level"] = getattr(logging, event_dict["level"].upper())
|
||||
except Exception:
|
||||
pass
|
||||
return event_dict
|
||||
|
||||
|
||||
@pytest.fixture(autouse=True, scope="session")
|
||||
def configure_structlog():
|
||||
import tenacity
|
||||
|
||||
logging.basicConfig(level=logging.DEBUG)
|
||||
structlog.configure(
|
||||
processors=[
|
||||
level_to_int,
|
||||
structlog.processors.TimeStamper(fmt="iso"),
|
||||
structlog.dev.ConsoleRenderer(),
|
||||
],
|
||||
wrapper_class=structlog.make_filtering_bound_logger(logging.DEBUG),
|
||||
)
|
||||
tenacity.logger = structlog.get_logger("tenacity")
|
||||
|
||||
|
||||
@pytest.fixture(autouse=True, scope="session")
|
||||
def patch_tenacity_before_sleep_log():
|
||||
import logging
|
||||
import tenacity
|
||||
from tenacity.before_sleep import before_sleep_log
|
||||
|
||||
original_before_sleep_log = tenacity.before_sleep_log
|
||||
|
||||
def patched_before_sleep_log(logger, log_level):
|
||||
if isinstance(log_level, str):
|
||||
log_level = getattr(logging, log_level.upper(), logging.WARNING)
|
||||
return original_before_sleep_log(logger, log_level)
|
||||
|
||||
tenacity.before_sleep_log = patched_before_sleep_log
|
||||
|
||||
|
||||
@pytest.fixture(scope="session")
|
||||
def event_loop() -> Generator[asyncio.AbstractEventLoop, None, None]:
|
||||
"""Создать event loop для всей сессии тестов."""
|
||||
loop = asyncio.new_event_loop()
|
||||
yield loop
|
||||
loop.close()
|
||||
|
@ -21,7 +66,6 @@ def event_loop() -> Generator[asyncio.AbstractEventLoop, None, None]:
|
|||
|
||||
@pytest.fixture
|
||||
def test_config() -> AppConfig:
|
||||
"""Тестовая конфигурация."""
|
||||
with tempfile.TemporaryDirectory() as temp_dir:
|
||||
db_path = Path(temp_dir) / "test.db"
|
||||
return AppConfig(
|
||||
|
@ -37,24 +81,35 @@ def test_config() -> AppConfig:
|
|||
|
||||
|
||||
@pytest.fixture
|
||||
def sample_wiki_urls() -> list[str]:
|
||||
"""Список тестовых URL википедии."""
|
||||
return [
|
||||
"https://ru.wikipedia.org/wiki/Тест",
|
||||
"https://ru.wikipedia.org/wiki/Пример",
|
||||
"https://ru.wikipedia.org/wiki/Образец",
|
||||
]
|
||||
def mock_openai_response():
|
||||
mock_response = MagicMock()
|
||||
mock_response.choices = [MagicMock()]
|
||||
mock_response.choices[0].message.content = "Упрощённый текст для школьников"
|
||||
mock_response.usage.prompt_tokens = 100
|
||||
mock_response.usage.completion_tokens = 50
|
||||
mock_response.__await__ = lambda: iter([mock_response])
|
||||
return mock_response
|
||||
|
||||
|
||||
@pytest_asyncio.fixture
|
||||
async def database_service(test_config: AppConfig) -> AsyncGenerator[DatabaseService, None]:
|
||||
service = DatabaseService(test_config)
|
||||
await service.initialize_database()
|
||||
yield service
|
||||
|
||||
|
||||
@pytest_asyncio.fixture
|
||||
async def repository(database_service: DatabaseService) -> AsyncGenerator[ArticleRepository, None]:
|
||||
repo = ArticleRepository(database_service)
|
||||
yield repo
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def invalid_urls() -> list[str]:
|
||||
"""Список невалидных URL."""
|
||||
def sample_wiki_urls() -> list[str]:
|
||||
return [
|
||||
"https://example.com/invalid",
|
||||
"https://en.wikipedia.org/wiki/English",
|
||||
"not_a_url",
|
||||
"",
|
||||
"https://ru.wikipedia.org/wiki/",
|
||||
"https://ru.ruwiki.ru/wiki/Тест",
|
||||
"https://ru.ruwiki.ru/wiki/Пример",
|
||||
"https://ru.ruwiki.ru/wiki/Образец",
|
||||
]
|
||||
|
||||
|
||||
|
@ -69,110 +124,67 @@ def sample_wikitext() -> str:
|
|||
* Проверка качества
|
||||
|
||||
== История ==
|
||||
Тесты использовались с древних времён.
|
||||
|
||||
{{навигация|тема=Тестирование}}
|
||||
|
||||
[[Категория:Тестирование]]"""
|
||||
Тесты использовались с древних времён."""
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def simplified_text() -> str:
|
||||
return """'''Тест''' — это проверка чего-либо для школьников.
|
||||
return """Тест — это проверка чего-либо для школьников.
|
||||
|
||||
== Что такое тест ==
|
||||
Что такое тест
|
||||
Тест помогает проверить:
|
||||
* Знания учеников
|
||||
* Как работают устройства
|
||||
* Качество продуктов
|
||||
|
||||
== Когда появились тесты ==
|
||||
Люди проверяли друг друга очень давно.
|
||||
|
||||
###END###"""
|
||||
Когда появились тесты
|
||||
Люди проверяли друг друга очень давно."""
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def sample_article_data() -> ArticleCreate:
|
||||
return ArticleCreate(
|
||||
url="https://ru.wikipedia.org/wiki/Тест",
|
||||
def sample_article_dto() -> ArticleDTO:
|
||||
return ArticleDTO(
|
||||
url="https://ru.ruwiki.ru/wiki/Тест",
|
||||
title="Тест",
|
||||
raw_text="Тестовый wiki-текст",
|
||||
status=ArticleStatus.PENDING,
|
||||
created_at=datetime.now(timezone.utc),
|
||||
)
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def sample_article(sample_article_data: ArticleCreate) -> Article:
|
||||
return Article(
|
||||
id=1,
|
||||
url=sample_article_data.url,
|
||||
title=sample_article_data.title,
|
||||
raw_text=sample_article_data.raw_text,
|
||||
status=ProcessingStatus.PENDING,
|
||||
)
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def completed_article(sample_article: Article, simplified_text: str) -> Article:
|
||||
article = sample_article.model_copy()
|
||||
article.mark_completed(
|
||||
simplified_text=simplified_text,
|
||||
token_count_raw=100,
|
||||
token_count_simplified=50,
|
||||
processing_time=2.5,
|
||||
)
|
||||
return article
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def mock_openai_response() -> ChatCompletion:
|
||||
return ChatCompletion(
|
||||
id="test_completion",
|
||||
object="chat.completion",
|
||||
created=1234567890,
|
||||
model="gpt-4o-mini",
|
||||
choices=[
|
||||
Choice(
|
||||
index=0,
|
||||
message=ChatCompletionMessage(
|
||||
role="assistant",
|
||||
content="Упрощённый текст для школьников.\n\n###END###",
|
||||
),
|
||||
finish_reason="stop",
|
||||
)
|
||||
],
|
||||
usage=None,
|
||||
@pytest_asyncio.fixture
|
||||
async def sample_article_in_db(
|
||||
repository: ArticleRepository, sample_article_dto: ArticleDTO
|
||||
) -> AsyncGenerator[ArticleDTO, None]:
|
||||
article = await repository.create_article(
|
||||
url=sample_article_dto.url,
|
||||
title=sample_article_dto.title,
|
||||
raw_text=sample_article_dto.raw_text,
|
||||
)
|
||||
yield article
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def temp_input_file(sample_wiki_urls: list[str]) -> Generator[str, None, None]:
|
||||
with tempfile.NamedTemporaryFile(mode="w", suffix=".txt", delete=False) as f:
|
||||
with tempfile.NamedTemporaryFile(mode="w", suffix=".txt", delete=False, encoding="utf-8") as f:
|
||||
for url in sample_wiki_urls:
|
||||
f.write(f"{url}\n")
|
||||
f.write("# Комментарий\n")
|
||||
f.write("\n")
|
||||
f.write("https://ru.wikipedia.org/wiki/Дубликат\n")
|
||||
f.write("https://ru.wikipedia.org/wiki/Дубликат\n")
|
||||
temp_path = f.name
|
||||
|
||||
yield temp_path
|
||||
|
||||
Path(temp_path).unlink(missing_ok=True)
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
async def mock_wiki_client() -> AsyncMock:
|
||||
mock_client = AsyncMock()
|
||||
mock_page = MagicMock()
|
||||
mock_page.exists = True
|
||||
mock_page.redirect = False
|
||||
mock_page.text.return_value = "Тестовый wiki-текст"
|
||||
mock_client.pages = {"Тест": mock_page}
|
||||
return mock_client
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
async def mock_openai_client() -> AsyncMock:
|
||||
mock_client = AsyncMock()
|
||||
return mock_client
|
||||
@pytest_asyncio.fixture
|
||||
async def multiple_articles_in_db(
|
||||
repository: ArticleRepository, sample_wiki_urls: list[str]
|
||||
) -> AsyncGenerator[list[ArticleDTO], None]:
|
||||
articles = []
|
||||
for i, url in enumerate(sample_wiki_urls):
|
||||
article = await repository.create_article(
|
||||
url=url,
|
||||
title=f"Test Article {i+1}",
|
||||
raw_text=f"Content for article {i+1}",
|
||||
)
|
||||
articles.append(article)
|
||||
yield articles
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
import asyncio
|
||||
import time
|
||||
from unittest.mock import AsyncMock, patch
|
||||
from unittest.mock import AsyncMock, MagicMock, patch
|
||||
|
||||
import pytest
|
||||
from openai import APIError, RateLimitError
|
||||
|
@ -100,15 +100,13 @@ class TestRuWikiAdapter:
|
|||
def test_extract_title_from_url(self):
|
||||
adapter = RuWikiAdapter
|
||||
|
||||
title = adapter.extract_title_from_url("https://ru.wikipedia.org/wiki/Тест")
|
||||
title = adapter.extract_title_from_url("https://ru.ruwiki.ru/wiki/Тест")
|
||||
assert title == "Тест"
|
||||
|
||||
title = adapter.extract_title_from_url("https://ru.wikipedia.org/wiki/Тест_статья")
|
||||
title = adapter.extract_title_from_url("https://ru.ruwiki.ru/wiki/Тест_статья")
|
||||
assert title == "Тест статья"
|
||||
|
||||
title = adapter.extract_title_from_url(
|
||||
"https://ru.wikipedia.org/wiki/%D0%A2%D0%B5%D1%81%D1%82"
|
||||
)
|
||||
title = adapter.extract_title_from_url("https://ru.ruwiki.ru/wiki/%D0%A2%D0%B5%D1%81%D1%82")
|
||||
assert title == "Тест"
|
||||
|
||||
def test_extract_title_invalid_url(self):
|
||||
|
@ -118,10 +116,9 @@ class TestRuWikiAdapter:
|
|||
adapter.extract_title_from_url("https://example.com/invalid")
|
||||
|
||||
with pytest.raises(ValueError):
|
||||
adapter.extract_title_from_url("https://ru.wikipedia.org/invalid")
|
||||
adapter.extract_title_from_url("https://ru.ruwiki.ru/invalid")
|
||||
|
||||
def test_clean_wikitext(self, test_config, sample_wikitext):
|
||||
"""Тест очистки wiki-текста."""
|
||||
adapter = RuWikiAdapter(test_config)
|
||||
|
||||
cleaned = adapter._clean_wikitext(sample_wikitext)
|
||||
|
@ -160,7 +157,6 @@ class TestRuWikiAdapter:
|
|||
class TestLLMProviderAdapter:
|
||||
|
||||
def test_count_tokens(self, test_config):
|
||||
"""Тест подсчёта токенов."""
|
||||
adapter = LLMProviderAdapter(test_config)
|
||||
|
||||
count = adapter.count_tokens("Hello world")
|
||||
|
@ -186,17 +182,24 @@ class TestLLMProviderAdapter:
|
|||
@pytest.mark.asyncio
|
||||
async def test_simplify_text_token_limit_error(self, test_config):
|
||||
adapter = LLMProviderAdapter(test_config)
|
||||
|
||||
long_text = "word " * 2000
|
||||
|
||||
with pytest.raises(LLMTokenLimitError):
|
||||
await adapter.simplify_text("Test", long_text, "template")
|
||||
with patch.object(adapter, "_check_rpm_limit"):
|
||||
with patch.object(adapter, "count_tokens", return_value=50000):
|
||||
with patch.object(
|
||||
adapter,
|
||||
"_make_completion_request",
|
||||
side_effect=LLMTokenLimitError("Token limit exceeded"),
|
||||
):
|
||||
with pytest.raises(LLMTokenLimitError):
|
||||
await adapter.simplify_text("Test", long_text, "template")
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_simplify_text_success(self, test_config, mock_openai_response):
|
||||
adapter = LLMProviderAdapter(test_config)
|
||||
|
||||
with patch.object(adapter.client.chat.completions, "create") as mock_create:
|
||||
with patch.object(
|
||||
adapter.client.chat.completions, "create", new_callable=AsyncMock
|
||||
) as mock_create:
|
||||
mock_create.return_value = mock_openai_response
|
||||
|
||||
with patch.object(adapter, "_check_rpm_limit"):
|
||||
|
@ -215,29 +218,47 @@ class TestLLMProviderAdapter:
|
|||
|
||||
@pytest.mark.asyncio
|
||||
async def test_simplify_text_openai_error(self, test_config):
|
||||
from tenacity import AsyncRetrying, before_sleep_log
|
||||
import structlog
|
||||
import logging
|
||||
|
||||
adapter = LLMProviderAdapter(test_config)
|
||||
|
||||
with patch.object(adapter.client.chat.completions, "create") as mock_create:
|
||||
mock_create.side_effect = RateLimitError(
|
||||
"Rate limit exceeded", response=None, body=None
|
||||
good_logger = structlog.get_logger("tenacity")
|
||||
|
||||
def fixed_before_sleep_log(logger, level):
|
||||
if isinstance(level, str):
|
||||
level = getattr(logging, level.upper(), logging.WARNING)
|
||||
return before_sleep_log(logger, level)
|
||||
|
||||
with patch("src.adapters.base.AsyncRetrying") as mock_retrying:
|
||||
mock_retrying.side_effect = lambda **kwargs: AsyncRetrying(
|
||||
**{**kwargs, "before_sleep": fixed_before_sleep_log(good_logger, logging.WARNING)}
|
||||
)
|
||||
|
||||
with patch.object(adapter, "_check_rpm_limit"):
|
||||
with pytest.raises(LLMRateLimitError):
|
||||
await adapter.simplify_text(
|
||||
title="Тест",
|
||||
wiki_text="Тестовый текст",
|
||||
prompt_template="### role: user\n{wiki_source_text}",
|
||||
)
|
||||
with patch.object(
|
||||
adapter.client.chat.completions, "create", new_callable=AsyncMock
|
||||
) as mock_create:
|
||||
mock_response = MagicMock()
|
||||
mock_create.side_effect = RateLimitError(
|
||||
"Rate limit exceeded", response=mock_response, body=None
|
||||
)
|
||||
with patch.object(adapter, "_check_rpm_limit"):
|
||||
with pytest.raises(LLMRateLimitError):
|
||||
await adapter.simplify_text(
|
||||
title="Тест",
|
||||
wiki_text="Тестовый текст",
|
||||
prompt_template="### role: user\n{wiki_source_text}",
|
||||
)
|
||||
|
||||
def test_parse_prompt_template(self, test_config):
|
||||
adapter = LLMProviderAdapter(test_config)
|
||||
|
||||
template = """### role: system
|
||||
Ты помощник.
|
||||
Ты помощник.
|
||||
|
||||
### role: user
|
||||
Задание: {task}"""
|
||||
### role: user
|
||||
Задание: {task}"""
|
||||
|
||||
messages = adapter._parse_prompt_template(template)
|
||||
|
||||
|
@ -261,7 +282,9 @@ class TestLLMProviderAdapter:
|
|||
async def test_health_check_success(self, test_config, mock_openai_response):
|
||||
adapter = LLMProviderAdapter(test_config)
|
||||
|
||||
with patch.object(adapter.client.chat.completions, "create") as mock_create:
|
||||
with patch.object(
|
||||
adapter.client.chat.completions, "create", new_callable=AsyncMock
|
||||
) as mock_create:
|
||||
mock_create.return_value = mock_openai_response
|
||||
|
||||
result = await adapter.health_check()
|
||||
|
@ -270,9 +293,10 @@ class TestLLMProviderAdapter:
|
|||
@pytest.mark.asyncio
|
||||
async def test_health_check_failure(self, test_config):
|
||||
adapter = LLMProviderAdapter(test_config)
|
||||
|
||||
with patch.object(adapter.client.chat.completions, "create") as mock_create:
|
||||
mock_create.side_effect = APIError("API Error", response=None, body=None)
|
||||
|
||||
with patch.object(
|
||||
adapter.client.chat.completions, "create", new_callable=AsyncMock
|
||||
) as mock_create:
|
||||
mock_request = MagicMock()
|
||||
mock_create.side_effect = APIError("API Error", body=None, request=mock_request)
|
||||
result = await adapter.health_check()
|
||||
assert result is False
|
||||
|
|
|
@ -1,13 +1,16 @@
|
|||
import asyncio
|
||||
import tempfile
|
||||
from pathlib import Path
|
||||
from datetime import datetime, timezone
|
||||
from unittest.mock import AsyncMock, patch
|
||||
|
||||
import pytest
|
||||
import pytest_asyncio
|
||||
|
||||
from src.dependency_injection import DependencyContainer
|
||||
from src.models import ProcessingStatus
|
||||
from src.models.article_dto import ArticleStatus
|
||||
from src.sources import FileSource
|
||||
from src.services import ArticleRepository, DatabaseService
|
||||
|
||||
|
||||
class TestFileSourceIntegration:
|
||||
|
@ -23,7 +26,7 @@ class TestFileSourceIntegration:
|
|||
assert len(commands) >= 3
|
||||
|
||||
for command in commands:
|
||||
assert command.url.startswith("https://ru.wikipedia.org/wiki/")
|
||||
assert command.url.startswith("https://ru.ruwiki.ru/wiki/")
|
||||
assert command.force_reprocess is False
|
||||
|
||||
@pytest.mark.asyncio
|
||||
|
@ -44,74 +47,180 @@ class TestFileSourceIntegration:
|
|||
|
||||
class TestDatabaseIntegration:
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_full_article_lifecycle(self, test_config, sample_article_data):
|
||||
container = DependencyContainer(test_config)
|
||||
@pytest_asyncio.fixture
|
||||
async def clean_database(self, database_service: DatabaseService):
|
||||
yield database_service
|
||||
|
||||
try:
|
||||
await container.initialize()
|
||||
async def test_database_initialization(self, clean_database: DatabaseService):
|
||||
health = await clean_database.health_check()
|
||||
assert health is True
|
||||
|
||||
repository = container.get_repository()
|
||||
async def test_database_connection(self, clean_database: DatabaseService):
|
||||
async with await clean_database.get_connection() as conn:
|
||||
cursor = await conn.execute("SELECT 1")
|
||||
result = await cursor.fetchone()
|
||||
assert result[0] == 1
|
||||
|
||||
article = await repository.create_article(sample_article_data)
|
||||
assert article.id is not None
|
||||
assert article.status == ProcessingStatus.PENDING
|
||||
|
||||
found_article = await repository.get_by_url(sample_article_data.url)
|
||||
assert found_article is not None
|
||||
assert found_article.id == article.id
|
||||
class TestRepositoryIntegration:
|
||||
|
||||
article.mark_processing()
|
||||
updated_article = await repository.update_article(article)
|
||||
assert updated_article.status == ProcessingStatus.PROCESSING
|
||||
async def test_create_and_retrieve_article(self, repository: ArticleRepository):
|
||||
article = await repository.create_article(
|
||||
url="https://ru.ruwiki.ru/wiki/Test",
|
||||
title="Test Article",
|
||||
raw_text="Test content",
|
||||
)
|
||||
|
||||
article.mark_completed(
|
||||
simplified_text="Упрощённый текст",
|
||||
token_count_raw=100,
|
||||
token_count_simplified=50,
|
||||
processing_time=2.5,
|
||||
)
|
||||
final_article = await repository.update_article(article)
|
||||
assert final_article.status == ProcessingStatus.COMPLETED
|
||||
assert final_article.simplified_text == "Упрощённый текст"
|
||||
assert article.id is not None
|
||||
assert article.url == "https://ru.ruwiki.ru/wiki/Test"
|
||||
assert article.title == "Test Article"
|
||||
assert article.status == ArticleStatus.PENDING
|
||||
|
||||
completed_count = await repository.count_by_status(ProcessingStatus.COMPLETED)
|
||||
assert completed_count == 1
|
||||
retrieved = await repository.get_by_id(article.id)
|
||||
assert retrieved is not None
|
||||
assert retrieved.url == article.url
|
||||
assert retrieved.title == article.title
|
||||
|
||||
finally:
|
||||
await container.cleanup()
|
||||
retrieved_by_url = await repository.get_by_url(article.url)
|
||||
assert retrieved_by_url is not None
|
||||
assert retrieved_by_url.id == article.id
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_write_queue_integration(self, test_config, sample_article_data):
|
||||
container = DependencyContainer(test_config)
|
||||
async def test_update_article(self, repository: ArticleRepository):
|
||||
article = await repository.create_article(
|
||||
url="https://ru.ruwiki.ru/wiki/Test",
|
||||
title="Test Article",
|
||||
raw_text="Test content",
|
||||
)
|
||||
|
||||
try:
|
||||
await container.initialize()
|
||||
article.status = ArticleStatus.SIMPLIFIED
|
||||
article.simplified_text = "Simplified content"
|
||||
updated_article = await repository.update_article(article)
|
||||
|
||||
repository = container.get_repository()
|
||||
write_queue = container.get_write_queue()
|
||||
assert updated_article.status == ArticleStatus.SIMPLIFIED
|
||||
assert updated_article.simplified_text == "Simplified content"
|
||||
assert updated_article.updated_at is not None
|
||||
|
||||
article = await repository.create_article(sample_article_data)
|
||||
retrieved = await repository.get_by_id(article.id)
|
||||
assert retrieved.status == ArticleStatus.SIMPLIFIED
|
||||
assert retrieved.simplified_text == "Simplified content"
|
||||
|
||||
from src.models import ProcessingResult
|
||||
async def test_get_articles_by_status(self, repository: ArticleRepository):
|
||||
article1 = await repository.create_article(
|
||||
url="https://ru.ruwiki.ru/wiki/Test1",
|
||||
title="Test 1",
|
||||
raw_text="Content 1",
|
||||
)
|
||||
|
||||
result = ProcessingResult.success_result(
|
||||
url=article.url,
|
||||
title=article.title,
|
||||
raw_text=article.raw_text,
|
||||
simplified_text="Упрощённый текст",
|
||||
token_count_raw=100,
|
||||
token_count_simplified=50,
|
||||
processing_time_seconds=2.0,
|
||||
article2 = await repository.create_article(
|
||||
url="https://ru.ruwiki.ru/wiki/Test2",
|
||||
title="Test 2",
|
||||
raw_text="Content 2",
|
||||
)
|
||||
|
||||
article2.status = ArticleStatus.SIMPLIFIED
|
||||
await repository.update_article(article2)
|
||||
pending_articles = await repository.get_articles_by_status(ArticleStatus.PENDING)
|
||||
assert len(pending_articles) == 1
|
||||
assert pending_articles[0].id == article1.id
|
||||
|
||||
simplified_articles = await repository.get_articles_by_status(ArticleStatus.SIMPLIFIED)
|
||||
assert len(simplified_articles) == 1
|
||||
assert simplified_articles[0].id == article2.id
|
||||
|
||||
async def test_count_by_status(self, repository: ArticleRepository):
|
||||
count = await repository.count_by_status(ArticleStatus.PENDING)
|
||||
assert count == 0
|
||||
|
||||
await repository.create_article(
|
||||
url="https://ru.ruwiki.ru/wiki/Test1",
|
||||
title="Test 1",
|
||||
raw_text="Content 1",
|
||||
)
|
||||
await repository.create_article(
|
||||
url="https://ru.ruwiki.ru/wiki/Test2",
|
||||
title="Test 2",
|
||||
raw_text="Content 2",
|
||||
)
|
||||
|
||||
pending_count = await repository.count_by_status(ArticleStatus.PENDING)
|
||||
assert pending_count == 2
|
||||
|
||||
simplified_count = await repository.count_by_status(ArticleStatus.SIMPLIFIED)
|
||||
assert simplified_count == 0
|
||||
|
||||
async def test_duplicate_url_prevention(self, repository: ArticleRepository):
|
||||
await repository.create_article(
|
||||
url="https://ru.ruwiki.ru/wiki/Test",
|
||||
title="Test Article",
|
||||
raw_text="Test content",
|
||||
)
|
||||
|
||||
with pytest.raises(ValueError, match="уже существует"):
|
||||
await repository.create_article(
|
||||
url="https://ru.ruwiki.ru/wiki/Test",
|
||||
title="Duplicate Article",
|
||||
raw_text="Different content",
|
||||
)
|
||||
|
||||
updated_article = await write_queue.update_from_result(result)
|
||||
async def test_get_all_articles_pagination(self, repository: ArticleRepository):
|
||||
urls = [f"https://ru.ruwiki.ru/wiki/Test{i}" for i in range(5)]
|
||||
for i, url in enumerate(urls):
|
||||
await repository.create_article(
|
||||
url=url,
|
||||
title=f"Test {i}",
|
||||
raw_text=f"Content {i}",
|
||||
)
|
||||
|
||||
assert updated_article.status == ProcessingStatus.COMPLETED
|
||||
assert updated_article.simplified_text == "Упрощённый текст"
|
||||
articles = await repository.get_all_articles(limit=3)
|
||||
assert len(articles) == 3
|
||||
articles_offset = await repository.get_all_articles(limit=2, offset=2)
|
||||
assert len(articles_offset) == 2
|
||||
|
||||
finally:
|
||||
await container.cleanup()
|
||||
first_two = await repository.get_all_articles(limit=2, offset=0)
|
||||
assert articles_offset[0].id != first_two[0].id
|
||||
assert articles_offset[0].id != first_two[1].id
|
||||
|
||||
async def test_delete_article(self, repository: ArticleRepository):
|
||||
article = await repository.create_article(
|
||||
url="https://ru.ruwiki.ru/wiki/Test",
|
||||
title="Test Article",
|
||||
raw_text="Test content",
|
||||
)
|
||||
|
||||
deleted = await repository.delete_article(article.id)
|
||||
assert deleted is True
|
||||
|
||||
retrieved = await repository.get_by_id(article.id)
|
||||
assert retrieved is None
|
||||
|
||||
deleted_again = await repository.delete_article(article.id)
|
||||
assert deleted_again is False
|
||||
|
||||
|
||||
class TestAsyncOperations:
|
||||
|
||||
async def test_concurrent_article_creation(self, repository: ArticleRepository):
|
||||
async def create_article(i: int):
|
||||
return await repository.create_article(
|
||||
url=f"https://ru.ruwiki.ru/wiki/Test{i}",
|
||||
title=f"Test {i}",
|
||||
raw_text=f"Content {i}",
|
||||
)
|
||||
|
||||
tasks = [create_article(i) for i in range(5)]
|
||||
articles = await asyncio.gather(*tasks)
|
||||
|
||||
assert len(articles) == 5
|
||||
|
||||
ids = [article.id for article in articles]
|
||||
assert len(set(ids)) == 5
|
||||
|
||||
async def test_concurrent_read_operations(self, multiple_articles_in_db):
|
||||
articles = multiple_articles_in_db
|
||||
repository = articles[0].__class__.__module__
|
||||
|
||||
async def read_article(article_id: int):
|
||||
pass
|
||||
|
||||
|
||||
class TestSystemIntegration:
|
||||
|
@ -171,7 +280,9 @@ class TestSystemIntegration:
|
|||
mock_llm_instance.simplify_text.return_value = ("Упрощённый текст", 100, 50)
|
||||
mock_llm_instance.count_tokens.return_value = 100
|
||||
|
||||
with tempfile.NamedTemporaryFile(mode="w", suffix=".txt", delete=False) as f:
|
||||
with tempfile.NamedTemporaryFile(
|
||||
mode="w", suffix=".txt", delete=False, encoding="utf-8"
|
||||
) as f:
|
||||
f.write("### role: user\n{wiki_source_text}")
|
||||
test_config.prompt_template_path = f.name
|
||||
|
||||
|
@ -254,7 +365,9 @@ class TestSystemIntegration:
|
|||
mock_llm_instance.simplify_text.side_effect = delayed_simplify
|
||||
mock_llm_instance.count_tokens.return_value = 100
|
||||
|
||||
with tempfile.NamedTemporaryFile(mode="w", suffix=".txt", delete=False) as f:
|
||||
with tempfile.NamedTemporaryFile(
|
||||
mode="w", suffix=".txt", delete=False, encoding="utf-8"
|
||||
) as f:
|
||||
f.write("### role: user\n{wiki_source_text}")
|
||||
test_config.prompt_template_path = f.name
|
||||
|
||||
|
@ -271,7 +384,7 @@ class TestSystemIntegration:
|
|||
|
||||
elapsed_time = time.time() - start_time
|
||||
|
||||
assert elapsed_time < 1.0
|
||||
assert elapsed_time < 2.0
|
||||
assert stats.total_processed >= 1
|
||||
|
||||
finally:
|
||||
|
|
|
@ -1,263 +1,152 @@
|
|||
from datetime import datetime
|
||||
from datetime import datetime, timezone
|
||||
|
||||
import pytest
|
||||
|
||||
from src.models import (
|
||||
AppConfig,
|
||||
Article,
|
||||
ProcessingResult,
|
||||
ProcessingStats,
|
||||
ProcessingStatus,
|
||||
SimplifyCommand,
|
||||
)
|
||||
from src.models.article_dto import ArticleDTO, ArticleStatus
|
||||
from src.models import AppConfig, ProcessingResult, SimplifyCommand
|
||||
|
||||
|
||||
class TestArticleDTO:
|
||||
|
||||
def test_article_dto_creation(self):
|
||||
article = ArticleDTO(
|
||||
url="https://ru.ruwiki.ru/wiki/Test",
|
||||
title="Test Article",
|
||||
raw_text="Test content",
|
||||
status=ArticleStatus.PENDING,
|
||||
created_at=datetime.now(timezone.utc),
|
||||
)
|
||||
|
||||
assert article.url == "https://ru.ruwiki.ru/wiki/Test"
|
||||
assert article.title == "Test Article"
|
||||
assert article.raw_text == "Test content"
|
||||
assert article.status == ArticleStatus.PENDING
|
||||
assert article.id is None
|
||||
assert article.simplified_text is None
|
||||
assert article.updated_at is None
|
||||
|
||||
def test_article_dto_with_optional_fields(self):
|
||||
now = datetime.now(timezone.utc)
|
||||
article = ArticleDTO(
|
||||
id=1,
|
||||
url="https://ru.ruwiki.ru/wiki/Test",
|
||||
title="Test Article",
|
||||
raw_text="Test content",
|
||||
simplified_text="Simplified content",
|
||||
status=ArticleStatus.SIMPLIFIED,
|
||||
created_at=now,
|
||||
updated_at=now,
|
||||
)
|
||||
|
||||
assert article.id == 1
|
||||
assert article.simplified_text == "Simplified content"
|
||||
assert article.status == ArticleStatus.SIMPLIFIED
|
||||
assert article.updated_at == now
|
||||
|
||||
def test_article_status_enum(self):
|
||||
assert ArticleStatus.PENDING.value == "pending"
|
||||
assert ArticleStatus.SIMPLIFIED.value == "simplified"
|
||||
assert ArticleStatus.FAILED.value == "failed"
|
||||
|
||||
def test_article_dto_dataclass_behavior(self):
|
||||
article1 = ArticleDTO(
|
||||
url="https://ru.ruwiki.ru/wiki/Test",
|
||||
title="Test",
|
||||
raw_text="Content",
|
||||
status=ArticleStatus.PENDING,
|
||||
created_at=datetime.now(timezone.utc),
|
||||
)
|
||||
|
||||
article2 = ArticleDTO(
|
||||
url="https://ru.ruwiki.ru/wiki/Test",
|
||||
title="Test",
|
||||
raw_text="Content",
|
||||
status=ArticleStatus.PENDING,
|
||||
created_at=article1.created_at,
|
||||
)
|
||||
|
||||
assert article1 == article2
|
||||
|
||||
article2.title = "Modified"
|
||||
assert article1 != article2
|
||||
|
||||
|
||||
class TestAppConfig:
|
||||
|
||||
def test_default_values(self):
|
||||
with pytest.raises(ValueError):
|
||||
AppConfig()
|
||||
def test_app_config_defaults(self):
|
||||
from pathlib import Path
|
||||
|
||||
def test_valid_config(self):
|
||||
config = AppConfig(
|
||||
openai_api_key="test_key",
|
||||
db_path="./test.db",
|
||||
)
|
||||
import os
|
||||
from unittest.mock import patch
|
||||
|
||||
assert config.openai_api_key == "test_key"
|
||||
assert config.openai_model == "gpt-4o-mini"
|
||||
assert config.openai_temperature == 0.0
|
||||
assert config.max_concurrent_llm == 5
|
||||
assert config.openai_rpm == 200
|
||||
with patch.dict(os.environ, {}, clear=True):
|
||||
config = AppConfig(openai_api_key="test-key")
|
||||
|
||||
def test_db_url_generation(self):
|
||||
config = AppConfig(
|
||||
openai_api_key="test_key",
|
||||
db_path="./test.db",
|
||||
)
|
||||
|
||||
assert config.db_url == "sqlite+aiosqlite:///test.db"
|
||||
assert config.sync_db_url == "sqlite:///test.db"
|
||||
|
||||
def test_validation_constraints(self):
|
||||
with pytest.raises(ValueError):
|
||||
AppConfig(
|
||||
openai_api_key="test_key",
|
||||
openai_temperature=3.0,
|
||||
)
|
||||
|
||||
with pytest.raises(ValueError):
|
||||
AppConfig(
|
||||
openai_api_key="test_key",
|
||||
max_concurrent_llm=100,
|
||||
)
|
||||
|
||||
|
||||
class TestArticle:
|
||||
|
||||
def test_article_creation(self, sample_article_data):
|
||||
article = Article(
|
||||
url=sample_article_data.url,
|
||||
title=sample_article_data.title,
|
||||
raw_text=sample_article_data.raw_text,
|
||||
)
|
||||
|
||||
assert article.url == sample_article_data.url
|
||||
assert article.title == sample_article_data.title
|
||||
assert article.status == ProcessingStatus.PENDING
|
||||
assert article.simplified_text is None
|
||||
assert isinstance(article.created_at, datetime)
|
||||
|
||||
def test_mark_processing(self, sample_article):
|
||||
article = sample_article
|
||||
original_updated = article.updated_at
|
||||
|
||||
article.mark_processing()
|
||||
|
||||
assert article.status == ProcessingStatus.PROCESSING
|
||||
assert article.updated_at != original_updated
|
||||
|
||||
def test_mark_completed(self, sample_article):
|
||||
article = sample_article
|
||||
simplified_text = "Упрощённый текст"
|
||||
|
||||
article.mark_completed(
|
||||
simplified_text=simplified_text,
|
||||
token_count_raw=100,
|
||||
token_count_simplified=50,
|
||||
processing_time=2.5,
|
||||
)
|
||||
|
||||
assert article.status == ProcessingStatus.COMPLETED
|
||||
assert article.simplified_text == simplified_text
|
||||
assert article.token_count_raw == 100
|
||||
assert article.token_count_simplified == 50
|
||||
assert article.processing_time_seconds == 2.5
|
||||
assert article.error_message is None
|
||||
assert article.updated_at is not None
|
||||
|
||||
def test_mark_failed(self, sample_article):
|
||||
article = sample_article
|
||||
error_message = "Тестовая ошибка"
|
||||
|
||||
article.mark_failed(error_message)
|
||||
|
||||
assert article.status == ProcessingStatus.FAILED
|
||||
assert article.error_message == error_message
|
||||
assert article.updated_at is not None
|
||||
|
||||
def test_mark_failed_long_error(self, sample_article):
|
||||
article = sample_article
|
||||
long_error = "x" * 1500
|
||||
|
||||
article.mark_failed(long_error)
|
||||
|
||||
assert len(article.error_message) == 1000
|
||||
assert article.error_message == "x" * 1000
|
||||
assert isinstance(config.db_path, str)
|
||||
assert Path(config.db_path).suffix == ".db"
|
||||
assert isinstance(config.openai_model, str)
|
||||
assert config.openai_model.startswith("gpt")
|
||||
assert isinstance(config.chunk_size, int)
|
||||
assert config.chunk_size > 0
|
||||
assert isinstance(config.chunk_overlap, int)
|
||||
assert config.chunk_overlap >= 0
|
||||
assert isinstance(config.max_concurrent_llm, int)
|
||||
assert config.max_concurrent_llm > 0
|
||||
|
||||
|
||||
class TestSimplifyCommand:
|
||||
|
||||
def test_command_creation(self):
|
||||
url = "https://ru.wikipedia.org/wiki/Тест"
|
||||
command = SimplifyCommand(url=url)
|
||||
def test_simplify_command_creation(self):
|
||||
command = SimplifyCommand(
|
||||
url="https://ru.ruwiki.ru/wiki/Test",
|
||||
force_reprocess=False,
|
||||
)
|
||||
|
||||
assert command.url == url
|
||||
assert command.url == "https://ru.ruwiki.ru/wiki/Test"
|
||||
assert command.force_reprocess is False
|
||||
|
||||
def test_command_with_force(self):
|
||||
url = "https://ru.wikipedia.org/wiki/Тест"
|
||||
command = SimplifyCommand(url=url, force_reprocess=True)
|
||||
def test_simplify_command_with_force(self):
|
||||
command = SimplifyCommand(
|
||||
url="https://ru.ruwiki.ru/wiki/Test",
|
||||
force_reprocess=True,
|
||||
)
|
||||
|
||||
assert command.url == url
|
||||
assert command.force_reprocess is True
|
||||
|
||||
def test_command_string_representation(self):
|
||||
url = "https://ru.wikipedia.org/wiki/Тест"
|
||||
command = SimplifyCommand(url=url, force_reprocess=True)
|
||||
|
||||
expected = f"SimplifyCommand(url='{url}', force=True)"
|
||||
assert str(command) == expected
|
||||
|
||||
|
||||
class TestProcessingResult:
|
||||
|
||||
def test_success_result_creation(self):
|
||||
def test_success_result(self):
|
||||
result = ProcessingResult.success_result(
|
||||
url="https://ru.wikipedia.org/wiki/Тест",
|
||||
title="Тест",
|
||||
raw_text="Исходный текст",
|
||||
simplified_text="Упрощённый текст",
|
||||
url="https://ru.ruwiki.ru/wiki/Test",
|
||||
title="Test",
|
||||
raw_text="Raw content",
|
||||
simplified_text="Simplified content",
|
||||
token_count_raw=100,
|
||||
token_count_simplified=50,
|
||||
processing_time_seconds=2.5,
|
||||
processing_time_seconds=1.5,
|
||||
)
|
||||
|
||||
assert result.success is True
|
||||
assert result.url == "https://ru.wikipedia.org/wiki/Тест"
|
||||
assert result.title == "Тест"
|
||||
assert result.raw_text == "Исходный текст"
|
||||
assert result.simplified_text == "Упрощённый текст"
|
||||
assert result.url == "https://ru.ruwiki.ru/wiki/Test"
|
||||
assert result.title == "Test"
|
||||
assert result.simplified_text == "Simplified content"
|
||||
assert result.token_count_raw == 100
|
||||
assert result.token_count_simplified == 50
|
||||
assert result.processing_time_seconds == 2.5
|
||||
assert result.processing_time_seconds == 1.5
|
||||
assert result.error_message is None
|
||||
|
||||
def test_failure_result_creation(self):
|
||||
def test_failure_result(self):
|
||||
result = ProcessingResult.failure_result(
|
||||
url="https://ru.wikipedia.org/wiki/Тест",
|
||||
error_message="Тестовая ошибка",
|
||||
url="https://ru.ruwiki.ru/wiki/Test",
|
||||
error_message="Processing failed",
|
||||
)
|
||||
|
||||
assert result.success is False
|
||||
assert result.url == "https://ru.wikipedia.org/wiki/Тест"
|
||||
assert result.error_message == "Тестовая ошибка"
|
||||
assert result.url == "https://ru.ruwiki.ru/wiki/Test"
|
||||
assert result.error_message == "Processing failed"
|
||||
assert result.title is None
|
||||
assert result.raw_text is None
|
||||
assert result.simplified_text is None
|
||||
|
||||
|
||||
class TestProcessingStats:
|
||||
|
||||
def test_initial_stats(self):
|
||||
stats = ProcessingStats()
|
||||
|
||||
assert stats.total_processed == 0
|
||||
assert stats.successful == 0
|
||||
assert stats.failed == 0
|
||||
assert stats.skipped == 0
|
||||
assert stats.success_rate == 0.0
|
||||
assert stats.average_processing_time == 0.0
|
||||
|
||||
def test_add_successful_result(self):
|
||||
stats = ProcessingStats()
|
||||
result = ProcessingResult.success_result(
|
||||
url="test",
|
||||
title="Test",
|
||||
raw_text="text",
|
||||
simplified_text="simple",
|
||||
token_count_raw=100,
|
||||
token_count_simplified=50,
|
||||
processing_time_seconds=2.0,
|
||||
)
|
||||
|
||||
stats.add_result(result)
|
||||
|
||||
assert stats.total_processed == 1
|
||||
assert stats.successful == 1
|
||||
assert stats.failed == 0
|
||||
assert stats.success_rate == 100.0
|
||||
assert stats.average_processing_time == 2.0
|
||||
|
||||
def test_add_failed_result(self):
|
||||
stats = ProcessingStats()
|
||||
result = ProcessingResult.failure_result("test", "error")
|
||||
|
||||
stats.add_result(result)
|
||||
|
||||
assert stats.total_processed == 1
|
||||
assert stats.successful == 0
|
||||
assert stats.failed == 1
|
||||
assert stats.success_rate == 0.0
|
||||
|
||||
def test_mixed_results(self):
|
||||
stats = ProcessingStats()
|
||||
|
||||
success_result = ProcessingResult.success_result(
|
||||
url="test1",
|
||||
title="Test1",
|
||||
raw_text="text",
|
||||
simplified_text="simple",
|
||||
token_count_raw=100,
|
||||
token_count_simplified=50,
|
||||
processing_time_seconds=3.0,
|
||||
)
|
||||
stats.add_result(success_result)
|
||||
|
||||
failure_result = ProcessingResult.failure_result("test2", "error")
|
||||
stats.add_result(failure_result)
|
||||
|
||||
success_result2 = ProcessingResult.success_result(
|
||||
url="test3",
|
||||
title="Test3",
|
||||
raw_text="text",
|
||||
simplified_text="simple",
|
||||
token_count_raw=100,
|
||||
token_count_simplified=50,
|
||||
processing_time_seconds=1.0,
|
||||
)
|
||||
stats.add_result(success_result2)
|
||||
|
||||
assert stats.total_processed == 3
|
||||
assert stats.successful == 2
|
||||
assert stats.failed == 1
|
||||
assert stats.success_rate == pytest.approx(66.67, rel=1e-2)
|
||||
assert stats.average_processing_time == 2.0
|
||||
|
||||
def test_add_skipped(self):
|
||||
stats = ProcessingStats()
|
||||
|
||||
stats.add_skipped()
|
||||
stats.add_skipped()
|
||||
|
||||
assert stats.skipped == 2
|
||||
assert result.token_count_raw is None
|
||||
assert result.token_count_simplified is None
|
||||
assert result.processing_time_seconds is None
|
||||
|
|
|
@ -1,370 +1,312 @@
|
|||
"""Тесты для сервисов."""
|
||||
|
||||
import tempfile
|
||||
from pathlib import Path
|
||||
from unittest.mock import AsyncMock, MagicMock
|
||||
import asyncio
|
||||
from datetime import datetime, timezone
|
||||
from unittest.mock import AsyncMock, MagicMock, patch
|
||||
|
||||
import pytest
|
||||
import pytest_asyncio
|
||||
|
||||
from src.adapters import LLMProviderAdapter, RuWikiAdapter
|
||||
from src.adapters.ruwiki import WikiPageInfo
|
||||
from src.models import ProcessingResult, SimplifyCommand
|
||||
from src.services import (
|
||||
AsyncWriteQueue,
|
||||
DatabaseService,
|
||||
RecursiveCharacterTextSplitter,
|
||||
SimplifyService,
|
||||
)
|
||||
|
||||
|
||||
class TestRecursiveCharacterTextSplitter:
|
||||
def test_split_short_text(self):
|
||||
splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=20)
|
||||
|
||||
short_text = "Это короткий текст."
|
||||
chunks = splitter.split_text(short_text)
|
||||
|
||||
assert len(chunks) == 1
|
||||
assert chunks[0] == short_text
|
||||
|
||||
def test_split_long_text(self):
|
||||
splitter = RecursiveCharacterTextSplitter(chunk_size=50, chunk_overlap=10)
|
||||
|
||||
long_text = "Это очень длинный текст. " * 10
|
||||
chunks = splitter.split_text(long_text)
|
||||
|
||||
assert len(chunks) > 1
|
||||
|
||||
for chunk in chunks:
|
||||
assert len(chunk) <= 60
|
||||
|
||||
def test_split_by_paragraphs(self):
|
||||
splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=10)
|
||||
|
||||
text = "Первый абзац.\n\nВторой абзац.\n\nТретий абзац."
|
||||
chunks = splitter.split_text(text)
|
||||
|
||||
assert len(chunks) >= 2
|
||||
|
||||
def test_split_empty_text(self):
|
||||
splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=20)
|
||||
|
||||
chunks = splitter.split_text("")
|
||||
assert chunks == []
|
||||
|
||||
def test_custom_length_function(self):
|
||||
def word_count(text: str) -> int:
|
||||
return len(text.split())
|
||||
|
||||
splitter = RecursiveCharacterTextSplitter(
|
||||
chunk_size=5,
|
||||
chunk_overlap=2,
|
||||
length_function=word_count,
|
||||
)
|
||||
|
||||
text = "Один два три четыре пять шесть семь восемь девять десять"
|
||||
chunks = splitter.split_text(text)
|
||||
|
||||
assert len(chunks) > 1
|
||||
|
||||
for chunk in chunks:
|
||||
word_count_in_chunk = len(chunk.split())
|
||||
assert word_count_in_chunk <= 7
|
||||
|
||||
def test_create_chunks_with_metadata(self):
|
||||
splitter = RecursiveCharacterTextSplitter(chunk_size=50, chunk_overlap=10)
|
||||
|
||||
text = "Это тестовый текст. " * 10
|
||||
title = "Тестовая статья"
|
||||
|
||||
chunks_with_metadata = splitter.create_chunks_with_metadata(text, title)
|
||||
|
||||
assert len(chunks_with_metadata) > 1
|
||||
|
||||
for i, chunk_data in enumerate(chunks_with_metadata):
|
||||
assert "text" in chunk_data
|
||||
assert chunk_data["title"] == title
|
||||
assert chunk_data["chunk_index"] == i
|
||||
assert chunk_data["total_chunks"] == len(chunks_with_metadata)
|
||||
assert "chunk_size" in chunk_data
|
||||
from src.models.article_dto import ArticleDTO, ArticleStatus
|
||||
from src.services import ArticleRepository, AsyncWriteQueue, DatabaseService, SimplifyService
|
||||
|
||||
|
||||
class TestDatabaseService:
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_initialize_database(self, test_config):
|
||||
db_service = DatabaseService(test_config)
|
||||
async def test_database_initialization(self, database_service: DatabaseService):
|
||||
health = await database_service.health_check()
|
||||
assert health is True
|
||||
|
||||
await db_service.initialize_database()
|
||||
|
||||
assert Path(test_config.db_path).exists()
|
||||
|
||||
assert await db_service.health_check() is True
|
||||
|
||||
db_service.close()
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_get_connection(self, test_config):
|
||||
db_service = DatabaseService(test_config)
|
||||
await db_service.initialize_database()
|
||||
|
||||
async with db_service.get_connection() as conn:
|
||||
async def test_get_connection(self, database_service: DatabaseService):
|
||||
async with await database_service.get_connection() as conn:
|
||||
cursor = await conn.execute("SELECT 1")
|
||||
result = await cursor.fetchone()
|
||||
assert result[0] == 1
|
||||
|
||||
db_service.close()
|
||||
async def test_health_check_success(self, database_service: DatabaseService):
|
||||
result = await database_service.health_check()
|
||||
assert result is True
|
||||
|
||||
async def test_health_check_failure(self, test_config):
|
||||
test_config.db_path = "/invalid/path/database.db"
|
||||
service = DatabaseService(test_config)
|
||||
|
||||
result = await service.health_check()
|
||||
assert result is False
|
||||
|
||||
|
||||
class TestArticleRepository:
|
||||
|
||||
async def test_create_article(self, repository: ArticleRepository):
|
||||
article = await repository.create_article(
|
||||
url="https://ru.ruwiki.ru/wiki/Test",
|
||||
title="Test Article",
|
||||
raw_text="Test content",
|
||||
)
|
||||
|
||||
assert article.id is not None
|
||||
assert article.url == "https://ru.ruwiki.ru/wiki/Test"
|
||||
assert article.title == "Test Article"
|
||||
assert article.raw_text == "Test content"
|
||||
assert article.status == ArticleStatus.PENDING
|
||||
assert article.simplified_text is None
|
||||
|
||||
async def test_create_duplicate_article(self, repository: ArticleRepository):
|
||||
url = "https://ru.ruwiki.ru/wiki/Test"
|
||||
|
||||
await repository.create_article(
|
||||
url=url,
|
||||
title="Test Article",
|
||||
raw_text="Test content",
|
||||
)
|
||||
|
||||
with pytest.raises(ValueError, match="уже существует"):
|
||||
await repository.create_article(
|
||||
url=url,
|
||||
title="Duplicate",
|
||||
raw_text="Different content",
|
||||
)
|
||||
|
||||
async def test_get_by_id(self, repository: ArticleRepository, sample_article_in_db: ArticleDTO):
|
||||
article = sample_article_in_db
|
||||
|
||||
retrieved = await repository.get_by_id(article.id)
|
||||
assert retrieved is not None
|
||||
assert retrieved.id == article.id
|
||||
assert retrieved.url == article.url
|
||||
|
||||
async def test_get_by_id_not_found(self, repository: ArticleRepository):
|
||||
result = await repository.get_by_id(99999)
|
||||
assert result is None
|
||||
|
||||
async def test_get_by_url(
|
||||
self, repository: ArticleRepository, sample_article_in_db: ArticleDTO
|
||||
):
|
||||
article = sample_article_in_db
|
||||
|
||||
retrieved = await repository.get_by_url(article.url)
|
||||
assert retrieved is not None
|
||||
assert retrieved.id == article.id
|
||||
assert retrieved.url == article.url
|
||||
|
||||
async def test_get_by_url_not_found(self, repository: ArticleRepository):
|
||||
result = await repository.get_by_url("https://ru.ruwiki.ru/wiki/NonExistent")
|
||||
assert result is None
|
||||
|
||||
async def test_update_article(
|
||||
self, repository: ArticleRepository, sample_article_in_db: ArticleDTO
|
||||
):
|
||||
article = sample_article_in_db
|
||||
|
||||
article.simplified_text = "Simplified content"
|
||||
article.status = ArticleStatus.SIMPLIFIED
|
||||
|
||||
updated = await repository.update_article(article)
|
||||
|
||||
assert updated.simplified_text == "Simplified content"
|
||||
assert updated.status == ArticleStatus.SIMPLIFIED
|
||||
assert updated.updated_at is not None
|
||||
|
||||
async def test_update_nonexistent_article(self, repository: ArticleRepository):
|
||||
article = ArticleDTO(
|
||||
id=99999,
|
||||
url="https://ru.ruwiki.ru/wiki/Test",
|
||||
title="Test",
|
||||
raw_text="Content",
|
||||
status=ArticleStatus.PENDING,
|
||||
created_at=datetime.now(timezone.utc),
|
||||
)
|
||||
|
||||
with pytest.raises(ValueError, match="не найдена"):
|
||||
await repository.update_article(article)
|
||||
|
||||
async def test_get_articles_by_status(self, repository: ArticleRepository):
|
||||
article1 = await repository.create_article(
|
||||
url="https://ru.ruwiki.ru/wiki/Test1",
|
||||
title="Test 1",
|
||||
raw_text="Content 1",
|
||||
)
|
||||
|
||||
article2 = await repository.create_article(
|
||||
url="https://ru.ruwiki.ru/wiki/Test2",
|
||||
title="Test 2",
|
||||
raw_text="Content 2",
|
||||
)
|
||||
|
||||
article2.status = ArticleStatus.SIMPLIFIED
|
||||
await repository.update_article(article2)
|
||||
|
||||
pending = await repository.get_articles_by_status(ArticleStatus.PENDING)
|
||||
assert len(pending) == 1
|
||||
assert pending[0].id == article1.id
|
||||
|
||||
simplified = await repository.get_articles_by_status(ArticleStatus.SIMPLIFIED)
|
||||
assert len(simplified) == 1
|
||||
assert simplified[0].id == article2.id
|
||||
|
||||
async def test_count_by_status(self, repository: ArticleRepository):
|
||||
count = await repository.count_by_status(ArticleStatus.PENDING)
|
||||
assert count == 0
|
||||
|
||||
await repository.create_article(
|
||||
url="https://ru.ruwiki.ru/wiki/Test1",
|
||||
title="Test 1",
|
||||
raw_text="Content 1",
|
||||
)
|
||||
await repository.create_article(
|
||||
url="https://ru.ruwiki.ru/wiki/Test2",
|
||||
title="Test 2",
|
||||
raw_text="Content 2",
|
||||
)
|
||||
|
||||
count = await repository.count_by_status(ArticleStatus.PENDING)
|
||||
assert count == 2
|
||||
|
||||
async def test_delete_article(
|
||||
self, repository: ArticleRepository, sample_article_in_db: ArticleDTO
|
||||
):
|
||||
article = sample_article_in_db
|
||||
|
||||
deleted = await repository.delete_article(article.id)
|
||||
assert deleted is True
|
||||
|
||||
retrieved = await repository.get_by_id(article.id)
|
||||
assert retrieved is None
|
||||
|
||||
async def test_delete_nonexistent_article(self, repository: ArticleRepository):
|
||||
deleted = await repository.delete_article(99999)
|
||||
assert deleted is False
|
||||
|
||||
|
||||
class TestAsyncWriteQueue:
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_start_stop(self):
|
||||
mock_repository = AsyncMock()
|
||||
queue = AsyncWriteQueue(mock_repository, max_batch_size=5)
|
||||
@pytest_asyncio.fixture
|
||||
async def write_queue(self, repository: ArticleRepository) -> AsyncWriteQueue:
|
||||
queue = AsyncWriteQueue(repository, max_batch_size=2)
|
||||
await queue.start()
|
||||
yield queue
|
||||
await queue.stop()
|
||||
|
||||
async def test_write_queue_startup_shutdown(self, repository: ArticleRepository):
|
||||
queue = AsyncWriteQueue(repository)
|
||||
|
||||
await queue.start()
|
||||
assert queue._worker_task is not None
|
||||
assert not queue._worker_task.done()
|
||||
|
||||
await queue.stop(timeout=1.0)
|
||||
await queue.stop()
|
||||
assert queue._worker_task.done()
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_update_from_result_success(self, sample_article, simplified_text):
|
||||
mock_repository = AsyncMock()
|
||||
mock_repository.get_by_url.return_value = sample_article
|
||||
mock_repository.update_article.return_value = sample_article
|
||||
async def test_update_article_operation(
|
||||
self, write_queue: AsyncWriteQueue, sample_article_in_db: ArticleDTO
|
||||
):
|
||||
article = sample_article_in_db
|
||||
article.simplified_text = "Updated content"
|
||||
|
||||
queue = AsyncWriteQueue(mock_repository, max_batch_size=1)
|
||||
await queue.start()
|
||||
await write_queue.update_article(article)
|
||||
|
||||
try:
|
||||
result = ProcessingResult.success_result(
|
||||
url=sample_article.url,
|
||||
title=sample_article.title,
|
||||
raw_text=sample_article.raw_text,
|
||||
simplified_text=simplified_text,
|
||||
token_count_raw=100,
|
||||
token_count_simplified=50,
|
||||
processing_time_seconds=2.0,
|
||||
)
|
||||
await asyncio.sleep(0.2)
|
||||
|
||||
updated_article = await queue.update_from_result(result)
|
||||
retrieved = await write_queue.repository.get_by_id(article.id)
|
||||
assert retrieved.simplified_text == "Updated content"
|
||||
|
||||
assert updated_article.simplified_text == simplified_text
|
||||
mock_repository.get_by_url.assert_called_once_with(sample_article.url)
|
||||
mock_repository.update_article.assert_called_once()
|
||||
async def test_update_from_result_success(
|
||||
self, write_queue: AsyncWriteQueue, sample_article_in_db: ArticleDTO
|
||||
):
|
||||
article = sample_article_in_db
|
||||
|
||||
finally:
|
||||
await queue.stop(timeout=1.0)
|
||||
result = ProcessingResult.success_result(
|
||||
url=article.url,
|
||||
title=article.title,
|
||||
raw_text=article.raw_text,
|
||||
simplified_text="Processed content",
|
||||
token_count_raw=100,
|
||||
token_count_simplified=50,
|
||||
processing_time_seconds=1.0,
|
||||
)
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_update_from_result_failure(self, sample_article):
|
||||
mock_repository = AsyncMock()
|
||||
mock_repository.get_by_url.return_value = sample_article
|
||||
mock_repository.update_article.return_value = sample_article
|
||||
updated_article = await write_queue.update_from_result(result)
|
||||
|
||||
queue = AsyncWriteQueue(mock_repository, max_batch_size=1)
|
||||
await queue.start()
|
||||
assert updated_article.simplified_text == "Processed content"
|
||||
assert updated_article.status == ArticleStatus.SIMPLIFIED
|
||||
|
||||
try:
|
||||
result = ProcessingResult.failure_result(
|
||||
url=sample_article.url,
|
||||
error_message="Тестовая ошибка",
|
||||
)
|
||||
async def test_update_from_result_failure(
|
||||
self, write_queue: AsyncWriteQueue, sample_article_in_db: ArticleDTO
|
||||
):
|
||||
article = sample_article_in_db
|
||||
|
||||
updated_article = await queue.update_from_result(result)
|
||||
result = ProcessingResult.failure_result(
|
||||
url=article.url,
|
||||
error_message="Processing failed",
|
||||
)
|
||||
|
||||
assert updated_article.error_message == "Тестовая ошибка"
|
||||
mock_repository.update_article.assert_called_once()
|
||||
updated_article = await write_queue.update_from_result(result)
|
||||
|
||||
finally:
|
||||
await queue.stop(timeout=1.0)
|
||||
assert updated_article.status == ArticleStatus.FAILED
|
||||
|
||||
def test_stats(self):
|
||||
mock_repository = AsyncMock()
|
||||
queue = AsyncWriteQueue(mock_repository)
|
||||
|
||||
stats = queue.stats
|
||||
async def test_queue_stats(self, write_queue: AsyncWriteQueue):
|
||||
stats = write_queue.stats
|
||||
|
||||
assert "total_operations" in stats
|
||||
assert "failed_operations" in stats
|
||||
assert "queue_size" in stats
|
||||
assert stats["total_operations"] == 0
|
||||
assert "success_rate" in stats
|
||||
|
||||
|
||||
class TestSimplifyService:
|
||||
|
||||
@pytest.fixture
|
||||
def mock_adapters_and_queue(self, test_config):
|
||||
mock_ruwiki = AsyncMock(spec=RuWikiAdapter)
|
||||
mock_llm = AsyncMock(spec=LLMProviderAdapter)
|
||||
mock_repository = AsyncMock()
|
||||
mock_write_queue = AsyncMock()
|
||||
|
||||
return mock_ruwiki, mock_llm, mock_repository, mock_write_queue
|
||||
|
||||
def test_service_initialization(self, test_config, mock_adapters_and_queue):
|
||||
mock_ruwiki, mock_llm, mock_repository, mock_write_queue = mock_adapters_and_queue
|
||||
@pytest_asyncio.fixture
|
||||
async def simplify_service(self, test_config, repository: ArticleRepository) -> SimplifyService:
|
||||
ruwiki_adapter = AsyncMock()
|
||||
llm_adapter = AsyncMock()
|
||||
write_queue = AsyncMock()
|
||||
|
||||
service = SimplifyService(
|
||||
config=test_config,
|
||||
ruwiki_adapter=mock_ruwiki,
|
||||
llm_adapter=mock_llm,
|
||||
repository=mock_repository,
|
||||
write_queue=mock_write_queue,
|
||||
ruwiki_adapter=ruwiki_adapter,
|
||||
llm_adapter=llm_adapter,
|
||||
repository=repository,
|
||||
write_queue=write_queue,
|
||||
)
|
||||
|
||||
assert service.config == test_config
|
||||
assert service.ruwiki_adapter == mock_ruwiki
|
||||
assert service.llm_adapter == mock_llm
|
||||
assert isinstance(service.text_splitter, RecursiveCharacterTextSplitter)
|
||||
return service
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_get_prompt_template(self, test_config, mock_adapters_and_queue):
|
||||
mock_ruwiki, mock_llm, mock_repository, mock_write_queue = mock_adapters_and_queue
|
||||
async def test_get_prompt_template(self, simplify_service: SimplifyService):
|
||||
with patch("pathlib.Path.exists", return_value=True):
|
||||
with patch("pathlib.Path.read_text", return_value="Test prompt"):
|
||||
template = await simplify_service.get_prompt_template()
|
||||
assert template == "Test prompt"
|
||||
|
||||
with tempfile.NamedTemporaryFile(mode="w", suffix=".txt", delete=False) as f:
|
||||
f.write("### role: system\nТы помощник")
|
||||
temp_prompt_path = f.name
|
||||
|
||||
test_config.prompt_template_path = temp_prompt_path
|
||||
|
||||
service = SimplifyService(
|
||||
config=test_config,
|
||||
ruwiki_adapter=mock_ruwiki,
|
||||
llm_adapter=mock_llm,
|
||||
repository=mock_repository,
|
||||
write_queue=mock_write_queue,
|
||||
async def test_check_existing_article(self, simplify_service: SimplifyService):
|
||||
existing_article = ArticleDTO(
|
||||
id=1,
|
||||
url="https://ru.ruwiki.ru/wiki/Test",
|
||||
title="Test",
|
||||
raw_text="Content",
|
||||
simplified_text="Simplified",
|
||||
status=ArticleStatus.SIMPLIFIED,
|
||||
created_at=datetime.now(timezone.utc),
|
||||
)
|
||||
|
||||
try:
|
||||
template = await service.get_prompt_template()
|
||||
assert "### role: system" in template
|
||||
assert "Ты помощник" in template
|
||||
simplify_service.repository.get_by_url = AsyncMock(return_value=existing_article)
|
||||
|
||||
template2 = await service.get_prompt_template()
|
||||
assert template == template2
|
||||
|
||||
finally:
|
||||
Path(temp_prompt_path).unlink(missing_ok=True)
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_get_prompt_template_not_found(self, test_config, mock_adapters_and_queue):
|
||||
mock_ruwiki, mock_llm, mock_repository, mock_write_queue = mock_adapters_and_queue
|
||||
|
||||
test_config.prompt_template_path = "nonexistent.txt"
|
||||
|
||||
service = SimplifyService(
|
||||
config=test_config,
|
||||
ruwiki_adapter=mock_ruwiki,
|
||||
llm_adapter=mock_llm,
|
||||
repository=mock_repository,
|
||||
write_queue=mock_write_queue,
|
||||
)
|
||||
|
||||
with pytest.raises(FileNotFoundError):
|
||||
await service.get_prompt_template()
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_process_command_success(
|
||||
self, test_config, mock_adapters_and_queue, sample_wikitext, simplified_text
|
||||
):
|
||||
mock_ruwiki, mock_llm, mock_repository, mock_write_queue = mock_adapters_and_queue
|
||||
|
||||
wiki_page_info = WikiPageInfo(
|
||||
title="Тест",
|
||||
content=sample_wikitext,
|
||||
)
|
||||
mock_ruwiki.fetch_page_cleaned.return_value = wiki_page_info
|
||||
mock_llm.simplify_text.return_value = (simplified_text, 100, 50)
|
||||
mock_llm.count_tokens.return_value = 100
|
||||
|
||||
mock_repository.get_by_url.return_value = None
|
||||
mock_repository.create_article.return_value = MagicMock(id=1)
|
||||
mock_repository.update_article.return_value = MagicMock()
|
||||
|
||||
mock_write_queue.update_from_result.return_value = MagicMock()
|
||||
|
||||
with tempfile.NamedTemporaryFile(mode="w", suffix=".txt", delete=False) as f:
|
||||
f.write("### role: user\n{wiki_source_text}")
|
||||
test_config.prompt_template_path = f.name
|
||||
|
||||
service = SimplifyService(
|
||||
config=test_config,
|
||||
ruwiki_adapter=mock_ruwiki,
|
||||
llm_adapter=mock_llm,
|
||||
repository=mock_repository,
|
||||
write_queue=mock_write_queue,
|
||||
)
|
||||
|
||||
try:
|
||||
command = SimplifyCommand(url="https://ru.wikipedia.org/wiki/Тест")
|
||||
result = await service.process_command(command)
|
||||
|
||||
assert result.success is True
|
||||
assert result.title == "Тест"
|
||||
assert result.simplified_text == simplified_text
|
||||
assert result.token_count_raw == 100
|
||||
assert result.token_count_simplified == 50
|
||||
|
||||
mock_ruwiki.fetch_page_cleaned.assert_called_once()
|
||||
mock_llm.simplify_text.assert_called_once()
|
||||
mock_write_queue.update_from_result.assert_called_once()
|
||||
|
||||
finally:
|
||||
Path(test_config.prompt_template_path).unlink(missing_ok=True)
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_process_command_skip_existing(
|
||||
self, test_config, mock_adapters_and_queue, completed_article
|
||||
):
|
||||
mock_ruwiki, mock_llm, mock_repository, mock_write_queue = mock_adapters_and_queue
|
||||
|
||||
mock_repository.get_by_url.return_value = completed_article
|
||||
|
||||
service = SimplifyService(
|
||||
config=test_config,
|
||||
ruwiki_adapter=mock_ruwiki,
|
||||
llm_adapter=mock_llm,
|
||||
repository=mock_repository,
|
||||
write_queue=mock_write_queue,
|
||||
)
|
||||
|
||||
command = SimplifyCommand(url=completed_article.url, force_reprocess=False)
|
||||
result = await service.process_command(command)
|
||||
result = await simplify_service._check_existing_article("https://ru.ruwiki.ru/wiki/Test")
|
||||
|
||||
assert result is not None
|
||||
assert result.success is True
|
||||
assert result.title == completed_article.title
|
||||
assert result.simplified_text == "Simplified"
|
||||
|
||||
mock_ruwiki.fetch_page_cleaned.assert_not_called()
|
||||
mock_llm.simplify_text.assert_not_called()
|
||||
async def test_check_existing_article_not_found(self, simplify_service: SimplifyService):
|
||||
simplify_service.repository.get_by_url = AsyncMock(return_value=None)
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_health_check(self, test_config, mock_adapters_and_queue):
|
||||
mock_ruwiki, mock_llm, mock_repository, mock_write_queue = mock_adapters_and_queue
|
||||
result = await simplify_service._check_existing_article("https://ru.ruwiki.ru/wiki/Test")
|
||||
|
||||
mock_ruwiki.health_check.return_value = True
|
||||
mock_llm.health_check.return_value = True
|
||||
assert result is None
|
||||
|
||||
with tempfile.NamedTemporaryFile(mode="w", suffix=".txt", delete=False) as f:
|
||||
f.write("test prompt")
|
||||
test_config.prompt_template_path = f.name
|
||||
async def test_health_check(self, simplify_service: SimplifyService):
|
||||
simplify_service.ruwiki_adapter.health_check = AsyncMock()
|
||||
simplify_service.llm_adapter.health_check = AsyncMock()
|
||||
|
||||
service = SimplifyService(
|
||||
config=test_config,
|
||||
ruwiki_adapter=mock_ruwiki,
|
||||
llm_adapter=mock_llm,
|
||||
repository=mock_repository,
|
||||
write_queue=mock_write_queue,
|
||||
)
|
||||
checks = await simplify_service.health_check()
|
||||
|
||||
try:
|
||||
checks = await service.health_check()
|
||||
|
||||
assert checks["ruwiki"] is True
|
||||
assert checks["llm"] is True
|
||||
assert checks["prompt_template"] is True
|
||||
|
||||
finally:
|
||||
Path(test_config.prompt_template_path).unlink(missing_ok=True)
|
||||
assert "ruwiki" in checks
|
||||
assert "llm" in checks
|
||||
assert "prompt_template" in checks
|
||||
|
|
Loading…
Reference in New Issue