diff --git a/我的论文/中间稿 v1.docx b/10_我的论文/中间稿 v1.docx similarity index 100% rename from 我的论文/中间稿 v1.docx rename to 10_我的论文/中间稿 v1.docx diff --git a/我的论文/中间稿.docx b/10_我的论文/中间稿.docx similarity index 100% rename from 我的论文/中间稿.docx rename to 10_我的论文/中间稿.docx diff --git a/我的论文/飞机稿_20260130.docx b/10_我的论文/飞机稿_20260130.docx similarity index 100% rename from 我的论文/飞机稿_20260130.docx rename to 10_我的论文/飞机稿_20260130.docx diff --git a/标杆论文/大纲与逻辑结构.md b/20_标杆论文/大纲与逻辑结构.md similarity index 100% rename from 标杆论文/大纲与逻辑结构.md rename to 20_标杆论文/大纲与逻辑结构.md diff --git a/标杆论文.pdf b/20_标杆论文/标杆论文.pdf similarity index 100% rename from 标杆论文.pdf rename to 20_标杆论文/标杆论文.pdf diff --git a/标杆论文/标杆论文的大纲.md b/20_标杆论文/标杆论文的大纲.md similarity index 100% rename from 标杆论文/标杆论文的大纲.md rename to 20_标杆论文/标杆论文的大纲.md diff --git a/评审输出/README.md b/30_评审输出/README.md similarity index 100% rename from 评审输出/README.md rename to 30_评审输出/README.md diff --git a/评审输出/标杆论文/00_MBA学位论文深度评价体系_基于标杆论文提炼_v1.md b/30_评审输出/标杆论文/00_MBA学位论文深度评价体系_基于标杆论文提炼_v1.md similarity index 100% rename from 评审输出/标杆论文/00_MBA学位论文深度评价体系_基于标杆论文提炼_v1.md rename to 30_评审输出/标杆论文/00_MBA学位论文深度评价体系_基于标杆论文提炼_v1.md diff --git a/评审输出/飞机稿_20260130/00_标杆画像_结构-论证-文风.md b/30_评审输出/飞机稿_20260130/00_标杆画像_结构-论证-文风.md similarity index 100% rename from 评审输出/飞机稿_20260130/00_标杆画像_结构-论证-文风.md rename to 30_评审输出/飞机稿_20260130/00_标杆画像_结构-论证-文风.md diff --git a/评审输出/飞机稿_20260130/01_对标评审_评分-问题清单.md b/30_评审输出/飞机稿_20260130/01_对标评审_评分-问题清单.md similarity index 100% rename from 评审输出/飞机稿_20260130/01_对标评审_评分-问题清单.md rename to 30_评审输出/飞机稿_20260130/01_对标评审_评分-问题清单.md diff --git a/评审输出/飞机稿_20260130/02_修改意见_按章节-按优先级.md b/30_评审输出/飞机稿_20260130/02_修改意见_按章节-按优先级.md similarity index 100% rename from 评审输出/飞机稿_20260130/02_修改意见_按章节-按优先级.md rename to 30_评审输出/飞机稿_20260130/02_修改意见_按章节-按优先级.md diff --git a/评审输出/飞机稿_20260130/03_修改后大纲_v01.md b/30_评审输出/飞机稿_20260130/03_修改后大纲_v01.md similarity index 100% rename from 评审输出/飞机稿_20260130/03_修改后大纲_v01.md rename to 30_评审输出/飞机稿_20260130/03_修改后大纲_v01.md diff --git a/评审输出/飞机稿_20260130/04_修改说明_v01_改动映射表.md b/30_评审输出/飞机稿_20260130/04_修改说明_v01_改动映射表.md similarity index 100% rename from 评审输出/飞机稿_20260130/04_修改说明_v01_改动映射表.md rename to 30_评审输出/飞机稿_20260130/04_修改说明_v01_改动映射表.md diff --git a/评审输出/飞机稿_20260130/raw/benchmark.txt b/30_评审输出/飞机稿_20260130/raw/benchmark.txt similarity index 100% rename from 评审输出/飞机稿_20260130/raw/benchmark.txt rename to 30_评审输出/飞机稿_20260130/raw/benchmark.txt diff --git a/评审输出/飞机稿_20260130/raw/paper.txt b/30_评审输出/飞机稿_20260130/raw/paper.txt similarity index 100% rename from 评审输出/飞机稿_20260130/raw/paper.txt rename to 30_评审输出/飞机稿_20260130/raw/paper.txt diff --git a/01_论文重构需求与结构分析.md b/40_写作过程文档/10_大纲与结构/01_论文重构需求与结构分析.md similarity index 100% rename from 01_论文重构需求与结构分析.md rename to 40_写作过程文档/10_大纲与结构/01_论文重构需求与结构分析.md diff --git a/02_评委反馈分析与新版大纲优化方案.md b/40_写作过程文档/10_大纲与结构/02_评委反馈分析与新版大纲优化方案.md similarity index 100% rename from 02_评委反馈分析与新版大纲优化方案.md rename to 40_写作过程文档/10_大纲与结构/02_评委反馈分析与新版大纲优化方案.md diff --git a/03_论文大纲_v3_深度优化版.md b/40_写作过程文档/10_大纲与结构/03_论文大纲_v3_深度优化版.md similarity index 100% rename from 03_论文大纲_v3_深度优化版.md rename to 40_写作过程文档/10_大纲与结构/03_论文大纲_v3_深度优化版.md diff --git a/04_大纲逻辑自查与标杆对比报告.md b/40_写作过程文档/10_大纲与结构/04_大纲逻辑自查与标杆对比报告.md similarity index 100% rename from 04_大纲逻辑自查与标杆对比报告.md rename to 40_写作过程文档/10_大纲与结构/04_大纲逻辑自查与标杆对比报告.md diff --git a/05_论文大纲_v4_终极定稿.md b/40_写作过程文档/10_大纲与结构/05_论文大纲_v4_终极定稿.md similarity index 100% rename from 05_论文大纲_v4_终极定稿.md rename to 40_写作过程文档/10_大纲与结构/05_论文大纲_v4_终极定稿.md diff --git a/06_论文大纲_v5_融合诊断版.md b/40_写作过程文档/10_大纲与结构/06_论文大纲_v5_融合诊断版.md similarity index 100% rename from 06_论文大纲_v5_融合诊断版.md rename to 40_写作过程文档/10_大纲与结构/06_论文大纲_v5_融合诊断版.md diff --git a/07_论文大纲_v6_标杆复刻版.md b/40_写作过程文档/10_大纲与结构/07_论文大纲_v6_标杆复刻版.md similarity index 100% rename from 07_论文大纲_v6_标杆复刻版.md rename to 40_写作过程文档/10_大纲与结构/07_论文大纲_v6_标杆复刻版.md diff --git a/标杆论文-大纲与写作风格拆解.md b/40_写作过程文档/20_对标与拆解/10_标杆论文-大纲与写作风格拆解.md similarity index 100% rename from 标杆论文-大纲与写作风格拆解.md rename to 40_写作过程文档/20_对标与拆解/10_标杆论文-大纲与写作风格拆解.md diff --git a/MBA_学位论文评审专家_Agent.md b/40_写作过程文档/30_方法与提示/10_MBA_学位论文评审专家_Agent.md similarity index 100% rename from MBA_学位论文评审专家_Agent.md rename to 40_写作过程文档/30_方法与提示/10_MBA_学位论文评审专家_Agent.md diff --git a/常用提示词.md b/40_写作过程文档/30_方法与提示/20_常用提示词.md similarity index 100% rename from 常用提示词.md rename to 40_写作过程文档/30_方法与提示/20_常用提示词.md diff --git a/scripts/extract_paper_text.py b/90_scripts/extract_paper_text.py similarity index 100% rename from scripts/extract_paper_text.py rename to 90_scripts/extract_paper_text.py diff --git a/90_scripts/test_docx_upload.py b/90_scripts/test_docx_upload.py new file mode 100644 index 0000000..9ce0688 --- /dev/null +++ b/90_scripts/test_docx_upload.py @@ -0,0 +1,258 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- + +"""test_docx_upload.py + +Test whether an OpenAI-compatible /v1/chat/completions endpoint supports +direct docx upload + content analysis. + +Flow: +1) ping chat/completions +2) probe /v1/files (many gateways return 404) +3) try two "direct docx" variants: + A) messages[].content as an array with {type: input_file, mime_type, data} + B) top-level files: [{filename, mime_type, data}] +4) fallback: extract docx text locally and send as plain text + +Security: +- API key is read from an environment variable only; never written to disk. +""" + +from __future__ import annotations + +import argparse +import base64 +import datetime as dt +import json +import os +import sys +import textwrap +from pathlib import Path + +import requests + +try: + from docx import Document +except Exception: + Document = None + + +DOCX_MIME = "application/vnd.openxmlformats-officedocument.wordprocessingml.document" + + +def now_stamp() -> str: + return dt.datetime.now().strftime("%Y%m%d_%H%M%S") + + +def safe_write(path: Path, content: str) -> None: + path.parent.mkdir(parents=True, exist_ok=True) + path.write_text(content, encoding="utf-8") + + +def dump_json(path: Path, obj) -> None: + safe_write(path, json.dumps(obj, ensure_ascii=False, indent=2)) + + +def http_head(url: str, headers: dict, timeout: int = 30) -> requests.Response: + return requests.head(url, headers=headers, timeout=timeout, allow_redirects=False) + + +def http_post_json(url: str, headers: dict, payload: dict, timeout: int = 120) -> requests.Response: + return requests.post(url, headers=headers, json=payload, timeout=timeout) + + +def read_docx_text(docx_path: Path) -> str: + if Document is None: + raise RuntimeError("python-docx not installed: pip install python-docx") + doc = Document(str(docx_path)) + paras = [p.text.strip() for p in doc.paragraphs if p.text and p.text.strip()] + return "\n".join(paras) + + +def truncate_text(s: str, max_chars: int) -> str: + if len(s) <= max_chars: + return s + return s[:max_chars] + "\n\n[TRUNCATED] original_len=%d truncated_len=%d" % (len(s), max_chars) + + +def summarize_response(resp: requests.Response) -> str: + ct = resp.headers.get("Content-Type", "") + return "HTTP %s Content-Type=%s len=%d" % (resp.status_code, ct, len(resp.content)) + + +def main() -> int: + ap = argparse.ArgumentParser( + formatter_class=argparse.RawDescriptionHelpFormatter, + description="Test docx upload support for chat/completions gateways", + epilog=textwrap.dedent( + """\ + Example: + export API_KEY='sk-***' + python test_docx_upload.py \ + --api-base 'http://120.24.249.39:18317' \ + --model 'gemini-3-pro-preview' \ + --docx './我的论文/飞机稿_20260130.docx' \ + --out './_upload_test_out' + """ + ), + ) + ap.add_argument("--api-base", required=True, help="e.g. http://120.24.249.39:18317") + ap.add_argument("--model", required=True, help="e.g. gemini-3-pro-preview") + ap.add_argument("--docx", required=True, help="Path to .docx") + ap.add_argument( + "--prompt", + default="请用中文提炼该文档的四级大纲,并说明论证逻辑与文风特点。", + help="Prompt to run against the document", + ) + ap.add_argument( + "--api-key-env", + default="API_KEY", + help="Environment variable name containing API key (default: API_KEY)", + ) + ap.add_argument("--timeout", type=int, default=180, help="POST timeout seconds (default: 180)") + ap.add_argument( + "--max-text-chars", + type=int, + default=60000, + help="Fallback mode: max extracted text characters to send (default: 60000)", + ) + ap.add_argument( + "--out", + default="./_upload_test_out_%s" % now_stamp(), + help="Output directory (default: timestamped)", + ) + args = ap.parse_args() + + api_key = os.getenv(args.api_key_env) + if not api_key: + print("[ERROR] env %s is empty; export %s='...'" % (args.api_key_env, args.api_key_env), file=sys.stderr) + return 2 + + api_base = args.api_base.rstrip("/") + chat_url = api_base + "/v1/chat/completions" + files_url = api_base + "/v1/files" + + outdir = Path(args.out).resolve() + outdir.mkdir(parents=True, exist_ok=True) + + headers = { + "Content-Type": "application/json", + "Authorization": "Bearer %s" % api_key, + } + head_headers = {"Authorization": "Bearer %s" % api_key} + + docx_path = Path(args.docx) + if not docx_path.exists(): + print("[ERROR] docx not found: %s" % docx_path, file=sys.stderr) + return 2 + + docx_bytes = docx_path.read_bytes() + docx_b64 = base64.b64encode(docx_bytes).decode("ascii") + size_mb = len(docx_bytes) / (1024.0 * 1024.0) + print("[INFO] docx=%s size=%.2fMB out=%s" % (docx_path, size_mb, outdir)) + + meta = { + "api_base": api_base, + "chat_url": chat_url, + "files_url": files_url, + "model": args.model, + "docx": str(docx_path), + "docx_size_bytes": len(docx_bytes), + "prompt": args.prompt, + "note": "API key is read from env only; never written to output files.", + } + dump_json(outdir / "00_meta.json", meta) + + # 1) ping + ping_payload = {"model": args.model, "messages": [{"role": "user", "content": "ping"}]} + dump_json(outdir / "01_ping_request.json", ping_payload) + print("[STEP 1] POST %s ping ..." % chat_url) + ping_resp = http_post_json(chat_url, headers=headers, payload=ping_payload, timeout=args.timeout) + safe_write(outdir / "01_ping_response.txt", ping_resp.text) + print("[STEP 1] %s" % summarize_response(ping_resp)) + if ping_resp.status_code != 200: + print("[ERROR] ping failed; see %s" % (outdir / "01_ping_response.txt"), file=sys.stderr) + return 1 + + # 2) probe /v1/files + print("[STEP 2] HEAD %s ..." % files_url) + try: + files_head = http_head(files_url, headers=head_headers, timeout=30) + safe_write(outdir / "02_files_head_status.txt", "%s\n%s\n" % (files_head.status_code, dict(files_head.headers))) + print("[STEP 2] HTTP %s" % files_head.status_code) + except Exception as e: + safe_write(outdir / "02_files_head_status.txt", "EXCEPTION: %r\n" % (e,)) + print("[STEP 2] exception: %r" % (e,)) + + # 3A) input_file in content array + print("[STEP 3A] try messages[].content input_file ...") + payload_a = { + "model": args.model, + "messages": [ + { + "role": "user", + "content": [ + {"type": "text", "text": args.prompt}, + {"type": "input_file", "mime_type": DOCX_MIME, "data": docx_b64}, + ], + } + ], + } + dump_json(outdir / "03A_input_file_request.json", payload_a) + resp_a = http_post_json(chat_url, headers=headers, payload=payload_a, timeout=args.timeout) + safe_write(outdir / "03A_input_file_response.txt", resp_a.text) + print("[STEP 3A] %s" % summarize_response(resp_a)) + if resp_a.status_code == 200: + print("[RESULT] supports variant A (input_file). See 03A_input_file_response.txt") + return 0 + + # 3B) top-level files field + print("[STEP 3B] try top-level files field ...") + payload_b = { + "model": args.model, + "messages": [{"role": "user", "content": args.prompt}], + "files": [{"filename": docx_path.name, "mime_type": DOCX_MIME, "data": docx_b64}], + } + dump_json(outdir / "03B_files_field_request.json", payload_b) + resp_b = http_post_json(chat_url, headers=headers, payload=payload_b, timeout=args.timeout) + safe_write(outdir / "03B_files_field_response.txt", resp_b.text) + print("[STEP 3B] %s" % summarize_response(resp_b)) + if resp_b.status_code == 200: + print("[RESULT] supports variant B (top-level files). See 03B_files_field_response.txt") + return 0 + + # 4) fallback: extract text and send as plain content + print("[STEP 4] fallback: extract docx text and send as plain text ...") + try: + extracted = read_docx_text(docx_path) + except Exception as e: + safe_write(outdir / "04_fallback_extract_error.txt", "%r\n" % (e,)) + print("[RESULT] direct upload A/B failed; fallback extraction failed: %r" % (e,), file=sys.stderr) + return 1 + + extracted_trunc = truncate_text(extracted, args.max_text_chars) + safe_write(outdir / "04_fallback_extracted_text.txt", extracted_trunc) + + payload_c = { + "model": args.model, + "messages": [ + { + "role": "user", + "content": "%s\n\n=== 文档正文(从 docx 提取)===\n%s" % (args.prompt, extracted_trunc), + } + ], + } + dump_json(outdir / "04_fallback_text_request.json", payload_c) + resp_c = http_post_json(chat_url, headers=headers, payload=payload_c, timeout=args.timeout) + safe_write(outdir / "04_fallback_text_response.txt", resp_c.text) + print("[STEP 4] %s" % summarize_response(resp_c)) + if resp_c.status_code == 200: + print("[RESULT] direct upload A/B failed; fallback text mode succeeded.") + return 0 + + print("[RESULT] direct upload A/B failed and fallback failed; inspect output dir: %s" % outdir, file=sys.stderr) + return 1 + + +if __name__ == "__main__": + raise SystemExit(main())