1

2026-04-26 13:39:19 +08:00
parent fa25bfd784
commit a89703ea72
12 changed files with 2154 additions and 25 deletions
--- a/jd/QUICKSTART_UBUNTU.md
+++ b/jd/QUICKSTART_UBUNTU.md
@@ -0,0 +1,132 @@
+# Ubuntu 快速入门指南
+
+## 快速安装（推荐）
+
+使用自动安装脚本：
+
+```bash
+cd ~/project/jdpl  # 进入项目目录
+chmod +x jd/setup_ubuntu.sh
+./jd/setup_ubuntu.sh
+```
+
+脚本会自动：
+1. ✅ 检查并安装 Python3 和依赖
+2. ✅ 检查并安装 Chrome/Chromium
+3. ✅ 安装 Chrome 运行时依赖
+4. ✅ 创建 Python 虚拟环境
+5. ✅ 安装 DrissionPage
+6. ✅ 创建便捷运行脚本
+
+## 手动安装
+
+### 1. 安装 Chrome
+
+```bash
+# Google Chrome (推荐)
+wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
+sudo apt install -y ./google-chrome-stable_current_amd64.deb
+
+# 或 Chromium
+sudo apt install -y chromium-browser
+```
+
+### 2. 安装系统依赖
+
+```bash
+sudo apt update
+sudo apt install -y python3 python3-pip python3-venv \
+    libnss3 libatk-bridge2.0-0 libdrm2 libxkbcommon0 \
+    libxcomposite1 libxdamage1 libxfixes3 libxrandr2 \
+    libgbm1 libasound2
+```
+
+### 3. 创建虚拟环境
+
+```bash
+cd ~/project/jdpl
+python3 -m venv venv
+source venv/bin/activate
+pip install DrissionPage
+deactivate
+```
+
+### 4. 运行脚本
+
+```bash
+# 方式1: 使用便捷脚本（如果运行了 setup_ubuntu.sh）
+./run_logistics.sh
+
+# 方式2: 手动运行
+source venv/bin/activate
+python jd/fetch_logistics_ubuntu.py
+deactivate
+```
+
+## 常见问题
+
+### Q: 遇到 "externally-managed-environment" 错误？
+
+A: 这是 Ubuntu 22.04+ 的保护机制。**必须使用虚拟环境**，不要使用 `--break-system-packages`。
+
+### Q: 虚拟环境在哪里？
+
+A: 在项目目录下的 `venv` 文件夹。每次运行前需要激活：`source venv/bin/activate`
+
+### Q: 找不到 Chrome？
+
+A: 脚本会自动查找，也可以手动安装。常见路径：
+- `/usr/bin/google-chrome`
+- `/usr/bin/chromium-browser`
+
+### Q: 无头模式 vs 有界面模式？
+
+A: 在 `fetch_logistics_ubuntu.py` 中修改：
+```python
+USE_HEADLESS = True   # 无头模式（服务器环境）
+USE_HEADLESS = False  # 有界面模式（需要图形界面）
+```
+
+### Q: 如何修改默认 URL？
+
+A: 编辑 `fetch_logistics_ubuntu.py`，找到：
+```python
+tracking_url = "https://3.cn/2t-Iibig"
+```
+修改为你想要的 URL。
+
+## 验证安装
+
+运行测试：
+
+```bash
+source venv/bin/activate
+python -c "
+from DrissionPage import ChromiumPage, ChromiumOptions
+import os
+chrome_path = '/usr/bin/google-chrome'
+if not os.path.exists(chrome_path):
+    chrome_path = '/usr/bin/chromium-browser'
+options = ChromiumOptions()
+options.set_browser_path(chrome_path)
+options.headless(True)
+page = ChromiumPage(options)
+page.get('https://www.baidu.com')
+print('✅ 测试成功！')
+page.quit()
+"
+deactivate
+```
+
+## 项目结构
+
+```
+jdpl/
+├── jd/
+│   ├── fetch_logistics_ubuntu.py  # Ubuntu 主脚本
+│   ├── setup_ubuntu.sh            # 自动安装脚本
+│   └── UBUNTU_SETUP.md            # 详细文档
+├── venv/                          # Python 虚拟环境（运行脚本后创建）
+└── run_logistics.sh               # 便捷运行脚本（运行 setup 后创建）
+```
+
--- a/jd/UBUNTU_SETUP.md
+++ b/jd/UBUNTU_SETUP.md
@@ -0,0 +1,309 @@
+# Ubuntu 环境设置指南
+
+## 1. 安装 Google Chrome 或 Chromium
+
+### 方式一：安装 Google Chrome（推荐）
+
+```bash
+# 下载并安装 Google Chrome
+wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
+sudo apt install -y ./google-chrome-stable_current_amd64.deb
+```
+
+### 方式二：安装 Chromium（开源版本）
+
+```bash
+sudo apt update
+sudo apt install -y chromium-browser
+```
+
+## 2. 安装必要的依赖库
+
+```bash
+# 更新包列表
+sudo apt update
+
+# 安装 Chrome/Chromium 运行时依赖
+sudo apt install -y \
+    libnss3 \
+    libatk-bridge2.0-0 \
+    libdrm2 \
+    libxkbcommon0 \
+    libxcomposite1 \
+    libxdamage1 \
+    libxfixes3 \
+    libxrandr2 \
+    libgbm1 \
+    libasound2 \
+    libpango-1.0-0 \
+    libcairo2 \
+    libatk1.0-0 \
+    libgdk-pixbuf2.0-0 \
+    libgtk-3-0
+```
+
+## 3. 安装 Python 依赖
+
+⚠️ **注意**: Ubuntu 22.04+ 默认不允许直接使用 pip 安装系统级包。请使用以下方法之一：
+
+### 方法一：使用虚拟环境（推荐）✅
+
+```bash
+# 1. 确保已安装 python3-venv
+sudo apt install -y python3-venv python3-pip
+
+# 2. 进入项目目录
+cd ~/project/jdpl  # 或者你的项目路径
+
+# 3. 创建虚拟环境
+python3 -m venv venv
+
+# 4. 激活虚拟环境
+source venv/bin/activate
+
+# 5. 安装依赖
+pip install DrissionPage
+
+# 如果使用数据库
+pip install sqlalchemy pymysql
+
+# 6. 运行脚本（需要在虚拟环境中）
+python jd/fetch_logistics_ubuntu.py
+
+# 7. 退出虚拟环境（不需要时）
+deactivate
+```
+
+### 方法二：使用 pipx（适合单命令工具）
+
+```bash
+# 1. 安装 pipx
+sudo apt install -y pipx
+pipx ensurepath
+
+# 2. 使用 pipx 安装（如果要全局可用）
+# 注意：pipx 主要用于安装应用程序，不太适合库
+```
+
+### 方法三：使用 --break-system-packages（不推荐，但快速）
+
+```bash
+# ⚠️ 警告：可能破坏系统 Python 环境，不推荐在生产环境使用
+
+# 安装 DrissionPage
+pip3 install --break-system-packages DrissionPage
+
+# 如果使用数据库
+pip3 install --break-system-packages sqlalchemy pymysql
+```
+
+### 方法四：使用 apt 安装（如果可用）
+
+```bash
+# 某些包可能通过 apt 安装（但 DrissionPage 通常不行）
+sudo apt install -y python3-drissionpage  # 通常不可用
+```
+
+**推荐使用方法一（虚拟环境）**，这是最安全和标准的做法。
+
+## 4. 配置说明
+
+### 无头模式（Headless）vs 有界面模式
+
+在 `fetch_logistics_ubuntu.py` 文件中，可以设置 `USE_HEADLESS` 变量：
+
+```python
+USE_HEADLESS = True   # 无头模式，适合服务器环境，不显示浏览器窗口
+USE_HEADLESS = False  # 有界面模式，需要图形界面支持
+```
+
+### 无头模式使用场景：
+- 服务器环境（无桌面环境）
+- SSH 远程连接
+- Docker 容器
+- 需要后台运行
+
+### 有界面模式使用场景：
+- 本地 Ubuntu 桌面环境
+- 需要调试和查看浏览器行为
+- 有 X11 或 Wayland 显示服务器
+
+## 5. 运行脚本
+
+```bash
+# 进入脚本目录
+cd /path/to/jdpl/jd
+
+# 运行脚本
+python3 fetch_logistics_ubuntu.py
+```
+
+## 6. 如果遇到问题
+
+### 问题1: 找不到 Chrome/Chromium
+
+```bash
+# 检查是否安装
+which google-chrome
+which chromium-browser
+
+# 如果找不到，检查常见路径
+ls -la /usr/bin/google-chrome*
+ls -la /usr/bin/chromium*
+```
+
+### 问题2: 权限问题
+
+```bash
+# 如果提示权限不足，可能需要添加 --no-sandbox 参数
+# 脚本中已经自动添加了这个参数
+```
+
+### 问题3: 无头模式无法使用
+
+如果设置 `USE_HEADLESS = False` 但仍然无法显示，可能需要：
+
+```bash
+# 检查 DISPLAY 环境变量
+echo $DISPLAY
+
+# 如果为空，设置显示（如果是本地桌面）
+export DISPLAY=:0
+
+# 或者使用 Xvfb（虚拟显示）
+sudo apt install -y xvfb
+xvfb-run -a python3 fetch_logistics_ubuntu.py
+```
+
+### 问题4: 缺少共享内存
+
+如果看到 `/dev/shm` 相关错误：
+
+```bash
+# 检查 /dev/shm 大小
+df -h /dev/shm
+
+# 如果太小，可以挂载更大的空间（临时）
+sudo mount -o remount,size=2G /dev/shm
+```
+
+### 问题5: 依赖库缺失
+
+如果运行时提示缺少某些库：
+
+```bash
+# 安装所有可能的依赖
+sudo apt install -y \
+    fonts-liberation \
+    libappindicator3-1 \
+    libasound2 \
+    libatk-bridge2.0-0 \
+    libatk1.0-0 \
+    libcairo2 \
+    libcups2 \
+    libdbus-1-3 \
+    libexpat1 \
+    libfontconfig1 \
+    libgbm1 \
+    libgcc1 \
+    libglib2.0-0 \
+    libgtk-3-0 \
+    libnspr4 \
+    libnss3 \
+    libpango-1.0-0 \
+    libpangocairo-1.0-0 \
+    libstdc++6 \
+    libx11-6 \
+    libx11-xcb1 \
+    libxcb1 \
+    libxcomposite1 \
+    libxcursor1 \
+    libxdamage1 \
+    libxext6 \
+    libxfixes3 \
+    libxi6 \
+    libxrandr2 \
+    libxrender1 \
+    libxss1 \
+    libxtst6 \
+    lsb-release \
+    wget \
+    xdg-utils
+```
+
+## 7. Docker 环境（可选）
+
+如果需要：
+
+```dockerfile
+FROM ubuntu:22.04
+
+RUN apt-get update && apt-get install -y \
+    python3 \
+    python3-pip \
+    wget \
+    && wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb \
+    && apt-get install -y ./google-chrome-stable_current_amd64.deb \
+    && pip3 install DrissionPage
+
+WORKDIR /app
+COPY jd/fetch_logistics_ubuntu.py .
+CMD ["python3", "fetch_logistics_ubuntu.py"]
+```
+
+## 8. 验证安装
+
+### 如果使用虚拟环境：
+
+```bash
+# 激活虚拟环境
+source venv/bin/activate
+
+# 运行测试
+python -c "
+from DrissionPage import ChromiumPage, ChromiumOptions
+import os
+chrome_path = '/usr/bin/google-chrome'
+if not os.path.exists(chrome_path):
+    chrome_path = '/usr/bin/chromium-browser'
+options = ChromiumOptions()
+options.set_browser_path(chrome_path)
+options.headless(True)
+page = ChromiumPage(options)
+page.get('https://www.baidu.com')
+print('✅ 浏览器测试成功！')
+page.quit()
+"
+```
+
+### 如果使用系统级安装（--break-system-packages）：
+
+```bash
+python3 -c "
+from DrissionPage import ChromiumPage, ChromiumOptions
+import os
+chrome_path = '/usr/bin/google-chrome'
+if not os.path.exists(chrome_path):
+    chrome_path = '/usr/bin/chromium-browser'
+options = ChromiumOptions()
+options.set_browser_path(chrome_path)
+options.headless(True)
+page = ChromiumPage(options)
+page.get('https://www.baidu.com')
+print('✅ 浏览器测试成功！')
+page.quit()
+"
+```
+
+## 常见路径总结
+
+Chrome/Chromium 在 Ubuntu 上的常见路径：
+- `/usr/bin/google-chrome` - Google Chrome
+- `/usr/bin/google-chrome-stable` - Google Chrome (稳定版)
+- `/usr/bin/chromium-browser` - Chromium
+- `/usr/bin/chromium` - Chromium (简化名)
+- `/snap/bin/chromium` - Snap 安装的 Chromium
+- `/opt/google/chrome/chrome` - 某些安装方式的路径
+
+脚本会自动检测这些路径。
+
--- a/jd/UBUNTU_TERMINAL_DRAG_DROP.md
+++ b/jd/UBUNTU_TERMINAL_DRAG_DROP.md
@@ -0,0 +1,238 @@
+# Ubuntu 终端拖拽文件设置指南
+
+## 方法一：GNOME Terminal（默认终端）
+
+GNOME Terminal **默认支持**拖拽文件功能！
+
+### 使用方式：
+1. 打开终端
+2. 从文件管理器（Nautilus）直接拖拽文件到终端
+3. 文件路径会自动插入到光标位置
+
+### 如果拖拽不工作，检查以下设置：
+
+#### 1. 确认使用的是 GNOME Terminal
+```bash
+# 查看当前终端
+echo $TERM
+# 或
+ps -p $PPID -o comm=
+```
+
+#### 2. 检查终端偏好设置
+- 打开 GNOME Terminal
+- 点击菜单：`编辑` → `首选项` → `常规`
+- 确保已启用相关选项
+
+#### 3. 使用快捷键代替
+如果拖拽不工作，可以：
+- 右键点击终端 → `粘贴文件名`（某些版本支持）
+- 或者使用命令：`cat <拖拽文件到此处>`
+
+## 方法二：其他终端应用
+
+### Tilix（平铺终端）
+```bash
+# 安装
+sudo apt install -y tilix
+
+# Tilix 默认支持拖拽文件
+```
+
+### Konsole（KDE 终端）
+```bash
+# 安装
+sudo apt install -y konsole
+
+# Konsole 支持拖拽文件
+```
+
+### Alacritty
+```bash
+# 安装
+sudo apt install -y alacritty
+
+# 可能需要配置，默认可能不支持拖拽
+```
+
+### Terminator
+```bash
+# 安装
+sudo apt install -y terminator
+
+# 支持拖拽文件功能
+```
+
+## 方法三：使用文件选择对话框
+
+如果拖拽不工作，可以使用交互式文件选择：
+
+### 在脚本中使用文件选择器
+```bash
+# 使用 zenity（GNOME 文件选择器）
+FILE=$(zenity --file-selection --title="选择文件")
+echo "选择的文件: $FILE"
+
+# 或使用 kdialog（KDE 文件选择器）
+FILE=$(kdialog --getopenfilename)
+
+# 在 Python 中也可以使用
+# python -c "from tkinter.filedialog import askopenfilename; print(askopenfilename())"
+```
+
+## 方法四：使用剪贴板
+
+### 在文件管理器中复制文件路径
+1. 在文件管理器中右键文件
+2. 选择"复制"或按 `Ctrl+C`
+3. 在终端中粘贴：`Ctrl+Shift+V`（或鼠标中键）
+
+### 复制完整路径到剪贴板
+```bash
+# 在文件管理器中
+# 右键 → 属性 → 位置（复制完整路径）
+
+# 或使用命令获取路径
+realpath filename.txt | xclip -selection clipboard
+```
+
+## 方法五：配置终端别名/函数
+
+创建一个便捷函数：
+
+```bash
+# 添加到 ~/.bashrc 或 ~/.zshrc
+file_path() {
+    if [ $# -eq 0 ]; then
+        # 如果没有参数，使用文件选择器
+        FILE=$(zenity --file-selection --title="选择文件")
+        if [ -n "$FILE" ]; then
+            echo "$FILE"
+        fi
+    else
+        # 如果有参数，直接输出
+        echo "$1"
+    fi
+}
+
+# 使用方法
+# file_path              # 会弹出文件选择对话框
+# file_path ~/test.txt   # 直接输出路径
+```
+
+## 方法六：使用 Tab 补全
+
+Ubuntu 终端默认支持 Tab 补全：
+1. 输入部分路径，如：`~/proj`
+2. 按 `Tab` 键自动补全
+3. 如果有多个匹配，按 `Tab` 两次显示所有选项
+
+## 检查拖拽功能是否正常
+
+### 测试步骤：
+1. 打开 GNOME Terminal
+2. 打开文件管理器（Nautilus）
+3. 找到一个文件（如 `test.txt`）
+4. 拖拽文件到终端窗口
+5. 应该看到文件路径自动输入
+
+### 如果拖拽不工作：
+
+#### 1. 检查桌面环境
+```bash
+echo $XDG_CURRENT_DESKTOP
+# 应该显示 GNOME 或 Ubuntu
+```
+
+#### 2. 重启终端
+```bash
+# 完全关闭所有终端窗口，重新打开
+```
+
+#### 3. 更新系统
+```bash
+sudo apt update
+sudo apt upgrade -y
+```
+
+#### 4. 检查文件管理器
+确保使用的是 Nautilus（GNOME 文件管理器）：
+```bash
+# 查看文件管理器进程
+ps aux | grep nautilus
+```
+
+## 替代方案：在代码中直接支持拖拽
+
+如果你在开发应用，可以让应用支持拖拽：
+
+### Python + Tkinter 示例
+```python
+import tkinter as tk
+from tkinter import filedialog
+
+def select_file():
+    root = tk.Tk()
+    root.withdraw()  # 隐藏主窗口
+    file_path = filedialog.askopenfilename()
+    root.destroy()
+    return file_path if file_path else None
+
+# 使用
+path = select_file()
+print(f"选择的文件: {path}")
+```
+
+### Bash 脚本 + 文件选择器
+```bash
+#!/bin/bash
+FILE=$(zenity --file-selection --title="选择物流链接文件")
+if [ -n "$FILE" ]; then
+    echo "处理文件: $FILE"
+    # 你的处理逻辑
+fi
+```
+
+## 快速参考
+
+| 操作 | 方法 |
+|------|------|
+| 拖拽文件 | 直接从文件管理器拖到终端（GNOME Terminal 默认支持） |
+| 复制路径 | `Ctrl+C` → `Ctrl+Shift+V` |
+| 文件选择器 | `zenity --file-selection` |
+| Tab 补全 | 输入路径时按 `Tab` |
+| 粘贴文件名 | 某些终端支持右键菜单 |
+
+## 常见问题
+
+### Q: 拖拽后没有反应？
+A: 
+1. 确认使用的是 GNOME Terminal
+2. 尝试重启终端
+3. 检查是否有权限问题
+
+### Q: 拖拽显示的是文件内容而不是路径？
+A: 某些终端可能需要按住 `Shift` 或 `Ctrl` 键拖拽才会插入路径
+
+### Q: 如何在 SSH 远程终端中拖拽？
+A: SSH 远程终端通常不支持拖拽，可以使用：
+- `scp` 命令上传文件
+- 使用 `cat << EOF` 手动输入
+- 使用 SFTP 客户端
+
+## 推荐工作流
+
+对于你的物流提取脚本，建议：
+
+```bash
+# 方法1: 直接拖拽 URL 或文件到终端
+# 拖拽包含 URL 的文件到终端，路径会自动出现
+python jd/fetch_logistics_ubuntu.py <拖拽文件>
+
+# 方法2: 使用参数
+python jd/fetch_logistics_ubuntu.py https://3.cn/2t-Iibig
+
+# 方法3: 修改脚本支持交互式输入
+# 在脚本中添加文件选择功能
+```
+
--- a/jd/fetch_logistics.py
+++ b/jd/fetch_logistics.py
@@ -0,0 +1,382 @@
+import time
+import json
+import re
+from DrissionPage import ChromiumPage, ChromiumOptions
+
+# 设置浏览器路径
+CHROME_PATH = r'C:\Program Files\Google\Chrome\Application\chrome.exe'
+
+# 全局浏览器实例
+global_page = None
+
+def get_global_browser():
+    """获取全局浏览器实例"""
+    global global_page
+    if global_page is None:
+        print("正在初始化浏览器...")
+        print(f"浏览器路径: {CHROME_PATH}")
+        
+        # 导入 os 检查文件是否存在
+        import os
+        if not os.path.exists(CHROME_PATH):
+            raise FileNotFoundError(f"找不到 Chrome 浏览器，路径: {CHROME_PATH}")
+        
+        options = ChromiumOptions()
+        options.set_browser_path(CHROME_PATH)
+        
+        # DrissionPage 默认应该是有界面的浏览器
+        # 参考 jd.py 和 tb.py 的实现，直接创建即可
+        # 如果需要最大化窗口，可以尝试添加参数（可选）
+        try:
+            options.set_argument('--start-maximized')
+        except:
+            pass  # 如果设置失败就忽略，不影响浏览器启动
+        
+        print("正在启动浏览器，请稍候...")
+        print("如果浏览器没有自动打开，请检查 Chrome 是否正确安装")
+        
+        try:
+            global_page = ChromiumPage(options)
+            print("✅ 浏览器已成功启动！")
+            print(f"当前页面 URL: {global_page.url}")
+            # 等待浏览器完全启动
+            time.sleep(2)
+        except Exception as e:
+            print(f"❌ 浏览器启动失败: {e}")
+            import traceback
+            traceback.print_exc()
+            raise
+    else:
+        print("使用已存在的浏览器实例")
+    return global_page
+
+
+def extract_logistics_info(tracking_url):
+    """
+    从京东物流追踪页面提取运单号、承运人等信息
+    
+    Args:
+        tracking_url: 物流追踪页面 URL，例如 https://3.cn/2t-Iibig
+    
+    Returns:
+        dict: 包含运单号、承运人、承运人电话、物流跟踪信息等的字典
+    """
+    page = get_global_browser()
+    
+    try:
+        print(f"\n正在打开物流追踪页面: {tracking_url}")
+        page.get(tracking_url)
+        print("页面加载中，请稍候...")
+        time.sleep(5)  # 等待页面加载
+        
+        # 检查页面是否成功加载
+        current_url = page.url
+        print(f"当前页面 URL: {current_url}")
+        
+        # 检查页面标题
+        try:
+            title = page.title
+            print(f"页面标题: {title}")
+        except:
+            print("无法获取页面标题")
+        
+        # 检查页面是否有内容
+        try:
+            html_length = len(page.html)
+            print(f"页面 HTML 长度: {html_length} 字符")
+            if html_length < 100:
+                print("⚠️ 警告: 页面内容可能未完全加载")
+        except Exception as e:
+            print(f"⚠️ 无法获取页面 HTML: {e}")
+        
+        result = {
+            "waybill_no": None,           # 运单号
+            "carrier": None,              # 国内承运人
+            "carrier_phone": None,         # 国内承运人电话
+            "tracking_info": [],          # 物流跟踪信息列表
+            "raw_html": None              # 原始 HTML（用于调试）
+        }
+        
+        # 方法1: 监听网络请求，查找物流数据 API
+        print("方法1: 监听网络请求...")
+        page.listen.start()
+        
+        # 滚动页面触发可能的请求
+        page.scroll.down(300)
+        time.sleep(2)
+        page.scroll.to_bottom()
+        time.sleep(3)
+        
+        # 检查监听到的请求
+        responses = page.listen.get()
+        print(f"监听到 {len(responses)} 个请求")
+        
+        # 查找可能的物流数据接口
+        possible_urls = [
+            'track', 'logistics', 'waybill', 'express', 
+            'delivery', '3.cn', 'jd.com/logistics',
+            'api.m.jd.com', 'mapi.jd.com'
+        ]
+        
+        for resp in responses:
+            url = resp.url if hasattr(resp, 'url') else ''
+            url_lower = url.lower()
+            
+            # 检查是否可能是物流相关的 API
+            if any(keyword in url_lower for keyword in possible_urls):
+                print(f"发现可能的物流 API: {url[:100]}")
+                try:
+                    if hasattr(resp, 'response') and hasattr(resp.response, 'body'):
+                        body = resp.response.body
+                        
+                        # 处理 JSON 响应
+                        if isinstance(body, dict):
+                            json_data = body
+                        elif isinstance(body, str):
+                            try:
+                                json_data = json.loads(body)
+                            except:
+                                continue
+                        else:
+                            continue
+                        
+                        # 尝试从 JSON 中提取运单号等信息
+                        extracted = extract_from_json(json_data)
+                        if extracted:
+                            result.update(extracted)
+                            print("成功从 API 响应中提取数据")
+                            return result
+                except Exception as e:
+                    print(f"解析 API 响应时出错: {e}")
+        
+        # 方法2: 从页面 HTML/DOM 中提取
+        print("\n方法2: 从页面 DOM 提取数据...")
+        
+        html = page.html
+        result['raw_html'] = html[:5000]  # 保存部分 HTML 用于调试
+        
+        # 从 HTML 文本中提取运单号
+        waybill_patterns = [
+            r'运单号[：:\s]*(\d+)',
+            r'waybill[_\s]*no["\']?\s*[:：]\s*["\']?(\d+)',
+            r'tracking[_\s]*number["\']?\s*[:：]\s*["\']?(\d+)',
+            r'"waybillNo"\s*[:：]\s*["\']?(\d+)',
+            r'"trackingNumber"\s*[:：]\s*["\']?(\d+)',
+        ]
+        
+        for pattern in waybill_patterns:
+            matches = re.findall(pattern, html, re.IGNORECASE)
+            if matches:
+                result['waybill_no'] = matches[0]
+                print(f"找到运单号: {result['waybill_no']}")
+                break
+        
+        # 提取承运人
+        carrier_patterns = [
+            r'国内承运人[：:\s]*([^\s<，,]+)',
+            r'carrier[：:\s]*([^\s<，,]+)',
+            r'"carrier"\s*[:：]\s*["\']?([^"\']+)',
+        ]
+        
+        for pattern in carrier_patterns:
+            matches = re.findall(pattern, html, re.IGNORECASE)
+            if matches:
+                result['carrier'] = matches[0].strip()
+                print(f"找到承运人: {result['carrier']}")
+                break
+        
+        # 提取承运人电话
+        phone_patterns = [
+            r'国内承运人电话[：:\s]*(\d+)',
+            r'carrier[_\s]*phone[：:\s]*(\d+)',
+            r'"carrierPhone"\s*[:：]\s*["\']?(\d+)',
+        ]
+        
+        for pattern in phone_patterns:
+            matches = re.findall(pattern, html, re.IGNORECASE)
+            if matches:
+                result['carrier_phone'] = matches[0]
+                print(f"找到承运人电话: {result['carrier_phone']}")
+                break
+        
+        # 方法3: 从 DOM 元素中提取
+        print("\n方法3: 从 DOM 元素提取数据...")
+        
+        # 尝试查找运单号元素
+        waybill_elements = page.eles('xpath=//*[contains(text(), "运单号") or contains(text(), "运单")]')
+        for elem in waybill_elements:
+            text = elem.text
+            parent_text = elem.parent().text if elem.parent() else ""
+            full_text = text + " " + parent_text
+            
+            # 从文本中提取数字作为运单号
+            numbers = re.findall(r'\d{8,}', full_text)
+            if numbers and not result['waybill_no']:
+                result['waybill_no'] = numbers[0]
+                print(f"从元素文本中找到运单号: {result['waybill_no']}")
+            
+            # 提取承运人
+            if '承运人' in text and not result['carrier']:
+                carrier_match = re.search(r'承运人[：:\s]*([^\s<，,]+)', full_text)
+                if carrier_match:
+                    result['carrier'] = carrier_match.group(1).strip()
+                    print(f"从元素文本中找到承运人: {result['carrier']}")
+            
+            # 提取电话
+            if '电话' in text and not result['carrier_phone']:
+                phone_match = re.search(r'电话[：:\s]*(\d+)', full_text)
+                if phone_match:
+                    result['carrier_phone'] = phone_match.group(1)
+                    print(f"从元素文本中找到电话: {result['carrier_phone']}")
+        
+        # 提取物流跟踪信息（时间线）
+        print("\n提取物流跟踪信息...")
+        tracking_elements = page.eles('xpath=//*[contains(@class, "track") or contains(@class, "logistics") or contains(@class, "timeline")]')
+        
+        if not tracking_elements:
+            # 尝试查找包含时间戳的元素
+            tracking_elements = page.eles('xpath=//*[contains(text(), "2025") or contains(text(), "货物") or contains(text(), "到达")]')
+        
+        tracking_info = []
+        for elem in tracking_elements[:20]:  # 限制数量
+            text = elem.text
+            if text and len(text) > 5:
+                # 尝试提取时间戳
+                time_match = re.search(r'(\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2})', text)
+                if time_match or any(keyword in text for keyword in ['货物', '到达', '揽收', '运输', '配送', '签收']):
+                    tracking_info.append({
+                        'text': text.strip(),
+                        'time': time_match.group(1) if time_match else None
+                    })
+        
+        result['tracking_info'] = tracking_info[:10]  # 最多保存10条
+        
+        return result
+        
+    except Exception as e:
+        print(f"提取物流信息时出错: {e}")
+        import traceback
+        traceback.print_exc()
+        return None
+
+
+def extract_from_json(json_data):
+    """
+    从 JSON 数据中提取物流信息
+    
+    Args:
+        json_data: JSON 字典
+    
+    Returns:
+        dict: 提取到的物流信息
+    """
+    result = {}
+    
+    def search_dict(d, key_patterns):
+        """递归搜索字典中的值"""
+        if isinstance(d, dict):
+            for k, v in d.items():
+                # 检查键名
+                for pattern in key_patterns:
+                    if re.search(pattern, k, re.IGNORECASE):
+                        return v
+                # 递归搜索值
+                if isinstance(v, (dict, list)):
+                    found = search_dict(v, key_patterns)
+                    if found:
+                        return found
+        elif isinstance(d, list):
+            for item in d:
+                found = search_dict(item, key_patterns)
+                if found:
+                    return found
+        return None
+    
+    # 搜索运单号
+    waybill = search_dict(json_data, [r'waybill', r'tracking.*number', r'运单号', r'waybillNo'])
+    if waybill:
+        result['waybill_no'] = str(waybill)
+    
+    # 搜索承运人
+    carrier = search_dict(json_data, [r'carrier', r'承运人', r'carrierName'])
+    if carrier:
+        result['carrier'] = str(carrier)
+    
+    # 搜索承运人电话
+    phone = search_dict(json_data, [r'carrier.*phone', r'承运人电话', r'carrierPhone', r'phone'])
+    if phone:
+        result['carrier_phone'] = str(phone)
+    
+    # 搜索物流跟踪信息
+    tracking = search_dict(json_data, [r'track', r'logistics', r'物流', r'轨迹', r'history'])
+    if tracking:
+        if isinstance(tracking, list):
+            result['tracking_info'] = tracking
+        elif isinstance(tracking, dict):
+            result['tracking_info'] = [tracking]
+    
+    return result if result else None
+
+
+def print_result(result):
+    """打印提取结果"""
+    if not result:
+        print("未能提取到物流信息")
+        return
+    
+    print("\n" + "="*50)
+    print("物流信息提取结果:")
+    print("="*50)
+    print(f"运单号: {result.get('waybill_no', '未找到')}")
+    print(f"国内承运人: {result.get('carrier', '未找到')}")
+    print(f"国内承运人电话: {result.get('carrier_phone', '未找到')}")
+    
+    if result.get('tracking_info'):
+        print(f"\n物流跟踪信息 (共 {len(result['tracking_info'])} 条):")
+        for idx, info in enumerate(result['tracking_info'], 1):
+            if isinstance(info, dict):
+                text = info.get('text', str(info))
+                time_str = info.get('time', '')
+                print(f"  {idx}. {text}")
+                if time_str:
+                    print(f"     时间: {time_str}")
+            else:
+                print(f"  {idx}. {info}")
+    else:
+        print("\n物流跟踪信息: 未找到")
+    
+    print("="*50)
+
+
+# 主程序
+if __name__ == '__main__':
+    # 测试 URL
+    tracking_url = "https://3.cn/2t-Iibig"
+    
+    print("="*60)
+    print("京东物流信息提取工具")
+    print("="*60)
+    print(f"目标 URL: {tracking_url}")
+    print("开始提取物流信息...\n")
+    
+    try:
+        result = extract_logistics_info(tracking_url)
+    except Exception as e:
+        print(f"\n❌ 执行过程中出错: {e}")
+        import traceback
+        traceback.print_exc()
+        result = None
+    
+    if result:
+        print_result(result)
+        
+        # 保存结果到文件
+        output_file = "logistics_result.json"
+        with open(output_file, 'w', encoding='utf-8') as f:
+            json.dump(result, f, ensure_ascii=False, indent=2)
+        print(f"\n结果已保存到: {output_file}")
+    else:
+        print("提取失败")
+    
+    print("\n脚本执行完成，浏览器保持打开状态用于调试")
+
--- a/jd/fetch_logistics_ubuntu.py
+++ b/jd/fetch_logistics_ubuntu.py
@@ -0,0 +1,454 @@
+import time
+import json
+import re
+import os
+import platform
+import threading
+from flask import Flask, request, jsonify
+from DrissionPage import ChromiumPage, ChromiumOptions
+
+# Ubuntu 上常见的 Chrome/Chromium 路径
+UBUNTU_CHROME_PATHS = [
+    '/usr/bin/google-chrome',
+    '/usr/bin/google-chrome-stable',
+    '/usr/bin/chromium-browser',
+    '/usr/bin/chromium',
+    '/snap/bin/chromium',
+    '/opt/google/chrome/chrome',
+]
+
+# 是否使用无头模式（headless）
+# True: 无界面模式，适合服务器环境
+# False: 有界面模式，需要 X11 或 Wayland
+USE_HEADLESS = True  # 可以根据需要修改
+
+# 全局浏览器实例
+global_page = None
+
+
+def find_chrome_path():
+    """自动查找 Ubuntu 系统中的 Chrome/Chromium 路径"""
+    print("正在查找 Chrome/Chromium 浏览器...")
+    
+    # 首先尝试常见的路径
+    for path in UBUNTU_CHROME_PATHS:
+        if os.path.exists(path):
+            print(f"✅ 找到浏览器: {path}")
+            return path
+    
+    # 尝试使用 which 命令查找
+    import subprocess
+    try:
+        result = subprocess.run(['which', 'google-chrome'], 
+                               capture_output=True, text=True, timeout=5)
+        if result.returncode == 0 and os.path.exists(result.stdout.strip()):
+            path = result.stdout.strip()
+            print(f"✅ 通过 which 找到浏览器: {path}")
+            return path
+    except:
+        pass
+    
+    try:
+        result = subprocess.run(['which', 'chromium-browser'], 
+                               capture_output=True, text=True, timeout=5)
+        if result.returncode == 0 and os.path.exists(result.stdout.strip()):
+            path = result.stdout.strip()
+            print(f"✅ 通过 which 找到浏览器: {path}")
+            return path
+    except:
+        pass
+    
+    # 如果都找不到，返回最常见的路径
+    default_path = '/usr/bin/google-chrome'
+    print(f"⚠️ 未找到浏览器，将使用默认路径: {default_path}")
+    print("请确保已安装 Google Chrome 或 Chromium:")
+    print("  sudo apt update")
+    print("  sudo apt install -y google-chrome-stable")
+    print("  或者")
+    print("  sudo apt install -y chromium-browser")
+    return default_path
+
+
+def get_global_browser():
+    """获取全局浏览器实例（Ubuntu 版本）"""
+    global global_page
+    if global_page is None:
+        print("="*60)
+        print("Ubuntu 浏览器初始化")
+        print("="*60)
+        
+        # 检查操作系统
+        if platform.system() != 'Linux':
+            print(f"⚠️ 警告: 当前系统是 {platform.system()}，此脚本专为 Ubuntu 设计")
+        
+        # 查找 Chrome 路径
+        chrome_path = find_chrome_path()
+        
+        options = ChromiumOptions()
+        options.set_browser_path(chrome_path)
+        
+        # Ubuntu 服务器环境通常使用无头模式
+        if USE_HEADLESS:
+            print("配置为无头模式（headless）...")
+            try:
+                options.headless(True)
+            except:
+                # 如果 headless 方法不存在，使用参数
+                try:
+                    options.set_argument('--headless=new')
+                    options.set_argument('--no-sandbox')
+                    options.set_argument('--disable-dev-shm-usage')
+                except:
+                    pass
+        else:
+            print("配置为有界面模式...")
+            # 检查是否有显示环境
+            display = os.environ.get('DISPLAY')
+            if not display:
+                print("⚠️ 警告: 未检测到 DISPLAY 环境变量")
+                print("如果无法显示浏览器，请:")
+                print("  1. 设置 USE_HEADLESS = True")
+                print("  2. 或者设置 DISPLAY 环境变量（如 DISPLAY=:0）")
+                print("  3. 或者使用 Xvfb（虚拟显示）")
+        
+        # Linux 特定参数
+        try:
+            options.set_argument('--no-sandbox')  # 在某些环境下需要
+            options.set_argument('--disable-dev-shm-usage')  # 避免 /dev/shm 空间不足
+            options.set_argument('--disable-gpu')  # 禁用 GPU（可选，在 headless 模式下有用）
+        except:
+            pass
+        
+        print(f"正在启动浏览器...")
+        print(f"浏览器路径: {chrome_path}")
+        if USE_HEADLESS:
+            print("模式: 无头模式（后台运行）")
+        else:
+            print("模式: 有界面模式")
+        
+        try:
+            global_page = ChromiumPage(options)
+            print("✅ 浏览器已成功启动！")
+            time.sleep(2)  # 等待浏览器完全启动
+        except Exception as e:
+            print(f"❌ 浏览器启动失败: {e}")
+            print("\n可能的解决方案:")
+            print("1. 确保已安装 Chrome/Chromium:")
+            print("   sudo apt update")
+            print("   sudo apt install -y google-chrome-stable")
+            print("2. 如果使用无头模式失败，尝试设置 USE_HEADLESS = False")
+            print("3. 确保有足够的权限")
+            print("4. 检查是否缺少依赖:")
+            print("   sudo apt install -y libnss3 libatk-bridge2.0-0 libdrm2 libxkbcommon0 libxcomposite1 libxdamage1 libxfixes3 libxrandr2 libgbm1 libasound2")
+            import traceback
+            traceback.print_exc()
+            raise
+    else:
+        print("使用已存在的浏览器实例")
+    
+    return global_page
+
+
+def extract_logistics_info(tracking_url):
+    """
+    从京东物流追踪页面提取运单号、承运人等信息（Ubuntu 版本）
+    
+    Args:
+        tracking_url: 物流追踪页面 URL，例如 https://3.cn/2t-Iibig
+    
+    Returns:
+        dict: 包含运单号、承运人、承运人电话、物流跟踪信息等的字典
+    """
+    page = get_global_browser()
+    
+    try:
+        print(f"\n正在打开物流追踪页面: {tracking_url}")
+        page.get(tracking_url)
+        print("页面加载中，请稍候...")
+        time.sleep(5)  # 等待页面加载
+        
+        # 检查页面是否成功加载
+        current_url = page.url
+        print(f"当前页面 URL: {current_url}")
+        
+        # 检查页面标题
+        try:
+            title = page.title
+            print(f"页面标题: {title}")
+        except:
+            print("无法获取页面标题")
+        
+        # 检查页面是否有内容
+        try:
+            html_length = len(page.html)
+            print(f"页面 HTML 长度: {html_length} 字符")
+            if html_length < 100:
+                print("⚠️ 警告: 页面内容可能未完全加载")
+        except Exception as e:
+            print(f"⚠️ 无法获取页面 HTML: {e}")
+        
+        result = {
+            "waybill_no": None,           # 运单号
+            "carrier": None,              # 国内承运人
+            "carrier_phone": None,         # 国内承运人电话
+            "tracking_info": [],          # 物流跟踪信息列表
+        }
+        
+        # 从 DOM 元素中提取数据
+        print("\n从 DOM 元素提取数据...")
+        
+        # 尝试查找运单号元素
+        waybill_elements = page.eles('xpath=//*[contains(text(), "运单号") or contains(text(), "运单")]')
+        for elem in waybill_elements:
+            text = elem.text
+            parent_text = elem.parent().text if elem.parent() else ""
+            full_text = text + " " + parent_text
+            
+            # 从文本中提取数字作为运单号
+            numbers = re.findall(r'\d{8,}', full_text)
+            if numbers and not result['waybill_no']:
+                result['waybill_no'] = numbers[0]
+                print(f"✅ 找到运单号: {result['waybill_no']}")
+            
+            # 提取承运人
+            if '承运人' in text and not result['carrier']:
+                carrier_match = re.search(r'承运人[：:\s]*([^\s<，,]+)', full_text)
+                if carrier_match:
+                    result['carrier'] = carrier_match.group(1).strip()
+                    print(f"✅ 找到承运人: {result['carrier']}")
+            
+            # 提取电话
+            if '电话' in text and not result['carrier_phone']:
+                phone_match = re.search(r'电话[：:\s]*(\d+)', full_text)
+                if phone_match:
+                    result['carrier_phone'] = phone_match.group(1)
+                    print(f"✅ 找到承运人电话: {result['carrier_phone']}")
+        
+        # 提取物流跟踪信息（时间线）
+        print("\n提取物流跟踪信息...")
+        tracking_elements = page.eles('xpath=//*[contains(@class, "track") or contains(@class, "logistics") or contains(@class, "timeline")]')
+        
+        if not tracking_elements:
+            # 尝试查找包含时间戳的元素
+            tracking_elements = page.eles('xpath=//*[contains(text(), "2025") or contains(text(), "货物") or contains(text(), "到达")]')
+        
+        tracking_info = []
+        for elem in tracking_elements[:20]:  # 限制数量
+            text = elem.text
+            if text and len(text) > 5:
+                # 尝试提取时间戳
+                time_match = re.search(r'(\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2})', text)
+                if time_match or any(keyword in text for keyword in ['货物', '到达', '揽收', '运输', '配送', '签收']):
+                    tracking_info.append({
+                        'text': text.strip(),
+                        'time': time_match.group(1) if time_match else None
+                    })
+        
+        result['tracking_info'] = tracking_info[:10]  # 最多保存10条
+        
+        if result['tracking_info']:
+            print(f"✅ 找到 {len(result['tracking_info'])} 条物流跟踪信息")
+        
+        return result
+        
+    except Exception as e:
+        print(f"提取物流信息时出错: {e}")
+        import traceback
+        traceback.print_exc()
+        return None
+
+
+def print_result(result):
+    """打印提取结果"""
+    if not result:
+        print("未能提取到物流信息")
+        return
+    
+    print("\n" + "="*50)
+    print("物流信息提取结果:")
+    print("="*50)
+    print(f"运单号: {result.get('waybill_no', '未找到')}")
+    print(f"国内承运人: {result.get('carrier', '未找到')}")
+    print(f"国内承运人电话: {result.get('carrier_phone', '未找到')}")
+    
+    if result.get('tracking_info'):
+        print(f"\n物流跟踪信息 (共 {len(result['tracking_info'])} 条):")
+        for idx, info in enumerate(result['tracking_info'], 1):
+            if isinstance(info, dict):
+                text = info.get('text', str(info))
+                time_str = info.get('time', '')
+                print(f"  {idx}. {text}")
+                if time_str:
+                    print(f"     时间: {time_str}")
+            else:
+                print(f"  {idx}. {info}")
+    else:
+        print("\n物流跟踪信息: 未找到")
+    
+    print("="*50)
+
+
+# =================== Flask API 接口 ===================
+# 初始化 Flask 应用
+app = Flask(__name__)
+
+# 初始化锁，防止并发访问
+fetch_lock = threading.Lock()
+
+
+@app.route('/fetch_logistics', methods=['GET', 'POST'])
+def fetch_logistics():
+    """
+    查询物流信息接口
+    
+    参数:
+        tracking_url: 物流追踪页面 URL（GET 或 POST）
+        例如: https://3.cn/2t-Iibig
+    
+    返回:
+        JSON 格式的物流信息，包含：
+        - waybill_no: 运单号
+        - carrier: 国内承运人
+        - carrier_phone: 国内承运人电话
+        - tracking_info: 物流跟踪信息列表
+        - success: 是否成功
+        - message: 消息提示
+    """
+    # 获取参数（支持 GET 和 POST）
+    if request.method == 'POST':
+        if request.is_json:
+            data = request.get_json()
+            tracking_url = data.get('tracking_url') or data.get('url')
+        else:
+            tracking_url = request.form.get('tracking_url') or request.form.get('url') or request.args.get('tracking_url') or request.args.get('url')
+    else:
+        tracking_url = request.args.get('tracking_url') or request.args.get('url')
+    
+    if not tracking_url:
+        return jsonify({
+            "success": False,
+            "error": "缺少参数 tracking_url 或 url",
+            "message": "请提供物流追踪页面 URL"
+        }), 400
+    
+    # 验证 URL 格式
+    if not (tracking_url.startswith('http://') or tracking_url.startswith('https://')):
+        return jsonify({
+            "success": False,
+            "error": "URL 格式错误",
+            "message": "URL 必须以 http:// 或 https:// 开头"
+        }), 400
+    
+    try:
+        with fetch_lock:  # 加锁，防止并发调用
+            print(f"\n收到物流查询请求: {tracking_url}")
+            result = extract_logistics_info(tracking_url)
+        
+        if result:
+            # 构建返回数据
+            response_data = {
+                "success": True,
+                "message": "查询成功",
+                "data": {
+                    "waybill_no": result.get('waybill_no'),
+                    "carrier": result.get('carrier'),
+                    "carrier_phone": result.get('carrier_phone'),
+                    "tracking_info": result.get('tracking_info', []),
+                    "tracking_count": len(result.get('tracking_info', []))
+                },
+                "url": tracking_url
+            }
+            
+            # 如果有些信息未找到，添加提示
+            missing_fields = []
+            if not result.get('waybill_no'):
+                missing_fields.append('waybill_no')
+            if not result.get('carrier'):
+                missing_fields.append('carrier')
+            
+            if missing_fields:
+                response_data["warning"] = f"以下字段未找到: {', '.join(missing_fields)}"
+            
+            return jsonify(response_data), 200
+        else:
+            return jsonify({
+                "success": False,
+                "error": "提取失败",
+                "message": "未能从页面中提取到物流信息",
+                "url": tracking_url
+            }), 500
+            
+    except Exception as e:
+        print(f"查询物流信息时出错: {e}")
+        import traceback
+        traceback.print_exc()
+        return jsonify({
+            "success": False,
+            "error": str(e),
+            "message": "服务器内部错误",
+            "url": tracking_url
+        }), 500
+
+
+@app.route('/health', methods=['GET'])
+def health():
+    """健康检查接口"""
+    return jsonify({
+        "status": "ok",
+        "service": "京东物流信息查询服务",
+        "version": "1.0.0"
+    }), 200
+
+
+@app.route('/', methods=['GET'])
+def index():
+    """首页，返回 API 使用说明"""
+    return jsonify({
+        "service": "京东物流信息查询 API",
+        "version": "1.0.0",
+        "endpoints": {
+            "/fetch_logistics": {
+                "method": ["GET", "POST"],
+                "description": "查询物流信息",
+                "parameters": {
+                    "tracking_url": "物流追踪页面 URL（必需）",
+                    "url": "tracking_url 的别名（可选）"
+                },
+                "example_get": "/fetch_logistics?tracking_url=https://3.cn/2t-Iibig",
+                "example_post": "POST /fetch_logistics\n{\"tracking_url\": \"https://3.cn/2t-Iibig\"}"
+            },
+            "/health": {
+                "method": ["GET"],
+                "description": "健康检查"
+            }
+        }
+    }), 200
+
+
+# =================== 启动服务 ===================
+if __name__ == '__main__':
+    # API 服务模式（默认）
+    print("="*60)
+    print("京东物流信息查询 API 服务 (Ubuntu 版本)")
+    print("="*60)
+    print(f"无头模式: {'是' if USE_HEADLESS else '否'}")
+    print("\n服务接口:")
+    print("  GET/POST  /fetch_logistics?tracking_url=<URL>  - 查询物流信息")
+    print("  GET       /health                             - 健康检查")
+    print("  GET       /                                   - API 说明")
+    print("\n启动服务...")
+    print("服务地址: http://0.0.0.0:5001")
+    print("按 Ctrl+C 停止服务\n")
+    
+    try:
+        app.run(host='0.0.0.0', port=5001, debug=False, threaded=True)
+    except KeyboardInterrupt:
+        print("\n\n服务已停止")
+    finally:
+        if 'global_page' in globals() and global_page:
+            try:
+                global_page.quit()
+                print("浏览器已关闭")
+            except:
+                pass
+
--- a/jd/jd.py
+++ b/jd/jd.py
@@ -6,8 +6,7 @@ import threading
 from flask import Flask, request, jsonify
 from DrissionPage import ChromiumPage, ChromiumOptions
 from sqlalchemy import create_engine, Column, Integer, String, Text, DateTime
-from sqlalchemy.ext.declarative import declarative_base
-from sqlalchemy.orm import sessionmaker
+from sqlalchemy.orm import declarative_base, sessionmaker

 # =================== 配置部分 ===================
 # 浏览器路径（请根据本地实际路径修改）
@@ -27,6 +26,14 @@ app = Flask(__name__)
 # 初始化锁
 fetch_lock = threading.Lock()

+# 全局爬虫控制标志
+crawler_running = False
+crawler_thread = None
+current_product_id = None
+
+# 当前“允许运行”的抓取任务 product_id（新请求会覆盖，旧线程检测到不匹配则退出）
+active_fetch_product_id = None
+

 # 初始化数据库连接
 db_url = f"mysql+pymysql://{db_config['user']}:{db_config['password']}@{db_config['host']}:{db_config['port']}/{db_config['database']}?charset=utf8mb4"
@@ -65,16 +72,27 @@ def get_global_browser():
    return global_page


+def _is_fetch_cancelled(product_id):
+    """当前任务是否已被新请求取消（只保留最新请求的 product_id）"""
+    global active_fetch_product_id
+    return active_fetch_product_id is not None and active_fetch_product_id != product_id
+
+
 def fetch_jd_comments(product_id):
+    global active_fetch_product_id
    page = get_global_browser()  # 使用全局浏览器
    try:
        # 打开商品页面
        page.get(f'https://item.jd.com/{product_id}.html#crumb-wrap')
        time.sleep(random.uniform(5, 8))
+        if _is_fetch_cancelled(product_id):
+            return 0

        # 向下滚动主页面
        page.scroll.down(150)
        time.sleep(random.uniform(3, 5))
+        if _is_fetch_cancelled(product_id):
+            return 0

        # 点击“买家赞不绝口”
        element1 = page.ele('xpath=//div[contains(text(), "买家赞不绝口")]')
@@ -86,16 +104,20 @@ def fetch_jd_comments(product_id):
            if element1:
                element1.click()
                time.sleep(random.uniform(3, 5))
+        if _is_fetch_cancelled(product_id):
+            return 0
        # 点击“当前商品”
        element2 = page.ele('xpath=//div[contains(text(), "当前商品")]')
        if element2:
            element2.click()
            time.sleep(random.uniform(3, 5))

+        if _is_fetch_cancelled(product_id):
+            return 0
        # 定位弹窗区域
        popup = page.ele('xpath=//*[@id="rateList"]/div/div[3]')
        if not popup:
-            return []
+            return 0

        # 点击“视频”
        element3 = page.ele('xpath=//div[contains(text(), "视频")]')
@@ -103,20 +125,28 @@ def fetch_jd_comments(product_id):
            element3.click()
            time.sleep(random.uniform(3, 5))

+        if _is_fetch_cancelled(product_id):
+            return 0
        # 监听请求
        page.listen.start('https://api.m.jd.com/client.action')

-        max_retries = 10  # 最多尝试 5 次无新数据
        retry_count = 0
        new_comments = []  # 存储最终的新评论
        seen_ids = set()  # 已处理过的 comment_id
+        total_comments_saved = 0  # 总共保存的评论数

-        while retry_count < max_retries and len(new_comments) < 10:
+        # 持续获取评论，直到被新请求取消或手动停止
+        while True:
+            if _is_fetch_cancelled(product_id):
+                print(f"[fetch_jd_comments] 商品 {product_id} 已被新请求取消，退出")
+                break
            scroll_amount = random.randint(10000, 100000)
            popup.scroll.down(scroll_amount)
            print(f"弹窗向下滚动了 {scroll_amount} 像素")

            time.sleep(random.uniform(3, 5))
+            if _is_fetch_cancelled(product_id):
+                break

            resp = page.listen.wait(timeout=5)
            if resp and 'getCommentListPage' in resp.request.postData:
@@ -161,6 +191,12 @@ def fetch_jd_comments(product_id):
                            print(f"本次获取到 {len(fresh_comments)} 条新评论")
                            new_comments.extend(fresh_comments)
                            retry_count = 0  # 有新数据，重置重试计数器
+
+                            # 立即保存这批评论到数据库
+                            save_comments_to_db(product_id, fresh_comments)
+                            total_comments_saved += len(fresh_comments)
+                            print(f"已保存 {len(fresh_comments)} 条评论到数据库，总计保存 {total_comments_saved} 条评论")
+
                        else:
                            print("本次无新评论，继续滚动...")
                            retry_count += 1
@@ -173,16 +209,35 @@ def fetch_jd_comments(product_id):
            else:
                print("未捕获到新的评论数据，继续滚动...")
                retry_count += 1
+            if _is_fetch_cancelled(product_id):
+                break

-        print(f"共抓取到 {len(new_comments)} 条新评论（最多需要10条）")
-        return new_comments[:10]  # 只保留前10条
+        print(f"爬虫已停止，共抓取到 {total_comments_saved} 条评论")
+        return total_comments_saved

    except Exception as e:
        print("发生错误:", e)
-        return []
+        return 0



+# =================== 持续爬虫后台运行函数 ===================
+def continuous_crawler(product_id):
+    """持续爬取评论的后台函数"""
+    global crawler_running
+    try:
+        print(f"开始持续爬取商品 {product_id} 的评论...")
+        while crawler_running:
+            result = fetch_jd_comments(product_id)
+            if not crawler_running:
+                break
+            # 如果没有获取到数据，等待一段时间再继续
+            time.sleep(10)
+        print(f"商品 {product_id} 的持续爬取已停止")
+    except Exception as e:
+        print(f"持续爬虫发生错误: {e}")
+        crawler_running = False
+
 # =================== 提取评论并保存到数据库 ===================
 def save_comments_to_db(product_id, comments):
    session = Session()
@@ -229,33 +284,144 @@ def save_comments_to_db(product_id, comments):


 # =================== Flask API 接口 ===================
-@app.route('/fetch_comments', methods=['POST'])
-def fetch_comments():
+@app.route('/start_crawler', methods=['POST'])
+def start_crawler():
+    """启动持续爬虫"""
+    global crawler_running, crawler_thread, current_product_id
+
    product_id = request.args.get('product_id')
    if not product_id:
-        return jsonify({"error": "缺少 product_id"}), -200
+        return jsonify({"error": "缺少 product_id"}), 400
+
+    if crawler_running:
+        return jsonify({
+            "message": f"爬虫已在运行中，当前商品ID: {current_product_id}",
+            "status": "already_running"
+        }), 200

    try:
-        with fetch_lock:  # 加锁，防止并发调用
-            comments = fetch_jd_comments(product_id)
-            if not comments:
-                return jsonify({"message": "未获取到评论数据"}), -200
-
-            save_comments_to_db(product_id, comments)
+        with fetch_lock:
+            crawler_running = True
+            current_product_id = product_id
+            crawler_thread = threading.Thread(target=continuous_crawler, args=(product_id,))
+            crawler_thread.daemon = True
+            crawler_thread.start()

        return jsonify({
-            "message": f"成功保存 {len(comments)} 条评论",
+            "message": f"已启动持续爬虫，商品ID: {product_id}",
+            "status": "started",
            "product_id": product_id
        }), 200

    except Exception as e:
-        return jsonify({"error": str(e)}), -200
+        crawler_running = False
+        return jsonify({"error": str(e)}), 500
+
+
+@app.route('/stop_crawler', methods=['POST'])
+def stop_crawler():
+    """停止持续爬虫"""
+    global crawler_running, crawler_thread, current_product_id
+
+    if not crawler_running:
+        return jsonify({
+            "message": "爬虫未在运行",
+            "status": "not_running"
+        }), 200
+
+    try:
+        with fetch_lock:
+            crawler_running = False
+            stopped_product_id = current_product_id
+            current_product_id = None
+
+        # 等待线程结束
+        if crawler_thread and crawler_thread.is_alive():
+            crawler_thread.join(timeout=10)
+
+        return jsonify({
+            "message": f"已停止持续爬虫，商品ID: {stopped_product_id}",
+            "status": "stopped",
+            "product_id": stopped_product_id
+        }), 200
+
+    except Exception as e:
+        return jsonify({"error": str(e)}), 500
+
+
+@app.route('/crawler_status', methods=['GET'])
+def crawler_status():
+    """获取爬虫状态"""
+    global crawler_running, current_product_id
+
+    return jsonify({
+        "running": crawler_running,
+        "product_id": current_product_id,
+        "status": "running" if crawler_running else "stopped"
+    }), 200
+
+
+@app.route('/test', methods=['GET'])
+def test():
+    """测试端点，验证服务器是否正常工作"""
+    print("测试端点被访问")
+    return jsonify({"message": "服务器运行正常", "status": "ok"}), 200
+
+
+@app.route('/fetch_comments', methods=['GET', 'POST'])
+def fetch_comments():
+    """单次获取评论（在后台运行，立即返回）。新请求会中断所有历史请求线程，只执行本次请求。"""
+    global crawler_running, active_fetch_product_id
+    print(f"[fetch_comments] 收到请求，方法: {request.method}, 参数: {request.args}")
+    product_id = request.args.get('product_id')
+
+    if not product_id:
+        print("[fetch_comments] 错误: 缺少 product_id")
+        return jsonify({"error": "缺少 product_id"}), 400
+
+    print(f"[fetch_comments] 开始处理商品ID: {product_id}，将中断所有历史请求后执行")
+
+    try:
+        # 立刻中断所有历史：停止持续爬虫并标记“当前任务”为新 product_id，旧线程在循环中检测到会自行退出
+        with fetch_lock:
+            crawler_running = False
+            active_fetch_product_id = product_id
+
+        def run_fetch():
+            try:
+                print(f"[后台线程] 开始获取商品 {product_id} 的评论...")
+                result = fetch_jd_comments(product_id)
+                print(f"[后台线程] 获取完成，结果: {result}")
+            except Exception as e:
+                import traceback
+                error_msg = f"后台获取评论时发生错误: {e}\n{traceback.format_exc()}"
+                print(f"[后台线程] {error_msg}")
+
+        fetch_thread = threading.Thread(target=run_fetch)
+        fetch_thread.daemon = True
+        fetch_thread.start()
+        print(f"[fetch_comments] 后台线程已启动（历史请求已标记为取消）")
+
+        response_data = {
+            "message": f"已开始获取商品 {product_id} 的评论，正在后台运行中...（已中断之前的请求）",
+            "status": "started",
+            "product_id": product_id,
+            "note": "评论获取在后台进行，请稍后查看数据库或使用 /crawler_status 查看状态"
+        }
+        print(f"[fetch_comments] 返回响应: {response_data}")
+        return jsonify(response_data), 200
+
+    except Exception as e:
+        import traceback
+        error_msg = f"处理请求时发生错误: {e}\n{traceback.format_exc()}"
+        print(f"[fetch_comments] {error_msg}")
+        return jsonify({"error": str(e)}), 500


 # =================== 启动服务 ===================
 if __name__ == '__main__':
    try:
-        app.run(host='0.0.0.0', port=5000, debug=True)
+        app.run(host='0.0.0.0', port=5008, debug=True)
    finally:
        if 'global_page' in globals() and global_page:
            global_page.quit()
--- a/jd/logistics.py
+++ b/jd/logistics.py
@@ -0,0 +1,154 @@
+import time
+import json
+import re
+from DrissionPage import ChromiumPage, ChromiumOptions
+
+# 设置浏览器路径
+CHROME_PATH = r'C:\Program Files\Google\Chrome\Application\chrome.exe'
+
+# 物流追踪页面 URL
+TRACKING_URL = "https://3.cn/2t-Iibig"
+
+# 配置并启动浏览器
+options = ChromiumOptions()
+options.set_browser_path(CHROME_PATH)
+
+# 创建浏览器实例
+page = ChromiumPage(options)
+
+try:
+    print("正在打开物流追踪页面...")
+    page.get(TRACKING_URL)
+    
+    # 等待页面加载
+    time.sleep(5)
+    
+    print("\n=== 方法1: 尝试从页面元素提取信息 ===")
+    
+    # 尝试提取运单号
+    waybill_elements = page.eles('xpath=//*[contains(text(), "运单号")]')
+    if waybill_elements:
+        print(f"找到运单号相关元素: {len(waybill_elements)} 个")
+        for elem in waybill_elements:
+            print(f"  文本: {elem.text}")
+            # 尝试获取父元素或兄弟元素
+            parent = elem.parent()
+            if parent:
+                print(f"  父元素文本: {parent.text[:100]}")
+    
+    # 尝试提取承运人信息
+    carrier_elements = page.eles('xpath=//*[contains(text(), "承运人")]')
+    if carrier_elements:
+        print(f"\n找到承运人相关元素: {len(carrier_elements)} 个")
+        for elem in carrier_elements:
+            print(f"  文本: {elem.text}")
+    
+    print("\n=== 方法2: 监听网络请求，查找数据接口 ===")
+    
+    # 监听所有包含数据的请求
+    print("开始监听网络请求...")
+    page.listen.start()
+    
+    # 滚动页面触发可能的请求
+    page.scroll.down(500)
+    time.sleep(3)
+    page.scroll.to_bottom()
+    time.sleep(5)
+    
+    # 获取所有监听到的请求
+    all_responses = page.listen.get()
+    print(f"\n共监听到 {len(all_responses)} 个请求")
+    
+    # 查找可能包含物流数据的请求
+    keywords = ['track', 'logistics', 'waybill', 'express', 'delivery', '3.cn', 'jd.com', 'json', 'api']
+    
+    for idx, resp in enumerate(all_responses):
+        url = resp.url if hasattr(resp, 'url') else ''
+        print(f"\n请求 {idx + 1}:")
+        print(f"  URL: {url[:150]}")
+        
+        # 检查是否包含关键词
+        url_lower = url.lower()
+        if any(keyword in url_lower for keyword in keywords):
+            print(f"  ⭐ 可能相关的请求！")
+            try:
+                if hasattr(resp, 'response') and hasattr(resp.response, 'body'):
+                    body = resp.response.body
+                    if isinstance(body, dict):
+                        print(f"  响应数据 (前500字符): {str(body)[:500]}")
+                        # 尝试解析 JSON
+                        print(f"  完整的 JSON 数据:")
+                        print(json.dumps(body, indent=2, ensure_ascii=False)[:1000])
+                    elif isinstance(body, str):
+                        print(f"  响应数据 (前500字符): {body[:500]}")
+                        # 尝试解析 JSON
+                        try:
+                            json_data = json.loads(body)
+                            print(f"  解析后的 JSON (前1000字符):")
+                            print(json.dumps(json_data, indent=2, ensure_ascii=False)[:1000])
+                        except:
+                            pass
+            except Exception as e:
+                print(f"  解析响应时出错: {e}")
+    
+    print("\n=== 方法3: 提取页面 HTML 中的 JSON 数据 ===")
+    
+    # 获取页面 HTML
+    html = page.html
+    # 查找可能的 JSON 数据（在 script 标签中）
+    json_patterns = [
+        r'window\.__INITIAL_STATE__\s*=\s*({.+?});',
+        r'var\s+trackData\s*=\s*({.+?});',
+        r'const\s+trackingInfo\s*=\s*({.+?});',
+        r'data\s*:\s*({.+?})',
+        r'"waybillNo"[:\s]+"([^"]+)"',
+        r'"trackingNumber"[:\s]+"([^"]+)"',
+    ]
+    
+    for pattern in json_patterns:
+        matches = re.findall(pattern, html, re.DOTALL)
+        if matches:
+            print(f"\n找到匹配模式 {pattern}:")
+            for match in matches[:3]:  # 只显示前3个
+                print(f"  匹配: {str(match)[:200]}")
+    
+    print("\n=== 尝试提取页面中的所有文本内容 ===")
+    page_text = page.html
+    # 查找运单号（通常是数字）
+    waybill_pattern = r'运单号[:\s]*(\d+)'
+    waybill_matches = re.findall(waybill_pattern, page_text)
+    if waybill_matches:
+        print(f"找到运单号: {waybill_matches}")
+    
+    # 查找承运人
+    carrier_pattern = r'国内承运人[:\s]*([^\s<]+)'
+    carrier_matches = re.findall(carrier_pattern, page_text)
+    if carrier_matches:
+        print(f"找到承运人: {carrier_matches}")
+    
+    # 查找电话号码
+    phone_pattern = r'国内承运人电话[:\s]*(\d+)'
+    phone_matches = re.findall(phone_pattern, page_text)
+    if phone_matches:
+        print(f"找到电话: {phone_matches}")
+    
+    print("\n=== 等待用户查看页面 ===")
+    print("页面已打开，请手动检查浏览器中的网络请求（F12 -> Network），查找包含物流数据的 API")
+    print("按 Enter 键继续或等待 60 秒后自动关闭...")
+    
+    try:
+        input()
+    except:
+        time.sleep(60)
+
+except KeyboardInterrupt:
+    print("\n用户中断脚本执行")
+except Exception as e:
+    print(f"\n发生错误: {e}")
+    import traceback
+    traceback.print_exc()
+finally:
+    print("\n脚本执行完成，浏览器保持打开状态用于调试")
+    # 可以选择是否关闭浏览器
+    # page.quit()
+
--- a/jd/requirements.txt
+++ b/jd/requirements.txt
@@ -0,0 +1,5 @@
+# jd.py 依赖
+flask>=2.0.0
+DrissionPage>=4.0.0
+sqlalchemy>=2.0.0
+pymysql>=1.0.0
--- a/jd/run_win.bat
+++ b/jd/run_win.bat
@@ -0,0 +1,35 @@
+@echo off
+chcp 65001 >nul
+title JD 服务 - 一键启动
+
+echo ========================================
+echo   JD 服务 - 依赖安装与启动
+echo ========================================
+echo.
+
+cd /d "%~dp0"
+
+:: 检查 Python
+python --version >nul 2>&1
+if errorlevel 1 (
+    echo [错误] 未找到 Python，请先安装 Python 并加入 PATH。
+    pause
+    exit /b 1
+)
+
+echo [1/2] 安装依赖...
+python -m pip install -r requirements.txt -q
+if errorlevel 1 (
+    echo [错误] 依赖安装失败。
+    pause
+    exit /b 1
+)
+echo 依赖已就绪。
+echo.
+
+echo [2/2] 启动服务...
+echo 按 Ctrl+C 可停止服务。
+echo.
+python jd.py
+
+pause
--- a/jd/setup_ubuntu.sh
+++ b/jd/setup_ubuntu.sh
@@ -0,0 +1,162 @@
+#!/bin/bash
+# Ubuntu 环境快速设置脚本
+
+set -e  # 遇到错误立即退出
+
+# 确保使用 bash 运行（兼容性问题处理）
+if [ -z "$BASH_VERSION" ]; then
+    echo "警告: 此脚本需要使用 bash 运行"
+    echo "请使用: bash $0"
+    exit 1
+fi
+
+echo "=========================================="
+echo "京东物流提取工具 - Ubuntu 环境设置"
+echo "=========================================="
+echo ""
+
+
+# 1. 检查并安装系统依赖
+echo "步骤 1: 检查系统依赖..."
+if ! command -v python3 >/dev/null 2>&1; then
+    echo "安装 Python3..."
+    sudo apt update
+    sudo apt install -y python3 python3-pip python3-venv
+else
+    echo "✅ Python3 已安装"
+fi
+
+# 检查 Chrome/Chromium
+CHROME_PATH=""
+if command -v google-chrome >/dev/null 2>&1; then
+    CHROME_PATH=$(which google-chrome)
+    echo "✅ 找到 Google Chrome: $CHROME_PATH"
+elif [ -f "/usr/bin/google-chrome" ]; then
+    CHROME_PATH="/usr/bin/google-chrome"
+    echo "✅ 找到 Google Chrome: $CHROME_PATH"
+elif command -v chromium-browser >/dev/null 2>&1; then
+    CHROME_PATH=$(which chromium-browser)
+    echo "✅ 找到 Chromium: $CHROME_PATH"
+elif [ -f "/usr/bin/chromium-browser" ]; then
+    CHROME_PATH="/usr/bin/chromium-browser"
+    echo "✅ 找到 Chromium: $CHROME_PATH"
+else
+    echo "⚠️  未找到 Chrome/Chromium，将尝试安装..."
+    echo "选择要安装的浏览器:"
+    echo "1) Google Chrome (推荐)"
+    echo "2) Chromium (开源版本)"
+    read -p "请选择 [1-2]: " choice
+    
+    if [ "$choice" = "1" ]; then
+        echo "正在安装 Google Chrome..."
+        wget -q https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
+        sudo apt install -y ./google-chrome-stable_current_amd64.deb
+        rm -f google-chrome-stable_current_amd64.deb
+        CHROME_PATH="/usr/bin/google-chrome"
+    elif [ "$choice" = "2" ]; then
+        echo "正在安装 Chromium..."
+        sudo apt update
+        sudo apt install -y chromium-browser
+        CHROME_PATH="/usr/bin/chromium-browser"
+    fi
+fi
+
+# 2. 安装 Chrome 运行时依赖
+echo ""
+echo "步骤 2: 检查 Chrome 运行时依赖..."
+DEPS="libnss3 libatk-bridge2.0-0 libdrm2 libxkbcommon0 libxcomposite1 libxdamage1 libxfixes3 libxrandr2 libgbm1 libasound2"
+MISSING_DEPS=""
+
+for dep in $DEPS; do
+    if ! dpkg -l 2>/dev/null | grep -q "^ii.*$dep"; then
+        if [ -z "$MISSING_DEPS" ]; then
+            MISSING_DEPS="$dep"
+        else
+            MISSING_DEPS="$MISSING_DEPS $dep"
+        fi
+    fi
+done
+
+if [ -n "$MISSING_DEPS" ]; then
+    echo "安装缺失的依赖: $MISSING_DEPS"
+    sudo apt install -y $MISSING_DEPS
+else
+    echo "✅ 所有依赖已安装"
+fi
+
+# 3. 创建虚拟环境
+echo ""
+echo "步骤 3: 设置 Python 虚拟环境..."
+if [ ! -d "venv" ]; then
+    echo "创建虚拟环境..."
+    python3 -m venv venv
+    echo "✅ 虚拟环境创建成功"
+else
+    echo "✅ 虚拟环境已存在"
+fi
+
+# 4. 激活虚拟环境并安装 Python 包
+echo ""
+echo "步骤 4: 安装 Python 依赖包..."
+source venv/bin/activate
+
+# 升级 pip
+pip install --upgrade pip
+
+# 安装依赖
+pip install DrissionPage Flask
+
+# 可选：如果需要数据库功能
+read -p "是否需要数据库功能？(sqlalchemy, pymysql) [y/N]: " need_db
+if [ "$need_db" = "y" ] || [ "$need_db" = "Y" ]; then
+    pip install sqlalchemy pymysql
+fi
+
+deactivate
+
+# 5. 创建运行脚本
+echo ""
+echo "步骤 5: 创建便捷运行脚本..."
+# 创建 API 服务启动脚本
+cat > run_logistics_api.sh << 'EOF'
+#!/bin/bash
+# 启动物流信息查询 API 服务
+
+cd "$(dirname "$0")"
+source venv/bin/activate
+
+# 启动 API 服务
+python jd/fetch_logistics_ubuntu.py
+
+deactivate
+EOF
+
+chmod +x run_logistics_api.sh
+
+# 6. 完成
+echo ""
+echo "=========================================="
+echo "✅ 环境设置完成！"
+echo "=========================================="
+echo ""
+echo "快速开始:"
+echo ""
+echo "启动 API 服务:"
+echo "  方式1: 使用便捷脚本"
+echo "    ./run_logistics_api.sh"
+echo ""
+echo "  方式2: 手动启动"
+echo "    source venv/bin/activate"
+echo "    python jd/fetch_logistics_ubuntu.py"
+echo "    deactivate"
+echo ""
+echo "  API 接口地址: http://localhost:5001"
+echo "  查询示例:"
+echo "    curl 'http://localhost:5001/fetch_logistics?tracking_url=https://3.cn/2t-Iibig'"
+echo "    或"
+echo "    curl -X POST http://localhost:5001/fetch_logistics -H 'Content-Type: application/json' -d '{\"tracking_url\":\"https://3.cn/2t-Iibig\"}'"
+echo ""
+echo "浏览器路径: $CHROME_PATH"
+echo "虚拟环境: $(pwd)/venv"
+echo ""
+
--- a/jd/tb.py
+++ b/jd/tb.py
@@ -13,8 +13,7 @@ from sqlalchemy.orm import sessionmaker, declarative_base
 CHROME_PATH = r'C:\Program Files\Google\Chrome\Application\chrome.exe'

 # 固定商品详情页 URL
-TARGET_URL = "https://detail.tmall.com/item.htm?abbucket=1&id=735141569627&ltk2=1753093866331wbixx4bjhgx78xdlrpyxq&ns=1&priceTId=213e074d17530938630755244e1109&skuId=5667837161089&spm=a21n57.1.hoverItem.2&utparam=%7B%22aplus_abtest%22%3A%228c55408acbff553514850c28e821c3b4%22%7D&xxc=taobaoSearch"
-# MySQL 配置
+TARGET_URL = "https://detail.tmall.com/item.htm?abbucket=1&id=629109576049&mi_id=0000ug1x7t_mV0K12gYppRSVQ7NozSDtS3YwUTM7oCeMS5w&ns=1&skuId=5800648665359&spm=a21n57.1.hoverItem.1&utparam=%7B%22aplus_abtest%22%3A%2254df76059607f4cb191afc7c675e8349%22%7D&xxc=taobaoSearch"# MySQL 配置
 db_config = {
    "host": "192.168.8.88",
    "port": 3306,
@@ -96,7 +95,7 @@ def fetch_taobao_comments():
            return []

        # 开始监听指定请求
-        target_url = 'https://h5api.m.tmall.com/h5/mtop.taobao.rate.detaillist.get/6.0/?jsv=2.7.5'
+        target_url = 'https://h5api.m.tmall.com/h5/mtop.taobao.rate.detaillist.get/6.0/?jsv=2.7.4'
        page.listen.start(target_url)

        seen_ids = set()
@@ -157,6 +156,8 @@ def save_taobao_comments_to_db(comments):
            user_nick = comment.get('userNick', '匿名用户')
            pic_list = comment.get('feedPicPathList', [])
            comment_date = comment.get('feedbackDate', '')
+            # 从评论数据中提取 skuId 作为 product_id
+            sku_id = comment.get('skuId', '')

            exists = session.query(TaobaoComment).filter_by(comment_id=comment_id).first()
            if exists:
@@ -166,7 +167,7 @@ def save_taobao_comments_to_db(comments):
            picture_urls = [url for url in pic_list if url.startswith('//')]

            new_comment = TaobaoComment(
-                product_id="735141569627",
+                product_id=sku_id,  # 使用 skuId 替代硬编码的 product_id
                user_name=user_nick,
                comment_text=feedback,
                comment_id=comment_id,
@@ -174,7 +175,7 @@ def save_taobao_comments_to_db(comments):
                comment_date=comment_date
            )
            session.add(new_comment)
-            print(f"正在写入评论: {comment_id}")
+            print(f"正在写入评论: {comment_id}, skuId: {sku_id}")
        session.commit()
    except Exception as e:
        session.rollback()
--- a/jd/test_browser.py
+++ b/jd/test_browser.py
@@ -0,0 +1,91 @@
+"""测试浏览器是否能正常启动"""
+import time
+from DrissionPage import ChromiumPage, ChromiumOptions
+
+CHROME_PATH = r'C:\Program Files\Google\Chrome\Application\chrome.exe'
+
+print("="*60)
+print("浏览器启动测试")
+print("="*60)
+
+# 检查 Chrome 路径
+import os
+if not os.path.exists(CHROME_PATH):
+    print(f"❌ 错误: 找不到 Chrome 浏览器")
+    print(f"路径: {CHROME_PATH}")
+    print("\n请检查:")
+    print("1. Chrome 是否已安装")
+    print("2. Chrome 的安装路径是否正确")
+    print("3. 如果 Chrome 安装在别的路径，请修改 CHROME_PATH 变量")
+    exit(1)
+else:
+    print(f"✅ Chrome 路径检查通过: {CHROME_PATH}")
+
+# 配置浏览器选项
+print("\n正在配置浏览器选项...")
+options = ChromiumOptions()
+options.set_browser_path(CHROME_PATH)
+
+# 尝试启动浏览器
+print("正在启动浏览器...")
+print("如果浏览器没有自动打开，可能会有以下原因:")
+print("1. Chrome 浏览器正在被其他程序使用")
+print("2. ChromeDriver 版本不匹配")
+print("3. 防火墙或安全软件阻止")
+print("\n请等待 10 秒...\n")
+
+try:
+    page = ChromiumPage(options)
+    print("✅ 浏览器启动成功！")
+    
+    # 测试打开一个简单的页面
+    print("\n正在打开测试页面: https://www.baidu.com")
+    page.get('https://www.baidu.com')
+    time.sleep(3)
+    
+    # 检查页面信息
+    try:
+        print(f"当前 URL: {page.url}")
+        print(f"页面标题: {page.title}")
+        html_len = len(page.html)
+        print(f"页面 HTML 长度: {html_len} 字符")
+        
+        if html_len > 1000:
+            print("\n✅ 测试成功！浏览器正常工作。")
+            print("\n现在可以运行 fetch_logistics.py 了。")
+        else:
+            print("\n⚠️ 警告: 页面内容可能未完全加载")
+    except Exception as e:
+        print(f"\n⚠️ 获取页面信息时出错: {e}")
+    
+    print("\n浏览器将保持打开状态 30 秒，请查看是否能看到浏览器窗口...")
+    print("如果能看到浏览器窗口，说明启动成功。")
+    time.sleep(30)
+    
+    # 询问是否关闭
+    print("\n测试完成。浏览器将保持打开状态。")
+    print("您可以手动关闭浏览器窗口，或者按 Ctrl+C 退出程序。")
+    
+    # 不自动关闭，让用户查看
+    try:
+        input("\n按 Enter 键关闭浏览器并退出...")
+    except:
+        pass
+    
+except Exception as e:
+    print(f"\n❌ 浏览器启动失败！")
+    print(f"错误信息: {e}")
+    print("\n可能的解决方案:")
+    print("1. 检查 Chrome 是否正确安装")
+    print("2. 尝试关闭所有 Chrome 窗口后重试")
+    print("3. 检查是否有权限问题")
+    print("4. 查看是否有错误日志")
+    import traceback
+    traceback.print_exc()
+finally:
+    try:
+        page.quit()
+        print("浏览器已关闭")
+    except:
+        pass
+