Merge Frontier, Update to Version 3.72 (#1553)

* Zhipu sdk update 适配最新的智谱SDK，支持GLM4v (#1502) * 适配 google gemini 优化为从用户input中提取文件 * 适配最新的智谱SDK、支持glm-4v * requirements.txt fix * pending history check --------- Co-authored-by: binary-husky <qingxu.fu@outlook.com> * Update "生成多种Mermaid图表" plugin: Separate out the file reading function (#1520) * Update crazy_functional.py with new functionality deal with PDF * Update crazy_functional.py and Mermaid.py for plugin_kwargs * Update crazy_functional.py with new chart type: mind map * Update SELECT_PROMPT and i_say_show_user messages * Update ArgsReminder message in get_crazy_functions() function * Update with read md file and update PROMPTS * Return the PROMPTS as the test found that the initial version worked best * Update Mermaid chart generation function * version 3.71 * 解决issues #1510 * Remove unnecessary text from sys_prompt in 解析历史输入 function * Remove sys_prompt message in 解析历史输入 function * Update bridge_all.py: supports gpt-4-turbo-preview (#1517) * Update bridge_all.py: supports gpt-4-turbo-preview supports gpt-4-turbo-preview * Update bridge_all.py --------- Co-authored-by: binary-husky <96192199+binary-husky@users.noreply.github.com> * Update config.py: supports gpt-4-turbo-preview (#1516) * Update config.py: supports gpt-4-turbo-preview supports gpt-4-turbo-preview * Update config.py --------- Co-authored-by: binary-husky <96192199+binary-husky@users.noreply.github.com> * Refactor 解析历史输入 function to handle file input * Update Mermaid chart generation functionality * rename files and functions --------- Co-authored-by: binary-husky <qingxu.fu@outlook.com> Co-authored-by: hongyi-zhao <hongyi.zhao@gmail.com> Co-authored-by: binary-husky <96192199+binary-husky@users.noreply.github.com> * 接入mathpix ocr功能 (#1468) * Update Latex输出PDF结果.py 借助mathpix实现了PDF翻译中文并重新编译PDF * Update config.py add mathpix appid & appkey * Add 'PDF翻译中文并重新编译PDF' feature to plugins. --------- Co-authored-by: binary-husky <96192199+binary-husky@users.noreply.github.com> * fix zhipuai * check picture * remove glm-4 due to bug * 修改config * 检查MATHPIX_APPID * Remove unnecessary code and update function_plugins dictionary * capture non-standard token overflow * bug fix #1524 * change mermaid style * 支持mermaid 滚动放大缩小重置,鼠标滚动和拖拽 (#1530) * 支持mermaid 滚动放大缩小重置,鼠标滚动和拖拽 * 微调未果先stage一下 * update --------- Co-authored-by: binary-husky <qingxu.fu@outlook.com> Co-authored-by: binary-husky <96192199+binary-husky@users.noreply.github.com> * ver 3.72 * change live2d * save the status of ``clear btn` in cookie * 前端选择保持 * js ui bug fix * reset btn bug fix * update live2d tips * fix missing get_token_num method * fix live2d toggle switch * fix persistent custom btn with cookie * fix zhipuai feedback with core functionality * Refactor button update and clean up functions --------- Co-authored-by: XIao <46100050+Kilig947@users.noreply.github.com> Co-authored-by: Menghuan1918 <menghuan2003@outlook.com> Co-authored-by: hongyi-zhao <hongyi.zhao@gmail.com> Co-authored-by: Hao Ma <893017927@qq.com> Co-authored-by: zeyuan huang <599012428@qq.com>
2024-02-14 18:35:09 +08:00
parent e0c5859cf9
commit 2e9b4a5770
42 changed files with 1171 additions and 9635 deletions
--- a/crazy_functions/Latex输出PDF.py
+++ b/crazy_functions/Latex输出PDF.py
@@ -0,0 +1,484 @@
+from toolbox import update_ui, trimmed_format_exc, get_conf, get_log_folder, promote_file_to_downloadzone
+from toolbox import CatchException, report_exception, update_ui_lastest_msg, zip_result, gen_time_str
+from functools import partial
+import glob, os, requests, time, json, tarfile
+
+pj = os.path.join
+ARXIV_CACHE_DIR = os.path.expanduser(f"~/arxiv_cache/")
+
+
+# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- 工具函数 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
+# 专业词汇声明  = 'If the term "agent" is used in this section, it should be translated to "智能体". '
+def switch_prompt(pfg, mode, more_requirement):
+    """
+    Generate prompts and system prompts based on the mode for proofreading or translating.
+    Args:
+    - pfg: Proofreader or Translator instance.
+    - mode: A string specifying the mode, either 'proofread' or 'translate_zh'.
+
+    Returns:
+    - inputs_array: A list of strings containing prompts for users to respond to.
+    - sys_prompt_array: A list of strings containing prompts for system prompts.
+    """
+    n_split = len(pfg.sp_file_contents)
+    if mode == 'proofread_en':
+        inputs_array = [r"Below is a section from an academic paper, proofread this section." +
+                        r"Do not modify any latex command such as \section, \cite, \begin, \item and equations. " + more_requirement +
+                        r"Answer me only with the revised text:" +
+                        f"\n\n{frag}" for frag in pfg.sp_file_contents]
+        sys_prompt_array = ["You are a professional academic paper writer." for _ in range(n_split)]
+    elif mode == 'translate_zh':
+        inputs_array = [
+            r"Below is a section from an English academic paper, translate it into Chinese. " + more_requirement +
+            r"Do not modify any latex command such as \section, \cite, \begin, \item and equations. " +
+            r"Answer me only with the translated text:" +
+            f"\n\n{frag}" for frag in pfg.sp_file_contents]
+        sys_prompt_array = ["You are a professional translator." for _ in range(n_split)]
+    else:
+        assert False, "未知指令"
+    return inputs_array, sys_prompt_array
+
+
+def desend_to_extracted_folder_if_exist(project_folder):
+    """ 
+    Descend into the extracted folder if it exists, otherwise return the original folder.
+
+    Args:
+    - project_folder: A string specifying the folder path.
+
+    Returns:
+    - A string specifying the path to the extracted folder, or the original folder if there is no extracted folder.
+    """
+    maybe_dir = [f for f in glob.glob(f'{project_folder}/*') if os.path.isdir(f)]
+    if len(maybe_dir) == 0: return project_folder
+    if maybe_dir[0].endswith('.extract'): return maybe_dir[0]
+    return project_folder
+
+
+def move_project(project_folder, arxiv_id=None):
+    """ 
+    Create a new work folder and copy the project folder to it.
+
+    Args:
+    - project_folder: A string specifying the folder path of the project.
+
+    Returns:
+    - A string specifying the path to the new work folder.
+    """
+    import shutil, time
+    time.sleep(2)  # avoid time string conflict
+    if arxiv_id is not None:
+        new_workfolder = pj(ARXIV_CACHE_DIR, arxiv_id, 'workfolder')
+    else:
+        new_workfolder = f'{get_log_folder()}/{gen_time_str()}'
+    try:
+        shutil.rmtree(new_workfolder)
+    except:
+        pass
+
+    # align subfolder if there is a folder wrapper
+    items = glob.glob(pj(project_folder, '*'))
+    items = [item for item in items if os.path.basename(item) != '__MACOSX']
+    if len(glob.glob(pj(project_folder, '*.tex'))) == 0 and len(items) == 1:
+        if os.path.isdir(items[0]): project_folder = items[0]
+
+    shutil.copytree(src=project_folder, dst=new_workfolder)
+    return new_workfolder
+
+
+def arxiv_download(chatbot, history, txt, allow_cache=True):
+    def check_cached_translation_pdf(arxiv_id):
+        translation_dir = pj(ARXIV_CACHE_DIR, arxiv_id, 'translation')
+        if not os.path.exists(translation_dir):
+            os.makedirs(translation_dir)
+        target_file = pj(translation_dir, 'translate_zh.pdf')
+        if os.path.exists(target_file):
+            promote_file_to_downloadzone(target_file, rename_file=None, chatbot=chatbot)
+            target_file_compare = pj(translation_dir, 'comparison.pdf')
+            if os.path.exists(target_file_compare):
+                promote_file_to_downloadzone(target_file_compare, rename_file=None, chatbot=chatbot)
+            return target_file
+        return False
+
+    def is_float(s):
+        try:
+            float(s)
+            return True
+        except ValueError:
+            return False
+
+    if ('.' in txt) and ('/' not in txt) and is_float(txt):  # is arxiv ID
+        txt = 'https://arxiv.org/abs/' + txt.strip()
+    if ('.' in txt) and ('/' not in txt) and is_float(txt[:10]):  # is arxiv ID
+        txt = 'https://arxiv.org/abs/' + txt[:10]
+
+    if not txt.startswith('https://arxiv.org'): 
+        return txt, None    # 是本地文件，跳过下载
+    
+    # <-------------- inspect format ------------->
+    chatbot.append([f"检测到arxiv文档连接", '尝试下载 ...'])
+    yield from update_ui(chatbot=chatbot, history=history)
+    time.sleep(1)  # 刷新界面
+
+    url_ = txt  # https://arxiv.org/abs/1707.06690
+    if not txt.startswith('https://arxiv.org/abs/'):
+        msg = f"解析arxiv网址失败, 期望格式例如: https://arxiv.org/abs/1707.06690。实际得到格式: {url_}。"
+        yield from update_ui_lastest_msg(msg, chatbot=chatbot, history=history)  # 刷新界面
+        return msg, None
+    # <-------------- set format ------------->
+    arxiv_id = url_.split('/abs/')[-1]
+    if 'v' in arxiv_id: arxiv_id = arxiv_id[:10]
+    cached_translation_pdf = check_cached_translation_pdf(arxiv_id)
+    if cached_translation_pdf and allow_cache: return cached_translation_pdf, arxiv_id
+
+    url_tar = url_.replace('/abs/', '/e-print/')
+    translation_dir = pj(ARXIV_CACHE_DIR, arxiv_id, 'e-print')
+    extract_dst = pj(ARXIV_CACHE_DIR, arxiv_id, 'extract')
+    os.makedirs(translation_dir, exist_ok=True)
+
+    # <-------------- download arxiv source file ------------->
+    dst = pj(translation_dir, arxiv_id + '.tar')
+    if os.path.exists(dst):
+        yield from update_ui_lastest_msg("调用缓存", chatbot=chatbot, history=history)  # 刷新界面
+    else:
+        yield from update_ui_lastest_msg("开始下载", chatbot=chatbot, history=history)  # 刷新界面
+        proxies = get_conf('proxies')
+        r = requests.get(url_tar, proxies=proxies)
+        with open(dst, 'wb+') as f:
+            f.write(r.content)
+    # <-------------- extract file ------------->
+    yield from update_ui_lastest_msg("下载完成", chatbot=chatbot, history=history)  # 刷新界面
+    from toolbox import extract_archive
+    extract_archive(file_path=dst, dest_dir=extract_dst)
+    return extract_dst, arxiv_id
+
+
+def pdf2tex_project(pdf_file_path):
+    # Mathpix API credentials
+    app_id, app_key = get_conf('MATHPIX_APPID', 'MATHPIX_APPKEY')
+    headers = {"app_id": app_id, "app_key": app_key}
+
+    # Step 1: Send PDF file for processing
+    options = {
+        "conversion_formats": {"tex.zip": True},
+        "math_inline_delimiters": ["$", "$"],
+        "rm_spaces": True
+    }
+
+    response = requests.post(url="https://api.mathpix.com/v3/pdf",
+                             headers=headers,
+                             data={"options_json": json.dumps(options)},
+                             files={"file": open(pdf_file_path, "rb")})
+
+    if response.ok:
+        pdf_id = response.json()["pdf_id"]
+        print(f"PDF processing initiated. PDF ID: {pdf_id}")
+
+        # Step 2: Check processing status
+        while True:
+            conversion_response = requests.get(f"https://api.mathpix.com/v3/pdf/{pdf_id}", headers=headers)
+            conversion_data = conversion_response.json()
+
+            if conversion_data["status"] == "completed":
+                print("PDF processing completed.")
+                break
+            elif conversion_data["status"] == "error":
+                print("Error occurred during processing.")
+            else:
+                print(f"Processing status: {conversion_data['status']}")
+                time.sleep(5)  # wait for a few seconds before checking again
+
+        # Step 3: Save results to local files
+        output_dir = os.path.join(os.path.dirname(pdf_file_path), 'mathpix_output')
+        if not os.path.exists(output_dir):
+            os.makedirs(output_dir)
+
+        url = f"https://api.mathpix.com/v3/pdf/{pdf_id}.tex"
+        response = requests.get(url, headers=headers)
+        file_name_wo_dot = '_'.join(os.path.basename(pdf_file_path).split('.')[:-1])
+        output_name = f"{file_name_wo_dot}.tex.zip"
+        output_path = os.path.join(output_dir, output_name)
+        with open(output_path, "wb") as output_file:
+            output_file.write(response.content)
+        print(f"tex.zip file saved at: {output_path}")
+
+        import zipfile
+        unzip_dir = os.path.join(output_dir, file_name_wo_dot)
+        with zipfile.ZipFile(output_path, 'r') as zip_ref:
+            zip_ref.extractall(unzip_dir)
+
+        return unzip_dir
+
+    else:
+        print(f"Error sending PDF for processing. Status code: {response.status_code}")
+        return None
+
+
+# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= 插件主程序1 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=    
+
+
+@CatchException
+def Latex英文纠错加PDF对比(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):
+    # <-------------- information about this plugin ------------->
+    chatbot.append(["函数插件功能？",
+                    "对整个Latex项目进行纠错, 用latex编译为PDF对修正处做高亮。函数插件贡献者: Binary-Husky。注意事项: 目前仅支持GPT3.5/GPT4，其他模型转化效果未知。目前对机器学习类文献转化效果最好，其他类型文献转化效果未知。仅在Windows系统进行了测试，其他操作系统表现未知。"])
+    yield from update_ui(chatbot=chatbot, history=history)  # 刷新界面
+
+    # <-------------- more requirements ------------->
+    if ("advanced_arg" in plugin_kwargs) and (plugin_kwargs["advanced_arg"] == ""): plugin_kwargs.pop("advanced_arg")
+    more_req = plugin_kwargs.get("advanced_arg", "")
+    _switch_prompt_ = partial(switch_prompt, more_requirement=more_req)
+
+    # <-------------- check deps ------------->
+    try:
+        import glob, os, time, subprocess
+        subprocess.Popen(['pdflatex', '-version'])
+        from .latex_fns.latex_actions import Latex精细分解与转化, 编译Latex
+    except Exception as e:
+        chatbot.append([f"解析项目: {txt}",
+                        f"尝试执行Latex指令失败。Latex没有安装, 或者不在环境变量PATH中。安装方法https://tug.org/texlive/。报错信息\n\n```\n\n{trimmed_format_exc()}\n\n```\n\n"])
+        yield from update_ui(chatbot=chatbot, history=history)  # 刷新界面
+        return
+
+    # <-------------- clear history and read input ------------->
+    history = []
+    if os.path.exists(txt):
+        project_folder = txt
+    else:
+        if txt == "": txt = '空空如也的输入栏'
+        report_exception(chatbot, history, a=f"解析项目: {txt}", b=f"找不到本地项目或无权访问: {txt}")
+        yield from update_ui(chatbot=chatbot, history=history)  # 刷新界面
+        return
+    file_manifest = [f for f in glob.glob(f'{project_folder}/**/*.tex', recursive=True)]
+    if len(file_manifest) == 0:
+        report_exception(chatbot, history, a=f"解析项目: {txt}", b=f"找不到任何.tex文件: {txt}")
+        yield from update_ui(chatbot=chatbot, history=history)  # 刷新界面
+        return
+
+    # <-------------- if is a zip/tar file ------------->
+    project_folder = desend_to_extracted_folder_if_exist(project_folder)
+
+    # <-------------- move latex project away from temp folder ------------->
+    project_folder = move_project(project_folder, arxiv_id=None)
+
+    # <-------------- if merge_translate_zh is already generated, skip gpt req ------------->
+    if not os.path.exists(project_folder + '/merge_proofread_en.tex'):
+        yield from Latex精细分解与转化(file_manifest, project_folder, llm_kwargs, plugin_kwargs,
+                                       chatbot, history, system_prompt, mode='proofread_en',
+                                       switch_prompt=_switch_prompt_)
+
+    # <-------------- compile PDF ------------->
+    success = yield from 编译Latex(chatbot, history, main_file_original='merge',
+                                   main_file_modified='merge_proofread_en',
+                                   work_folder_original=project_folder, work_folder_modified=project_folder,
+                                   work_folder=project_folder)
+
+    # <-------------- zip PDF ------------->
+    zip_res = zip_result(project_folder)
+    if success:
+        chatbot.append((f"成功啦", '请查收结果（压缩包）...'))
+        yield from update_ui(chatbot=chatbot, history=history);
+        time.sleep(1)  # 刷新界面
+        promote_file_to_downloadzone(file=zip_res, chatbot=chatbot)
+    else:
+        chatbot.append((f"失败了",
+                        '虽然PDF生成失败了, 但请查收结果（压缩包）, 内含已经翻译的Tex文档, 也是可读的, 您可以到Github Issue区, 用该压缩包+对话历史存档进行反馈 ...'))
+        yield from update_ui(chatbot=chatbot, history=history);
+        time.sleep(1)  # 刷新界面
+        promote_file_to_downloadzone(file=zip_res, chatbot=chatbot)
+
+    # <-------------- we are done ------------->
+    return success
+
+
+# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= 插件主程序2 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=    
+
+@CatchException
+def Latex翻译中文并重新编译PDF(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):
+    # <-------------- information about this plugin ------------->
+    chatbot.append([
+        "函数插件功能？",
+        "对整个Latex项目进行翻译, 生成中文PDF。函数插件贡献者: Binary-Husky。注意事项: 此插件Windows支持最佳，Linux下必须使用Docker安装，详见项目主README.md。目前仅支持GPT3.5/GPT4，其他模型转化效果未知。目前对机器学习类文献转化效果最好，其他类型文献转化效果未知。"])
+    yield from update_ui(chatbot=chatbot, history=history)  # 刷新界面
+
+    # <-------------- more requirements ------------->
+    if ("advanced_arg" in plugin_kwargs) and (plugin_kwargs["advanced_arg"] == ""): plugin_kwargs.pop("advanced_arg")
+    more_req = plugin_kwargs.get("advanced_arg", "")
+    no_cache = more_req.startswith("--no-cache")
+    if no_cache: more_req.lstrip("--no-cache")
+    allow_cache = not no_cache
+    _switch_prompt_ = partial(switch_prompt, more_requirement=more_req)
+
+    # <-------------- check deps ------------->
+    try:
+        import glob, os, time, subprocess
+        subprocess.Popen(['pdflatex', '-version'])
+        from .latex_fns.latex_actions import Latex精细分解与转化, 编译Latex
+    except Exception as e:
+        chatbot.append([f"解析项目: {txt}",
+                        f"尝试执行Latex指令失败。Latex没有安装, 或者不在环境变量PATH中。安装方法https://tug.org/texlive/。报错信息\n\n```\n\n{trimmed_format_exc()}\n\n```\n\n"])
+        yield from update_ui(chatbot=chatbot, history=history)  # 刷新界面
+        return
+
+    # <-------------- clear history and read input ------------->
+    history = []
+    try:
+        txt, arxiv_id = yield from arxiv_download(chatbot, history, txt, allow_cache)
+    except tarfile.ReadError as e:
+        yield from update_ui_lastest_msg(
+            "无法自动下载该论文的Latex源码，请前往arxiv打开此论文下载页面，点other Formats，然后download source手动下载latex源码包。接下来调用本地Latex翻译插件即可。", 
+            chatbot=chatbot, history=history)
+        return
+
+    if txt.endswith('.pdf'):
+        report_exception(chatbot, history, a=f"解析项目: {txt}", b=f"发现已经存在翻译好的PDF文档")
+        yield from update_ui(chatbot=chatbot, history=history)  # 刷新界面
+        return
+
+    if os.path.exists(txt):
+        project_folder = txt
+    else:
+        if txt == "": txt = '空空如也的输入栏'
+        report_exception(chatbot, history, a=f"解析项目: {txt}", b=f"找不到本地项目或无法处理: {txt}")
+        yield from update_ui(chatbot=chatbot, history=history)  # 刷新界面
+        return
+
+    file_manifest = [f for f in glob.glob(f'{project_folder}/**/*.tex', recursive=True)]
+    if len(file_manifest) == 0:
+        report_exception(chatbot, history, a=f"解析项目: {txt}", b=f"找不到任何.tex文件: {txt}")
+        yield from update_ui(chatbot=chatbot, history=history)  # 刷新界面
+        return
+
+    # <-------------- if is a zip/tar file ------------->
+    project_folder = desend_to_extracted_folder_if_exist(project_folder)
+
+    # <-------------- move latex project away from temp folder ------------->
+    project_folder = move_project(project_folder, arxiv_id)
+
+    # <-------------- if merge_translate_zh is already generated, skip gpt req ------------->
+    if not os.path.exists(project_folder + '/merge_translate_zh.tex'):
+        yield from Latex精细分解与转化(file_manifest, project_folder, llm_kwargs, plugin_kwargs,
+                                       chatbot, history, system_prompt, mode='translate_zh',
+                                       switch_prompt=_switch_prompt_)
+
+    # <-------------- compile PDF ------------->
+    success = yield from 编译Latex(chatbot, history, main_file_original='merge',
+                                   main_file_modified='merge_translate_zh', mode='translate_zh',
+                                   work_folder_original=project_folder, work_folder_modified=project_folder,
+                                   work_folder=project_folder)
+
+    # <-------------- zip PDF ------------->
+    zip_res = zip_result(project_folder)
+    if success:
+        chatbot.append((f"成功啦", '请查收结果（压缩包）...'))
+        yield from update_ui(chatbot=chatbot, history=history);
+        time.sleep(1)  # 刷新界面
+        promote_file_to_downloadzone(file=zip_res, chatbot=chatbot)
+    else:
+        chatbot.append((f"失败了",
+                        '虽然PDF生成失败了, 但请查收结果（压缩包）, 内含已经翻译的Tex文档, 您可以到Github Issue区, 用该压缩包进行反馈。如系统是Linux，请检查系统字体（见Github wiki） ...'))
+        yield from update_ui(chatbot=chatbot, history=history);
+        time.sleep(1)  # 刷新界面
+        promote_file_to_downloadzone(file=zip_res, chatbot=chatbot)
+
+    # <-------------- we are done ------------->
+    return success
+
+
+#  =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- 插件主程序3  =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=   
+
+@CatchException
+def PDF翻译中文并重新编译PDF(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
+    # <-------------- information about this plugin ------------->
+    chatbot.append([
+        "函数插件功能？",
+        "将PDF转换为Latex项目，翻译为中文后重新编译为PDF。函数插件贡献者: Marroh。注意事项: 此插件Windows支持最佳，Linux下必须使用Docker安装，详见项目主README.md。目前仅支持GPT3.5/GPT4，其他模型转化效果未知。目前对机器学习类文献转化效果最好，其他类型文献转化效果未知。"])
+    yield from update_ui(chatbot=chatbot, history=history)  # 刷新界面
+
+    # <-------------- more requirements ------------->
+    if ("advanced_arg" in plugin_kwargs) and (plugin_kwargs["advanced_arg"] == ""): plugin_kwargs.pop("advanced_arg")
+    more_req = plugin_kwargs.get("advanced_arg", "")
+    no_cache = more_req.startswith("--no-cache")
+    if no_cache: more_req.lstrip("--no-cache")
+    allow_cache = not no_cache
+    _switch_prompt_ = partial(switch_prompt, more_requirement=more_req)
+
+    # <-------------- check deps ------------->
+    try:
+        import glob, os, time, subprocess
+        subprocess.Popen(['pdflatex', '-version'])
+        from .latex_fns.latex_actions import Latex精细分解与转化, 编译Latex
+    except Exception as e:
+        chatbot.append([f"解析项目: {txt}",
+                        f"尝试执行Latex指令失败。Latex没有安装, 或者不在环境变量PATH中。安装方法https://tug.org/texlive/。报错信息\n\n```\n\n{trimmed_format_exc()}\n\n```\n\n"])
+        yield from update_ui(chatbot=chatbot, history=history)  # 刷新界面
+        return
+
+    # <-------------- clear history and read input ------------->
+    if os.path.exists(txt):
+        project_folder = txt
+    else:
+        if txt == "": txt = '空空如也的输入栏'
+        report_exception(chatbot, history, a=f"解析项目: {txt}", b=f"找不到本地项目或无法处理: {txt}")
+        yield from update_ui(chatbot=chatbot, history=history)  # 刷新界面
+        return
+
+    file_manifest = [f for f in glob.glob(f'{project_folder}/**/*.pdf', recursive=True)]
+    if len(file_manifest) == 0:
+        report_exception(chatbot, history, a=f"解析项目: {txt}", b=f"找不到任何.pdf文件: {txt}")
+        yield from update_ui(chatbot=chatbot, history=history)  # 刷新界面
+        return
+    if len(file_manifest) != 1:
+        report_exception(chatbot, history, a=f"解析项目: {txt}", b=f"不支持同时处理多个pdf文件: {txt}")
+        yield from update_ui(chatbot=chatbot, history=history)  # 刷新界面
+        return
+    app_id, app_key = get_conf('MATHPIX_APPID', 'MATHPIX_APPKEY')
+    if len(app_id) == 0 or len(app_key) == 0:
+        report_exception(chatbot, history, a=f"请配置 MATHPIX_APPID 和 MATHPIX_APPKEY")
+        yield from update_ui(chatbot=chatbot, history=history)  # 刷新界面
+        return
+
+    # <-------------- convert pdf into tex ------------->
+    project_folder = pdf2tex_project(file_manifest[0])
+
+    # Translate English Latex to Chinese Latex, and compile it
+    file_manifest = [f for f in glob.glob(f'{project_folder}/**/*.tex', recursive=True)]
+    if len(file_manifest) == 0:
+        report_exception(chatbot, history, a=f"解析项目: {txt}", b=f"找不到任何.tex文件: {txt}")
+        yield from update_ui(chatbot=chatbot, history=history)  # 刷新界面
+        return
+
+    # <-------------- if is a zip/tar file ------------->
+    project_folder = desend_to_extracted_folder_if_exist(project_folder)
+
+    # <-------------- move latex project away from temp folder ------------->
+    project_folder = move_project(project_folder)
+
+    # <-------------- if merge_translate_zh is already generated, skip gpt req ------------->
+    if not os.path.exists(project_folder + '/merge_translate_zh.tex'):
+        yield from Latex精细分解与转化(file_manifest, project_folder, llm_kwargs, plugin_kwargs,
+                                       chatbot, history, system_prompt, mode='translate_zh',
+                                       switch_prompt=_switch_prompt_)
+
+    # <-------------- compile PDF ------------->
+    success = yield from 编译Latex(chatbot, history, main_file_original='merge',
+                                   main_file_modified='merge_translate_zh', mode='translate_zh',
+                                   work_folder_original=project_folder, work_folder_modified=project_folder,
+                                   work_folder=project_folder)
+
+    # <-------------- zip PDF ------------->
+    zip_res = zip_result(project_folder)
+    if success:
+        chatbot.append((f"成功啦", '请查收结果（压缩包）...'))
+        yield from update_ui(chatbot=chatbot, history=history);
+        time.sleep(1)  # 刷新界面
+        promote_file_to_downloadzone(file=zip_res, chatbot=chatbot)
+    else:
+        chatbot.append((f"失败了",
+                        '虽然PDF生成失败了, 但请查收结果（压缩包）, 内含已经翻译的Tex文档, 您可以到Github Issue区, 用该压缩包进行反馈。如系统是Linux，请检查系统字体（见Github wiki） ...'))
+        yield from update_ui(chatbot=chatbot, history=history);
+        time.sleep(1)  # 刷新界面
+        promote_file_to_downloadzone(file=zip_res, chatbot=chatbot)
+
+    # <-------------- we are done ------------->
+    return success
--- a/crazy_functions/Latex输出PDF结果.py
+++ b/crazy_functions/Latex输出PDF结果.py
@@ -1,313 +0,0 @@
-from toolbox import update_ui, trimmed_format_exc, get_conf, get_log_folder, promote_file_to_downloadzone
-from toolbox import CatchException, report_exception, update_ui_lastest_msg, zip_result, gen_time_str
-from functools import partial
-import glob, os, requests, time, tarfile
-pj = os.path.join
-ARXIV_CACHE_DIR = os.path.expanduser(f"~/arxiv_cache/")
-
-# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- 工具函数 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
-# 专业词汇声明  = 'If the term "agent" is used in this section, it should be translated to "智能体". '
-def switch_prompt(pfg, mode, more_requirement):
-    """
-    Generate prompts and system prompts based on the mode for proofreading or translating.
-    Args:
-    - pfg: Proofreader or Translator instance.
-    - mode: A string specifying the mode, either 'proofread' or 'translate_zh'.
-
-    Returns:
-    - inputs_array: A list of strings containing prompts for users to respond to.
-    - sys_prompt_array: A list of strings containing prompts for system prompts.
-    """
-    n_split = len(pfg.sp_file_contents)
-    if mode == 'proofread_en':
-        inputs_array = [r"Below is a section from an academic paper, proofread this section." + 
-                        r"Do not modify any latex command such as \section, \cite, \begin, \item and equations. " + more_requirement +
-                        r"Answer me only with the revised text:" + 
-                        f"\n\n{frag}" for frag in pfg.sp_file_contents]
-        sys_prompt_array = ["You are a professional academic paper writer." for _ in range(n_split)]
-    elif mode == 'translate_zh':
-        inputs_array = [r"Below is a section from an English academic paper, translate it into Chinese. " + more_requirement + 
-                        r"Do not modify any latex command such as \section, \cite, \begin, \item and equations. " + 
-                        r"Answer me only with the translated text:" + 
-                        f"\n\n{frag}" for frag in pfg.sp_file_contents]
-        sys_prompt_array = ["You are a professional translator." for _ in range(n_split)]
-    else:
-        assert False, "未知指令"
-    return inputs_array, sys_prompt_array
-
-def desend_to_extracted_folder_if_exist(project_folder):
-    """ 
-    Descend into the extracted folder if it exists, otherwise return the original folder.
-
-    Args:
-    - project_folder: A string specifying the folder path.
-
-    Returns:
-    - A string specifying the path to the extracted folder, or the original folder if there is no extracted folder.
-    """
-    maybe_dir = [f for f in glob.glob(f'{project_folder}/*') if os.path.isdir(f)]
-    if len(maybe_dir) == 0: return project_folder
-    if maybe_dir[0].endswith('.extract'): return maybe_dir[0]
-    return project_folder
-
-def move_project(project_folder, arxiv_id=None):
-    """ 
-    Create a new work folder and copy the project folder to it.
-
-    Args:
-    - project_folder: A string specifying the folder path of the project.
-
-    Returns:
-    - A string specifying the path to the new work folder.
-    """
-    import shutil, time
-    time.sleep(2)   # avoid time string conflict
-    if arxiv_id is not None:
-        new_workfolder = pj(ARXIV_CACHE_DIR, arxiv_id, 'workfolder')
-    else:
-        new_workfolder = f'{get_log_folder()}/{gen_time_str()}'
-    try:
-        shutil.rmtree(new_workfolder)
-    except:
-        pass
-
-    # align subfolder if there is a folder wrapper
-    items = glob.glob(pj(project_folder,'*'))
-    items = [item for item in items if os.path.basename(item)!='__MACOSX']
-    if len(glob.glob(pj(project_folder,'*.tex'))) == 0 and len(items) == 1:
-        if os.path.isdir(items[0]): project_folder = items[0]
-
-    shutil.copytree(src=project_folder, dst=new_workfolder)
-    return new_workfolder
-
-def arxiv_download(chatbot, history, txt, allow_cache=True):
-    def check_cached_translation_pdf(arxiv_id):
-        translation_dir = pj(ARXIV_CACHE_DIR, arxiv_id, 'translation')
-        if not os.path.exists(translation_dir):
-            os.makedirs(translation_dir)
-        target_file = pj(translation_dir, 'translate_zh.pdf')
-        if os.path.exists(target_file):
-            promote_file_to_downloadzone(target_file, rename_file=None, chatbot=chatbot)
-            target_file_compare = pj(translation_dir, 'comparison.pdf')
-            if os.path.exists(target_file_compare):
-                promote_file_to_downloadzone(target_file_compare, rename_file=None, chatbot=chatbot)
-            return target_file
-        return False
-    def is_float(s):
-        try:
-            float(s)
-            return True
-        except ValueError:
-            return False
-    if ('.' in txt) and ('/' not in txt) and is_float(txt): # is arxiv ID
-        txt = 'https://arxiv.org/abs/' + txt.strip()
-    if ('.' in txt) and ('/' not in txt) and is_float(txt[:10]): # is arxiv ID
-        txt = 'https://arxiv.org/abs/' + txt[:10]
-    if not txt.startswith('https://arxiv.org'): 
-        return txt, None    # 是本地文件，跳过下载
-    
-    # <-------------- inspect format ------------->
-    chatbot.append([f"检测到arxiv文档连接", '尝试下载 ...']) 
-    yield from update_ui(chatbot=chatbot, history=history)
-    time.sleep(1) # 刷新界面
-
-    url_ = txt   # https://arxiv.org/abs/1707.06690
-    if not txt.startswith('https://arxiv.org/abs/'): 
-        msg = f"解析arxiv网址失败, 期望格式例如: https://arxiv.org/abs/1707.06690。实际得到格式: {url_}。"
-        yield from update_ui_lastest_msg(msg, chatbot=chatbot, history=history) # 刷新界面
-        return msg, None
-    # <-------------- set format ------------->
-    arxiv_id = url_.split('/abs/')[-1]
-    if 'v' in arxiv_id: arxiv_id = arxiv_id[:10]
-    cached_translation_pdf = check_cached_translation_pdf(arxiv_id)
-    if cached_translation_pdf and allow_cache: return cached_translation_pdf, arxiv_id
-
-    url_tar = url_.replace('/abs/', '/e-print/')
-    translation_dir = pj(ARXIV_CACHE_DIR, arxiv_id, 'e-print')
-    extract_dst = pj(ARXIV_CACHE_DIR, arxiv_id, 'extract')
-    os.makedirs(translation_dir, exist_ok=True)
-    
-    # <-------------- download arxiv source file ------------->
-    dst = pj(translation_dir, arxiv_id+'.tar')
-    if os.path.exists(dst):
-        yield from update_ui_lastest_msg("调用缓存", chatbot=chatbot, history=history)  # 刷新界面
-    else:
-        yield from update_ui_lastest_msg("开始下载", chatbot=chatbot, history=history)  # 刷新界面
-        proxies = get_conf('proxies')
-        r = requests.get(url_tar, proxies=proxies)
-        with open(dst, 'wb+') as f:
-            f.write(r.content)
-    # <-------------- extract file ------------->
-    yield from update_ui_lastest_msg("下载完成", chatbot=chatbot, history=history)  # 刷新界面
-    from toolbox import extract_archive
-    extract_archive(file_path=dst, dest_dir=extract_dst)
-    return extract_dst, arxiv_id
-# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= 插件主程序1 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=    
-
-
-@CatchException
-def Latex英文纠错加PDF对比(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):
-    # <-------------- information about this plugin ------------->
-    chatbot.append([ "函数插件功能？",
-        "对整个Latex项目进行纠错, 用latex编译为PDF对修正处做高亮。函数插件贡献者: Binary-Husky。注意事项: 目前仅支持GPT3.5/GPT4，其他模型转化效果未知。目前对机器学习类文献转化效果最好，其他类型文献转化效果未知。仅在Windows系统进行了测试，其他操作系统表现未知。"])
-    yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
-    
-    # <-------------- more requirements ------------->
-    if ("advanced_arg" in plugin_kwargs) and (plugin_kwargs["advanced_arg"] == ""): plugin_kwargs.pop("advanced_arg")
-    more_req = plugin_kwargs.get("advanced_arg", "")
-    _switch_prompt_ = partial(switch_prompt, more_requirement=more_req)
-
-    # <-------------- check deps ------------->
-    try:
-        import glob, os, time, subprocess
-        subprocess.Popen(['pdflatex', '-version'])
-        from .latex_fns.latex_actions import Latex精细分解与转化, 编译Latex
-    except Exception as e:
-        chatbot.append([ f"解析项目: {txt}",
-            f"尝试执行Latex指令失败。Latex没有安装, 或者不在环境变量PATH中。安装方法https://tug.org/texlive/。报错信息\n\n```\n\n{trimmed_format_exc()}\n\n```\n\n"])
-        yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
-        return
-    
-
-    # <-------------- clear history and read input ------------->
-    history = []
-    if os.path.exists(txt):
-        project_folder = txt
-    else:
-        if txt == "": txt = '空空如也的输入栏'
-        report_exception(chatbot, history, a = f"解析项目: {txt}", b = f"找不到本地项目或无权访问: {txt}")
-        yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
-        return
-    file_manifest = [f for f in glob.glob(f'{project_folder}/**/*.tex', recursive=True)]
-    if len(file_manifest) == 0:
-        report_exception(chatbot, history, a = f"解析项目: {txt}", b = f"找不到任何.tex文件: {txt}")
-        yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
-        return
-    
-
-    # <-------------- if is a zip/tar file ------------->
-    project_folder = desend_to_extracted_folder_if_exist(project_folder)
-
-
-    # <-------------- move latex project away from temp folder ------------->
-    project_folder = move_project(project_folder, arxiv_id=None)
-
-
-    # <-------------- if merge_translate_zh is already generated, skip gpt req ------------->
-    if not os.path.exists(project_folder + '/merge_proofread_en.tex'):
-        yield from Latex精细分解与转化(file_manifest, project_folder, llm_kwargs, plugin_kwargs, 
-                                chatbot, history, system_prompt, mode='proofread_en', switch_prompt=_switch_prompt_)
-
-
-    # <-------------- compile PDF ------------->
-    success = yield from 编译Latex(chatbot, history, main_file_original='merge', main_file_modified='merge_proofread_en', 
-                             work_folder_original=project_folder, work_folder_modified=project_folder, work_folder=project_folder)
-    
-
-    # <-------------- zip PDF ------------->
-    zip_res = zip_result(project_folder)
-    if success:
-        chatbot.append((f"成功啦", '请查收结果（压缩包）...'))
-        yield from update_ui(chatbot=chatbot, history=history); time.sleep(1) # 刷新界面
-        promote_file_to_downloadzone(file=zip_res, chatbot=chatbot)
-    else:
-        chatbot.append((f"失败了", '虽然PDF生成失败了, 但请查收结果（压缩包）, 内含已经翻译的Tex文档, 也是可读的, 您可以到Github Issue区, 用该压缩包+对话历史存档进行反馈 ...'))
-        yield from update_ui(chatbot=chatbot, history=history); time.sleep(1) # 刷新界面
-        promote_file_to_downloadzone(file=zip_res, chatbot=chatbot)
-
-    # <-------------- we are done ------------->
-    return success
-
-# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= 插件主程序2 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=    
-
-@CatchException
-def Latex翻译中文并重新编译PDF(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):
-    # <-------------- information about this plugin ------------->
-    chatbot.append([
-        "函数插件功能？",
-        "对整个Latex项目进行翻译, 生成中文PDF。函数插件贡献者: Binary-Husky。注意事项: 此插件Windows支持最佳，Linux下必须使用Docker安装，详见项目主README.md。目前仅支持GPT3.5/GPT4，其他模型转化效果未知。目前对机器学习类文献转化效果最好，其他类型文献转化效果未知。"])
-    yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
-
-    # <-------------- more requirements ------------->
-    if ("advanced_arg" in plugin_kwargs) and (plugin_kwargs["advanced_arg"] == ""): plugin_kwargs.pop("advanced_arg")
-    more_req = plugin_kwargs.get("advanced_arg", "")
-    no_cache = more_req.startswith("--no-cache")
-    if no_cache: more_req.lstrip("--no-cache")
-    allow_cache = not no_cache
-    _switch_prompt_ = partial(switch_prompt, more_requirement=more_req)
-
-    # <-------------- check deps ------------->
-    try:
-        import glob, os, time, subprocess
-        subprocess.Popen(['pdflatex', '-version'])
-        from .latex_fns.latex_actions import Latex精细分解与转化, 编译Latex
-    except Exception as e:
-        chatbot.append([ f"解析项目: {txt}",
-            f"尝试执行Latex指令失败。Latex没有安装, 或者不在环境变量PATH中。安装方法https://tug.org/texlive/。报错信息\n\n```\n\n{trimmed_format_exc()}\n\n```\n\n"])
-        yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
-        return
-    
-
-    # <-------------- clear history and read input ------------->
-    history = []
-    try:
-        txt, arxiv_id = yield from arxiv_download(chatbot, history, txt, allow_cache)
-    except tarfile.ReadError as e:
-        yield from update_ui_lastest_msg(
-            "无法自动下载该论文的Latex源码，请前往arxiv打开此论文下载页面，点other Formats，然后download source手动下载latex源码包。接下来调用本地Latex翻译插件即可。", 
-            chatbot=chatbot, history=history)
-        return
-
-    if txt.endswith('.pdf'):
-        report_exception(chatbot, history, a = f"解析项目: {txt}", b = f"发现已经存在翻译好的PDF文档")
-        yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
-        return
-    
-
-    if os.path.exists(txt):
-        project_folder = txt
-    else:
-        if txt == "": txt = '空空如也的输入栏'
-        report_exception(chatbot, history, a = f"解析项目: {txt}", b = f"找不到本地项目或无法处理: {txt}")
-        yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
-        return
-    
-    file_manifest = [f for f in glob.glob(f'{project_folder}/**/*.tex', recursive=True)]
-    if len(file_manifest) == 0:
-        report_exception(chatbot, history, a = f"解析项目: {txt}", b = f"找不到任何.tex文件: {txt}")
-        yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
-        return
-    
-
-    # <-------------- if is a zip/tar file ------------->
-    project_folder = desend_to_extracted_folder_if_exist(project_folder)
-
-
-    # <-------------- move latex project away from temp folder ------------->
-    project_folder = move_project(project_folder, arxiv_id)
-
-
-    # <-------------- if merge_translate_zh is already generated, skip gpt req ------------->
-    if not os.path.exists(project_folder + '/merge_translate_zh.tex'):
-        yield from Latex精细分解与转化(file_manifest, project_folder, llm_kwargs, plugin_kwargs, 
-                                chatbot, history, system_prompt, mode='translate_zh', switch_prompt=_switch_prompt_)
-
-
-    # <-------------- compile PDF ------------->
-    success = yield from 编译Latex(chatbot, history, main_file_original='merge', main_file_modified='merge_translate_zh', mode='translate_zh', 
-                             work_folder_original=project_folder, work_folder_modified=project_folder, work_folder=project_folder)
-
-    # <-------------- zip PDF ------------->
-    zip_res = zip_result(project_folder)
-    if success:
-        chatbot.append((f"成功啦", '请查收结果（压缩包）...'))
-        yield from update_ui(chatbot=chatbot, history=history); time.sleep(1) # 刷新界面
-        promote_file_to_downloadzone(file=zip_res, chatbot=chatbot)
-    else:
-        chatbot.append((f"失败了", '虽然PDF生成失败了, 但请查收结果（压缩包）, 内含已经翻译的Tex文档, 您可以到Github Issue区, 用该压缩包进行反馈。如系统是Linux，请检查系统字体（见Github wiki） ...'))
-        yield from update_ui(chatbot=chatbot, history=history); time.sleep(1) # 刷新界面
-        promote_file_to_downloadzone(file=zip_res, chatbot=chatbot)
-
-
-    # <-------------- we are done ------------->
-    return success
--- a/crazy_functions/pdf_fns/parse_word.py
+++ b/crazy_functions/pdf_fns/parse_word.py
@@ -0,0 +1,85 @@
+from crazy_functions.crazy_utils import read_and_clean_pdf_text, get_files_from_everything
+import os
+import re
+def extract_text_from_files(txt, chatbot, history):
+    """
+    查找pdf/md/word并获取文本内容并返回状态以及文本
+
+    输入参数 Args:
+        chatbot: chatbot inputs and outputs （用户界面对话窗口句柄，用于数据流可视化）
+        history (list): List of chat history （历史，对话历史列表）
+
+    输出 Returns:
+        文件是否存在(bool)
+        final_result(list):文本内容
+        page_one(list):第一页内容/摘要
+        file_manifest(list):文件路径
+        excption(string):需要用户手动处理的信息,如没出错则保持为空
+    """
+
+    final_result = []
+    page_one = []
+    file_manifest = []
+    excption = ""
+
+    if txt == "": 
+        final_result.append(txt)
+        return False, final_result, page_one, file_manifest, excption   #如输入区内容不是文件则直接返回输入区内容
+    
+    #查找输入区内容中的文件
+    file_pdf,pdf_manifest,folder_pdf = get_files_from_everything(txt, '.pdf')
+    file_md,md_manifest,folder_md = get_files_from_everything(txt, '.md')
+    file_word,word_manifest,folder_word = get_files_from_everything(txt, '.docx')
+    file_doc,doc_manifest,folder_doc = get_files_from_everything(txt, '.doc')
+
+    if file_doc:
+        excption = "word"
+        return False, final_result, page_one, file_manifest, excption
+    
+    file_num = len(pdf_manifest) + len(md_manifest) + len(word_manifest)
+    if file_num == 0:
+        final_result.append(txt)
+        return False, final_result, page_one, file_manifest, excption   #如输入区内容不是文件则直接返回输入区内容
+    
+    if file_pdf:
+        try:    # 尝试导入依赖，如果缺少依赖，则给出安装建议
+            import fitz
+        except:
+            excption = "pdf"
+            return False, final_result, page_one, file_manifest, excption
+        for index, fp in enumerate(pdf_manifest):
+            file_content, pdf_one = read_and_clean_pdf_text(fp) # （尝试）按照章节切割PDF
+            file_content = file_content.encode('utf-8', 'ignore').decode()   # avoid reading non-utf8 chars
+            pdf_one = str(pdf_one).encode('utf-8', 'ignore').decode()  # avoid reading non-utf8 chars
+            final_result.append(file_content)
+            page_one.append(pdf_one)
+            file_manifest.append(os.path.relpath(fp, folder_pdf))
+
+    if file_md:
+        for index, fp in enumerate(md_manifest):
+            with open(fp, 'r', encoding='utf-8', errors='replace') as f:
+                file_content = f.read()
+            file_content = file_content.encode('utf-8', 'ignore').decode()
+            headers = re.findall(r'^#\s(.*)$', file_content, re.MULTILINE)  #接下来提取md中的一级/二级标题作为摘要
+            if len(headers) > 0: 
+                page_one.append("\n".join(headers)) #合并所有的标题,以换行符分割
+            else:
+                page_one.append("")
+            final_result.append(file_content)
+            file_manifest.append(os.path.relpath(fp, folder_md))
+
+    if file_word:
+        try:    # 尝试导入依赖，如果缺少依赖，则给出安装建议
+            from docx import Document
+        except:
+            excption = "word_pip"
+            return False, final_result, page_one, file_manifest, excption
+        for index, fp in enumerate(word_manifest):
+            doc = Document(fp)
+            file_content = '\n'.join([p.text for p in doc.paragraphs])
+            file_content = file_content.encode('utf-8', 'ignore').decode()
+            page_one.append(file_content[:200])
+            final_result.append(file_content)
+            file_manifest.append(os.path.relpath(fp, folder_word))
+            
+    return True, final_result, page_one, file_manifest, excption
--- a/crazy_functions/生成多种Mermaid图表.py
+++ b/crazy_functions/生成多种Mermaid图表.py
@@ -1,6 +1,5 @@
 from toolbox import CatchException, update_ui, report_exception
 from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
-from .crazy_utils import read_and_clean_pdf_text
 import datetime

 #以下是每类图表的PROMPT
@@ -162,7 +161,7 @@ mindmap
 ```
 """

-def 解析历史输入(history,llm_kwargs,chatbot,plugin_kwargs):
+def 解析历史输入(history,llm_kwargs,file_manifest,chatbot,plugin_kwargs):
    ############################## <第 0 步，切割输入> ##################################
    # 借用PDF切割中的函数对文本进行切割
    TOKEN_LIMIT_PER_FRAGMENT = 2500
@@ -170,8 +169,6 @@ def 解析历史输入(history,llm_kwargs,chatbot,plugin_kwargs):
    from crazy_functions.pdf_fns.breakdown_txt import breakdown_text_to_satisfy_token_limit
    txt = breakdown_text_to_satisfy_token_limit(txt=txt, limit=TOKEN_LIMIT_PER_FRAGMENT, llm_model=llm_kwargs['llm_model'])
    ############################## <第 1 步，迭代地历遍整个文章，提取精炼信息> ##################################
-    i_say_show_user = f'首先你从历史记录或文件中提取摘要。'; gpt_say = "[Local Message] 收到。"   # 用户提示
-    chatbot.append([i_say_show_user, gpt_say]); yield from update_ui(chatbot=chatbot, history=history)    # 更新UI
    results = []
    MAX_WORD_TOTAL = 4096
    n_txt = len(txt)
@@ -179,7 +176,7 @@ def 解析历史输入(history,llm_kwargs,chatbot,plugin_kwargs):
    if n_txt >= 20: print('文章极长，不能达到预期效果')
    for i in range(n_txt):
        NUM_OF_WORD = MAX_WORD_TOTAL // n_txt
-        i_say = f"Read this section, recapitulate the content of this section with less than {NUM_OF_WORD} words: {txt[i]}"
+        i_say = f"Read this section, recapitulate the content of this section with less than {NUM_OF_WORD} words in Chinese: {txt[i]}"
        i_say_show_user = f"[{i+1}/{n_txt}] Read this section, recapitulate the content of this section with less than {NUM_OF_WORD} words: {txt[i][:200]} ...."
        gpt_say = yield from request_gpt_model_in_new_thread_with_ui_alive(i_say, i_say_show_user,  # i_say=真正给chatgpt的提问， i_say_show_user=给用户看的提问
                                                                           llm_kwargs, chatbot, 
@@ -232,34 +229,10 @@ def 解析历史输入(history,llm_kwargs,chatbot,plugin_kwargs):
        inputs=i_say,
        inputs_show_user=i_say_show_user,
        llm_kwargs=llm_kwargs, chatbot=chatbot, history=[], 
-        sys_prompt="你精通使用mermaid语法来绘制图表,首先确保语法正确,其次避免在mermaid语法中使用不允许的字符,此外也应当分考虑图表的可读性。"
+        sys_prompt=""
    )
    history.append(gpt_say)
    yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 # 界面更新
-
-def 输入区文件处理(txt):
-    if txt == "": return False, txt
-    success = True
-    import glob
-    from .crazy_utils import get_files_from_everything
-    file_pdf,pdf_manifest,folder_pdf = get_files_from_everything(txt, '.pdf')
-    file_md,md_manifest,folder_md = get_files_from_everything(txt, '.md')
-    if len(pdf_manifest) == 0 and len(md_manifest) == 0:
-        return False, txt   #如输入区内容不是文件则直接返回输入区内容
-    
-    final_result = ""
-    if file_pdf:
-        for index, fp in enumerate(pdf_manifest):
-            file_content, page_one = read_and_clean_pdf_text(fp) # （尝试）按照章节切割PDF
-            file_content = file_content.encode('utf-8', 'ignore').decode()   # avoid reading non-utf8 chars
-            final_result += "\n" + file_content
-    if file_md:
-        for index, fp in enumerate(md_manifest):
-            with open(fp, 'r', encoding='utf-8', errors='replace') as f:
-                file_content = f.read()
-            file_content = file_content.encode('utf-8', 'ignore').decode()
-            final_result += "\n" + file_content
-    return True, final_result
    
@CatchException
 def 生成多种Mermaid图表(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
@@ -277,26 +250,47 @@ def 生成多种Mermaid图表(txt, llm_kwargs, plugin_kwargs, chatbot, history,
    # 基本信息：功能、贡献者
    chatbot.append([
        "函数插件功能？", 
-        "根据当前聊天历史或文件中(文件内容优先)绘制多种mermaid图表，将会由对话模型首先判断适合的图表类型，随后绘制图表。\
+        "根据当前聊天历史或指定的路径文件(文件内容优先)绘制多种mermaid图表，将会由对话模型首先判断适合的图表类型，随后绘制图表。\
        \n您也可以使用插件参数指定绘制的图表类型,函数插件贡献者: Menghuan1918"])
    yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
-
-    # 尝试导入依赖，如果缺少依赖，则给出安装建议
-    try:
-        import fitz
-    except:
-        report_exception(chatbot, history, 
-            a = f"解析项目: {txt}", 
-            b = f"导入软件依赖失败。使用该模块需要额外依赖，安装方法```pip install --upgrade pymupdf```。")
-        yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
-        return
    
    if os.path.exists(txt):     #如输入区无内容则直接解析历史记录
-        file_exist, txt = 输入区文件处理(txt)
+        from crazy_functions.pdf_fns.parse_word import extract_text_from_files
+        file_exist, final_result, page_one, file_manifest, excption = extract_text_from_files(txt, chatbot, history)
    else:
        file_exist = False
+        excption = ""
+        file_manifest = []

-    if file_exist : history = []    #如输入区内容为文件则清空历史记录
-    history.append(txt)     #将解析后的txt传递加入到历史中
-    
-    yield from 解析历史输入(history,llm_kwargs,chatbot,plugin_kwargs)  
+    if excption != "":
+        if excption == "word":
+            report_exception(chatbot, history, 
+                a = f"解析项目: {txt}", 
+                b = f"找到了.doc文件，但是该文件格式不被支持，请先转化为.docx格式。")
+            
+        elif excption == "pdf":
+            report_exception(chatbot, history, 
+                a = f"解析项目: {txt}", 
+                b = f"导入软件依赖失败。使用该模块需要额外依赖，安装方法```pip install --upgrade pymupdf```。")
+        
+        elif excption == "word_pip":
+                report_exception(chatbot, history,
+                    a=f"解析项目: {txt}",
+                    b=f"导入软件依赖失败。使用该模块需要额外依赖，安装方法```pip install --upgrade python-docx pywin32```。")
+
+        yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
+
+    else:
+        if not file_exist:
+            history.append(txt)     #如输入区不是文件则将输入区内容加入历史记录
+            i_say_show_user = f'首先你从历史记录中提取摘要。'; gpt_say = "[Local Message] 收到。"   # 用户提示
+            chatbot.append([i_say_show_user, gpt_say]); yield from update_ui(chatbot=chatbot, history=history)    # 更新UI
+            yield from 解析历史输入(history,llm_kwargs,file_manifest,chatbot,plugin_kwargs)
+        else:
+            file_num = len(file_manifest)
+            for i in range(file_num):     #依次处理文件
+                i_say_show_user = f"[{i+1}/{file_num}]处理文件{file_manifest[i]}"; gpt_say = "[Local Message] 收到。"   # 用户提示
+                chatbot.append([i_say_show_user, gpt_say]); yield from update_ui(chatbot=chatbot, history=history)    # 更新UI
+                history = []    #如输入区内容为文件则清空历史记录
+                history.append(final_result[i])
+                yield from 解析历史输入(history,llm_kwargs,file_manifest,chatbot,plugin_kwargs)