merge frontier branch (#1620)

* Zhipu sdk update 适配最新的智谱SDK，支持GLM4v (#1502) * 适配 google gemini 优化为从用户input中提取文件 * 适配最新的智谱SDK、支持glm-4v * requirements.txt fix * pending history check --------- Co-authored-by: binary-husky <qingxu.fu@outlook.com> * Update "生成多种Mermaid图表" plugin: Separate out the file reading function (#1520) * Update crazy_functional.py with new functionality deal with PDF * Update crazy_functional.py and Mermaid.py for plugin_kwargs * Update crazy_functional.py with new chart type: mind map * Update SELECT_PROMPT and i_say_show_user messages * Update ArgsReminder message in get_crazy_functions() function * Update with read md file and update PROMPTS * Return the PROMPTS as the test found that the initial version worked best * Update Mermaid chart generation function * version 3.71 * 解决issues #1510 * Remove unnecessary text from sys_prompt in 解析历史输入 function * Remove sys_prompt message in 解析历史输入 function * Update bridge_all.py: supports gpt-4-turbo-preview (#1517) * Update bridge_all.py: supports gpt-4-turbo-preview supports gpt-4-turbo-preview * Update bridge_all.py --------- Co-authored-by: binary-husky <96192199+binary-husky@users.noreply.github.com> * Update config.py: supports gpt-4-turbo-preview (#1516) * Update config.py: supports gpt-4-turbo-preview supports gpt-4-turbo-preview * Update config.py --------- Co-authored-by: binary-husky <96192199+binary-husky@users.noreply.github.com> * Refactor 解析历史输入 function to handle file input * Update Mermaid chart generation functionality * rename files and functions --------- Co-authored-by: binary-husky <qingxu.fu@outlook.com> Co-authored-by: hongyi-zhao <hongyi.zhao@gmail.com> Co-authored-by: binary-husky <96192199+binary-husky@users.noreply.github.com> * 接入mathpix ocr功能 (#1468) * Update Latex输出PDF结果.py 借助mathpix实现了PDF翻译中文并重新编译PDF * Update config.py add mathpix appid & appkey * Add 'PDF翻译中文并重新编译PDF' feature to plugins. --------- Co-authored-by: binary-husky <96192199+binary-husky@users.noreply.github.com> * fix zhipuai * check picture * remove glm-4 due to bug * 修改config * 检查MATHPIX_APPID * Remove unnecessary code and update function_plugins dictionary * capture non-standard token overflow * bug fix #1524 * change mermaid style * 支持mermaid 滚动放大缩小重置,鼠标滚动和拖拽 (#1530) * 支持mermaid 滚动放大缩小重置,鼠标滚动和拖拽 * 微调未果先stage一下 * update --------- Co-authored-by: binary-husky <qingxu.fu@outlook.com> Co-authored-by: binary-husky <96192199+binary-husky@users.noreply.github.com> * ver 3.72 * change live2d * save the status of ``clear btn` in cookie * 前端选择保持 * js ui bug fix * reset btn bug fix * update live2d tips * fix missing get_token_num method * fix live2d toggle switch * fix persistent custom btn with cookie * fix zhipuai feedback with core functionality * Refactor button update and clean up functions * tailing space removal * Fix missing MATHPIX_APPID and MATHPIX_APPKEY configuration * Prompt fix、脑图提示词优化 (#1537) * 适配 google gemini 优化为从用户input中提取文件 * 脑图提示词优化 * Fix missing MATHPIX_APPID and MATHPIX_APPKEY configuration --------- Co-authored-by: binary-husky <qingxu.fu@outlook.com> * 优化“PDF翻译中文并重新编译PDF”插件 (#1602) * Add gemini_endpoint to API_URL_REDIRECT (#1560) * Add gemini_endpoint to API_URL_REDIRECT * Update gemini-pro and gemini-pro-vision model_info endpoints * Update to support new claude models (#1606) * Add anthropic library and update claude models * 更新bridge_claude.py文件，添加了对图片输入的支持。修复了一些bug。 * 添加Claude_3_Models变量以限制图片数量 * Refactor code to improve readability and maintainability * minor claude bug fix * more flexible one-api support * reformat config * fix one-api new access bug * dummy * compat non-standard api * version 3.73 --------- Co-authored-by: XIao <46100050+Kilig947@users.noreply.github.com> Co-authored-by: Menghuan1918 <menghuan2003@outlook.com> Co-authored-by: hongyi-zhao <hongyi.zhao@gmail.com> Co-authored-by: Hao Ma <893017927@qq.com> Co-authored-by: zeyuan huang <599012428@qq.com>
2024-03-11 17:26:09 +08:00
parent cd18663800
commit c3140ce344
85 changed files with 866 additions and 642 deletions
--- a/crazy_functions/批量总结PDF文档.py
+++ b/crazy_functions/批量总结PDF文档.py
@@ -17,7 +17,7 @@ def 解析PDF(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot,
        file_content, page_one = read_and_clean_pdf_text(file_name) # （尝试）按照章节切割PDF
        file_content = file_content.encode('utf-8', 'ignore').decode()   # avoid reading non-utf8 chars
        page_one = str(page_one).encode('utf-8', 'ignore').decode()  # avoid reading non-utf8 chars
-        
+
        TOKEN_LIMIT_PER_FRAGMENT = 2500

        from crazy_functions.pdf_fns.breakdown_txt import breakdown_text_to_satisfy_token_limit
@@ -25,7 +25,7 @@ def 解析PDF(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot,
        page_one_fragments = breakdown_text_to_satisfy_token_limit(txt=str(page_one), limit=TOKEN_LIMIT_PER_FRAGMENT//4, llm_model=llm_kwargs['llm_model'])
        # 为了更好的效果，我们剥离Introduction之后的部分（如果有）
        paper_meta = page_one_fragments[0].split('introduction')[0].split('Introduction')[0].split('INTRODUCTION')[0]
-        
+
        ############################## <第 1 步，从摘要中提取高价值信息，放到history中> ##################################
        final_results = []
        final_results.append(paper_meta)
@@ -44,10 +44,10 @@ def 解析PDF(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot,
            i_say = f"Read this section, recapitulate the content of this section with less than {NUM_OF_WORD} Chinese characters: {paper_fragments[i]}"
            i_say_show_user = f"[{i+1}/{n_fragment}] Read this section, recapitulate the content of this section with less than {NUM_OF_WORD} Chinese characters: {paper_fragments[i][:200]}"
            gpt_say = yield from request_gpt_model_in_new_thread_with_ui_alive(i_say, i_say_show_user,  # i_say=真正给chatgpt的提问， i_say_show_user=给用户看的提问
-                                                                                llm_kwargs, chatbot, 
+                                                                                llm_kwargs, chatbot,
                                                                                history=["The main idea of the previous section is?", last_iteration_result], # 迭代上一次的结果
                                                                                sys_prompt="Extract the main idea of this section with Chinese."  # 提示
-                                                                                ) 
+                                                                                )
            iteration_results.append(gpt_say)
            last_iteration_result = gpt_say

@@ -67,15 +67,15 @@ def 解析PDF(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot,
    - (2):What are the past methods? What are the problems with them? Is the approach well motivated?
    - (3):What is the research methodology proposed in this paper?
    - (4):On what task and what performance is achieved by the methods in this paper? Can the performance support their goals?
-Follow the format of the output that follows:                  
+Follow the format of the output that follows:
 1. Title: xxx\n\n
 2. Authors: xxx\n\n
 3. Affiliation: xxx\n\n
 4. Keywords: xxx\n\n
 5. Urls: xxx or xxx , xxx \n\n
 6. Summary: \n\n
-    - (1):xxx;\n 
-    - (2):xxx;\n 
+    - (1):xxx;\n
+    - (2):xxx;\n
    - (3):xxx;\n
    - (4):xxx.\n\n
 Be sure to use Chinese answers (proper nouns need to be marked in English), statements as concise and academic as possible,
@@ -85,8 +85,8 @@ do not have too much repetitive information, numerical values using the original
        file_write_buffer.extend(final_results)
        i_say, final_results = input_clipping(i_say, final_results, max_token_limit=2000)
        gpt_say = yield from request_gpt_model_in_new_thread_with_ui_alive(
-            inputs=i_say, inputs_show_user='开始最终总结', 
-            llm_kwargs=llm_kwargs, chatbot=chatbot, history=final_results, 
+            inputs=i_say, inputs_show_user='开始最终总结',
+            llm_kwargs=llm_kwargs, chatbot=chatbot, history=final_results,
            sys_prompt= f"Extract the main idea of this paper with less than {NUM_OF_WORD} Chinese characters"
        )
        final_results.append(gpt_say)
@@ -114,8 +114,8 @@ def 批量总结PDF文档(txt, llm_kwargs, plugin_kwargs, chatbot, history, syst
    try:
        import fitz
    except:
-        report_exception(chatbot, history, 
-            a = f"解析项目: {txt}", 
+        report_exception(chatbot, history,
+            a = f"解析项目: {txt}",
            b = f"导入软件依赖失败。使用该模块需要额外依赖，安装方法```pip install --upgrade pymupdf```。")
        yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
        return
@@ -134,7 +134,7 @@ def 批量总结PDF文档(txt, llm_kwargs, plugin_kwargs, chatbot, history, syst

    # 搜索需要处理的文件清单
    file_manifest = [f for f in glob.glob(f'{project_folder}/**/*.pdf', recursive=True)]
-    
+
    # 如果没找到任何文件
    if len(file_manifest) == 0:
        report_exception(chatbot, history, a = f"解析项目: {txt}", b = f"找不到任何.tex或.pdf文件: {txt}")