Compare commits

..

202 Commits

Author SHA1 Message Date
雷欧(林平凡)
0055ea2df7 Merge branch 'master' of https://github.com/binary-husky/gpt_academic
Some checks failed
build-with-all-capacity / build-and-push-image (push) Has been cancelled
build-with-audio-assistant / build-and-push-image (push) Has been cancelled
build-with-chatglm / build-and-push-image (push) Has been cancelled
build-with-latex-arm / build-and-push-image (push) Has been cancelled
build-with-latex / build-and-push-image (push) Has been cancelled
build-without-local-llms / build-and-push-image (push) Has been cancelled
2025-03-04 14:16:24 +08:00
Steven Moder
4a79aa6a93 typo: Fix typos and rename functions across multiple files (#2130)
* typo: Fix typos and rename functions across multiple files

This commit addresses several minor issues:
- Corrected spelling of function names (e.g., `update_ui_lastest_msg` to `update_ui_latest_msg`)
- Fixed typos in comments and variable names
- Corrected capitalization in some strings (e.g., "ArXiv" instead of "Arixv")
- Renamed some variables for consistency
- Corrected some console-related parameter names (e.g., `console_slience` to `console_silence`)

The changes span multiple files across the project, including request LLM bridges, crazy functions, and utility modules.

* fix: f-string expression part cannot include a backslash (#2139)

* raise error when the uploaded tar contain hard/soft link (#2136)

* minor bug fix

* fine tune reasoning css

* upgrade internet gpt plugin

* Update README.md

* fix GHSA-gqp5-wm97-qxcv

* typo fix

* update readme

---------

Co-authored-by: binary-husky <96192199+binary-husky@users.noreply.github.com>
Co-authored-by: binary-husky <qingxu.fu@outlook.com>
2025-03-02 02:16:10 +08:00
binary-husky
5dffe8627f fix GHSA-gqp5-wm97-qxcv 2025-03-02 01:58:45 +08:00
binary-husky
2aefef26db Update README.md 2025-02-21 19:51:09 +08:00
binary-husky
957da731db upgrade internet gpt plugin 2025-02-13 00:19:43 +08:00
雷欧(林平凡)
155e7e0deb Merge remote-tracking branch 'github/master'
Some checks failed
build-with-all-capacity / build-and-push-image (push) Has been cancelled
build-with-audio-assistant / build-and-push-image (push) Has been cancelled
build-with-chatglm / build-and-push-image (push) Has been cancelled
build-with-latex-arm / build-and-push-image (push) Has been cancelled
build-with-latex / build-and-push-image (push) Has been cancelled
build-without-local-llms / build-and-push-image (push) Has been cancelled
# Conflicts:
#	config.py
2025-02-12 15:07:39 +08:00
binary-husky
add29eba08 fine tune reasoning css 2025-02-09 20:26:52 +08:00
binary-husky
163e59c0f3 minor bug fix 2025-02-09 19:33:02 +08:00
binary-husky
07ece29c7c raise error when the uploaded tar contain hard/soft link (#2136) 2025-02-08 20:54:01 +08:00
Steven Moder
991a903fa9 fix: f-string expression part cannot include a backslash (#2139) 2025-02-08 20:50:54 +08:00
Steven Moder
cf7c81170c fix: return 参数数量 及 返回类型考虑 (#2129) 2025-02-07 21:33:06 +08:00
barry
6dda2061dd Update bridge_openrouter.py (#2132)
fix openrouter api 400 post bug

Co-authored-by: lan <56376794+lostatnight@users.noreply.github.com>
2025-02-07 21:28:05 +08:00
雷欧(林平凡)
e9de41b7e8 1
Some checks failed
build-with-all-capacity / build-and-push-image (push) Has been cancelled
build-with-audio-assistant / build-and-push-image (push) Has been cancelled
build-with-chatglm / build-and-push-image (push) Has been cancelled
build-with-latex-arm / build-and-push-image (push) Has been cancelled
build-with-latex / build-and-push-image (push) Has been cancelled
build-without-local-llms / build-and-push-image (push) Has been cancelled
2025-02-07 11:21:24 +08:00
雷欧(林平凡)
b34c79a94b 1
Some checks are pending
build-with-all-capacity / build-and-push-image (push) Waiting to run
build-with-audio-assistant / build-and-push-image (push) Waiting to run
build-with-chatglm / build-and-push-image (push) Waiting to run
build-with-latex-arm / build-and-push-image (push) Waiting to run
build-with-latex / build-and-push-image (push) Waiting to run
build-without-local-llms / build-and-push-image (push) Waiting to run
2025-02-07 11:17:38 +08:00
binary-husky
8a0d96afd3 consider element missing cases in js 2025-02-07 01:21:21 +08:00
binary-husky
37f9b94dee add options to hide ui components 2025-02-07 00:17:36 +08:00
雷欧(林平凡)
95284d859b 1
Some checks are pending
build-with-all-capacity / build-and-push-image (push) Waiting to run
build-with-audio-assistant / build-and-push-image (push) Waiting to run
build-with-chatglm / build-and-push-image (push) Waiting to run
build-with-latex-arm / build-and-push-image (push) Waiting to run
build-with-latex / build-and-push-image (push) Waiting to run
build-without-local-llms / build-and-push-image (push) Waiting to run
2025-02-06 10:46:46 +08:00
雷欧(林平凡)
a552592b5a 1
Some checks are pending
build-with-all-capacity / build-and-push-image (push) Waiting to run
build-with-audio-assistant / build-and-push-image (push) Waiting to run
build-with-chatglm / build-and-push-image (push) Waiting to run
build-with-latex-arm / build-and-push-image (push) Waiting to run
build-with-latex / build-and-push-image (push) Waiting to run
build-without-local-llms / build-and-push-image (push) Waiting to run
2025-02-06 10:32:13 +08:00
雷欧(林平凡)
e305f1b4a8 1
Some checks are pending
build-with-all-capacity / build-and-push-image (push) Waiting to run
build-with-audio-assistant / build-and-push-image (push) Waiting to run
build-with-chatglm / build-and-push-image (push) Waiting to run
build-with-latex-arm / build-and-push-image (push) Waiting to run
build-with-latex / build-and-push-image (push) Waiting to run
build-without-local-llms / build-and-push-image (push) Waiting to run
2025-02-06 10:30:58 +08:00
van
a88497c3ab 1
Some checks are pending
build-with-all-capacity / build-and-push-image (push) Waiting to run
build-with-audio-assistant / build-and-push-image (push) Waiting to run
build-with-chatglm / build-and-push-image (push) Waiting to run
build-with-latex-arm / build-and-push-image (push) Waiting to run
build-with-latex / build-and-push-image (push) Waiting to run
build-without-local-llms / build-and-push-image (push) Waiting to run
2025-02-06 10:23:42 +08:00
雷欧(林平凡)
0f1d2e0e48 1
Some checks are pending
build-with-all-capacity / build-and-push-image (push) Waiting to run
build-with-audio-assistant / build-and-push-image (push) Waiting to run
build-with-chatglm / build-and-push-image (push) Waiting to run
build-with-latex-arm / build-and-push-image (push) Waiting to run
build-with-latex / build-and-push-image (push) Waiting to run
build-without-local-llms / build-and-push-image (push) Waiting to run
2025-02-06 10:03:34 +08:00
binary-husky
936e2f5206 update readme 2025-02-04 16:15:56 +08:00
binary-husky
7f4b87a633 update readme 2025-02-04 16:08:18 +08:00
binary-husky
2ddd1bb634 Merge branch 'memset0-master' 2025-02-04 16:03:53 +08:00
binary-husky
c68285aeac update config and version 2025-02-04 16:03:01 +08:00
Memento mori.
caaebe4296 add support for Deepseek R1 model and display CoT (#2118)
* feat: add support for R1 model and display CoT

* fix unpacking

* feat: customized font & font size

* auto hide tooltip when scoll down

* tooltip glass transparent css

* fix: Enhance API key validation in is_any_api_key function (#2113)

* support qwen2.5-max!

* update minior adjustment

---------

Co-authored-by: binary-husky <qingxu.fu@outlook.com>
Co-authored-by: Steven Moder <java20131114@gmail.com>
2025-02-04 16:02:02 +08:00
binary-husky
39d50c1c95 update minior adjustment 2025-02-04 15:57:35 +08:00
binary-husky
25dc7bf912 Merge branch 'master' of https://github.com/memset0/gpt_academic into memset0-master 2025-01-30 22:03:31 +08:00
binary-husky
0458590a77 support qwen2.5-max! 2025-01-29 23:29:38 +08:00
Steven Moder
44fe78fff5 fix: Enhance API key validation in is_any_api_key function (#2113) 2025-01-29 21:40:30 +08:00
binary-husky
5ddd657ebc tooltip glass transparent css 2025-01-28 23:50:21 +08:00
binary-husky
9b0b2cf260 auto hide tooltip when scoll down 2025-01-28 23:32:40 +08:00
binary-husky
9f39a6571a feat: customized font & font size 2025-01-28 02:52:56 +08:00
memset0
d07e736214 fix unpacking 2025-01-25 00:00:13 +08:00
memset0
a1f7ae5b55 feat: add support for R1 model and display CoT 2025-01-24 14:43:49 +08:00
binary-husky
1213ef19e5 Merge branch 'master' of github.com:binary-husky/chatgpt_academic 2025-01-22 01:50:08 +08:00
binary-husky
aaafe2a797 fix xelatex font problem in all-cap image 2025-01-22 01:49:53 +08:00
binary-husky
2716606f0c Update README.md 2025-01-16 23:40:24 +08:00
binary-husky
286f7303be fix image display bug 2025-01-12 21:54:43 +08:00
binary-husky
7eeab9e376 fix code block display bug 2025-01-09 22:31:59 +08:00
binary-husky
4ca331fb28 prevent html rendering for input 2025-01-05 21:20:12 +08:00
binary-husky
9487829930 change max_chat_preserve = 10 2025-01-03 00:34:36 +08:00
binary-husky
a73074b89e upgrade chat checkpoint 2025-01-03 00:31:03 +08:00
Southlandi
fd93622840 修复Gemini对话错误问题(停用词数量为0的情况) (#2092) 2024-12-28 23:22:10 +08:00
whyXVI
09a82a572d Fix RuntimeError in predict_no_ui_long_connection() (#2095)
Bug fix: Fix RuntimeError in predict_no_ui_long_connection()

In the original code, calling predict_no_ui_long_connection() would trigger a RuntimeError("OpenAI拒绝了请求:" + error_msg) even when the server responded normally. The issue occurred due to incorrect handling of SSE protocol comment lines (lines starting with ":"). 

Modified the parsing logic in both `predict` and `predict_no_ui_long_connection` to handle these lines correctly, making the logic more intuitive and robust.
2024-12-28 23:21:14 +08:00
G.RQ
c53ddf65aa 修复 bug“重置”按钮报错 (#2102)
* fix 重置按钮bug

* fix version control bug

---------

Co-authored-by: binary-husky <qingxu.fu@outlook.com>
2024-12-28 23:19:25 +08:00
binary-husky
ac64a77c2d allow disable openai proxy in WHEN_TO_USE_PROXY 2024-12-28 07:14:54 +08:00
binary-husky
dae8a0affc compat bug fix 2024-12-25 01:21:58 +08:00
binary-husky
97a81e9388 fix temp issue of o1 2024-12-25 00:54:03 +08:00
binary-husky
1dd1d0ed6c fix cookie overflow bug 2024-12-25 00:33:20 +08:00
binary-husky
060af0d2e6 Merge branch 'master' of github.com:binary-husky/chatgpt_academic 2024-12-22 23:33:44 +08:00
binary-husky
a848f714b6 fix welcome card bugs 2024-12-22 23:33:22 +08:00
binary-husky
924f8e30c7 Update issue stale.yml 2024-12-22 14:16:18 +08:00
binary-husky
f40347665b github action change 2024-12-22 14:15:16 +08:00
binary-husky
734c40bbde fix non-localhost javascript error 2024-12-22 14:01:22 +08:00
binary-husky
4ec87fbb54 history ng patch 1 2024-12-21 11:27:53 +08:00
binary-husky
17b5c22e61 Merge branch 'master' of github.com:binary-husky/chatgpt_academic 2024-12-19 22:46:14 +08:00
binary-husky
c6cd04a407 promote the rank of DASHSCOPE_API_KEY 2024-12-19 22:39:14 +08:00
YIQI JIANG
f60a12f8b4 Add o1 and o1-2024-12-17 model support (#2090)
* Add o1 and o1-2024-12-17 model support

* patch api key selection

---------

Co-authored-by: 蒋翌琪 <jiangyiqi99@jiangyiqideMacBook-Pro.local>
Co-authored-by: binary-husky <qingxu.fu@outlook.com>
2024-12-19 22:32:57 +08:00
binary-husky
8413fb15ba optimize welcome page 2024-12-18 23:35:25 +08:00
binary-husky
72b2ce9b62 ollama patch 2024-12-18 23:05:55 +08:00
binary-husky
f43ef909e2 roll version to 3.91 2024-12-18 22:56:41 +08:00
binary-husky
9651ad488f Merge branch 'master' into frontier 2024-12-18 22:27:12 +08:00
binary-husky
81da7bb1a5 remove welcome card when layout overflows 2024-12-18 17:48:02 +08:00
binary-husky
4127162ee7 add tts test 2024-12-18 17:47:23 +08:00
binary-husky
98e5cb7b77 update readme 2024-12-09 23:57:10 +08:00
binary-husky
c88d8047dd cookie storage to local storage 2024-12-09 23:52:02 +08:00
binary-husky
e4bebea28d update requirements 2024-12-09 23:40:23 +08:00
YE Ke 叶柯
294df6c2d5 Add ChatGLM4 local deployment support and refactor ChatGLM bridge's path configuration (#2062)
*  feat(request_llms and config.py): ChatGLM4 Deployment

Add support for local deployment of ChatGLM4 model

* 🦄 refactor(bridge_chatglm3.py): ChatGLM3 model path

Added ChatGLM3 path customization (in config.py).
Removed useless quantization model options that have been annotated

---------

Co-authored-by: MarkDeia <17290550+MarkDeia@users.noreply.github.com>
2024-12-07 23:43:51 +08:00
Zhenhong Du
239894544e Add support for grok-beta model from x.ai (#2060)
* Update config.py

add support for `grok-beta` model

* Update bridge_all.py

add support for `grok-beta` model
2024-12-07 23:41:53 +08:00
Menghuan
ed5fc84d4e 添加为windows的环境打包以及一键启动脚本 (#2068)
* 新增自动打包windows下的环境依赖

---------

Co-authored-by: binary-husky <qingxu.fu@outlook.com>
2024-12-07 23:41:02 +08:00
Menghuan
e3f84069ee 改进Doc2X请求,并增加xelatex编译的支持 (#2058)
* doc2x请求函数格式清理

* 更新中间部分

* 添加doc2x超时设置并添加对xelatex编译的支持

* Bug修复以及增加对xelatex安装的检测

* 增强弱网环境下的稳定性

* 修复模型中_无法显示的问题

* add xelatex logs

---------

Co-authored-by: binary-husky <qingxu.fu@outlook.com>
2024-12-07 23:23:59 +08:00
binary-husky
7bf094b6b6 remove 2024-12-07 22:43:03 +08:00
binary-husky
57d7dc33d3 sync common.js 2024-12-07 17:10:01 +08:00
binary-husky
94ccd77480 remove gen restore btn 2024-12-07 16:22:29 +08:00
binary-husky
48e53cba05 update gradio 2024-12-07 16:18:05 +08:00
binary-husky
e9a7f9439f upgrade gradio 2024-12-07 15:59:30 +08:00
binary-husky
a88b119bf0 change urls 2024-12-05 22:13:59 +08:00
binary-husky
eee8115434 add a config note 2024-12-04 23:55:22 +08:00
binary-husky
4f6a272113 remove keyword extraction 2024-12-04 01:33:31 +08:00
binary-husky
cf3dd5ddb6 add fail fallback option for media plugin 2024-12-04 01:06:12 +08:00
binary-husky
f6f10b7230 media plugin update 2024-12-04 00:36:34 +08:00
binary-husky
bd7b219e8f update web search functionality 2024-12-02 01:55:01 +08:00
binary-husky
e62decac21 change some open fn encoding to utf-8 2024-11-19 15:53:50 +00:00
binary-husky
588b22e039 comment remove 2024-11-19 15:05:48 +00:00
binary-husky
ef18aeda81 adjust rag 2024-11-19 14:59:50 +00:00
binary-husky
3520131ca2 public media gpt 2024-11-18 18:38:49 +00:00
binary-husky
ff5901d8c0 Merge branch 'master' into frontier 2024-11-17 18:16:19 +00:00
binary-husky
2305576410 unify mutex button manifest 2024-11-17 18:14:45 +00:00
binary-husky
52f23c505c media-gpt update 2024-11-17 17:45:53 +00:00
binary-husky
34cc484635 chatgpt-4o-latest 2024-11-11 15:58:57 +00:00
binary-husky
d152f62894 renamed plugins 2024-11-11 14:55:05 +00:00
binary-husky
ca35f56f9b fix: media gpt upgrade 2024-11-11 14:48:29 +00:00
binary-husky
d616fd121a update experimental media agent 2024-11-10 16:42:31 +00:00
binary-husky
f3fda0d3fc Merge branch 'master' into frontier 2024-11-10 13:41:44 +00:00
binary-husky
197287fc30 Enhance archive extraction with error handling for tar and gzip formats 2024-11-09 10:10:46 +00:00
Bingchen Jiang
c37fcc9299 Adding support to new openai apikey format (#2030) 2024-11-09 13:41:19 +08:00
binary-husky
91f5e6b8f7 resolve pickle security issue 2024-11-04 13:49:49 +00:00
hcy2206
4f0851f703 增加了对于glm-4-plus的支持 (#2014)
* 增加对于讯飞星火大模型Spark4.0的支持

* Create github action sync.yml

* 增加对于智谱glm-4-plus的支持

* feat: change arxiv io param

* catch comment source code exception

* upgrade auto comment

* add security patch

---------

Co-authored-by: GH Action - Upstream Sync <action@github.com>
Co-authored-by: binary-husky <qingxu.fu@outlook.com>
2024-11-03 22:41:16 +08:00
binary-husky
2821f27756 add security patch 2024-11-03 14:34:17 +00:00
binary-husky
8f91a048a8 dfa algo imp 2024-11-03 09:39:14 +00:00
binary-husky
58eac38b4d Merge branch 'master' into frontier 2024-10-30 13:42:17 +00:00
binary-husky
180550b8f0 upgrade auto comment 2024-10-30 13:37:35 +00:00
binary-husky
7497dcb852 catch comment source code exception 2024-10-30 11:40:47 +00:00
binary-husky
23ef2ffb22 feat: change arxiv io param 2024-10-27 16:54:29 +00:00
binary-husky
848d0f65c7 share paper network beta 2024-10-27 16:08:25 +00:00
Menghuan1918
f0b0364f74 修复并改进build with latex的Docker构建 (#2020)
* 改进构建文件

* 修复问题

* 更改docker注释,同时测试拉取大小
2024-10-27 23:17:03 +08:00
binary-husky
d7f0cbe68e Merge branch 'master' into frontier 2024-10-21 14:31:25 +00:00
binary-husky
69f3755682 adjust max_token_limit for pdf translation plugin 2024-10-21 14:31:11 +00:00
binary-husky
04c9077265 Merge branch 'papershare_beta' into frontier 2024-10-21 14:06:52 +00:00
binary-husky
6afd7db1e3 Merge branch 'master' into frontier 2024-10-21 14:06:23 +00:00
binary-husky
4727113243 update doc2x functions 2024-10-21 14:05:42 +00:00
binary-husky
42d10a9481 update doc2x functions 2024-10-21 14:05:05 +00:00
binary-husky
50a1ea83ef control whether to allow sharing translation results with GPTAC academic cloud. 2024-10-18 18:05:50 +00:00
binary-husky
a9c86a7fb8 pre 2024-10-18 14:16:24 +00:00
binary-husky
2b299cf579 Merge branch 'master' into frontier 2024-10-16 15:22:27 +00:00
wsg1873
310122f5a7 solve the concatenate error. (#2011) 2024-10-16 00:56:24 +08:00
binary-husky
0121cacc84 Merge branch 'master' into frontier 2024-10-15 09:10:36 +00:00
binary-husky
c83bf214d0 change arxiv download attempt url order 2024-10-15 09:09:24 +00:00
binary-husky
e34c49dce5 compat: deal with arxiv url change 2024-10-15 09:07:39 +00:00
binary-husky
f2dcd6ad55 compat: arxiv translation src shift 2024-10-15 09:06:57 +00:00
binary-husky
42d9712f20 Merge branch 'frontier' of github.com:binary-husky/chatgpt_academic into frontier 2024-10-15 08:24:01 +00:00
binary-husky
3890467c84 replace rm with rm -f 2024-10-15 07:32:29 +00:00
binary-husky
074b3c9828 explicitly declare default value 2024-10-15 06:41:12 +00:00
Nextstrain
b8e8457a01 关于o1系列模型无法正常请求的修复,多模型轮询KeyError: 'finish_reason'的修复 (#1992)
* Update bridge_all.py

* Update bridge_chatgpt.py

* Update bridge_chatgpt.py

* Update bridge_all.py

* Update bridge_all.py
2024-10-15 14:36:51 +08:00
binary-husky
2c93a24d7e fix dockerfile: try align python 2024-10-15 06:35:35 +00:00
binary-husky
e9af6ef3a0 fix: github action glitch 2024-10-15 06:32:47 +00:00
wsg1873
5ae8981dbb add the '/Fit' destination (#2009) 2024-10-14 22:50:56 +08:00
Boyin Liu
7f0ffa58f0 Boyin rag (#1983)
* first_version

* rag document support

* RAG interactive prompts added, issues resolved

* Resolve conflicts

* Resolve conflicts

* Resolve conflicts

* more file format support

* move import

* Resolve LlamaIndexRagWorker bug

* new resolve

* Address import  LlamaIndexRagWorker problem

* change import order

---------

Co-authored-by: binary-husky <qingxu.fu@outlook.com>
2024-10-14 22:48:24 +08:00
binary-husky
adbed044e4 fix o1 compat problem 2024-10-13 17:02:07 +00:00
Menghuan1918
2fe5febaf0 为build-with-latex版本Docker构建新增arm64支持 (#1994)
* Add arm64 support

* Bug fix

* Some build bug fix

* Add arm support

* 分离arm和x86构建

* 改进构建文档

* update tags

* Update build-with-latex-arm.yml

* Revert "Update build-with-latex-arm.yml"

This reverts commit 9af92549b5.

* Update

* Add

* httpx

* Addison

* Update GithubAction+NoLocal+Latex

* Update docker-compose.yml and GithubAction+NoLocal+Latex

* Update README.md

* test math anim generation

* solve the pdf concatenate error. (#2006)

* solve the pdf concatenate error.

* add legacy fallback option

---------

Co-authored-by: binary-husky <qingxu.fu@outlook.com>

---------

Co-authored-by: binary-husky <96192199+binary-husky@users.noreply.github.com>
Co-authored-by: binary-husky <qingxu.fu@outlook.com>
Co-authored-by: wsg1873 <wsg0326@163.com>
2024-10-14 00:25:28 +08:00
binary-husky
5888d038aa move import 2024-10-13 16:17:10 +00:00
binary-husky
ee8213e936 Merge branch 'boyin_rag' into frontier 2024-10-13 16:12:51 +00:00
binary-husky
a57dcbcaeb Merge branch 'frontier' of github.com:binary-husky/chatgpt_academic into frontier 2024-10-13 08:26:06 +00:00
binary-husky
b812392a9d Merge branch 'master' into frontier 2024-10-13 08:25:47 +00:00
wsg1873
f54d8e559a solve the pdf concatenate error. (#2006)
* solve the pdf concatenate error.

* add legacy fallback option

---------

Co-authored-by: binary-husky <qingxu.fu@outlook.com>
2024-10-13 16:16:51 +08:00
lbykkkk
fce4fa1ec7 more file format support 2024-10-12 18:25:33 +00:00
Boyin Liu
d13f1e270c Merge branch 'master' into boyin_rag 2024-10-11 22:31:07 +08:00
lbykkkk
85cf3d08eb Resolve conflicts 2024-10-11 22:29:56 +08:00
lbykkkk
584e747565 Resolve conflicts 2024-10-11 22:27:57 +08:00
lbykkkk
02ba653c19 Resolve conflicts 2024-10-11 22:21:53 +08:00
binary-husky
e68fc2bc69 Merge branch 'master' of github.com:binary-husky/chatgpt_academic 2024-10-11 13:33:05 +00:00
binary-husky
f695d7f1da test math anim generation 2024-10-11 13:32:57 +00:00
lbykkkk
2d12b5b27d RAG interactive prompts added, issues resolved 2024-10-11 01:06:17 +08:00
binary-husky
679352d896 Update README.md 2024-10-10 13:38:35 +08:00
binary-husky
12c9ab1e33 Update README.md 2024-10-10 12:02:12 +08:00
binary-husky
a4bcd262f9 Merge branch 'master' into frontier 2024-10-07 05:20:49 +00:00
binary-husky
da4a5efc49 lazy load llama-index lib 2024-10-06 16:26:26 +00:00
binary-husky
9ac450cfb6 紧急修复 fix httpx breaking bad error 2024-10-06 15:02:14 +00:00
binary-husky
172f9e220b version 3.90 2024-10-05 16:51:08 +00:00
Boyin Liu
748e31102f Merge branch 'master' into boyin_rag 2024-10-05 23:58:43 +08:00
binary-husky
a28b7d8475 Merge branch 'master' of https://github.com/binary-husky/gpt_academic 2024-10-05 19:10:42 +08:00
binary-husky
7d3ed36899 fix: llama index deps verion limit 2024-10-05 19:10:38 +08:00
binary-husky
a7bc5fa357 remove out-dated jittor models 2024-10-05 10:58:45 +00:00
binary-husky
4f5dd9ebcf add temp solution for llama-index compat 2024-10-05 09:53:21 +00:00
binary-husky
427feb99d8 llama-index==0.10.5 2024-10-05 17:34:08 +08:00
binary-husky
a01ca93362 Merge Latest Frontier (#1991)
* logging sys to loguru: stage 1 complete

* import loguru: stage 2

* logging -> loguru: stage 3

* support o1-preview and o1-mini

* logging -> loguru stage 4

* update social helper

* logging -> loguru: final stage

* fix: console output

* update translation matrix

* fix: loguru argument error with proxy enabled (#1977)

* relax llama index version

* remove comment

* Added some modules to support openrouter (#1975)

* Added some modules for supporting openrouter model

Added some modules for supporting openrouter model

* Update config.py

* Update .gitignore

* Update bridge_openrouter.py

* Not changed actually

* Refactor logging in bridge_openrouter.py

---------

Co-authored-by: binary-husky <qingxu.fu@outlook.com>

* remove logging extra

---------

Co-authored-by: Steven Moder <java20131114@gmail.com>
Co-authored-by: Ren Lifei <2602264455@qq.com>
2024-10-05 17:09:18 +08:00
binary-husky
97eef45ab7 Merge branch 'frontier' of github.com:binary-husky/chatgpt_academic into frontier 2024-10-01 11:59:14 +00:00
binary-husky
0c0e2acb9b remove logging extra 2024-10-01 11:57:47 +00:00
Ren Lifei
9fba8e0142 Added some modules to support openrouter (#1975)
* Added some modules for supporting openrouter model

Added some modules for supporting openrouter model

* Update config.py

* Update .gitignore

* Update bridge_openrouter.py

* Not changed actually

* Refactor logging in bridge_openrouter.py

---------

Co-authored-by: binary-husky <qingxu.fu@outlook.com>
2024-09-28 18:05:34 +08:00
binary-husky
7d7867fb64 remove comment 2024-09-23 15:16:13 +00:00
lbykkkk
7ea791d83a rag document support 2024-09-22 21:37:57 +08:00
binary-husky
f9dbaa39fb Merge branch 'frontier' of github.com:binary-husky/chatgpt_academic into frontier 2024-09-21 15:40:24 +00:00
binary-husky
bbc2288c5b relax llama index version 2024-09-21 15:40:10 +00:00
Steven Moder
64ab916838 fix: loguru argument error with proxy enabled (#1977) 2024-09-21 23:32:00 +08:00
binary-husky
8fe559da9f update translation matrix 2024-09-21 14:56:10 +00:00
binary-husky
09fd22091a fix: console output 2024-09-21 14:41:36 +00:00
lbykkkk
df717f8bba first_version 2024-09-20 00:06:59 +08:00
binary-husky
e296719b23 Merge branch 'purge_print' into frontier 2024-09-16 09:56:25 +00:00
binary-husky
2f343179a2 logging -> loguru: final stage 2024-09-15 15:51:51 +00:00
binary-husky
4d9604f2e9 update social helper 2024-09-15 15:16:36 +00:00
binary-husky
597c320808 fix: system prompt err when using o1 models 2024-09-14 17:04:01 +00:00
binary-husky
18290fd138 fix: support o1 models 2024-09-14 17:00:02 +00:00
binary-husky
bbf9e9f868 logging -> loguru stage 4 2024-09-14 16:00:09 +00:00
binary-husky
0d0575a639 support o1-preview and o1-mini 2024-09-13 03:12:18 +00:00
binary-husky
aa1f967dd7 support o1-preview and o1-mini 2024-09-13 03:11:53 +00:00
binary-husky
0d082327c8 logging -> loguru: stage 3 2024-09-11 08:49:55 +00:00
binary-husky
80acd9c875 import loguru: stage 2 2024-09-11 08:18:01 +00:00
binary-husky
17cd4f8210 logging sys to loguru: stage 1 complete 2024-09-11 03:30:30 +00:00
binary-husky
4e041e1d4e Merge branch 'frontier': windows deps bug fix 2024-09-08 16:32:38 +00:00
binary-husky
7ef39770c7 fallback to simple vs in windows system 2024-09-09 00:27:02 +08:00
binary-husky
8222f638cf Merge branch 'frontier' 2024-09-08 15:46:13 +00:00
binary-husky
ab32c314ab change git ignore 2024-09-08 15:44:02 +00:00
binary-husky
dcfed97054 revise milvus rag 2024-09-08 15:43:01 +00:00
binary-husky
dd66ca26f7 Frontier (#1958)
* update welcome svg

* fix loading chatglm3 (#1937)

* update welcome svg

* update welcome message

* fix loading chatglm3

---------

Co-authored-by: binary-husky <qingxu.fu@outlook.com>
Co-authored-by: binary-husky <96192199+binary-husky@users.noreply.github.com>

* begin rag project with llama index

* rag version one

* rag beta release

* add social worker (proto)

* fix llamaindex version

---------

Co-authored-by: moetayuko <loli@yuko.moe>
2024-09-08 23:20:42 +08:00
binary-husky
8b91d2ac0a add milvus vector store 2024-09-08 15:19:03 +00:00
binary-husky
e4e00b713f fix llamaindex version 2024-09-05 05:21:10 +00:00
binary-husky
710a65522c add social worker (proto) 2024-09-02 15:55:06 +00:00
binary-husky
34784c1d40 Merge branch 'rag' into frontier 2024-09-02 15:01:12 +00:00
binary-husky
80b1a6f99b rag beta release 2024-09-02 15:00:47 +00:00
binary-husky
08c3c56f53 rag version one 2024-08-28 15:14:13 +00:00
binary-husky
294716c832 begin rag project with llama index 2024-08-21 14:24:37 +00:00
binary-husky
16f4fd636e update ref 2024-08-19 16:14:52 +00:00
binary-husky
e07caf7a69 update openai api key pattern 2024-08-19 15:59:20 +00:00
moetayuko
a95b3daab9 fix loading chatglm3 (#1937)
* update welcome svg

* update welcome message

* fix loading chatglm3

---------

Co-authored-by: binary-husky <qingxu.fu@outlook.com>
Co-authored-by: binary-husky <96192199+binary-husky@users.noreply.github.com>
2024-08-19 23:32:45 +08:00
binary-husky
4873e9dfdc update translation matrix 2024-08-12 13:50:37 +00:00
moetayuko
a119ab36fe fix enabling sparkv4 (#1936) 2024-08-12 21:45:08 +08:00
FatShibaInu
f9384e4e5f Add Support for Gemini 1.5 Pro & Gemini 1.5 Flash (#1926)
* Add Support for Gemini 1.5 Pro & 1.5 Flash.

* Update bridge_all.py

fix a spelling error in comments.

* Add Support for Gemini 1.5 Pro & Gemini 1.5 Flash
2024-08-12 21:44:24 +08:00
binary-husky
6fe5f6ee6e update welcome message 2024-08-05 11:37:06 +00:00
binary-husky
068d753426 update welcome svg 2024-08-04 15:59:09 +00:00
binary-husky
5010537f3c update welcome svg 2024-08-04 15:58:32 +00:00
binary-husky
f35f6633e0 fix: welcome card flip bug 2024-08-02 11:20:41 +00:00
189 changed files with 12027 additions and 2377 deletions

View File

@@ -1,44 +0,0 @@
# https://docs.github.com/en/actions/publishing-packages/publishing-docker-images#publishing-images-to-github-packages
name: build-with-jittorllms
on:
push:
branches:
- 'master'
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}_jittorllms
jobs:
build-and-push-image:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:
- name: Checkout repository
uses: actions/checkout@v3
- name: Log in to the Container registry
uses: docker/login-action@v2
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract metadata (tags, labels) for Docker
id: meta
uses: docker/metadata-action@v4
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
- name: Build and push Docker image
uses: docker/build-push-action@v4
with:
context: .
push: true
file: docs/GithubAction+JittorLLMs
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}

View File

@@ -1,14 +1,14 @@
# https://docs.github.com/en/actions/publishing-packages/publishing-docker-images#publishing-images-to-github-packages # https://docs.github.com/en/actions/publishing-packages/publishing-docker-images#publishing-images-to-github-packages
name: build-with-all-capacity-beta name: build-with-latex-arm
on: on:
push: push:
branches: branches:
- 'master' - "master"
env: env:
REGISTRY: ghcr.io REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}_with_all_capacity_beta IMAGE_NAME: ${{ github.repository }}_with_latex_arm
jobs: jobs:
build-and-push-image: build-and-push-image:
@@ -18,11 +18,17 @@ jobs:
packages: write packages: write
steps: steps:
- name: Set up QEMU
uses: docker/setup-qemu-action@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Checkout repository - name: Checkout repository
uses: actions/checkout@v3 uses: actions/checkout@v4
- name: Log in to the Container registry - name: Log in to the Container registry
uses: docker/login-action@v2 uses: docker/login-action@v3
with: with:
registry: ${{ env.REGISTRY }} registry: ${{ env.REGISTRY }}
username: ${{ github.actor }} username: ${{ github.actor }}
@@ -35,10 +41,11 @@ jobs:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }} images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
- name: Build and push Docker image - name: Build and push Docker image
uses: docker/build-push-action@v4 uses: docker/build-push-action@v6
with: with:
context: . context: .
push: true push: true
file: docs/GithubAction+AllCapacityBeta platforms: linux/arm64
file: docs/GithubAction+NoLocal+Latex
tags: ${{ steps.meta.outputs.tags }} tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }} labels: ${{ steps.meta.outputs.labels }}

View File

@@ -0,0 +1,56 @@
name: Create Conda Environment Package
on:
workflow_dispatch:
jobs:
build:
runs-on: windows-latest
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Setup Miniconda
uses: conda-incubator/setup-miniconda@v3
with:
auto-activate-base: true
activate-environment: ""
- name: Create new Conda environment
shell: bash -l {0}
run: |
conda create -n gpt python=3.11 -y
conda activate gpt
- name: Install requirements
shell: bash -l {0}
run: |
conda activate gpt
pip install -r requirements.txt
- name: Install conda-pack
shell: bash -l {0}
run: |
conda activate gpt
conda install conda-pack -y
- name: Pack conda environment
shell: bash -l {0}
run: |
conda activate gpt
conda pack -n gpt -o gpt.tar.gz
- name: Create workspace zip
shell: pwsh
run: |
mkdir workspace
Get-ChildItem -Exclude "workspace" | Copy-Item -Destination workspace -Recurse
Remove-Item -Path workspace/.git* -Recurse -Force -ErrorAction SilentlyContinue
Copy-Item gpt.tar.gz workspace/ -Force
- name: Upload packed files
uses: actions/upload-artifact@v4
with:
name: gpt-academic-package
path: workspace

View File

@@ -7,7 +7,7 @@
name: 'Close stale issues and PRs' name: 'Close stale issues and PRs'
on: on:
schedule: schedule:
- cron: '*/5 * * * *' - cron: '*/30 * * * *'
jobs: jobs:
stale: stale:
@@ -19,7 +19,6 @@ jobs:
steps: steps:
- uses: actions/stale@v8 - uses: actions/stale@v8
with: with:
stale-issue-message: 'This issue is stale because it has been open 100 days with no activity. Remove stale label or comment or this will be closed in 1 days.' stale-issue-message: 'This issue is stale because it has been open 100 days with no activity. Remove stale label or comment or this will be closed in 7 days.'
days-before-stale: 100 days-before-stale: 100
days-before-close: 1 days-before-close: 7
debug-only: true

3
.gitignore vendored
View File

@@ -160,3 +160,6 @@ test.*
temp.* temp.*
objdump* objdump*
*.min.*.js *.min.*.js
TODO
experimental_mods
search_results

View File

@@ -15,6 +15,7 @@ RUN echo '[global]' > /etc/pip.conf && \
# 语音输出功能以下两行第一行更换阿里源第二行安装ffmpeg都可以删除 # 语音输出功能以下两行第一行更换阿里源第二行安装ffmpeg都可以删除
RUN UBUNTU_VERSION=$(awk -F= '/^VERSION_CODENAME=/{print $2}' /etc/os-release); echo "deb https://mirrors.aliyun.com/debian/ $UBUNTU_VERSION main non-free contrib" > /etc/apt/sources.list; apt-get update RUN UBUNTU_VERSION=$(awk -F= '/^VERSION_CODENAME=/{print $2}' /etc/os-release); echo "deb https://mirrors.aliyun.com/debian/ $UBUNTU_VERSION main non-free contrib" > /etc/apt/sources.list; apt-get update
RUN apt-get install ffmpeg -y RUN apt-get install ffmpeg -y
RUN apt-get clean
# 进入工作路径(必要) # 进入工作路径(必要)
@@ -33,6 +34,7 @@ RUN pip3 install -r requirements.txt
# 非必要步骤,用于预热模块(可以删除) # 非必要步骤,用于预热模块(可以删除)
RUN python3 -c 'from check_proxy import warm_up_modules; warm_up_modules()' RUN python3 -c 'from check_proxy import warm_up_modules; warm_up_modules()'
RUN python3 -m pip cache purge
# 启动(必要) # 启动(必要)

View File

@@ -1,8 +1,14 @@
> [!IMPORTANT] > [!IMPORTANT]
> 2024.6.1: 版本3.80加入插件二级菜单功能详见wiki > `master主分支`最新动态(2025.3.2): 修复大量代码typo / 联网组件支持Jina的api / 增加deepseek-r1支持
> `frontier开发分支`最新动态(2024.12.9): 更新对话时间线功能优化xelatex论文翻译
> `wiki文档`最新动态(2024.12.5): 更新ollama接入指南
>
> 2025.2.2: 三分钟快速接入最强qwen2.5-max[视频](https://www.bilibili.com/video/BV1LeFuerEG4)
> 2025.2.1: 支持自定义字体
> 2024.10.10: 突发停电,紧急恢复了提供[whl包](https://drive.google.com/drive/folders/14kR-3V-lIbvGxri4AHc8TpiA1fqsw7SK?usp=sharing)的文件服务器
> 2024.5.1: 加入Doc2x翻译PDF论文的功能[查看详情](https://github.com/binary-husky/gpt_academic/wiki/Doc2x) > 2024.5.1: 加入Doc2x翻译PDF论文的功能[查看详情](https://github.com/binary-husky/gpt_academic/wiki/Doc2x)
> 2024.3.11: 全力支持Qwen、GLM、DeepseekCoder等中文大语言模型 SoVits语音克隆模块[查看详情](https://www.bilibili.com/video/BV1Rp421S7tF/) > 2024.3.11: 全力支持Qwen、GLM、DeepseekCoder等中文大语言模型 SoVits语音克隆模块[查看详情](https://www.bilibili.com/video/BV1Rp421S7tF/)
> 2024.1.17: 安装依赖时,请选择`requirements.txt`中**指定的版本**。 安装命令:`pip install -r requirements.txt`。本项目完全开源免费,您可通过订阅[在线服务](https://github.com/binary-husky/gpt_academic/wiki/online)的方式鼓励本项目的发展。 > 2024.1.17: 安装依赖时,请选择`requirements.txt`中**指定的版本**。 安装命令:`pip install -r requirements.txt`。
<br> <br>
@@ -123,20 +129,20 @@ Latex论文一键校对 | [插件] 仿Grammarly对Latex文章进行语法、拼
```mermaid ```mermaid
flowchart TD flowchart TD
A{"安装方法"} --> W1("I. 🔑直接运行 (Windows, Linux or MacOS)") A{"安装方法"} --> W1("I 🔑直接运行 (Windows, Linux or MacOS)")
W1 --> W11["1. Python pip包管理依赖"] W1 --> W11["1 Python pip包管理依赖"]
W1 --> W12["2. Anaconda包管理依赖推荐⭐"] W1 --> W12["2 Anaconda包管理依赖推荐⭐"]
A --> W2["II. 🐳使用Docker (Windows, Linux or MacOS)"] A --> W2["II 🐳使用Docker (Windows, Linux or MacOS)"]
W2 --> k1["1. 部署项目全部能力的大镜像(推荐⭐)"] W2 --> k1["1 部署项目全部能力的大镜像(推荐⭐)"]
W2 --> k2["2. 仅在线模型GPT, GLM4等镜像"] W2 --> k2["2 仅在线模型GPT, GLM4等镜像"]
W2 --> k3["3. 在线模型 + Latex的大镜像"] W2 --> k3["3 在线模型 + Latex的大镜像"]
A --> W4["IV. 🚀其他部署方法"] A --> W4["IV 🚀其他部署方法"]
W4 --> C1["1. Windows/MacOS 一键安装运行脚本(推荐⭐)"] W4 --> C1["1 Windows/MacOS 一键安装运行脚本(推荐⭐)"]
W4 --> C2["2. Huggingface, Sealos远程部署"] W4 --> C2["2 Huggingface, Sealos远程部署"]
W4 --> C4["3. ... 其他 ..."] W4 --> C4["3 其他 ..."]
``` ```
### 安装方法I直接运行 (Windows, Linux or MacOS) ### 安装方法I直接运行 (Windows, Linux or MacOS)
@@ -169,26 +175,32 @@ flowchart TD
``` ```
<details><summary>如果需要支持清华ChatGLM2/复旦MOSS/RWKV作为后端请点击展开此处</summary> <details><summary>如果需要支持清华ChatGLM系列/复旦MOSS/RWKV作为后端请点击展开此处</summary>
<p> <p>
【可选步骤】如果需要支持清华ChatGLM3/复旦MOSS作为后端需要额外安装更多依赖前提条件熟悉Python + 用过Pytorch + 电脑配置够强): 【可选步骤】如果需要支持清华ChatGLM系列/复旦MOSS作为后端需要额外安装更多依赖前提条件熟悉Python + 用过Pytorch + 电脑配置够强):
```sh ```sh
# 【可选步骤I】支持清华ChatGLM3。清华ChatGLM备注如果遇到"Call ChatGLM fail 不能正常加载ChatGLM的参数" 错误,参考如下: 1以上默认安装的为torch+cpu版使用cuda需要卸载torch重新安装torch+cuda 2如因本机配置不够无法加载模型可以修改request_llm/bridge_chatglm.py中的模型精度, 将 AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True) 都修改为 AutoTokenizer.from_pretrained("THUDM/chatglm-6b-int4", trust_remote_code=True) # 【可选步骤I】支持清华ChatGLM3。清华ChatGLM备注如果遇到"Call ChatGLM fail 不能正常加载ChatGLM的参数" 错误,参考如下: 1以上默认安装的为torch+cpu版使用cuda需要卸载torch重新安装torch+cuda 2如因本机配置不够无法加载模型可以修改request_llm/bridge_chatglm.py中的模型精度, 将 AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True) 都修改为 AutoTokenizer.from_pretrained("THUDM/chatglm-6b-int4", trust_remote_code=True)
python -m pip install -r request_llms/requirements_chatglm.txt python -m pip install -r request_llms/requirements_chatglm.txt
# 【可选步骤II】支持复旦MOSS # 【可选步骤II】支持清华ChatGLM4 注意此模型至少需要24G显存
python -m pip install -r request_llms/requirements_chatglm4.txt
# 可使用modelscope下载ChatGLM4模型
# pip install modelscope
# modelscope download --model ZhipuAI/glm-4-9b-chat --local_dir ./THUDM/glm-4-9b-chat
# 【可选步骤III】支持复旦MOSS
python -m pip install -r request_llms/requirements_moss.txt python -m pip install -r request_llms/requirements_moss.txt
git clone --depth=1 https://github.com/OpenLMLab/MOSS.git request_llms/moss # 注意执行此行代码时,必须处于项目根路径 git clone --depth=1 https://github.com/OpenLMLab/MOSS.git request_llms/moss # 注意执行此行代码时,必须处于项目根路径
# 【可选步骤III】支持RWKV Runner # 【可选步骤IV】支持RWKV Runner
参考wikihttps://github.com/binary-husky/gpt_academic/wiki/%E9%80%82%E9%85%8DRWKV-Runner 参考wikihttps://github.com/binary-husky/gpt_academic/wiki/%E9%80%82%E9%85%8DRWKV-Runner
# 【可选步骤IV】确保config.py配置文件的AVAIL_LLM_MODELS包含了期望的模型目前支持的全部模型如下(jittorllms系列目前仅支持docker方案) # 【可选步骤V】确保config.py配置文件的AVAIL_LLM_MODELS包含了期望的模型目前支持的全部模型如下(jittorllms系列目前仅支持docker方案)
AVAIL_LLM_MODELS = ["gpt-3.5-turbo", "api2d-gpt-3.5-turbo", "gpt-4", "api2d-gpt-4", "chatglm", "moss"] # + ["jittorllms_rwkv", "jittorllms_pangualpha", "jittorllms_llama"] AVAIL_LLM_MODELS = ["gpt-3.5-turbo", "api2d-gpt-3.5-turbo", "gpt-4", "api2d-gpt-4", "chatglm", "moss"] # + ["jittorllms_rwkv", "jittorllms_pangualpha", "jittorllms_llama"]
# 【可选步骤V】支持本地模型INT8,INT4量化这里所指的模型本身不是量化版本目前deepseek-coder支持后面测试后会加入更多模型量化选择 # 【可选步骤VI】支持本地模型INT8,INT4量化这里所指的模型本身不是量化版本目前deepseek-coder支持后面测试后会加入更多模型量化选择
pip install bitsandbyte pip install bitsandbyte
# windows用户安装bitsandbytes需要使用下面bitsandbytes-windows-webui # windows用户安装bitsandbytes需要使用下面bitsandbytes-windows-webui
python -m pip install bitsandbytes --prefer-binary --extra-index-url=https://jllllll.github.io/bitsandbytes-windows-webui python -m pip install bitsandbytes --prefer-binary --extra-index-url=https://jllllll.github.io/bitsandbytes-windows-webui
@@ -416,7 +428,6 @@ timeline LR
1. `master` 分支: 主分支,稳定版 1. `master` 分支: 主分支,稳定版
2. `frontier` 分支: 开发分支,测试版 2. `frontier` 分支: 开发分支,测试版
3. 如何[接入其他大模型](request_llms/README.md) 3. 如何[接入其他大模型](request_llms/README.md)
4. 访问GPT-Academic的[在线服务并支持我们](https://github.com/binary-husky/gpt_academic/wiki/online)
### V参考与学习 ### V参考与学习

View File

@@ -1,48 +1,77 @@
from loguru import logger
def check_proxy(proxies, return_ip=False): def check_proxy(proxies, return_ip=False):
"""
检查代理配置并返回结果。
Args:
proxies (dict): 包含http和https代理配置的字典。
return_ip (bool, optional): 是否返回代理的IP地址。默认为False。
Returns:
str or None: 检查的结果信息或代理的IP地址如果`return_ip`为True
"""
import requests import requests
proxies_https = proxies['https'] if proxies is not None else '' proxies_https = proxies['https'] if proxies is not None else ''
ip = None ip = None
try: try:
response = requests.get("https://ipapi.co/json/", proxies=proxies, timeout=4) response = requests.get("https://ipapi.co/json/", proxies=proxies, timeout=4) # ⭐ 执行GET请求以获取代理信息
data = response.json() data = response.json()
if 'country_name' in data: if 'country_name' in data:
country = data['country_name'] country = data['country_name']
result = f"代理配置 {proxies_https}, 代理所在地:{country}" result = f"代理配置 {proxies_https}, 代理所在地:{country}"
if 'ip' in data: ip = data['ip'] if 'ip' in data:
ip = data['ip']
elif 'error' in data: elif 'error' in data:
alternative, ip = _check_with_backup_source(proxies) alternative, ip = _check_with_backup_source(proxies) # ⭐ 调用备用方法检查代理配置
if alternative is None: if alternative is None:
result = f"代理配置 {proxies_https}, 代理所在地未知IP查询频率受限" result = f"代理配置 {proxies_https}, 代理所在地未知IP查询频率受限"
else: else:
result = f"代理配置 {proxies_https}, 代理所在地:{alternative}" result = f"代理配置 {proxies_https}, 代理所在地:{alternative}"
else: else:
result = f"代理配置 {proxies_https}, 代理数据解析失败:{data}" result = f"代理配置 {proxies_https}, 代理数据解析失败:{data}"
if not return_ip: if not return_ip:
print(result) logger.warning(result)
return result return result
else: else:
return ip return ip
except: except:
result = f"代理配置 {proxies_https}, 代理所在地查询超时,代理可能无效" result = f"代理配置 {proxies_https}, 代理所在地查询超时,代理可能无效"
if not return_ip: if not return_ip:
print(result) logger.warning(result)
return result return result
else: else:
return ip return ip
def _check_with_backup_source(proxies): def _check_with_backup_source(proxies):
"""
通过备份源检查代理,并获取相应信息。
Args:
proxies (dict): 包含代理信息的字典。
Returns:
tuple: 代理信息(geo)和IP地址(ip)的元组。
"""
import random, string, requests import random, string, requests
random_string = ''.join(random.choices(string.ascii_letters + string.digits, k=32)) random_string = ''.join(random.choices(string.ascii_letters + string.digits, k=32))
try: try:
res_json = requests.get(f"http://{random_string}.edns.ip-api.com/json", proxies=proxies, timeout=4).json() res_json = requests.get(f"http://{random_string}.edns.ip-api.com/json", proxies=proxies, timeout=4).json() # ⭐ 执行代理检查和备份源请求
return res_json['dns']['geo'], res_json['dns']['ip'] return res_json['dns']['geo'], res_json['dns']['ip']
except: except:
return None, None return None, None
def backup_and_download(current_version, remote_version): def backup_and_download(current_version, remote_version):
""" """
一键更新协议:备份和下载 一键更新协议:备份当前版本,下载远程版本并解压缩。
Args:
current_version (str): 当前版本号。
remote_version (str): 远程版本号。
Returns:
str: 新版本目录的路径。
""" """
from toolbox import get_conf from toolbox import get_conf
import shutil import shutil
@@ -59,7 +88,7 @@ def backup_and_download(current_version, remote_version):
proxies = get_conf('proxies') proxies = get_conf('proxies')
try: r = requests.get('https://github.com/binary-husky/chatgpt_academic/archive/refs/heads/master.zip', proxies=proxies, stream=True) try: r = requests.get('https://github.com/binary-husky/chatgpt_academic/archive/refs/heads/master.zip', proxies=proxies, stream=True)
except: r = requests.get('https://public.agent-matrix.com/publish/master.zip', proxies=proxies, stream=True) except: r = requests.get('https://public.agent-matrix.com/publish/master.zip', proxies=proxies, stream=True)
zip_file_path = backup_dir+'/master.zip' zip_file_path = backup_dir+'/master.zip' # ⭐ 保存备份文件的路径
with open(zip_file_path, 'wb+') as f: with open(zip_file_path, 'wb+') as f:
f.write(r.content) f.write(r.content)
dst_path = new_version_dir dst_path = new_version_dir
@@ -75,6 +104,17 @@ def backup_and_download(current_version, remote_version):
def patch_and_restart(path): def patch_and_restart(path):
""" """
一键更新协议:覆盖和重启 一键更新协议:覆盖和重启
Args:
path (str): 新版本代码所在的路径
注意事项:
如果您的程序没有使用config_private.py私密配置文件则会将config.py重命名为config_private.py以避免配置丢失。
更新流程:
- 复制最新版本代码到当前目录
- 更新pip包依赖
- 如果更新失败,则提示手动安装依赖库并重启
""" """
from distutils import dir_util from distutils import dir_util
import shutil import shutil
@@ -82,33 +122,44 @@ def patch_and_restart(path):
import sys import sys
import time import time
import glob import glob
from shared_utils.colorful import print亮黄, print亮绿, print亮红 from shared_utils.colorful import log亮黄, log亮绿, log亮红
# if not using config_private, move origin config.py as config_private.py
if not os.path.exists('config_private.py'): if not os.path.exists('config_private.py'):
print亮黄('由于您没有设置config_private.py私密配置现将您的现有配置移动至config_private.py以防止配置丢失', log亮黄('由于您没有设置config_private.py私密配置现将您的现有配置移动至config_private.py以防止配置丢失',
'另外您可以随时在history子文件夹下找回旧版的程序。') '另外您可以随时在history子文件夹下找回旧版的程序。')
shutil.copyfile('config.py', 'config_private.py') shutil.copyfile('config.py', 'config_private.py')
path_new_version = glob.glob(path + '/*-master')[0] path_new_version = glob.glob(path + '/*-master')[0]
dir_util.copy_tree(path_new_version, './') dir_util.copy_tree(path_new_version, './') # ⭐ 将最新版本代码复制到当前目录
print亮绿('代码已经更新即将更新pip包依赖……')
for i in reversed(range(5)): time.sleep(1); print(i) log亮绿('代码已经更新即将更新pip包依赖……')
for i in reversed(range(5)): time.sleep(1); log亮绿(i)
try: try:
import subprocess import subprocess
subprocess.check_call([sys.executable, '-m', 'pip', 'install', '-r', 'requirements.txt']) subprocess.check_call([sys.executable, '-m', 'pip', 'install', '-r', 'requirements.txt'])
except: except:
print亮红('pip包依赖安装出现问题需要手动安装新增的依赖库 `python -m pip install -r requirements.txt`,然后在用常规的`python main.py`的方式启动。') log亮红('pip包依赖安装出现问题需要手动安装新增的依赖库 `python -m pip install -r requirements.txt`,然后在用常规的`python main.py`的方式启动。')
print亮绿('更新完成您可以随时在history子文件夹下找回旧版的程序5s之后重启')
print亮红('假如重启失败,您可能需要手动安装新增的依赖库 `python -m pip install -r requirements.txt`,然后在用常规的`python main.py`的方式启动。') log亮绿('更新完成您可以随时在history子文件夹下找回旧版的程序5s之后重启')
print(' ------------------------------ -----------------------------------') log亮红('假如重启失败,您可能需要手动安装新增的依赖库 `python -m pip install -r requirements.txt`,然后在用常规的`python main.py`的方式启动。')
for i in reversed(range(8)): time.sleep(1); print(i) log亮绿(' ------------------------------ -----------------------------------')
os.execl(sys.executable, sys.executable, *sys.argv)
for i in reversed(range(8)): time.sleep(1); log亮绿(i)
os.execl(sys.executable, sys.executable, *sys.argv) # 重启程序
def get_current_version(): def get_current_version():
"""
获取当前的版本号。
Returns:
str: 当前的版本号。如果无法获取版本号,则返回空字符串。
"""
import json import json
try: try:
with open('./version', 'r', encoding='utf8') as f: with open('./version', 'r', encoding='utf8') as f:
current_version = json.loads(f.read())['version'] current_version = json.loads(f.read())['version'] # ⭐ 从读取的json数据中提取版本号
except: except:
current_version = "" current_version = ""
return current_version return current_version
@@ -117,6 +168,12 @@ def get_current_version():
def auto_update(raise_error=False): def auto_update(raise_error=False):
""" """
一键更新协议:查询版本和用户意见 一键更新协议:查询版本和用户意见
Args:
raise_error (bool, optional): 是否在出错时抛出错误。默认为 False。
Returns:
None
""" """
try: try:
from toolbox import get_conf from toolbox import get_conf
@@ -135,22 +192,22 @@ def auto_update(raise_error=False):
current_version = f.read() current_version = f.read()
current_version = json.loads(current_version)['version'] current_version = json.loads(current_version)['version']
if (remote_version - current_version) >= 0.01-1e-5: if (remote_version - current_version) >= 0.01-1e-5:
from shared_utils.colorful import print亮黄 from shared_utils.colorful import log亮黄
print亮黄(f'\n新版本可用。新版本:{remote_version},当前版本:{current_version}{new_feature}') log亮黄(f'\n新版本可用。新版本:{remote_version},当前版本:{current_version}{new_feature}') # ⭐ 在控制台打印新版本信息
print('1Github更新地址:\nhttps://github.com/binary-husky/chatgpt_academic\n') logger.info('1Github更新地址:\nhttps://github.com/binary-husky/chatgpt_academic\n')
user_instruction = input('2是否一键更新代码Y+回车=确认,输入其他/无输入+回车=不更新)?') user_instruction = input('2是否一键更新代码Y+回车=确认,输入其他/无输入+回车=不更新)?')
if user_instruction in ['Y', 'y']: if user_instruction in ['Y', 'y']:
path = backup_and_download(current_version, remote_version) path = backup_and_download(current_version, remote_version) # ⭐ 备份并下载文件
try: try:
patch_and_restart(path) patch_and_restart(path) # ⭐ 执行覆盖并重启操作
except: except:
msg = '更新失败。' msg = '更新失败。'
if raise_error: if raise_error:
from toolbox import trimmed_format_exc from toolbox import trimmed_format_exc
msg += trimmed_format_exc() msg += trimmed_format_exc()
print(msg) logger.warning(msg)
else: else:
print('自动更新程序:已禁用') logger.info('自动更新程序:已禁用')
return return
else: else:
return return
@@ -159,10 +216,13 @@ def auto_update(raise_error=False):
if raise_error: if raise_error:
from toolbox import trimmed_format_exc from toolbox import trimmed_format_exc
msg += trimmed_format_exc() msg += trimmed_format_exc()
print(msg) logger.info(msg)
def warm_up_modules(): def warm_up_modules():
print('正在执行一些模块的预热 ...') """
预热模块,加载特定模块并执行预热操作。
"""
logger.info('正在执行一些模块的预热 ...')
from toolbox import ProxyNetworkActivate from toolbox import ProxyNetworkActivate
from request_llms.bridge_all import model_info from request_llms.bridge_all import model_info
with ProxyNetworkActivate("Warmup_Modules"): with ProxyNetworkActivate("Warmup_Modules"):
@@ -172,7 +232,17 @@ def warm_up_modules():
enc.encode("模块预热", disallowed_special=()) enc.encode("模块预热", disallowed_special=())
def warm_up_vectordb(): def warm_up_vectordb():
print('正在执行一些模块的预热 ...') """
执行一些模块的预热操作。
本函数主要用于执行一些模块的预热操作,确保在后续的流程中能够顺利运行。
⭐ 关键作用:预热模块
Returns:
None
"""
logger.info('正在执行一些模块的预热 ...')
from toolbox import ProxyNetworkActivate from toolbox import ProxyNetworkActivate
with ProxyNetworkActivate("Warmup_Modules"): with ProxyNetworkActivate("Warmup_Modules"):
import nltk import nltk

View File

@@ -7,11 +7,16 @@
Configuration reading priority: environment variable > config_private.py > config.py Configuration reading priority: environment variable > config_private.py > config.py
""" """
# [step 1]>> API_KEY = "sk-123456789xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx123456789"。极少数情况下还需要填写组织格式如org-123456789abcdefghijklmno的请向下翻找 API_ORG 设置项 # [step 1-1]>> ( 接入GPT等模型 ) API_KEY = "sk-123456789xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx123456789"。极少数情况下还需要填写组织格式如org-123456789abcdefghijklmno的请向下翻找 API_ORG 设置项
API_KEY = "此处填API密钥" # 可同时填写多个API-KEY用英文逗号分割例如API_KEY = "sk-openaikey1,sk-openaikey2,fkxxxx-api2dkey3,azure-apikey4" API_KEY = "此处填APIKEY" # 可同时填写多个API-KEY用英文逗号分割例如API_KEY = "sk-openaikey1,sk-openaikey2,fkxxxx-api2dkey3,azure-apikey4"
# [step 1-2]>> ( 接入通义 qwen-max ) 接入通义千问在线大模型api-key获取地址 https://dashscope.console.aliyun.com/
DASHSCOPE_API_KEY = "" # 阿里灵积云API_KEY
# [step 2]>> 改为True应用代理如果直接在海外服务器部署此处不修改如果使用本地或无地域限制的大模型时此处也不需要修改 # [step 1-3]>> ( 接入 deepseek-reasoner, 即 deepseek-r1 ) 深度求索(DeepSeek) API KEY默认请求地址为"https://api.deepseek.com/v1/chat/completions"
DEEPSEEK_API_KEY = ""
# [step 2]>> 改为True应用代理。如果使用本地或无地域限制的大模型时此处不修改如果直接在海外服务器部署此处不修改
USE_PROXY = False USE_PROXY = False
if USE_PROXY: if USE_PROXY:
""" """
@@ -32,30 +37,37 @@ else:
# [step 3]>> 模型选择是 (注意: LLM_MODEL是默认选中的模型, 它*必须*被包含在AVAIL_LLM_MODELS列表中 ) # [step 3]>> 模型选择是 (注意: LLM_MODEL是默认选中的模型, 它*必须*被包含在AVAIL_LLM_MODELS列表中 )
LLM_MODEL = "gpt-3.5-turbo-16k" # 可选 ↓↓↓ LLM_MODEL = "gpt-3.5-turbo-16k" # 可选 ↓↓↓
AVAIL_LLM_MODELS = ["gpt-4-1106-preview", "gpt-4-turbo-preview", "gpt-4-vision-preview", AVAIL_LLM_MODELS = ["qwen-max", "o1-mini", "o1-mini-2024-09-12", "o1", "o1-2024-12-17", "o1-preview", "o1-preview-2024-09-12",
"gpt-4-1106-preview", "gpt-4-turbo-preview", "gpt-4-vision-preview",
"gpt-4o", "gpt-4o-mini", "gpt-4-turbo", "gpt-4-turbo-2024-04-09", "gpt-4o", "gpt-4o-mini", "gpt-4-turbo", "gpt-4-turbo-2024-04-09",
"gpt-3.5-turbo-1106", "gpt-3.5-turbo-16k", "gpt-3.5-turbo", "azure-gpt-3.5", "gpt-3.5-turbo-1106", "gpt-3.5-turbo-16k", "gpt-3.5-turbo", "azure-gpt-3.5",
"gpt-4", "gpt-4-32k", "azure-gpt-4", "glm-4", "glm-4v", "glm-3-turbo", "gpt-4", "gpt-4-32k", "azure-gpt-4", "glm-4", "glm-4v", "glm-3-turbo",
"gemini-pro", "chatglm3" "gemini-1.5-pro", "chatglm3", "chatglm4",
"deepseek-chat", "deepseek-coder", "deepseek-reasoner"
] ]
EMBEDDING_MODEL = "text-embedding-3-small"
# --- --- --- --- # --- --- --- ---
# P.S. 其他可用的模型还包括 # P.S. 其他可用的模型还包括
# AVAIL_LLM_MODELS = [ # AVAIL_LLM_MODELS = [
# "glm-4-0520", "glm-4-air", "glm-4-airx", "glm-4-flash", # "glm-4-0520", "glm-4-air", "glm-4-airx", "glm-4-flash",
# "qianfan", "deepseekcoder", # "qianfan", "deepseekcoder",
# "spark", "sparkv2", "sparkv3", "sparkv3.5", "sparkv4", # "spark", "sparkv2", "sparkv3", "sparkv3.5", "sparkv4",
# "qwen-turbo", "qwen-plus", "qwen-max", "qwen-local", # "qwen-turbo", "qwen-plus", "qwen-local",
# "moonshot-v1-128k", "moonshot-v1-32k", "moonshot-v1-8k", # "moonshot-v1-128k", "moonshot-v1-32k", "moonshot-v1-8k",
# "gpt-3.5-turbo-0613", "gpt-3.5-turbo-16k-0613", "gpt-3.5-turbo-0125", "gpt-4o-2024-05-13" # "gpt-3.5-turbo-0613", "gpt-3.5-turbo-16k-0613", "gpt-3.5-turbo-0125", "gpt-4o-2024-05-13"
# "claude-3-haiku-20240307","claude-3-sonnet-20240229","claude-3-opus-20240229", "claude-2.1", "claude-instant-1.2", # "claude-3-haiku-20240307","claude-3-sonnet-20240229","claude-3-opus-20240229", "claude-2.1", "claude-instant-1.2",
# "moss", "llama2", "chatglm_onnx", "internlm", "jittorllms_pangualpha", "jittorllms_llama", # "moss", "llama2", "chatglm_onnx", "internlm", "jittorllms_pangualpha", "jittorllms_llama",
# "deepseek-chat" ,"deepseek-coder", # "deepseek-chat" ,"deepseek-coder",
# "gemini-1.5-flash",
# "yi-34b-chat-0205","yi-34b-chat-200k","yi-large","yi-medium","yi-spark","yi-large-turbo","yi-large-preview", # "yi-34b-chat-0205","yi-34b-chat-200k","yi-large","yi-medium","yi-spark","yi-large-turbo","yi-large-preview",
# "grok-beta",
# ] # ]
# --- --- --- --- # --- --- --- ---
# 此外您还可以在接入one-api/vllm/ollama时 # 此外您还可以在接入one-api/vllm/ollama/Openroute时,
# 使用"one-api-*","vllm-*","ollama-*"前缀直接使用非标准方式接入的模型,例如 # 使用"one-api-*","vllm-*","ollama-*","openrouter-*"前缀直接使用非标准方式接入的模型,例如
# AVAIL_LLM_MODELS = ["one-api-claude-3-sonnet-20240229(max_token=100000)", "ollama-phi3(max_token=4096)"] # AVAIL_LLM_MODELS = ["one-api-claude-3-sonnet-20240229(max_token=100000)", "ollama-phi3(max_token=4096)","openrouter-openai/gpt-4o-mini","openrouter-openai/chatgpt-4o-latest"]
# --- --- --- --- # --- --- --- ---
@@ -69,7 +81,7 @@ API_URL_REDIRECT = {}
# 多线程函数插件中默认允许多少路线程同时访问OpenAI。Free trial users的限制是每分钟3次Pay-as-you-go users的限制是每分钟3500次 # 多线程函数插件中默认允许多少路线程同时访问OpenAI。Free trial users的限制是每分钟3次Pay-as-you-go users的限制是每分钟3500次
# 一言以蔽之免费5刀用户填3OpenAI绑了信用卡的用户可以填 16 或者更高。提高限制请查询https://platform.openai.com/docs/guides/rate-limits/overview # 一言以蔽之免费5刀用户填3OpenAI绑了信用卡的用户可以填 16 或者更高。提高限制请查询https://platform.openai.com/docs/guides/rate-limits/overview
DEFAULT_WORKER_NUM = 3 DEFAULT_WORKER_NUM = 8
# 色彩主题, 可选 ["Default", "Chuanhu-Small-and-Beautiful", "High-Contrast"] # 色彩主题, 可选 ["Default", "Chuanhu-Small-and-Beautiful", "High-Contrast"]
@@ -77,6 +89,31 @@ DEFAULT_WORKER_NUM = 3
THEME = "Default" THEME = "Default"
AVAIL_THEMES = ["Default", "Chuanhu-Small-and-Beautiful", "High-Contrast", "Gstaff/Xkcd", "NoCrypt/Miku"] AVAIL_THEMES = ["Default", "Chuanhu-Small-and-Beautiful", "High-Contrast", "Gstaff/Xkcd", "NoCrypt/Miku"]
FONT = "Theme-Default-Font"
AVAIL_FONTS = [
"默认值(Theme-Default-Font)",
"宋体(SimSun)",
"黑体(SimHei)",
"楷体(KaiTi)",
"仿宋(FangSong)",
"华文细黑(STHeiti Light)",
"华文楷体(STKaiti)",
"华文仿宋(STFangsong)",
"华文宋体(STSong)",
"华文中宋(STZhongsong)",
"华文新魏(STXinwei)",
"华文隶书(STLiti)",
# 备注:以下字体需要网络支持,您可以自定义任意您喜欢的字体,如下所示,需要满足的格式为 "字体昵称(字体英文真名@字体css下载链接)"
"思源宋体(Source Han Serif CN VF@https://chinese-fonts-cdn.deno.dev/packages/syst/dist/SourceHanSerifCN/result.css)",
"月星楷(Moon Stars Kai HW@https://chinese-fonts-cdn.deno.dev/packages/moon-stars-kai/dist/MoonStarsKaiHW-Regular/result.css)",
"珠圆体(MaokenZhuyuanTi@https://chinese-fonts-cdn.deno.dev/packages/mkzyt/dist/猫啃珠圆体/result.css)",
"平方萌萌哒(PING FANG MENG MNEG DA@https://chinese-fonts-cdn.deno.dev/packages/pfmmd/dist/平方萌萌哒/result.css)",
"Helvetica",
"ui-sans-serif",
"sans-serif",
"system-ui"
]
# 默认的系统提示词system prompt # 默认的系统提示词system prompt
INIT_SYS_PROMPT = "Serve me as a writing and programming assistant." INIT_SYS_PROMPT = "Serve me as a writing and programming assistant."
@@ -128,16 +165,15 @@ MULTI_QUERY_LLM_MODELS = "gpt-3.5-turbo&chatglm3"
QWEN_LOCAL_MODEL_SELECTION = "Qwen/Qwen-1_8B-Chat-Int8" QWEN_LOCAL_MODEL_SELECTION = "Qwen/Qwen-1_8B-Chat-Int8"
# 接入通义千问在线大模型 https://dashscope.console.aliyun.com/
DASHSCOPE_API_KEY = "" # 阿里灵积云API_KEY
# 百度千帆LLM_MODEL="qianfan" # 百度千帆LLM_MODEL="qianfan"
BAIDU_CLOUD_API_KEY = '' BAIDU_CLOUD_API_KEY = ''
BAIDU_CLOUD_SECRET_KEY = '' BAIDU_CLOUD_SECRET_KEY = ''
BAIDU_CLOUD_QIANFAN_MODEL = 'ERNIE-Bot' # 可选 "ERNIE-Bot-4"(文心大模型4.0), "ERNIE-Bot"(文心一言), "ERNIE-Bot-turbo", "BLOOMZ-7B", "Llama-2-70B-Chat", "Llama-2-13B-Chat", "Llama-2-7B-Chat", "ERNIE-Speed-128K", "ERNIE-Speed-8K", "ERNIE-Lite-8K" BAIDU_CLOUD_QIANFAN_MODEL = 'ERNIE-Bot' # 可选 "ERNIE-Bot-4"(文心大模型4.0), "ERNIE-Bot"(文心一言), "ERNIE-Bot-turbo", "BLOOMZ-7B", "Llama-2-70B-Chat", "Llama-2-13B-Chat", "Llama-2-7B-Chat", "ERNIE-Speed-128K", "ERNIE-Speed-8K", "ERNIE-Lite-8K"
# 如果使用ChatGLM3或ChatGLM4本地模型请把 LLM_MODEL="chatglm3" 或LLM_MODEL="chatglm4",并在此处指定模型路径
CHATGLM_LOCAL_MODEL_PATH = "THUDM/glm-4-9b-chat" # 例如"/home/hmp/ChatGLM3-6B/"
# 如果使用ChatGLM2微调模型请把 LLM_MODEL="chatglmft",并在此处指定模型路径 # 如果使用ChatGLM2微调模型请把 LLM_MODEL="chatglmft",并在此处指定模型路径
CHATGLM_PTUNING_CHECKPOINT = "" # 例如"/home/hmp/ChatGLM2-6B/ptuning/output/6b-pt-128-1e-2/checkpoint-100" CHATGLM_PTUNING_CHECKPOINT = "" # 例如"/home/hmp/ChatGLM2-6B/ptuning/output/6b-pt-128-1e-2/checkpoint-100"
@@ -231,13 +267,11 @@ MOONSHOT_API_KEY = ""
YIMODEL_API_KEY = "" YIMODEL_API_KEY = ""
# 深度求索(DeepSeek) API KEY默认请求地址为"https://api.deepseek.com/v1/chat/completions"
DEEPSEEK_API_KEY = ""
# 紫东太初大模型 https://ai-maas.wair.ac.cn # 紫东太初大模型 https://ai-maas.wair.ac.cn
TAICHU_API_KEY = "" TAICHU_API_KEY = ""
# Grok API KEY
GROK_API_KEY = ""
# Mathpix 拥有执行PDF的OCR功能但是需要注册账号 # Mathpix 拥有执行PDF的OCR功能但是需要注册账号
MATHPIX_APPID = "" MATHPIX_APPID = ""
@@ -269,8 +303,8 @@ GROBID_URLS = [
] ]
# Searxng互联网检索服务 # Searxng互联网检索服务这是一个huggingface空间请前往huggingface复制该空间然后把自己新的空间地址填在这里
SEARXNG_URL = "https://cloud-1.agent-matrix.com/" SEARXNG_URLS = [ f"https://kaletianlre-beardvs{i}dd.hf.space/" for i in range(1,5) ]
# 是否允许通过自然语言描述修改本页的配置,该功能具有一定的危险性,默认关闭 # 是否允许通过自然语言描述修改本页的配置,该功能具有一定的危险性,默认关闭
@@ -294,8 +328,8 @@ ARXIV_CACHE_DIR = "gpt_log/arxiv_cache"
# 除了连接OpenAI之外还有哪些场合允许使用代理请尽量不要修改 # 除了连接OpenAI之外还有哪些场合允许使用代理请尽量不要修改
WHEN_TO_USE_PROXY = ["Download_LLM", "Download_Gradio_Theme", "Connect_Grobid", WHEN_TO_USE_PROXY = ["Connect_OpenAI", "Download_LLM", "Download_Gradio_Theme", "Connect_Grobid",
"Warmup_Modules", "Nougat_Download", "AutoGen"] "Warmup_Modules", "Nougat_Download", "AutoGen", "Connect_OpenAI_Embedding"]
# 启用插件热加载 # 启用插件热加载
@@ -306,6 +340,12 @@ PLUGIN_HOT_RELOAD = False
NUM_CUSTOM_BASIC_BTN = 4 NUM_CUSTOM_BASIC_BTN = 4
# 媒体智能体的服务地址这是一个huggingface空间请前往huggingface复制该空间然后把自己新的空间地址填在这里
DAAS_SERVER_URLS = [ f"https://niuziniu-biligpt{i}.hf.space/stream" for i in range(1,5) ]
# 在互联网搜索组件中负责将搜索结果整理成干净的Markdown
JINA_API_KEY = ""
""" """
--------------- 配置关联关系说明 --------------- --------------- 配置关联关系说明 ---------------
@@ -365,6 +405,7 @@ NUM_CUSTOM_BASIC_BTN = 4
本地大模型示意图 本地大模型示意图
├── "chatglm4"
├── "chatglm3" ├── "chatglm3"
├── "chatglm" ├── "chatglm"
├── "chatglm_onnx" ├── "chatglm_onnx"
@@ -395,7 +436,7 @@ NUM_CUSTOM_BASIC_BTN = 4
插件在线服务配置依赖关系示意图 插件在线服务配置依赖关系示意图
├── 互联网检索 ├── 互联网检索
│ └── SEARXNG_URL │ └── SEARXNG_URLS
├── 语音功能 ├── 语音功能
│ ├── ENABLE_AUDIO │ ├── ENABLE_AUDIO

444
config_private.py Normal file
View File

@@ -0,0 +1,444 @@
"""
以下所有配置也都支持利用环境变量覆写环境变量配置格式见docker-compose.yml。
读取优先级:环境变量 > config_private.py > config.py
--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
All the following configurations also support using environment variables to override,
and the environment variable configuration format can be seen in docker-compose.yml.
Configuration reading priority: environment variable > config_private.py > config.py
"""
# [step 1-1]>> ( 接入GPT等模型 ) API_KEY = "sk-123456789xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx123456789"。极少数情况下还需要填写组织格式如org-123456789abcdefghijklmno的请向下翻找 API_ORG 设置项
API_KEY = "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" # 可同时填写多个API-KEY用英文逗号分割例如API_KEY = "sk-openaikey1,sk-openaikey2,fkxxxx-api2dkey1,fkxxxx-api2dkey2"
# [step 1-2]>> ( 接入通义 qwen-max ) 接入通义千问在线大模型api-key获取地址 https://dashscope.console.aliyun.com/
DASHSCOPE_API_KEY = "" # 阿里灵积云API_KEY
# [step 1-3]>> ( 接入 deepseek-reasoner, 即 deepseek-r1 ) 深度求索(DeepSeek) API KEY默认请求地址为"https://api.deepseek.com/v1/chat/completions"
DEEPSEEK_API_KEY = "sk-d99b8cc6b7414cc88a5d950a3ff7585e"
# [step 2]>> 改为True应用代理。如果使用本地或无地域限制的大模型时此处不修改如果直接在海外服务器部署此处不修改
USE_PROXY = True
if USE_PROXY:
proxies = {
"http":"socks5h://192.168.8.9:1070", # 再例如 "http": "http://127.0.0.1:7890",
"https":"socks5h://192.168.8.9:1070", # 再例如 "https": "http://127.0.0.1:7890",
}
else:
proxies = None
DEFAULT_WORKER_NUM = 256
# [step 3]>> 模型选择是 (注意: LLM_MODEL是默认选中的模型, 它*必须*被包含在AVAIL_LLM_MODELS列表中 )
LLM_MODEL = "gpt-4-32k" # 可选 ↓↓↓
AVAIL_LLM_MODELS = ["deepseek-chat", "deepseek-coder", "deepseek-reasoner",
"gpt-4-1106-preview", "gpt-4-turbo-preview", "gpt-4-vision-preview",
"gpt-4o", "gpt-4o-mini", "gpt-4-turbo", "gpt-4-turbo-2024-04-09",
"gpt-3.5-turbo-1106", "gpt-3.5-turbo-16k", "gpt-3.5-turbo", "azure-gpt-3.5",
"gpt-4", "gpt-4-32k", "azure-gpt-4", "glm-4", "glm-4v", "glm-3-turbo",
"gemini-1.5-pro", "chatglm3", "chatglm4",
]
EMBEDDING_MODEL = "text-embedding-3-small"
# --- --- --- ---
# P.S. 其他可用的模型还包括
# AVAIL_LLM_MODELS = [
# "glm-4-0520", "glm-4-air", "glm-4-airx", "glm-4-flash",
# "qianfan", "deepseekcoder",
# "spark", "sparkv2", "sparkv3", "sparkv3.5", "sparkv4",
# "qwen-turbo", "qwen-plus", "qwen-local",
# "moonshot-v1-128k", "moonshot-v1-32k", "moonshot-v1-8k",
# "gpt-3.5-turbo-0613", "gpt-3.5-turbo-16k-0613", "gpt-3.5-turbo-0125", "gpt-4o-2024-05-13"
# "claude-3-haiku-20240307","claude-3-sonnet-20240229","claude-3-opus-20240229", "claude-2.1", "claude-instant-1.2",
# "moss", "llama2", "chatglm_onnx", "internlm", "jittorllms_pangualpha", "jittorllms_llama",
# "deepseek-chat" ,"deepseek-coder",
# "gemini-1.5-flash",
# "yi-34b-chat-0205","yi-34b-chat-200k","yi-large","yi-medium","yi-spark","yi-large-turbo","yi-large-preview",
# "grok-beta",
# ]
# --- --- --- ---
# 此外您还可以在接入one-api/vllm/ollama/Openroute时
# 使用"one-api-*","vllm-*","ollama-*","openrouter-*"前缀直接使用非标准方式接入的模型,例如
# AVAIL_LLM_MODELS = ["one-api-claude-3-sonnet-20240229(max_token=100000)", "ollama-phi3(max_token=4096)","openrouter-openai/gpt-4o-mini","openrouter-openai/chatgpt-4o-latest"]
# --- --- --- ---
# --------------- 以下配置可以优化体验 ---------------
# 重新URL重新定向实现更换API_URL的作用高危设置! 常规情况下不要修改! 通过修改此设置您将把您的API-KEY和对话隐私完全暴露给您设定的中间人
# 格式: API_URL_REDIRECT = {"https://api.openai.com/v1/chat/completions": "在这里填写重定向的api.openai.com的URL"}
# 举例: API_URL_REDIRECT = {"https://api.openai.com/v1/chat/completions": "https://reverse-proxy-url/v1/chat/completions", "http://localhost:11434/api/chat": "在这里填写您ollama的URL"}
API_URL_REDIRECT = {}
# 多线程函数插件中默认允许多少路线程同时访问OpenAI。Free trial users的限制是每分钟3次Pay-as-you-go users的限制是每分钟3500次
# 一言以蔽之免费5刀用户填3OpenAI绑了信用卡的用户可以填 16 或者更高。提高限制请查询https://platform.openai.com/docs/guides/rate-limits/overview
DEFAULT_WORKER_NUM = 64
# 色彩主题, 可选 ["Default", "Chuanhu-Small-and-Beautiful", "High-Contrast"]
# 更多主题, 请查阅Gradio主题商店: https://huggingface.co/spaces/gradio/theme-gallery 可选 ["Gstaff/Xkcd", "NoCrypt/Miku", ...]
THEME = "Default"
AVAIL_THEMES = ["Default", "Chuanhu-Small-and-Beautiful", "High-Contrast", "Gstaff/Xkcd", "NoCrypt/Miku"]
FONT = "Theme-Default-Font"
AVAIL_FONTS = [
"默认值(Theme-Default-Font)",
"宋体(SimSun)",
"黑体(SimHei)",
"楷体(KaiTi)",
"仿宋(FangSong)",
"华文细黑(STHeiti Light)",
"华文楷体(STKaiti)",
"华文仿宋(STFangsong)",
"华文宋体(STSong)",
"华文中宋(STZhongsong)",
"华文新魏(STXinwei)",
"华文隶书(STLiti)",
"思源宋体(Source Han Serif CN VF@https://chinese-fonts-cdn.deno.dev/packages/syst/dist/SourceHanSerifCN/result.css)",
"月星楷(Moon Stars Kai HW@https://chinese-fonts-cdn.deno.dev/packages/moon-stars-kai/dist/MoonStarsKaiHW-Regular/result.css)",
"珠圆体(MaokenZhuyuanTi@https://chinese-fonts-cdn.deno.dev/packages/mkzyt/dist/猫啃珠圆体/result.css)",
"平方萌萌哒(PING FANG MENG MNEG DA@https://chinese-fonts-cdn.deno.dev/packages/pfmmd/dist/平方萌萌哒/result.css)",
"Helvetica",
"ui-sans-serif",
"sans-serif",
"system-ui"
]
# 默认的系统提示词system prompt
INIT_SYS_PROMPT = "Serve me as a writing and programming assistant."
# 对话窗的高度 仅在LAYOUT="TOP-DOWN"时生效)
CHATBOT_HEIGHT = 1115
# 代码高亮
CODE_HIGHLIGHT = True
# 窗口布局
LAYOUT = "LEFT-RIGHT" # "LEFT-RIGHT"(左右布局) # "TOP-DOWN"(上下布局)
# 暗色模式 / 亮色模式
DARK_MODE = True
# 发送请求到OpenAI后等待多久判定为超时
TIMEOUT_SECONDS = 60
# 网页的端口, -1代表随机端口
WEB_PORT = 19998
# 是否自动打开浏览器页面
AUTO_OPEN_BROWSER = True
# 如果OpenAI不响应网络卡顿、代理失败、KEY失效重试的次数限制
MAX_RETRY = 5
# 插件分类默认选项
DEFAULT_FN_GROUPS = ['对话', '编程', '学术', '智能体']
# 定义界面上“询问多个GPT模型”插件应该使用哪些模型请从AVAIL_LLM_MODELS中选择并在不同模型之间用`&`间隔,例如"gpt-3.5-turbo&chatglm3&azure-gpt-4"
MULTI_QUERY_LLM_MODELS = "gpt-3.5-turbo&chatglm3"
# 选择本地模型变体只有当AVAIL_LLM_MODELS包含了对应本地模型时才会起作用
# 如果你选择Qwen系列的模型那么请在下面的QWEN_MODEL_SELECTION中指定具体的模型
# 也可以是具体的模型路径
QWEN_LOCAL_MODEL_SELECTION = "Qwen/Qwen-1_8B-Chat-Int8"
# 百度千帆LLM_MODEL="qianfan"
BAIDU_CLOUD_API_KEY = ''
BAIDU_CLOUD_SECRET_KEY = ''
BAIDU_CLOUD_QIANFAN_MODEL = 'ERNIE-Bot' # 可选 "ERNIE-Bot-4"(文心大模型4.0), "ERNIE-Bot"(文心一言), "ERNIE-Bot-turbo", "BLOOMZ-7B", "Llama-2-70B-Chat", "Llama-2-13B-Chat", "Llama-2-7B-Chat", "ERNIE-Speed-128K", "ERNIE-Speed-8K", "ERNIE-Lite-8K"
# 如果使用ChatGLM3或ChatGLM4本地模型请把 LLM_MODEL="chatglm3" 或LLM_MODEL="chatglm4",并在此处指定模型路径
CHATGLM_LOCAL_MODEL_PATH = "THUDM/glm-4-9b-chat" # 例如"/home/hmp/ChatGLM3-6B/"
# 如果使用ChatGLM2微调模型请把 LLM_MODEL="chatglmft",并在此处指定模型路径
CHATGLM_PTUNING_CHECKPOINT = "" # 例如"/home/hmp/ChatGLM2-6B/ptuning/output/6b-pt-128-1e-2/checkpoint-100"
# 本地LLM模型如ChatGLM的执行方式 CPU/GPU
LOCAL_MODEL_DEVICE = "cpu" # 可选 "cuda"
LOCAL_MODEL_QUANT = "FP16" # 默认 "FP16" "INT4" 启用量化INT4版本 "INT8" 启用量化INT8版本
# 设置gradio的并行线程数不需要修改
CONCURRENT_COUNT = 100
# 是否在提交时自动清空输入框
AUTO_CLEAR_TXT = False
# 加一个live2d装饰
ADD_WAIFU = False
# 设置用户名和密码不需要修改相关功能不稳定与gradio版本和网络都相关如果本地使用不建议加这个
# [("username", "password"), ("username2", "password2"), ...]
AUTHENTICATION = [("van", "L807878712"),("", "L807878712"),("", "L807878712"),("", "L807878712"),("z", "czh123456789")]
# 如果需要在二级路径下运行(常规情况下,不要修改!!
# (举例 CUSTOM_PATH = "/gpt_academic",可以让软件运行在 http://ip:port/gpt_academic/ 下。)
CUSTOM_PATH = "/"
# HTTPS 秘钥和证书(不需要修改)
SSL_KEYFILE = ""
SSL_CERTFILE = ""
# 极少数情况下openai的官方KEY需要伴随组织编码格式如org-xxxxxxxxxxxxxxxxxxxxxxxx使用
API_ORG = ""
# 如果需要使用Slack Claude使用教程详情见 request_llms/README.md
SLACK_CLAUDE_BOT_ID = ''
SLACK_CLAUDE_USER_TOKEN = ''
# 如果需要使用AZURE方法一单个azure模型部署详情请见额外文档 docs\use_azure.md
AZURE_ENDPOINT = "https://你亲手写的api名称.openai.azure.com/"
AZURE_API_KEY = "填入azure openai api的密钥" # 建议直接在API_KEY处填写该选项即将被弃用
AZURE_ENGINE = "填入你亲手写的部署名" # 读 docs\use_azure.md
# 如果需要使用AZURE方法二多个azure模型部署+动态切换)详情请见额外文档 docs\use_azure.md
AZURE_CFG_ARRAY = {}
# 阿里云实时语音识别 配置难度较高
# 参考 https://github.com/binary-husky/gpt_academic/blob/master/docs/use_audio.md
ENABLE_AUDIO = False
ALIYUN_TOKEN="" # 例如 f37f30e0f9934c34a992f6f64f7eba4f
ALIYUN_APPKEY="" # 例如 RoPlZrM88DnAFkZK
ALIYUN_ACCESSKEY="" # (无需填写)
ALIYUN_SECRET="" # (无需填写)
# GPT-SOVITS 文本转语音服务的运行地址(将语言模型的生成文本朗读出来)
TTS_TYPE = "DISABLE" # EDGE_TTS / LOCAL_SOVITS_API / DISABLE
GPT_SOVITS_URL = ""
EDGE_TTS_VOICE = "zh-CN-XiaoxiaoNeural"
# 接入讯飞星火大模型 https://console.xfyun.cn/services/iat
XFYUN_APPID = "00000000"
XFYUN_API_SECRET = "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"
XFYUN_API_KEY = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
# 接入智谱大模型
ZHIPUAI_API_KEY = ""
ZHIPUAI_MODEL = "" # 此选项已废弃,不再需要填写
# Claude API KEY
ANTHROPIC_API_KEY = ""
# 月之暗面 API KEY
MOONSHOT_API_KEY = ""
# 零一万物(Yi Model) API KEY
YIMODEL_API_KEY = ""
# 紫东太初大模型 https://ai-maas.wair.ac.cn
TAICHU_API_KEY = ""
# Grok API KEY
GROK_API_KEY = ""
# Mathpix 拥有执行PDF的OCR功能但是需要注册账号
MATHPIX_APPID = ""
MATHPIX_APPKEY = ""
# DOC2X的PDF解析服务注册账号并获取API KEY: https://doc2x.noedgeai.com/login
DOC2X_API_KEY = ""
# 自定义API KEY格式
CUSTOM_API_KEY_PATTERN = ""
# Google Gemini API-Key
GEMINI_API_KEY = ''
# HUGGINGFACE的TOKEN下载LLAMA时起作用 https://huggingface.co/docs/hub/security-tokens
HUGGINGFACE_ACCESS_TOKEN = "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"
# GROBID服务器地址填写多个可以均衡负载用于高质量地读取PDF文档
# 获取方法复制以下空间https://huggingface.co/spaces/qingxu98/grobid设为public然后GROBID_URL = "https://(你的hf用户名如qingxu98)-(你的填写的空间名如grobid).hf.space"
GROBID_URLS = [
"https://qingxu98-grobid.hf.space","https://qingxu98-grobid2.hf.space","https://qingxu98-grobid3.hf.space",
"https://qingxu98-grobid4.hf.space","https://qingxu98-grobid5.hf.space", "https://qingxu98-grobid6.hf.space",
"https://qingxu98-grobid7.hf.space", "https://qingxu98-grobid8.hf.space",
]
# Searxng互联网检索服务这是一个huggingface空间请前往huggingface复制该空间然后把自己新的空间地址填在这里
SEARXNG_URLS = [ f"https://kaletianlre-beardvs{i}dd.hf.space/" for i in range(1,5) ]
# 是否允许通过自然语言描述修改本页的配置,该功能具有一定的危险性,默认关闭
ALLOW_RESET_CONFIG = False
# 在使用AutoGen插件时是否使用Docker容器运行代码
AUTOGEN_USE_DOCKER = False
# 临时的上传文件夹位置,请尽量不要修改
PATH_PRIVATE_UPLOAD = "private_upload"
# 日志文件夹的位置,请尽量不要修改
PATH_LOGGING = "gpt_log"
# 存储翻译好的arxiv论文的路径请尽量不要修改
ARXIV_CACHE_DIR = "gpt_log/arxiv_cache"
# 除了连接OpenAI之外还有哪些场合允许使用代理请尽量不要修改
WHEN_TO_USE_PROXY = ["Connect_OpenAI", "Download_LLM", "Download_Gradio_Theme", "Connect_Grobid",
"Warmup_Modules", "Nougat_Download", "AutoGen", "Connect_OpenAI_Embedding"]
# 启用插件热加载
PLUGIN_HOT_RELOAD = False
# 自定义按钮的最大数量限制
NUM_CUSTOM_BASIC_BTN = 4
# 媒体智能体的服务地址这是一个huggingface空间请前往huggingface复制该空间然后把自己新的空间地址填在这里
DAAS_SERVER_URLS = [ f"https://niuziniu-biligpt{i}.hf.space/stream" for i in range(1,5) ]
"""
--------------- 配置关联关系说明 ---------------
在线大模型配置关联关系示意图
├── "gpt-3.5-turbo" 等openai模型
│ ├── API_KEY
│ ├── CUSTOM_API_KEY_PATTERN不常用
│ ├── API_ORG不常用
│ └── API_URL_REDIRECT不常用
├── "azure-gpt-3.5" 等azure模型单个azure模型不需要动态切换
│ ├── API_KEY
│ ├── AZURE_ENDPOINT
│ ├── AZURE_API_KEY
│ ├── AZURE_ENGINE
│ └── API_URL_REDIRECT
├── "azure-gpt-3.5" 等azure模型多个azure模型需要动态切换高优先级
│ └── AZURE_CFG_ARRAY
├── "spark" 星火认知大模型 spark & sparkv2
│ ├── XFYUN_APPID
│ ├── XFYUN_API_SECRET
│ └── XFYUN_API_KEY
├── "claude-3-opus-20240229" 等claude模型
│ └── ANTHROPIC_API_KEY
├── "stack-claude"
│ ├── SLACK_CLAUDE_BOT_ID
│ └── SLACK_CLAUDE_USER_TOKEN
├── "qianfan" 百度千帆大模型库
│ ├── BAIDU_CLOUD_QIANFAN_MODEL
│ ├── BAIDU_CLOUD_API_KEY
│ └── BAIDU_CLOUD_SECRET_KEY
├── "glm-4", "glm-3-turbo", "zhipuai" 智谱AI大模型
│ └── ZHIPUAI_API_KEY
├── "yi-34b-chat-0205", "yi-34b-chat-200k" 等零一万物(Yi Model)大模型
│ └── YIMODEL_API_KEY
├── "qwen-turbo" 等通义千问大模型
│ └── DASHSCOPE_API_KEY
├── "Gemini"
│ └── GEMINI_API_KEY
└── "one-api-...(max_token=...)" 用一种更方便的方式接入one-api多模型管理界面
├── AVAIL_LLM_MODELS
├── API_KEY
└── API_URL_REDIRECT
本地大模型示意图
├── "chatglm4"
├── "chatglm3"
├── "chatglm"
├── "chatglm_onnx"
├── "chatglmft"
├── "internlm"
├── "moss"
├── "jittorllms_pangualpha"
├── "jittorllms_llama"
├── "deepseekcoder"
├── "qwen-local"
├── RWKV的支持见Wiki
└── "llama2"
用户图形界面布局依赖关系示意图
├── CHATBOT_HEIGHT 对话窗的高度
├── CODE_HIGHLIGHT 代码高亮
├── LAYOUT 窗口布局
├── DARK_MODE 暗色模式 / 亮色模式
├── DEFAULT_FN_GROUPS 插件分类默认选项
├── THEME 色彩主题
├── AUTO_CLEAR_TXT 是否在提交时自动清空输入框
├── ADD_WAIFU 加一个live2d装饰
└── ALLOW_RESET_CONFIG 是否允许通过自然语言描述修改本页的配置,该功能具有一定的危险性
插件在线服务配置依赖关系示意图
├── 互联网检索
│ └── SEARXNG_URLS
├── 语音功能
│ ├── ENABLE_AUDIO
│ ├── ALIYUN_TOKEN
│ ├── ALIYUN_APPKEY
│ ├── ALIYUN_ACCESSKEY
│ └── ALIYUN_SECRET
└── PDF文档精准解析
├── GROBID_URLS
├── MATHPIX_APPID
└── MATHPIX_APPKEY
"""

View File

@@ -17,7 +17,7 @@ def get_core_functions():
text_show_english= text_show_english=
r"Below is a paragraph from an academic paper. Polish the writing to meet the academic style, " r"Below is a paragraph from an academic paper. Polish the writing to meet the academic style, "
r"improve the spelling, grammar, clarity, concision and overall readability. When necessary, rewrite the whole sentence. " r"improve the spelling, grammar, clarity, concision and overall readability. When necessary, rewrite the whole sentence. "
r"Firstly, you should provide the polished paragraph. " r"Firstly, you should provide the polished paragraph (in English). "
r"Secondly, you should list all your modification and explain the reasons to do so in markdown table.", r"Secondly, you should list all your modification and explain the reasons to do so in markdown table.",
text_show_chinese= text_show_chinese=
r"作为一名中文学术论文写作改进助理,你的任务是改进所提供文本的拼写、语法、清晰、简洁和整体可读性," r"作为一名中文学术论文写作改进助理,你的任务是改进所提供文本的拼写、语法、清晰、简洁和整体可读性,"

View File

@@ -1,6 +1,6 @@
from toolbox import HotReload # HotReload 的意思是热更新,修改函数插件后,不需要重启程序,代码直接生效 from toolbox import HotReload # HotReload 的意思是热更新,修改函数插件后,不需要重启程序,代码直接生效
from toolbox import trimmed_format_exc from toolbox import trimmed_format_exc
from loguru import logger
def get_crazy_functions(): def get_crazy_functions():
from crazy_functions.读文章写摘要 import 读文章写摘要 from crazy_functions.读文章写摘要 import 读文章写摘要
@@ -16,7 +16,7 @@ def get_crazy_functions():
from crazy_functions.SourceCode_Analyse import 解析一个前端项目 from crazy_functions.SourceCode_Analyse import 解析一个前端项目
from crazy_functions.高级功能函数模板 import 高阶功能模板函数 from crazy_functions.高级功能函数模板 import 高阶功能模板函数
from crazy_functions.高级功能函数模板 import Demo_Wrap from crazy_functions.高级功能函数模板 import Demo_Wrap
from crazy_functions.Latex全文润色 import Latex英文润色 from crazy_functions.Latex_Project_Polish import Latex英文润色
from crazy_functions.询问多个大语言模型 import 同时问询 from crazy_functions.询问多个大语言模型 import 同时问询
from crazy_functions.SourceCode_Analyse import 解析一个Lua项目 from crazy_functions.SourceCode_Analyse import 解析一个Lua项目
from crazy_functions.SourceCode_Analyse import 解析一个CSharp项目 from crazy_functions.SourceCode_Analyse import 解析一个CSharp项目
@@ -32,8 +32,8 @@ def get_crazy_functions():
from crazy_functions.PDF_Translate import 批量翻译PDF文档 from crazy_functions.PDF_Translate import 批量翻译PDF文档
from crazy_functions.谷歌检索小助手 import 谷歌检索小助手 from crazy_functions.谷歌检索小助手 import 谷歌检索小助手
from crazy_functions.理解PDF文档内容 import 理解PDF文档内容标准文件输入 from crazy_functions.理解PDF文档内容 import 理解PDF文档内容标准文件输入
from crazy_functions.Latex全文润色 import Latex中文润色 from crazy_functions.Latex_Project_Polish import Latex中文润色
from crazy_functions.Latex全文润色 import Latex英文纠错 from crazy_functions.Latex_Project_Polish import Latex英文纠错
from crazy_functions.Markdown_Translate import Markdown中译英 from crazy_functions.Markdown_Translate import Markdown中译英
from crazy_functions.虚空终端 import 虚空终端 from crazy_functions.虚空终端 import 虚空终端
from crazy_functions.生成多种Mermaid图表 import Mermaid_Gen from crazy_functions.生成多种Mermaid图表 import Mermaid_Gen
@@ -48,8 +48,17 @@ def get_crazy_functions():
from crazy_functions.Image_Generate import 图片生成_DALLE2, 图片生成_DALLE3, 图片修改_DALLE2 from crazy_functions.Image_Generate import 图片生成_DALLE2, 图片生成_DALLE3, 图片修改_DALLE2
from crazy_functions.Image_Generate_Wrap import ImageGen_Wrap from crazy_functions.Image_Generate_Wrap import ImageGen_Wrap
from crazy_functions.SourceCode_Comment import 注释Python项目 from crazy_functions.SourceCode_Comment import 注释Python项目
from crazy_functions.SourceCode_Comment_Wrap import SourceCodeComment_Wrap
from crazy_functions.VideoResource_GPT import 多媒体任务
function_plugins = { function_plugins = {
"多媒体智能体": {
"Group": "智能体",
"Color": "stop",
"AsButton": False,
"Info": "【仅测试】多媒体任务",
"Function": HotReload(多媒体任务),
},
"虚空终端": { "虚空终端": {
"Group": "对话|编程|学术|智能体", "Group": "对话|编程|学术|智能体",
"Color": "stop", "Color": "stop",
@@ -70,6 +79,7 @@ def get_crazy_functions():
"AsButton": False, "AsButton": False,
"Info": "上传一系列python源文件(或者压缩包), 为这些代码添加docstring | 输入参数为路径", "Info": "上传一系列python源文件(或者压缩包), 为这些代码添加docstring | 输入参数为路径",
"Function": HotReload(注释Python项目), "Function": HotReload(注释Python项目),
"Class": SourceCodeComment_Wrap,
}, },
"载入对话历史存档(先上传存档或输入路径)": { "载入对话历史存档(先上传存档或输入路径)": {
"Group": "对话", "Group": "对话",
@@ -103,7 +113,7 @@ def get_crazy_functions():
"Group": "学术", "Group": "学术",
"Color": "stop", "Color": "stop",
"AsButton": True, "AsButton": True,
"Info": "Arixv论文精细翻译 | 输入参数arxiv论文的ID比如1812.10695", "Info": "ArXiv论文精细翻译 | 输入参数arxiv论文的ID比如1812.10695",
"Function": HotReload(Latex翻译中文并重新编译PDF), # 当注册Class后Function旧接口仅会在“虚空终端”中起作用 "Function": HotReload(Latex翻译中文并重新编译PDF), # 当注册Class后Function旧接口仅会在“虚空终端”中起作用
"Class": Arxiv_Localize, # 新一代插件需要注册Class "Class": Arxiv_Localize, # 新一代插件需要注册Class
}, },
@@ -342,7 +352,7 @@ def get_crazy_functions():
"ArgsReminder": r"如果有必要, 请在此处给出自定义翻译命令, 解决部分词汇翻译不准确的问题。 " "ArgsReminder": r"如果有必要, 请在此处给出自定义翻译命令, 解决部分词汇翻译不准确的问题。 "
r"例如当单词'agent'翻译不准确时, 请尝试把以下指令复制到高级参数区: " r"例如当单词'agent'翻译不准确时, 请尝试把以下指令复制到高级参数区: "
r'If the term "agent" is used in this section, it should be translated to "智能体". ', r'If the term "agent" is used in this section, it should be translated to "智能体". ',
"Info": "Arixv论文精细翻译 | 输入参数arxiv论文的ID比如1812.10695", "Info": "ArXiv论文精细翻译 | 输入参数arxiv论文的ID比如1812.10695",
"Function": HotReload(Latex翻译中文并重新编译PDF), # 当注册Class后Function旧接口仅会在“虚空终端”中起作用 "Function": HotReload(Latex翻译中文并重新编译PDF), # 当注册Class后Function旧接口仅会在“虚空终端”中起作用
"Class": Arxiv_Localize, # 新一代插件需要注册Class "Class": Arxiv_Localize, # 新一代插件需要注册Class
}, },
@@ -421,39 +431,9 @@ def get_crazy_functions():
} }
) )
except: except:
print(trimmed_format_exc()) logger.error(trimmed_format_exc())
print("Load function plugin failed") logger.error("Load function plugin failed")
# try:
# from crazy_functions.联网的ChatGPT import 连接网络回答问题
# function_plugins.update(
# {
# "连接网络回答问题(输入问题后点击该插件,需要访问谷歌)": {
# "Group": "对话",
# "Color": "stop",
# "AsButton": False, # 加入下拉菜单中
# # "Info": "连接网络回答问题(需要访问谷歌)| 输入参数是一个问题",
# "Function": HotReload(连接网络回答问题),
# }
# }
# )
# from crazy_functions.联网的ChatGPT_bing版 import 连接bing搜索回答问题
# function_plugins.update(
# {
# "连接网络回答问题中文Bing版输入问题后点击该插件": {
# "Group": "对话",
# "Color": "stop",
# "AsButton": False, # 加入下拉菜单中
# "Info": "连接网络回答问题需要访问中文Bing| 输入参数是一个问题",
# "Function": HotReload(连接bing搜索回答问题),
# }
# }
# )
# except:
# print(trimmed_format_exc())
# print("Load function plugin failed")
try: try:
from crazy_functions.SourceCode_Analyse import 解析任意code项目 from crazy_functions.SourceCode_Analyse import 解析任意code项目
@@ -471,8 +451,8 @@ def get_crazy_functions():
} }
) )
except: except:
print(trimmed_format_exc()) logger.error(trimmed_format_exc())
print("Load function plugin failed") logger.error("Load function plugin failed")
try: try:
from crazy_functions.询问多个大语言模型 import 同时问询_指定模型 from crazy_functions.询问多个大语言模型 import 同时问询_指定模型
@@ -490,8 +470,8 @@ def get_crazy_functions():
} }
) )
except: except:
print(trimmed_format_exc()) logger.error(trimmed_format_exc())
print("Load function plugin failed") logger.error("Load function plugin failed")
@@ -512,8 +492,8 @@ def get_crazy_functions():
} }
) )
except: except:
print(trimmed_format_exc()) logger.error(trimmed_format_exc())
print("Load function plugin failed") logger.error("Load function plugin failed")
try: try:
from crazy_functions.数学动画生成manim import 动画生成 from crazy_functions.数学动画生成manim import 动画生成
@@ -530,8 +510,8 @@ def get_crazy_functions():
} }
) )
except: except:
print(trimmed_format_exc()) logger.error(trimmed_format_exc())
print("Load function plugin failed") logger.error("Load function plugin failed")
try: try:
from crazy_functions.Markdown_Translate import Markdown翻译指定语言 from crazy_functions.Markdown_Translate import Markdown翻译指定语言
@@ -549,8 +529,8 @@ def get_crazy_functions():
} }
) )
except: except:
print(trimmed_format_exc()) logger.error(trimmed_format_exc())
print("Load function plugin failed") logger.error("Load function plugin failed")
try: try:
from crazy_functions.知识库问答 import 知识库文件注入 from crazy_functions.知识库问答 import 知识库文件注入
@@ -568,8 +548,8 @@ def get_crazy_functions():
} }
) )
except: except:
print(trimmed_format_exc()) logger.error(trimmed_format_exc())
print("Load function plugin failed") logger.error("Load function plugin failed")
try: try:
from crazy_functions.知识库问答 import 读取知识库作答 from crazy_functions.知识库问答 import 读取知识库作答
@@ -587,8 +567,8 @@ def get_crazy_functions():
} }
) )
except: except:
print(trimmed_format_exc()) logger.error(trimmed_format_exc())
print("Load function plugin failed") logger.error("Load function plugin failed")
try: try:
from crazy_functions.交互功能函数模板 import 交互功能模板函数 from crazy_functions.交互功能函数模板 import 交互功能模板函数
@@ -604,8 +584,8 @@ def get_crazy_functions():
} }
) )
except: except:
print(trimmed_format_exc()) logger.error(trimmed_format_exc())
print("Load function plugin failed") logger.error("Load function plugin failed")
try: try:
@@ -627,8 +607,8 @@ def get_crazy_functions():
} }
) )
except: except:
print(trimmed_format_exc()) logger.error(trimmed_format_exc())
print("Load function plugin failed") logger.error("Load function plugin failed")
try: try:
from crazy_functions.批量翻译PDF文档_NOUGAT import 批量翻译PDF文档 from crazy_functions.批量翻译PDF文档_NOUGAT import 批量翻译PDF文档
@@ -644,8 +624,8 @@ def get_crazy_functions():
} }
) )
except: except:
print(trimmed_format_exc()) logger.error(trimmed_format_exc())
print("Load function plugin failed") logger.error("Load function plugin failed")
try: try:
from crazy_functions.函数动态生成 import 函数动态生成 from crazy_functions.函数动态生成 import 函数动态生成
@@ -661,8 +641,8 @@ def get_crazy_functions():
} }
) )
except: except:
print(trimmed_format_exc()) logger.error(trimmed_format_exc())
print("Load function plugin failed") logger.error("Load function plugin failed")
try: try:
from crazy_functions.多智能体 import 多智能体终端 from crazy_functions.多智能体 import 多智能体终端
@@ -678,8 +658,8 @@ def get_crazy_functions():
} }
) )
except: except:
print(trimmed_format_exc()) logger.error(trimmed_format_exc())
print("Load function plugin failed") logger.error("Load function plugin failed")
try: try:
from crazy_functions.互动小游戏 import 随机小游戏 from crazy_functions.互动小游戏 import 随机小游戏
@@ -695,8 +675,27 @@ def get_crazy_functions():
} }
) )
except: except:
print(trimmed_format_exc()) logger.error(trimmed_format_exc())
print("Load function plugin failed") logger.error("Load function plugin failed")
try:
from crazy_functions.Rag_Interface import Rag问答
function_plugins.update(
{
"Rag智能召回": {
"Group": "对话",
"Color": "stop",
"AsButton": False,
"Info": "将问答数据记录到向量库中,作为长期参考。",
"Function": HotReload(Rag问答),
},
}
)
except:
logger.error(trimmed_format_exc())
logger.error("Load function plugin failed")
# try: # try:
# from crazy_functions.高级功能函数模板 import 测试图表渲染 # from crazy_functions.高级功能函数模板 import 测试图表渲染
@@ -709,22 +708,9 @@ def get_crazy_functions():
# } # }
# }) # })
# except: # except:
# print(trimmed_format_exc()) # logger.error(trimmed_format_exc())
# print('Load function plugin failed') # print('Load function plugin failed')
# try:
# from crazy_functions.chatglm微调工具 import 微调数据集生成
# function_plugins.update({
# "黑盒模型学习: 微调数据集生成 (先上传数据集)": {
# "Color": "stop",
# "AsButton": False,
# "AdvancedArgs": True,
# "ArgsReminder": "针对数据集输入(如 绿帽子*深蓝色衬衫*黑色运动裤)给出指令,例如您可以将以下命令复制到下方: --llm_to_learn=azure-gpt-3.5 --prompt_prefix='根据下面的服装类型提示想象一个穿着者对这个人外貌、身处的环境、内心世界、过去经历进行描写。要求100字以内用第二人称。' --system_prompt=''",
# "Function": HotReload(微调数据集生成)
# }
# })
# except:
# print('Load function plugin failed')
""" """
设置默认值: 设置默认值:
@@ -744,3 +730,26 @@ def get_crazy_functions():
function_plugins[name]["Color"] = "secondary" function_plugins[name]["Color"] = "secondary"
return function_plugins return function_plugins
def get_multiplex_button_functions():
"""多路复用主提交按钮的功能映射
"""
return {
"常规对话":
"",
"查互联网后回答":
"查互联网后回答",
"多模型对话":
"询问多个GPT模型", # 映射到上面的 `询问多个GPT模型` 插件
"智能召回 RAG":
"Rag智能召回", # 映射到上面的 `Rag智能召回` 插件
"多媒体查询":
"多媒体智能体", # 映射到上面的 `多媒体智能体` 插件
}

View File

@@ -171,7 +171,7 @@ def 载入对话历史存档(txt, llm_kwargs, plugin_kwargs, chatbot, history, s
system_prompt 给gpt的静默提醒 system_prompt 给gpt的静默提醒
user_request 当前用户的请求信息IP地址等 user_request 当前用户的请求信息IP地址等
""" """
from .crazy_utils import get_files_from_everything from crazy_functions.crazy_utils import get_files_from_everything
success, file_manifest, _ = get_files_from_everything(txt, type='.html') success, file_manifest, _ = get_files_from_everything(txt, type='.html')
if not success: if not success:

View File

@@ -30,7 +30,7 @@ def gen_image(llm_kwargs, prompt, resolution="1024x1024", model="dall-e-2", qual
if style is not None: if style is not None:
data['style'] = style data['style'] = style
response = requests.post(url, headers=headers, json=data, proxies=proxies) response = requests.post(url, headers=headers, json=data, proxies=proxies)
print(response.content) # logger.info(response.content)
try: try:
image_url = json.loads(response.content.decode('utf8'))['data'][0]['url'] image_url = json.loads(response.content.decode('utf8'))['data'][0]['url']
except: except:
@@ -76,7 +76,7 @@ def edit_image(llm_kwargs, prompt, image_path, resolution="1024x1024", model="da
} }
response = requests.post(url, headers=headers, files=files, proxies=proxies) response = requests.post(url, headers=headers, files=files, proxies=proxies)
print(response.content) # logger.info(response.content)
try: try:
image_url = json.loads(response.content.decode('utf8'))['data'][0]['url'] image_url = json.loads(response.content.decode('utf8'))['data'][0]['url']
except: except:

View File

@@ -7,7 +7,7 @@ from bs4 import BeautifulSoup
from functools import lru_cache from functools import lru_cache
from itertools import zip_longest from itertools import zip_longest
from check_proxy import check_proxy from check_proxy import check_proxy
from toolbox import CatchException, update_ui, get_conf from toolbox import CatchException, update_ui, get_conf, update_ui_latest_msg
from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive, input_clipping from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive, input_clipping
from request_llms.bridge_all import model_info from request_llms.bridge_all import model_info
from request_llms.bridge_all import predict_no_ui_long_connection from request_llms.bridge_all import predict_no_ui_long_connection
@@ -49,7 +49,7 @@ def search_optimizer(
mutable = ["", time.time(), ""] mutable = ["", time.time(), ""]
llm_kwargs["temperature"] = 0.8 llm_kwargs["temperature"] = 0.8
try: try:
querys_json = predict_no_ui_long_connection( query_json = predict_no_ui_long_connection(
inputs=query, inputs=query,
llm_kwargs=llm_kwargs, llm_kwargs=llm_kwargs,
history=[], history=[],
@@ -57,31 +57,31 @@ def search_optimizer(
observe_window=mutable, observe_window=mutable,
) )
except Exception: except Exception:
querys_json = "1234" query_json = "null"
#* 尝试解码优化后的搜索结果 #* 尝试解码优化后的搜索结果
querys_json = re.sub(r"```json|```", "", querys_json) query_json = re.sub(r"```json|```", "", query_json)
try: try:
querys = json.loads(querys_json) queries = json.loads(query_json)
except Exception: except Exception:
#* 如果解码失败,降低温度再试一次 #* 如果解码失败,降低温度再试一次
try: try:
llm_kwargs["temperature"] = 0.4 llm_kwargs["temperature"] = 0.4
querys_json = predict_no_ui_long_connection( query_json = predict_no_ui_long_connection(
inputs=query, inputs=query,
llm_kwargs=llm_kwargs, llm_kwargs=llm_kwargs,
history=[], history=[],
sys_prompt=sys_prompt, sys_prompt=sys_prompt,
observe_window=mutable, observe_window=mutable,
) )
querys_json = re.sub(r"```json|```", "", querys_json) query_json = re.sub(r"```json|```", "", query_json)
querys = json.loads(querys_json) queries = json.loads(query_json)
except Exception: except Exception:
#* 如果再次失败,直接返回原始问题 #* 如果再次失败,直接返回原始问题
querys = [query] queries = [query]
links = [] links = []
success = 0 success = 0
Exceptions = "" Exceptions = ""
for q in querys: for q in queries:
try: try:
link = searxng_request(q, proxies, categories, searxng_url, engines=engines) link = searxng_request(q, proxies, categories, searxng_url, engines=engines)
if len(link) > 0: if len(link) > 0:
@@ -115,7 +115,8 @@ def get_auth_ip():
def searxng_request(query, proxies, categories='general', searxng_url=None, engines=None): def searxng_request(query, proxies, categories='general', searxng_url=None, engines=None):
if searxng_url is None: if searxng_url is None:
url = get_conf("SEARXNG_URL") urls = get_conf("SEARXNG_URLS")
url = random.choice(urls)
else: else:
url = searxng_url url = searxng_url
@@ -174,10 +175,17 @@ def scrape_text(url, proxies) -> str:
Returns: Returns:
str: The scraped text str: The scraped text
""" """
from loguru import logger
headers = { headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Safari/537.36', 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Safari/537.36',
'Content-Type': 'text/plain', 'Content-Type': 'text/plain',
} }
# 首先采用Jina进行文本提取
if get_conf("JINA_API_KEY"):
try: return jina_scrape_text(url)
except: logger.debug("Jina API 请求失败,回到旧方法")
try: try:
response = requests.get(url, headers=headers, proxies=proxies, timeout=8) response = requests.get(url, headers=headers, proxies=proxies, timeout=8)
if response.encoding == "ISO-8859-1": response.encoding = response.apparent_encoding if response.encoding == "ISO-8859-1": response.encoding = response.apparent_encoding
@@ -193,6 +201,56 @@ def scrape_text(url, proxies) -> str:
return text return text
def jina_scrape_text(url) -> str:
"jina_39727421c8fa4e4fa9bd698e5211feaaDyGeVFESNrRaepWiLT0wmHYJSh-d"
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Safari/537.36',
'Content-Type': 'text/plain',
"X-Retain-Images": "none",
"Authorization": f'Bearer {get_conf("JINA_API_KEY")}'
}
response = requests.get("https://r.jina.ai/" + url, headers=headers, proxies=None, timeout=8)
if response.status_code != 200:
raise ValueError("Jina API 请求失败,开始尝试旧方法!" + response.text)
if response.encoding == "ISO-8859-1": response.encoding = response.apparent_encoding
result = response.text
result = result.replace("\\[", "[").replace("\\]", "]").replace("\\(", "(").replace("\\)", ")")
return response.text
def internet_search_with_analysis_prompt(prompt, analysis_prompt, llm_kwargs, chatbot):
from toolbox import get_conf
proxies = get_conf('proxies')
categories = 'general'
searxng_url = None # 使用默认的searxng_url
engines = None # 使用默认的搜索引擎
yield from update_ui_latest_msg(lastmsg=f"检索中: {prompt} ...", chatbot=chatbot, history=[], delay=1)
urls = searxng_request(prompt, proxies, categories, searxng_url, engines=engines)
yield from update_ui_latest_msg(lastmsg=f"依次访问搜索到的网站 ...", chatbot=chatbot, history=[], delay=1)
if len(urls) == 0:
return None
max_search_result = 5 # 最多收纳多少个网页的结果
history = []
for index, url in enumerate(urls[:max_search_result]):
yield from update_ui_latest_msg(lastmsg=f"依次访问搜索到的网站: {url['link']} ...", chatbot=chatbot, history=[], delay=1)
res = scrape_text(url['link'], proxies)
prefix = f"{index}份搜索结果 [源自{url['source'][0]}搜索] {url['title'][:25]}"
history.extend([prefix, res])
i_say = f"从以上搜索结果中抽取信息,然后回答问题:{prompt} {analysis_prompt}"
i_say, history = input_clipping( # 裁剪输入从最长的条目开始裁剪防止爆token
inputs=i_say,
history=history,
max_token_limit=8192
)
gpt_say = predict_no_ui_long_connection(
inputs=i_say,
llm_kwargs=llm_kwargs,
history=history,
sys_prompt="请从搜索结果中抽取信息,对最相关的两个搜索结果进行总结,然后回答问题。",
console_silence=False,
)
return gpt_say
@CatchException @CatchException
def 连接网络回答问题(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request): def 连接网络回答问题(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):
optimizer_history = history[:-8] optimizer_history = history[:-8]
@@ -213,23 +271,52 @@ def 连接网络回答问题(txt, llm_kwargs, plugin_kwargs, chatbot, history, s
urls = search_optimizer(txt, proxies, optimizer_history, llm_kwargs, optimizer, categories, searxng_url, engines) urls = search_optimizer(txt, proxies, optimizer_history, llm_kwargs, optimizer, categories, searxng_url, engines)
history = [] history = []
if len(urls) == 0: if len(urls) == 0:
chatbot.append((f"结论:{txt}", chatbot.append((f"结论:{txt}", "[Local Message] 受到限制无法从searxng获取信息请尝试更换搜索引擎。"))
"[Local Message] 受到限制无法从searxng获取信息请尝试更换搜索引擎。"))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return return
# ------------- < 第2步依次访问网页 > ------------- # ------------- < 第2步依次访问网页 > -------------
from concurrent.futures import ThreadPoolExecutor
from textwrap import dedent
max_search_result = 5 # 最多收纳多少个网页的结果 max_search_result = 5 # 最多收纳多少个网页的结果
if optimizer == "开启(增强)": if optimizer == "开启(增强)":
max_search_result = 8 max_search_result = 8
chatbot.append(["联网检索中 ...", None]) template = dedent("""
for index, url in enumerate(urls[:max_search_result]): <details>
res = scrape_text(url['link'], proxies) <summary>{TITLE}</summary>
prefix = f"{index}份搜索结果 [源自{url['source'][0]}搜索] {url['title'][:25]}" <div class="search_result">{URL}</div>
history.extend([prefix, res]) <div class="search_result">{CONTENT}</div>
res_squeeze = res.replace('\n', '...') </details>
chatbot[-1] = [prefix + "\n\n" + res_squeeze[:500] + "......", None] """)
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
buffer = ""
# 创建线程池
with ThreadPoolExecutor(max_workers=5) as executor:
# 提交任务到线程池
futures = []
for index, url in enumerate(urls[:max_search_result]):
future = executor.submit(scrape_text, url['link'], proxies)
futures.append((index, future, url))
# 处理完成的任务
for index, future, url in futures:
# 开始
prefix = f"正在加载 第{index+1}份搜索结果 [源自{url['source'][0]}搜索] {url['title'][:25]}"
string_structure = template.format(TITLE=prefix, URL=url['link'], CONTENT="正在加载,请稍后 ......")
yield from update_ui_latest_msg(lastmsg=(buffer + string_structure), chatbot=chatbot, history=history, delay=0.1) # 刷新界面
# 获取结果
res = future.result()
# 显示结果
prefix = f"{index+1}份搜索结果 [源自{url['source'][0]}搜索] {url['title'][:25]}"
string_structure = template.format(TITLE=prefix, URL=url['link'], CONTENT=res[:1000] + "......")
buffer += string_structure
# 更新历史
history.extend([prefix, res])
yield from update_ui_latest_msg(lastmsg=buffer, chatbot=chatbot, history=history, delay=0.1) # 刷新界面
# ------------- < 第3步ChatGPT综合 > ------------- # ------------- < 第3步ChatGPT综合 > -------------
if (optimizer != "开启(增强)"): if (optimizer != "开启(增强)"):

View File

@@ -1,4 +1,4 @@
import random
from toolbox import get_conf from toolbox import get_conf
from crazy_functions.Internet_GPT import 连接网络回答问题 from crazy_functions.Internet_GPT import 连接网络回答问题
from crazy_functions.plugin_template.plugin_class_template import GptAcademicPluginTemplate, ArgProperty from crazy_functions.plugin_template.plugin_class_template import GptAcademicPluginTemplate, ArgProperty
@@ -20,6 +20,9 @@ class NetworkGPT_Wrap(GptAcademicPluginTemplate):
第三个参数,名称`allow_cache`,参数`type`声明这是一个下拉菜单,下拉菜单上方显示`title`+`description`,下拉菜单的选项为`options``default_value`为下拉菜单默认值; 第三个参数,名称`allow_cache`,参数`type`声明这是一个下拉菜单,下拉菜单上方显示`title`+`description`,下拉菜单的选项为`options``default_value`为下拉菜单默认值;
""" """
urls = get_conf("SEARXNG_URLS")
url = random.choice(urls)
gui_definition = { gui_definition = {
"main_input": "main_input":
ArgProperty(title="输入问题", description="待通过互联网检索的问题,会自动读取输入框内容", default_value="", type="string").model_dump_json(), # 主输入,自动从输入框同步 ArgProperty(title="输入问题", description="待通过互联网检索的问题,会自动读取输入框内容", default_value="", type="string").model_dump_json(), # 主输入,自动从输入框同步
@@ -30,16 +33,17 @@ class NetworkGPT_Wrap(GptAcademicPluginTemplate):
"optimizer": "optimizer":
ArgProperty(title="搜索优化", options=["关闭", "开启", "开启(增强)"], default_value="关闭", description="是否使用搜索增强。注意这可能会消耗较多token", type="dropdown").model_dump_json(), ArgProperty(title="搜索优化", options=["关闭", "开启", "开启(增强)"], default_value="关闭", description="是否使用搜索增强。注意这可能会消耗较多token", type="dropdown").model_dump_json(),
"searxng_url": "searxng_url":
ArgProperty(title="Searxng服务地址", description="输入Searxng的地址", default_value=get_conf("SEARXNG_URL"), type="string").model_dump_json(), # 主输入,自动从输入框同步 ArgProperty(title="Searxng服务地址", description="输入Searxng的地址", default_value=url, type="string").model_dump_json(), # 主输入,自动从输入框同步
} }
return gui_definition return gui_definition
def execute(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request): def execute(txt, llm_kwargs, plugin_kwargs:dict, chatbot, history, system_prompt, user_request):
""" """
执行插件 执行插件
""" """
if plugin_kwargs["categories"] == "网页": plugin_kwargs["categories"] = "general" if plugin_kwargs.get("categories", None) == "网页": plugin_kwargs["categories"] = "general"
if plugin_kwargs["categories"] == "学术论文": plugin_kwargs["categories"] = "science" elif plugin_kwargs.get("categories", None) == "学术论文": plugin_kwargs["categories"] = "science"
else: plugin_kwargs["categories"] = "general"
yield from 连接网络回答问题(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request) yield from 连接网络回答问题(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request)

View File

@@ -1,7 +1,9 @@
from toolbox import update_ui, trimmed_format_exc, get_conf, get_log_folder, promote_file_to_downloadzone, check_repeat_upload, map_file_to_sha256 from toolbox import update_ui, trimmed_format_exc, get_conf, get_log_folder, promote_file_to_downloadzone, check_repeat_upload, map_file_to_sha256
from toolbox import CatchException, report_exception, update_ui_lastest_msg, zip_result, gen_time_str from toolbox import CatchException, report_exception, update_ui_latest_msg, zip_result, gen_time_str
from functools import partial from functools import partial
import glob, os, requests, time, json, tarfile from loguru import logger
import glob, os, requests, time, json, tarfile, threading
pj = os.path.join pj = os.path.join
ARXIV_CACHE_DIR = get_conf("ARXIV_CACHE_DIR") ARXIV_CACHE_DIR = get_conf("ARXIV_CACHE_DIR")
@@ -39,7 +41,7 @@ def switch_prompt(pfg, mode, more_requirement):
return inputs_array, sys_prompt_array return inputs_array, sys_prompt_array
def desend_to_extracted_folder_if_exist(project_folder): def descend_to_extracted_folder_if_exist(project_folder):
""" """
Descend into the extracted folder if it exists, otherwise return the original folder. Descend into the extracted folder if it exists, otherwise return the original folder.
@@ -128,7 +130,7 @@ def arxiv_download(chatbot, history, txt, allow_cache=True):
if not txt.startswith('https://arxiv.org/abs/'): if not txt.startswith('https://arxiv.org/abs/'):
msg = f"解析arxiv网址失败, 期望格式例如: https://arxiv.org/abs/1707.06690。实际得到格式: {url_}" msg = f"解析arxiv网址失败, 期望格式例如: https://arxiv.org/abs/1707.06690。实际得到格式: {url_}"
yield from update_ui_lastest_msg(msg, chatbot=chatbot, history=history) # 刷新界面 yield from update_ui_latest_msg(msg, chatbot=chatbot, history=history) # 刷新界面
return msg, None return msg, None
# <-------------- set format -------------> # <-------------- set format ------------->
arxiv_id = url_.split('/abs/')[-1] arxiv_id = url_.split('/abs/')[-1]
@@ -136,25 +138,43 @@ def arxiv_download(chatbot, history, txt, allow_cache=True):
cached_translation_pdf = check_cached_translation_pdf(arxiv_id) cached_translation_pdf = check_cached_translation_pdf(arxiv_id)
if cached_translation_pdf and allow_cache: return cached_translation_pdf, arxiv_id if cached_translation_pdf and allow_cache: return cached_translation_pdf, arxiv_id
url_tar = url_.replace('/abs/', '/e-print/')
translation_dir = pj(ARXIV_CACHE_DIR, arxiv_id, 'e-print')
extract_dst = pj(ARXIV_CACHE_DIR, arxiv_id, 'extract') extract_dst = pj(ARXIV_CACHE_DIR, arxiv_id, 'extract')
os.makedirs(translation_dir, exist_ok=True) translation_dir = pj(ARXIV_CACHE_DIR, arxiv_id, 'e-print')
# <-------------- download arxiv source file ------------->
dst = pj(translation_dir, arxiv_id + '.tar') dst = pj(translation_dir, arxiv_id + '.tar')
if os.path.exists(dst): os.makedirs(translation_dir, exist_ok=True)
yield from update_ui_lastest_msg("调用缓存", chatbot=chatbot, history=history) # 刷新界面 # <-------------- download arxiv source file ------------->
def fix_url_and_download():
# for url_tar in [url_.replace('/abs/', '/e-print/'), url_.replace('/abs/', '/src/')]:
for url_tar in [url_.replace('/abs/', '/src/'), url_.replace('/abs/', '/e-print/')]:
proxies = get_conf('proxies')
r = requests.get(url_tar, proxies=proxies)
if r.status_code == 200:
with open(dst, 'wb+') as f:
f.write(r.content)
return True
return False
if os.path.exists(dst) and allow_cache:
yield from update_ui_latest_msg(f"调用缓存 {arxiv_id}", chatbot=chatbot, history=history) # 刷新界面
success = True
else: else:
yield from update_ui_lastest_msg("开始下载", chatbot=chatbot, history=history) # 刷新界面 yield from update_ui_latest_msg(f"开始下载 {arxiv_id}", chatbot=chatbot, history=history) # 刷新界面
proxies = get_conf('proxies') success = fix_url_and_download()
r = requests.get(url_tar, proxies=proxies) yield from update_ui_latest_msg(f"下载完成 {arxiv_id}", chatbot=chatbot, history=history) # 刷新界面
with open(dst, 'wb+') as f:
f.write(r.content)
if not success:
yield from update_ui_latest_msg(f"下载失败 {arxiv_id}", chatbot=chatbot, history=history)
raise tarfile.ReadError(f"论文下载失败 {arxiv_id}")
# <-------------- extract file -------------> # <-------------- extract file ------------->
yield from update_ui_lastest_msg("下载完成", chatbot=chatbot, history=history) # 刷新界面
from toolbox import extract_archive from toolbox import extract_archive
extract_archive(file_path=dst, dest_dir=extract_dst) try:
extract_archive(file_path=dst, dest_dir=extract_dst)
except tarfile.ReadError:
os.remove(dst)
raise tarfile.ReadError(f"论文下载失败")
return extract_dst, arxiv_id return extract_dst, arxiv_id
@@ -178,7 +198,7 @@ def pdf2tex_project(pdf_file_path, plugin_kwargs):
if response.ok: if response.ok:
pdf_id = response.json()["pdf_id"] pdf_id = response.json()["pdf_id"]
print(f"PDF processing initiated. PDF ID: {pdf_id}") logger.info(f"PDF processing initiated. PDF ID: {pdf_id}")
# Step 2: Check processing status # Step 2: Check processing status
while True: while True:
@@ -186,12 +206,12 @@ def pdf2tex_project(pdf_file_path, plugin_kwargs):
conversion_data = conversion_response.json() conversion_data = conversion_response.json()
if conversion_data["status"] == "completed": if conversion_data["status"] == "completed":
print("PDF processing completed.") logger.info("PDF processing completed.")
break break
elif conversion_data["status"] == "error": elif conversion_data["status"] == "error":
print("Error occurred during processing.") logger.info("Error occurred during processing.")
else: else:
print(f"Processing status: {conversion_data['status']}") logger.info(f"Processing status: {conversion_data['status']}")
time.sleep(5) # wait for a few seconds before checking again time.sleep(5) # wait for a few seconds before checking again
# Step 3: Save results to local files # Step 3: Save results to local files
@@ -206,7 +226,7 @@ def pdf2tex_project(pdf_file_path, plugin_kwargs):
output_path = os.path.join(output_dir, output_name) output_path = os.path.join(output_dir, output_name)
with open(output_path, "wb") as output_file: with open(output_path, "wb") as output_file:
output_file.write(response.content) output_file.write(response.content)
print(f"tex.zip file saved at: {output_path}") logger.info(f"tex.zip file saved at: {output_path}")
import zipfile import zipfile
unzip_dir = os.path.join(output_dir, file_name_wo_dot) unzip_dir = os.path.join(output_dir, file_name_wo_dot)
@@ -216,7 +236,7 @@ def pdf2tex_project(pdf_file_path, plugin_kwargs):
return unzip_dir return unzip_dir
else: else:
print(f"Error sending PDF for processing. Status code: {response.status_code}") logger.error(f"Error sending PDF for processing. Status code: {response.status_code}")
return None return None
else: else:
from crazy_functions.pdf_fns.parse_pdf_via_doc2x import 解析PDF_DOC2X_转Latex from crazy_functions.pdf_fns.parse_pdf_via_doc2x import 解析PDF_DOC2X_转Latex
@@ -268,7 +288,7 @@ def Latex英文纠错加PDF对比(txt, llm_kwargs, plugin_kwargs, chatbot, histo
return return
# <-------------- if is a zip/tar file -------------> # <-------------- if is a zip/tar file ------------->
project_folder = desend_to_extracted_folder_if_exist(project_folder) project_folder = descend_to_extracted_folder_if_exist(project_folder)
# <-------------- move latex project away from temp folder -------------> # <-------------- move latex project away from temp folder ------------->
from shared_utils.fastapi_server import validate_path_safety from shared_utils.fastapi_server import validate_path_safety
@@ -318,11 +338,17 @@ def Latex翻译中文并重新编译PDF(txt, llm_kwargs, plugin_kwargs, chatbot,
# <-------------- more requirements -------------> # <-------------- more requirements ------------->
if ("advanced_arg" in plugin_kwargs) and (plugin_kwargs["advanced_arg"] == ""): plugin_kwargs.pop("advanced_arg") if ("advanced_arg" in plugin_kwargs) and (plugin_kwargs["advanced_arg"] == ""): plugin_kwargs.pop("advanced_arg")
more_req = plugin_kwargs.get("advanced_arg", "") more_req = plugin_kwargs.get("advanced_arg", "")
no_cache = more_req.startswith("--no-cache")
if no_cache: more_req.lstrip("--no-cache") no_cache = ("--no-cache" in more_req)
if no_cache: more_req = more_req.replace("--no-cache", "").strip()
allow_gptac_cloud_io = ("--allow-cloudio" in more_req) # 从云端下载翻译结果,以及上传翻译结果到云端
if allow_gptac_cloud_io: more_req = more_req.replace("--allow-cloudio", "").strip()
allow_cache = not no_cache allow_cache = not no_cache
_switch_prompt_ = partial(switch_prompt, more_requirement=more_req) _switch_prompt_ = partial(switch_prompt, more_requirement=more_req)
# <-------------- check deps -------------> # <-------------- check deps ------------->
try: try:
import glob, os, time, subprocess import glob, os, time, subprocess
@@ -339,7 +365,7 @@ def Latex翻译中文并重新编译PDF(txt, llm_kwargs, plugin_kwargs, chatbot,
try: try:
txt, arxiv_id = yield from arxiv_download(chatbot, history, txt, allow_cache) txt, arxiv_id = yield from arxiv_download(chatbot, history, txt, allow_cache)
except tarfile.ReadError as e: except tarfile.ReadError as e:
yield from update_ui_lastest_msg( yield from update_ui_latest_msg(
"无法自动下载该论文的Latex源码请前往arxiv打开此论文下载页面点other Formats然后download source手动下载latex源码包。接下来调用本地Latex翻译插件即可。", "无法自动下载该论文的Latex源码请前往arxiv打开此论文下载页面点other Formats然后download source手动下载latex源码包。接下来调用本地Latex翻译插件即可。",
chatbot=chatbot, history=history) chatbot=chatbot, history=history)
return return
@@ -349,6 +375,20 @@ def Latex翻译中文并重新编译PDF(txt, llm_kwargs, plugin_kwargs, chatbot,
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return return
# #################################################################
if allow_gptac_cloud_io and arxiv_id:
# 访问 GPTAC学术云查询云端是否存在该论文的翻译版本
from crazy_functions.latex_fns.latex_actions import check_gptac_cloud
success, downloaded = check_gptac_cloud(arxiv_id, chatbot)
if success:
chatbot.append([
f"检测到GPTAC云端存在翻译版本, 如果不满意翻译结果, 请禁用云端分享, 然后重新执行。",
None
])
yield from update_ui(chatbot=chatbot, history=history)
return
#################################################################
if os.path.exists(txt): if os.path.exists(txt):
project_folder = txt project_folder = txt
else: else:
@@ -364,7 +404,7 @@ def Latex翻译中文并重新编译PDF(txt, llm_kwargs, plugin_kwargs, chatbot,
return return
# <-------------- if is a zip/tar file -------------> # <-------------- if is a zip/tar file ------------->
project_folder = desend_to_extracted_folder_if_exist(project_folder) project_folder = descend_to_extracted_folder_if_exist(project_folder)
# <-------------- move latex project away from temp folder -------------> # <-------------- move latex project away from temp folder ------------->
from shared_utils.fastapi_server import validate_path_safety from shared_utils.fastapi_server import validate_path_safety
@@ -386,14 +426,21 @@ def Latex翻译中文并重新编译PDF(txt, llm_kwargs, plugin_kwargs, chatbot,
# <-------------- zip PDF -------------> # <-------------- zip PDF ------------->
zip_res = zip_result(project_folder) zip_res = zip_result(project_folder)
if success: if success:
if allow_gptac_cloud_io and arxiv_id:
# 如果用户允许我们将翻译好的arxiv论文PDF上传到GPTAC学术云
from crazy_functions.latex_fns.latex_actions import upload_to_gptac_cloud_if_user_allow
threading.Thread(target=upload_to_gptac_cloud_if_user_allow,
args=(chatbot, arxiv_id), daemon=True).start()
chatbot.append((f"成功啦", '请查收结果(压缩包)...')) chatbot.append((f"成功啦", '请查收结果(压缩包)...'))
yield from update_ui(chatbot=chatbot, history=history); yield from update_ui(chatbot=chatbot, history=history)
time.sleep(1) # 刷新界面 time.sleep(1) # 刷新界面
promote_file_to_downloadzone(file=zip_res, chatbot=chatbot) promote_file_to_downloadzone(file=zip_res, chatbot=chatbot)
else: else:
chatbot.append((f"失败了", chatbot.append((f"失败了",
'虽然PDF生成失败了, 但请查收结果(压缩包), 内含已经翻译的Tex文档, 您可以到Github Issue区, 用该压缩包进行反馈。如系统是Linux请检查系统字体见Github wiki ...')) '虽然PDF生成失败了, 但请查收结果(压缩包), 内含已经翻译的Tex文档, 您可以到Github Issue区, 用该压缩包进行反馈。如系统是Linux请检查系统字体见Github wiki ...'))
yield from update_ui(chatbot=chatbot, history=history); yield from update_ui(chatbot=chatbot, history=history)
time.sleep(1) # 刷新界面 time.sleep(1) # 刷新界面
promote_file_to_downloadzone(file=zip_res, chatbot=chatbot) promote_file_to_downloadzone(file=zip_res, chatbot=chatbot)
@@ -471,7 +518,7 @@ def PDF翻译中文并重新编译PDF(txt, llm_kwargs, plugin_kwargs, chatbot, h
# repeat, project_folder = check_repeat_upload(file_manifest[0], hash_tag) # repeat, project_folder = check_repeat_upload(file_manifest[0], hash_tag)
# if repeat: # if repeat:
# yield from update_ui_lastest_msg(f"发现重复上传,请查收结果(压缩包)...", chatbot=chatbot, history=history) # yield from update_ui_latest_msg(f"发现重复上传,请查收结果(压缩包)...", chatbot=chatbot, history=history)
# try: # try:
# translate_pdf = [f for f in glob.glob(f'{project_folder}/**/merge_translate_zh.pdf', recursive=True)][0] # translate_pdf = [f for f in glob.glob(f'{project_folder}/**/merge_translate_zh.pdf', recursive=True)][0]
# promote_file_to_downloadzone(translate_pdf, rename_file=None, chatbot=chatbot) # promote_file_to_downloadzone(translate_pdf, rename_file=None, chatbot=chatbot)
@@ -484,7 +531,7 @@ def PDF翻译中文并重新编译PDF(txt, llm_kwargs, plugin_kwargs, chatbot, h
# report_exception(chatbot, history, a=f"解析项目: {txt}", b=f"发现重复上传,但是无法找到相关文件") # report_exception(chatbot, history, a=f"解析项目: {txt}", b=f"发现重复上传,但是无法找到相关文件")
# yield from update_ui(chatbot=chatbot, history=history) # yield from update_ui(chatbot=chatbot, history=history)
# else: # else:
# yield from update_ui_lastest_msg(f"未发现重复上传", chatbot=chatbot, history=history) # yield from update_ui_latest_msg(f"未发现重复上传", chatbot=chatbot, history=history)
# <-------------- convert pdf into tex -------------> # <-------------- convert pdf into tex ------------->
chatbot.append([f"解析项目: {txt}", "正在将PDF转换为tex项目请耐心等待..."]) chatbot.append([f"解析项目: {txt}", "正在将PDF转换为tex项目请耐心等待..."])
@@ -496,7 +543,7 @@ def PDF翻译中文并重新编译PDF(txt, llm_kwargs, plugin_kwargs, chatbot, h
return False return False
# <-------------- translate latex file into Chinese -------------> # <-------------- translate latex file into Chinese ------------->
yield from update_ui_lastest_msg("正在tex项目将翻译为中文...", chatbot=chatbot, history=history) yield from update_ui_latest_msg("正在tex项目将翻译为中文...", chatbot=chatbot, history=history)
file_manifest = [f for f in glob.glob(f'{project_folder}/**/*.tex', recursive=True)] file_manifest = [f for f in glob.glob(f'{project_folder}/**/*.tex', recursive=True)]
if len(file_manifest) == 0: if len(file_manifest) == 0:
report_exception(chatbot, history, a=f"解析项目: {txt}", b=f"找不到任何.tex文件: {txt}") report_exception(chatbot, history, a=f"解析项目: {txt}", b=f"找不到任何.tex文件: {txt}")
@@ -504,7 +551,7 @@ def PDF翻译中文并重新编译PDF(txt, llm_kwargs, plugin_kwargs, chatbot, h
return return
# <-------------- if is a zip/tar file -------------> # <-------------- if is a zip/tar file ------------->
project_folder = desend_to_extracted_folder_if_exist(project_folder) project_folder = descend_to_extracted_folder_if_exist(project_folder)
# <-------------- move latex project away from temp folder -------------> # <-------------- move latex project away from temp folder ------------->
from shared_utils.fastapi_server import validate_path_safety from shared_utils.fastapi_server import validate_path_safety
@@ -512,7 +559,7 @@ def PDF翻译中文并重新编译PDF(txt, llm_kwargs, plugin_kwargs, chatbot, h
project_folder = move_project(project_folder) project_folder = move_project(project_folder)
# <-------------- set a hash tag for repeat-checking -------------> # <-------------- set a hash tag for repeat-checking ------------->
with open(pj(project_folder, hash_tag + '.tag'), 'w') as f: with open(pj(project_folder, hash_tag + '.tag'), 'w', encoding='utf8') as f:
f.write(hash_tag) f.write(hash_tag)
f.close() f.close()
@@ -524,7 +571,7 @@ def PDF翻译中文并重新编译PDF(txt, llm_kwargs, plugin_kwargs, chatbot, h
switch_prompt=_switch_prompt_) switch_prompt=_switch_prompt_)
# <-------------- compile PDF -------------> # <-------------- compile PDF ------------->
yield from update_ui_lastest_msg("正在将翻译好的项目tex项目编译为PDF...", chatbot=chatbot, history=history) yield from update_ui_latest_msg("正在将翻译好的项目tex项目编译为PDF...", chatbot=chatbot, history=history)
success = yield from 编译Latex(chatbot, history, main_file_original='merge', success = yield from 编译Latex(chatbot, history, main_file_original='merge',
main_file_modified='merge_translate_zh', mode='translate_zh', main_file_modified='merge_translate_zh', mode='translate_zh',
work_folder_original=project_folder, work_folder_modified=project_folder, work_folder_original=project_folder, work_folder_modified=project_folder,

View File

@@ -30,6 +30,8 @@ class Arxiv_Localize(GptAcademicPluginTemplate):
default_value="", type="string").model_dump_json(), # 高级参数输入区,自动同步 default_value="", type="string").model_dump_json(), # 高级参数输入区,自动同步
"allow_cache": "allow_cache":
ArgProperty(title="是否允许从缓存中调取结果", options=["允许缓存", "从头执行"], default_value="允许缓存", description="", type="dropdown").model_dump_json(), ArgProperty(title="是否允许从缓存中调取结果", options=["允许缓存", "从头执行"], default_value="允许缓存", description="", type="dropdown").model_dump_json(),
"allow_cloudio":
ArgProperty(title="是否允许从GPTAC学术云下载(或者上传)翻译结果(仅针对Arxiv论文)", options=["允许", "禁止"], default_value="禁止", description="共享文献,互助互利", type="dropdown").model_dump_json(),
} }
return gui_definition return gui_definition
@@ -38,9 +40,14 @@ class Arxiv_Localize(GptAcademicPluginTemplate):
执行插件 执行插件
""" """
allow_cache = plugin_kwargs["allow_cache"] allow_cache = plugin_kwargs["allow_cache"]
allow_cloudio = plugin_kwargs["allow_cloudio"]
advanced_arg = plugin_kwargs["advanced_arg"] advanced_arg = plugin_kwargs["advanced_arg"]
if allow_cache == "从头执行": plugin_kwargs["advanced_arg"] = "--no-cache " + plugin_kwargs["advanced_arg"] if allow_cache == "从头执行": plugin_kwargs["advanced_arg"] = "--no-cache " + plugin_kwargs["advanced_arg"]
# 从云端下载翻译结果,以及上传翻译结果到云端;人人为我,我为人人。
if allow_cloudio == "允许": plugin_kwargs["advanced_arg"] = "--allow-cloudio " + plugin_kwargs["advanced_arg"]
yield from Latex翻译中文并重新编译PDF(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request) yield from Latex翻译中文并重新编译PDF(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request)

View File

@@ -1,6 +1,6 @@
from toolbox import update_ui, trimmed_format_exc, promote_file_to_downloadzone, get_log_folder from toolbox import update_ui, trimmed_format_exc, promote_file_to_downloadzone, get_log_folder
from toolbox import CatchException, report_exception, write_history_to_file, zip_folder from toolbox import CatchException, report_exception, write_history_to_file, zip_folder
from loguru import logger
class PaperFileGroup(): class PaperFileGroup():
def __init__(self): def __init__(self):
@@ -33,7 +33,7 @@ class PaperFileGroup():
self.sp_file_index.append(index) self.sp_file_index.append(index)
self.sp_file_tag.append(self.file_paths[index] + f".part-{j}.tex") self.sp_file_tag.append(self.file_paths[index] + f".part-{j}.tex")
print('Segmentation: done') logger.info('Segmentation: done')
def merge_result(self): def merge_result(self):
self.file_result = ["" for _ in range(len(self.file_paths))] self.file_result = ["" for _ in range(len(self.file_paths))]
for r, k in zip(self.sp_file_result, self.sp_file_index): for r, k in zip(self.sp_file_result, self.sp_file_index):
@@ -56,7 +56,7 @@ class PaperFileGroup():
def 多文件润色(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, language='en', mode='polish'): def 多文件润色(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, language='en', mode='polish'):
import time, os, re import time, os, re
from .crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency from crazy_functions.crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency
# <-------- 读取Latex文件删除其中的所有注释 ----------> # <-------- 读取Latex文件删除其中的所有注释 ---------->
@@ -122,7 +122,7 @@ def 多文件润色(file_manifest, project_folder, llm_kwargs, plugin_kwargs, ch
pfg.write_result() pfg.write_result()
pfg.zip_result() pfg.zip_result()
except: except:
print(trimmed_format_exc()) logger.error(trimmed_format_exc())
# <-------- 整理结果,退出 ----------> # <-------- 整理结果,退出 ---------->
create_report_file_name = time.strftime("%Y-%m-%d-%H-%M-%S", time.localtime()) + f"-chatgpt.polish.md" create_report_file_name = time.strftime("%Y-%m-%d-%H-%M-%S", time.localtime()) + f"-chatgpt.polish.md"

View File

@@ -1,6 +1,6 @@
from toolbox import update_ui, promote_file_to_downloadzone from toolbox import update_ui, promote_file_to_downloadzone
from toolbox import CatchException, report_exception, write_history_to_file from toolbox import CatchException, report_exception, write_history_to_file
fast_debug = False from loguru import logger
class PaperFileGroup(): class PaperFileGroup():
def __init__(self): def __init__(self):
@@ -33,11 +33,11 @@ class PaperFileGroup():
self.sp_file_index.append(index) self.sp_file_index.append(index)
self.sp_file_tag.append(self.file_paths[index] + f".part-{j}.tex") self.sp_file_tag.append(self.file_paths[index] + f".part-{j}.tex")
print('Segmentation: done') logger.info('Segmentation: done')
def 多文件翻译(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, language='en'): def 多文件翻译(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, language='en'):
import time, os, re import time, os, re
from .crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency from crazy_functions.crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency
# <-------- 读取Latex文件删除其中的所有注释 ----------> # <-------- 读取Latex文件删除其中的所有注释 ---------->
pfg = PaperFileGroup() pfg = PaperFileGroup()

View File

@@ -1,4 +1,5 @@
import glob, shutil, os, re, logging import glob, shutil, os, re
from loguru import logger
from toolbox import update_ui, trimmed_format_exc, gen_time_str from toolbox import update_ui, trimmed_format_exc, gen_time_str
from toolbox import CatchException, report_exception, get_log_folder from toolbox import CatchException, report_exception, get_log_folder
from toolbox import write_history_to_file, promote_file_to_downloadzone from toolbox import write_history_to_file, promote_file_to_downloadzone
@@ -34,7 +35,7 @@ class PaperFileGroup():
self.sp_file_contents.append(segment) self.sp_file_contents.append(segment)
self.sp_file_index.append(index) self.sp_file_index.append(index)
self.sp_file_tag.append(self.file_paths[index] + f".part-{j}.md") self.sp_file_tag.append(self.file_paths[index] + f".part-{j}.md")
logging.info('Segmentation: done') logger.info('Segmentation: done')
def merge_result(self): def merge_result(self):
self.file_result = ["" for _ in range(len(self.file_paths))] self.file_result = ["" for _ in range(len(self.file_paths))]
@@ -51,7 +52,7 @@ class PaperFileGroup():
return manifest return manifest
def 多文件翻译(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, language='en'): def 多文件翻译(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, language='en'):
from .crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency from crazy_functions.crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency
# <-------- 读取Markdown文件删除其中的所有注释 ----------> # <-------- 读取Markdown文件删除其中的所有注释 ---------->
pfg = PaperFileGroup() pfg = PaperFileGroup()
@@ -64,7 +65,7 @@ def 多文件翻译(file_manifest, project_folder, llm_kwargs, plugin_kwargs, ch
pfg.file_contents.append(file_content) pfg.file_contents.append(file_content)
# <-------- 拆分过长的Markdown文件 ----------> # <-------- 拆分过长的Markdown文件 ---------->
pfg.run_file_split(max_token_limit=2048) pfg.run_file_split(max_token_limit=1024)
n_split = len(pfg.sp_file_contents) n_split = len(pfg.sp_file_contents)
# <-------- 多线程翻译开始 ----------> # <-------- 多线程翻译开始 ---------->
@@ -106,7 +107,7 @@ def 多文件翻译(file_manifest, project_folder, llm_kwargs, plugin_kwargs, ch
expected_f_name = plugin_kwargs['markdown_expected_output_path'] expected_f_name = plugin_kwargs['markdown_expected_output_path']
shutil.copyfile(output_file, expected_f_name) shutil.copyfile(output_file, expected_f_name)
except: except:
logging.error(trimmed_format_exc()) logger.error(trimmed_format_exc())
# <-------- 整理结果,退出 ----------> # <-------- 整理结果,退出 ---------->
create_report_file_name = gen_time_str() + f"-chatgpt.md" create_report_file_name = gen_time_str() + f"-chatgpt.md"
@@ -126,7 +127,7 @@ def get_files_from_everything(txt, preference=''):
proxies = get_conf('proxies') proxies = get_conf('proxies')
# 网络的远程文件 # 网络的远程文件
if preference == 'Github': if preference == 'Github':
logging.info('正在从github下载资源 ...') logger.info('正在从github下载资源 ...')
if not txt.endswith('.md'): if not txt.endswith('.md'):
# Make a request to the GitHub API to retrieve the repository information # Make a request to the GitHub API to retrieve the repository information
url = txt.replace("https://github.com/", "https://api.github.com/repos/") + '/readme' url = txt.replace("https://github.com/", "https://api.github.com/repos/") + '/readme'

View File

@@ -1,5 +1,5 @@
from toolbox import CatchException, check_packages, get_conf from toolbox import CatchException, check_packages, get_conf
from toolbox import update_ui, update_ui_lastest_msg, disable_auto_promotion from toolbox import update_ui, update_ui_latest_msg, disable_auto_promotion
from toolbox import trimmed_format_exc_markdown from toolbox import trimmed_format_exc_markdown
from crazy_functions.crazy_utils import get_files_from_everything from crazy_functions.crazy_utils import get_files_from_everything
from crazy_functions.pdf_fns.parse_pdf import get_avail_grobid_url from crazy_functions.pdf_fns.parse_pdf import get_avail_grobid_url
@@ -47,7 +47,7 @@ def 批量翻译PDF文档(txt, llm_kwargs, plugin_kwargs, chatbot, history, syst
yield from 解析PDF_基于DOC2X(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, DOC2X_API_KEY, user_request) yield from 解析PDF_基于DOC2X(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, DOC2X_API_KEY, user_request)
return return
except: except:
chatbot.append([None, f"DOC2X服务不可用现在将执行效果稍差的旧版代码{trimmed_format_exc_markdown()}"]) chatbot.append([None, f"DOC2X服务不可用请检查报错详细{trimmed_format_exc_markdown()}"])
yield from update_ui(chatbot=chatbot, history=history) yield from update_ui(chatbot=chatbot, history=history)
if method == "GROBID": if method == "GROBID":
@@ -57,9 +57,9 @@ def 批量翻译PDF文档(txt, llm_kwargs, plugin_kwargs, chatbot, history, syst
yield from 解析PDF_基于GROBID(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, grobid_url) yield from 解析PDF_基于GROBID(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, grobid_url)
return return
if method == "ClASSIC": if method == "Classic":
# ------- 第三种方法,早期代码,效果不理想 ------- # ------- 第三种方法,早期代码,效果不理想 -------
yield from update_ui_lastest_msg("GROBID服务不可用请检查config中的GROBID_URL。作为替代现在将执行效果稍差的旧版代码。", chatbot, history, delay=3) yield from update_ui_latest_msg("GROBID服务不可用请检查config中的GROBID_URL。作为替代现在将执行效果稍差的旧版代码。", chatbot, history, delay=3)
yield from 解析PDF_简单拆解(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt) yield from 解析PDF_简单拆解(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt)
return return
@@ -77,7 +77,7 @@ def 批量翻译PDF文档(txt, llm_kwargs, plugin_kwargs, chatbot, history, syst
if grobid_url is not None: if grobid_url is not None:
yield from 解析PDF_基于GROBID(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, grobid_url) yield from 解析PDF_基于GROBID(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, grobid_url)
return return
yield from update_ui_lastest_msg("GROBID服务不可用请检查config中的GROBID_URL。作为替代现在将执行效果稍差的旧版代码。", chatbot, history, delay=3) yield from update_ui_latest_msg("GROBID服务不可用请检查config中的GROBID_URL。作为替代现在将执行效果稍差的旧版代码。", chatbot, history, delay=3)
yield from 解析PDF_简单拆解(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt) yield from 解析PDF_简单拆解(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt)
return return

View File

@@ -19,7 +19,7 @@ class PDF_Tran(GptAcademicPluginTemplate):
"additional_prompt": "additional_prompt":
ArgProperty(title="额外提示词", description="例如:对专有名词、翻译语气等方面的要求", default_value="", type="string").model_dump_json(), # 高级参数输入区,自动同步 ArgProperty(title="额外提示词", description="例如:对专有名词、翻译语气等方面的要求", default_value="", type="string").model_dump_json(), # 高级参数输入区,自动同步
"pdf_parse_method": "pdf_parse_method":
ArgProperty(title="PDF解析方法", options=["DOC2X", "GROBID", "ClASSIC"], description="", default_value="GROBID", type="dropdown").model_dump_json(), ArgProperty(title="PDF解析方法", options=["DOC2X", "GROBID", "Classic"], description="", default_value="GROBID", type="dropdown").model_dump_json(),
} }
return gui_definition return gui_definition

View File

@@ -0,0 +1,153 @@
import os,glob
from typing import List
from shared_utils.fastapi_server import validate_path_safety
from toolbox import report_exception
from toolbox import CatchException, update_ui, get_conf, get_log_folder, update_ui_latest_msg
from shared_utils.fastapi_server import validate_path_safety
from crazy_functions.crazy_utils import input_clipping
from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
RAG_WORKER_REGISTER = {}
MAX_HISTORY_ROUND = 5
MAX_CONTEXT_TOKEN_LIMIT = 4096
REMEMBER_PREVIEW = 1000
@CatchException
def handle_document_upload(files: List[str], llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request, rag_worker):
"""
Handles document uploads by extracting text and adding it to the vector store.
"""
from llama_index.core import Document
from crazy_functions.rag_fns.rag_file_support import extract_text, supports_format
user_name = chatbot.get_user()
checkpoint_dir = get_log_folder(user_name, plugin_name='experimental_rag')
for file_path in files:
try:
validate_path_safety(file_path, user_name)
text = extract_text(file_path)
if text is None:
chatbot.append(
[f"上传文件: {os.path.basename(file_path)}", f"文件解析失败无法提取文本内容请更换文件。失败原因可能为1.文档格式过于复杂2. 不支持的文件格式,支持的文件格式后缀有:" + ", ".join(supports_format)])
else:
chatbot.append(
[f"上传文件: {os.path.basename(file_path)}", f"上传文件前50个字符为:{text[:50]}"])
document = Document(text=text, metadata={"source": file_path})
rag_worker.add_documents_to_vector_store([document])
chatbot.append([f"上传文件: {os.path.basename(file_path)}", "文件已成功添加到知识库。"])
except Exception as e:
report_exception(chatbot, history, a=f"处理文件: {file_path}", b=str(e))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
# Main Q&A function with document upload support
@CatchException
def Rag问答(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):
# import vector store lib
VECTOR_STORE_TYPE = "Milvus"
if VECTOR_STORE_TYPE == "Milvus":
try:
from crazy_functions.rag_fns.milvus_worker import MilvusRagWorker as LlamaIndexRagWorker
except:
VECTOR_STORE_TYPE = "Simple"
if VECTOR_STORE_TYPE == "Simple":
from crazy_functions.rag_fns.llama_index_worker import LlamaIndexRagWorker
# 1. we retrieve rag worker from global context
user_name = chatbot.get_user()
checkpoint_dir = get_log_folder(user_name, plugin_name='experimental_rag')
if user_name in RAG_WORKER_REGISTER:
rag_worker = RAG_WORKER_REGISTER[user_name]
else:
rag_worker = RAG_WORKER_REGISTER[user_name] = LlamaIndexRagWorker(
user_name,
llm_kwargs,
checkpoint_dir=checkpoint_dir,
auto_load_checkpoint=True
)
current_context = f"{VECTOR_STORE_TYPE} @ {checkpoint_dir}"
tip = "提示输入“清空向量数据库”可以清空RAG向量数据库"
# 2. Handle special commands
if os.path.exists(txt) and os.path.isdir(txt):
project_folder = txt
validate_path_safety(project_folder, chatbot.get_user())
# Extract file paths from the user input
# Assuming the user inputs file paths separated by commas after the command
file_paths = [f for f in glob.glob(f'{project_folder}/**/*', recursive=True)]
chatbot.append([txt, f'正在处理上传的文档 ({current_context}) ...'])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
yield from handle_document_upload(file_paths, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request, rag_worker)
return
elif txt == "清空向量数据库":
chatbot.append([txt, f'正在清空 ({current_context}) ...'])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
rag_worker.purge_vector_store()
yield from update_ui_latest_msg('已清空', chatbot, history, delay=0) # 刷新界面
return
# 3. Normal Q&A processing
chatbot.append([txt, f'正在召回知识 ({current_context}) ...'])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
# 4. Clip history to reduce token consumption
txt_origin = txt
if len(history) > MAX_HISTORY_ROUND * 2:
history = history[-(MAX_HISTORY_ROUND * 2):]
txt_clip, history, flags = input_clipping(txt, history, max_token_limit=MAX_CONTEXT_TOKEN_LIMIT, return_clip_flags=True)
input_is_clipped_flag = (flags["original_input_len"] != flags["clipped_input_len"])
# 5. If input is clipped, add input to vector store before retrieve
if input_is_clipped_flag:
yield from update_ui_latest_msg('检测到长输入, 正在向量化 ...', chatbot, history, delay=0) # 刷新界面
# Save input to vector store
rag_worker.add_text_to_vector_store(txt_origin)
yield from update_ui_latest_msg('向量化完成 ...', chatbot, history, delay=0) # 刷新界面
if len(txt_origin) > REMEMBER_PREVIEW:
HALF = REMEMBER_PREVIEW // 2
i_say_to_remember = txt[:HALF] + f" ...\n...(省略{len(txt_origin)-REMEMBER_PREVIEW}字)...\n... " + txt[-HALF:]
if (flags["original_input_len"] - flags["clipped_input_len"]) > HALF:
txt_clip = txt_clip + f" ...\n...(省略{len(txt_origin)-len(txt_clip)-HALF}字)...\n... " + txt[-HALF:]
else:
i_say_to_remember = i_say = txt_clip
else:
i_say_to_remember = i_say = txt_clip
# 6. Search vector store and build prompts
nodes = rag_worker.retrieve_from_store_with_query(i_say)
prompt = rag_worker.build_prompt(query=i_say, nodes=nodes)
# 7. Query language model
if len(chatbot) != 0:
chatbot.pop(-1) # Pop temp chat, because we are going to add them again inside `request_gpt_model_in_new_thread_with_ui_alive`
model_say = yield from request_gpt_model_in_new_thread_with_ui_alive(
inputs=prompt,
inputs_show_user=i_say,
llm_kwargs=llm_kwargs,
chatbot=chatbot,
history=history,
sys_prompt=system_prompt,
retry_times_at_unknown_error=0
)
# 8. Remember Q&A
yield from update_ui_latest_msg(
model_say + '</br></br>' + f'对话记忆中, 请稍等 ({current_context}) ...',
chatbot, history, delay=0.5
)
rag_worker.remember_qa(i_say_to_remember, model_say)
history.extend([i_say, model_say])
# 9. Final UI Update
yield from update_ui_latest_msg(model_say, chatbot, history, delay=0, msg=tip)

View File

@@ -0,0 +1,167 @@
import pickle, os, random
from toolbox import CatchException, update_ui, get_conf, get_log_folder, update_ui_latest_msg
from crazy_functions.crazy_utils import input_clipping
from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from request_llms.bridge_all import predict_no_ui_long_connection
from crazy_functions.json_fns.select_tool import structure_output, select_tool
from pydantic import BaseModel, Field
from loguru import logger
from typing import List
SOCIAL_NETWORK_WORKER_REGISTER = {}
class SocialNetwork():
def __init__(self):
self.people = []
class SaveAndLoad():
def __init__(self, user_name, llm_kwargs, auto_load_checkpoint=True, checkpoint_dir=None) -> None:
self.user_name = user_name
self.checkpoint_dir = checkpoint_dir
if auto_load_checkpoint:
self.social_network = self.load_from_checkpoint(checkpoint_dir)
else:
self.social_network = SocialNetwork()
def does_checkpoint_exist(self, checkpoint_dir=None):
import os, glob
if checkpoint_dir is None: checkpoint_dir = self.checkpoint_dir
if not os.path.exists(checkpoint_dir): return False
if len(glob.glob(os.path.join(checkpoint_dir, "social_network.pkl"))) == 0: return False
return True
def save_to_checkpoint(self, checkpoint_dir=None):
if checkpoint_dir is None: checkpoint_dir = self.checkpoint_dir
with open(os.path.join(checkpoint_dir, 'social_network.pkl'), "wb+") as f:
pickle.dump(self.social_network, f)
return
def load_from_checkpoint(self, checkpoint_dir=None):
if checkpoint_dir is None: checkpoint_dir = self.checkpoint_dir
if self.does_checkpoint_exist(checkpoint_dir=checkpoint_dir):
with open(os.path.join(checkpoint_dir, 'social_network.pkl'), "rb") as f:
social_network = pickle.load(f)
return social_network
else:
return SocialNetwork()
class Friend(BaseModel):
friend_name: str = Field(description="name of a friend")
friend_description: str = Field(description="description of a friend (everything about this friend)")
friend_relationship: str = Field(description="The relationship with a friend (e.g. friend, family, colleague)")
class FriendList(BaseModel):
friends_list: List[Friend] = Field(description="The list of friends")
class SocialNetworkWorker(SaveAndLoad):
def ai_socail_advice(self, prompt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, run_gpt_fn, intention_type):
pass
def ai_remove_friend(self, prompt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, run_gpt_fn, intention_type):
pass
def ai_list_friends(self, prompt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, run_gpt_fn, intention_type):
pass
def ai_add_multi_friends(self, prompt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, run_gpt_fn, intention_type):
friend, err_msg = structure_output(
txt=prompt,
prompt="根据提示, 解析多个联系人的身份信息\n\n",
err_msg=f"不能理解该联系人",
run_gpt_fn=run_gpt_fn,
pydantic_cls=FriendList
)
if friend.friends_list:
for f in friend.friends_list:
self.add_friend(f)
msg = f"成功添加{len(friend.friends_list)}个联系人: {str(friend.friends_list)}"
yield from update_ui_latest_msg(lastmsg=msg, chatbot=chatbot, history=history, delay=0)
def run(self, txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):
prompt = txt
run_gpt_fn = lambda inputs, sys_prompt: predict_no_ui_long_connection(inputs=inputs, llm_kwargs=llm_kwargs, history=[], sys_prompt=sys_prompt, observe_window=[])
self.tools_to_select = {
"SocialAdvice":{
"explain_to_llm": "如果用户希望获取社交指导调用SocialAdvice生成一些社交建议",
"callback": self.ai_socail_advice,
},
"AddFriends":{
"explain_to_llm": "如果用户给出了联系人调用AddMultiFriends把联系人添加到数据库",
"callback": self.ai_add_multi_friends,
},
"RemoveFriend":{
"explain_to_llm": "如果用户希望移除某个联系人调用RemoveFriend",
"callback": self.ai_remove_friend,
},
"ListFriends":{
"explain_to_llm": "如果用户列举联系人调用ListFriends",
"callback": self.ai_list_friends,
}
}
try:
Explanation = '\n'.join([f'{k}: {v["explain_to_llm"]}' for k, v in self.tools_to_select.items()])
class UserSociaIntention(BaseModel):
intention_type: str = Field(
description=
f"The type of user intention. You must choose from {self.tools_to_select.keys()}.\n\n"
f"Explanation:\n{Explanation}",
default="SocialAdvice"
)
pydantic_cls_instance, err_msg = select_tool(
prompt=txt,
run_gpt_fn=run_gpt_fn,
pydantic_cls=UserSociaIntention
)
except Exception as e:
yield from update_ui_latest_msg(
lastmsg=f"无法理解用户意图 {err_msg}",
chatbot=chatbot,
history=history,
delay=0
)
return
intention_type = pydantic_cls_instance.intention_type
intention_callback = self.tools_to_select[pydantic_cls_instance.intention_type]['callback']
yield from intention_callback(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, run_gpt_fn, intention_type)
def add_friend(self, friend):
# check whether the friend is already in the social network
for f in self.social_network.people:
if f.friend_name == friend.friend_name:
f.friend_description = friend.friend_description
f.friend_relationship = friend.friend_relationship
logger.info(f"Repeated friend, update info: {friend}")
return
logger.info(f"Add a new friend: {friend}")
self.social_network.people.append(friend)
return
@CatchException
def I人助手(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):
# 1. we retrieve worker from global context
user_name = chatbot.get_user()
checkpoint_dir=get_log_folder(user_name, plugin_name='experimental_rag')
if user_name in SOCIAL_NETWORK_WORKER_REGISTER:
social_network_worker = SOCIAL_NETWORK_WORKER_REGISTER[user_name]
else:
social_network_worker = SOCIAL_NETWORK_WORKER_REGISTER[user_name] = SocialNetworkWorker(
user_name,
llm_kwargs,
checkpoint_dir=checkpoint_dir,
auto_load_checkpoint=True
)
# 2. save
yield from social_network_worker.run(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request)
social_network_worker.save_to_checkpoint(checkpoint_dir)
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面

View File

@@ -5,8 +5,8 @@ from crazy_functions.crazy_utils import input_clipping
def 解析源代码新(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt): def 解析源代码新(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt):
import os, copy import os, copy
from .crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency from crazy_functions.crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
summary_batch_isolation = True summary_batch_isolation = True
inputs_array = [] inputs_array = []

View File

@@ -1,12 +1,15 @@
import os, copy, time import os, copy, time
from toolbox import CatchException, report_exception, update_ui, zip_result, promote_file_to_downloadzone, update_ui_lastest_msg, get_conf, generate_file_link from toolbox import CatchException, report_exception, update_ui, zip_result, promote_file_to_downloadzone, update_ui_latest_msg, get_conf, generate_file_link
from shared_utils.fastapi_server import validate_path_safety from shared_utils.fastapi_server import validate_path_safety
from crazy_functions.crazy_utils import input_clipping from crazy_functions.crazy_utils import input_clipping
from crazy_functions.crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency from crazy_functions.crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency
from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from crazy_functions.agent_fns.python_comment_agent import PythonCodeComment from crazy_functions.agent_fns.python_comment_agent import PythonCodeComment
from crazy_functions.diagram_fns.file_tree import FileNode from crazy_functions.diagram_fns.file_tree import FileNode
from crazy_functions.agent_fns.watchdog import WatchDog
from shared_utils.advanced_markdown_format import markdown_convertion_for_file from shared_utils.advanced_markdown_format import markdown_convertion_for_file
from loguru import logger
def 注释源代码(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt): def 注释源代码(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt):
@@ -24,12 +27,13 @@ def 注释源代码(file_manifest, project_folder, llm_kwargs, plugin_kwargs, ch
file_tree_struct.add_file(file_path, file_path) file_tree_struct.add_file(file_path, file_path)
# <第一步,逐个文件分析,多线程> # <第一步,逐个文件分析,多线程>
lang = "" if not plugin_kwargs["use_chinese"] else " (you must use Chinese)"
for index, fp in enumerate(file_manifest): for index, fp in enumerate(file_manifest):
# 读取文件 # 读取文件
with open(fp, 'r', encoding='utf-8', errors='replace') as f: with open(fp, 'r', encoding='utf-8', errors='replace') as f:
file_content = f.read() file_content = f.read()
prefix = "" prefix = ""
i_say = prefix + f'Please conclude the following source code at {os.path.relpath(fp, project_folder)} with only one sentence, the code is:\n```{file_content}```' i_say = prefix + f'Please conclude the following source code at {os.path.relpath(fp, project_folder)} with only one sentence{lang}, the code is:\n```{file_content}```'
i_say_show_user = prefix + f'[{index+1}/{len(file_manifest)}] 请用一句话对下面的程序文件做一个整体概述: {fp}' i_say_show_user = prefix + f'[{index+1}/{len(file_manifest)}] 请用一句话对下面的程序文件做一个整体概述: {fp}'
# 装载请求内容 # 装载请求内容
MAX_TOKEN_SINGLE_FILE = 2560 MAX_TOKEN_SINGLE_FILE = 2560
@@ -37,7 +41,7 @@ def 注释源代码(file_manifest, project_folder, llm_kwargs, plugin_kwargs, ch
inputs_array.append(i_say) inputs_array.append(i_say)
inputs_show_user_array.append(i_say_show_user) inputs_show_user_array.append(i_say_show_user)
history_array.append([]) history_array.append([])
sys_prompt_array.append("You are a software architecture analyst analyzing a source code project. Do not dig into details, tell me what the code is doing in general. Your answer must be short, simple and clear.") sys_prompt_array.append(f"You are a software architecture analyst analyzing a source code project. Do not dig into details, tell me what the code is doing in general. Your answer must be short, simple and clear{lang}.")
# 文件读取完成,对每一个源代码文件,生成一个请求线程,发送到大模型进行分析 # 文件读取完成,对每一个源代码文件,生成一个请求线程,发送到大模型进行分析
gpt_response_collection = yield from request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency( gpt_response_collection = yield from request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency(
inputs_array = inputs_array, inputs_array = inputs_array,
@@ -50,10 +54,20 @@ def 注释源代码(file_manifest, project_folder, llm_kwargs, plugin_kwargs, ch
) )
# <第二步,逐个文件分析,生成带注释文件> # <第二步,逐个文件分析,生成带注释文件>
tasks = ["" for _ in range(len(file_manifest))]
def bark_fn(tasks):
for i in range(len(tasks)): tasks[i] = "watchdog is dead"
wd = WatchDog(timeout=10, bark_fn=lambda: bark_fn(tasks), interval=3, msg="ThreadWatcher timeout")
wd.begin_watch()
from concurrent.futures import ThreadPoolExecutor from concurrent.futures import ThreadPoolExecutor
executor = ThreadPoolExecutor(max_workers=get_conf('DEFAULT_WORKER_NUM')) executor = ThreadPoolExecutor(max_workers=get_conf('DEFAULT_WORKER_NUM'))
def _task_multi_threading(i_say, gpt_say, fp, file_tree_struct): def _task_multi_threading(i_say, gpt_say, fp, file_tree_struct, index):
pcc = PythonCodeComment(llm_kwargs, language='English') language = 'Chinese' if plugin_kwargs["use_chinese"] else 'English'
def observe_window_update(x):
if tasks[index] == "watchdog is dead":
raise TimeoutError("ThreadWatcher: watchdog is dead")
tasks[index] = x
pcc = PythonCodeComment(llm_kwargs, plugin_kwargs, language=language, observe_window_update=observe_window_update)
pcc.read_file(path=fp, brief=gpt_say) pcc.read_file(path=fp, brief=gpt_say)
revised_path, revised_content = pcc.begin_comment_source_code(None, None) revised_path, revised_content = pcc.begin_comment_source_code(None, None)
file_tree_struct.manifest[fp].revised_path = revised_path file_tree_struct.manifest[fp].revised_path = revised_path
@@ -65,7 +79,8 @@ def 注释源代码(file_manifest, project_folder, llm_kwargs, plugin_kwargs, ch
with open("crazy_functions/agent_fns/python_comment_compare.html", 'r', encoding='utf-8') as f: with open("crazy_functions/agent_fns/python_comment_compare.html", 'r', encoding='utf-8') as f:
html_template = f.read() html_template = f.read()
warp = lambda x: "```python\n\n" + x + "\n\n```" warp = lambda x: "```python\n\n" + x + "\n\n```"
from themes.theme import advanced_css from themes.theme import load_dynamic_theme
_, advanced_css, _, _ = load_dynamic_theme("Default")
html_template = html_template.replace("ADVANCED_CSS", advanced_css) html_template = html_template.replace("ADVANCED_CSS", advanced_css)
html_template = html_template.replace("REPLACE_CODE_FILE_LEFT", pcc.get_markdown_block_in_html(markdown_convertion_for_file(warp(pcc.original_content)))) html_template = html_template.replace("REPLACE_CODE_FILE_LEFT", pcc.get_markdown_block_in_html(markdown_convertion_for_file(warp(pcc.original_content))))
html_template = html_template.replace("REPLACE_CODE_FILE_RIGHT", pcc.get_markdown_block_in_html(markdown_convertion_for_file(warp(revised_content)))) html_template = html_template.replace("REPLACE_CODE_FILE_RIGHT", pcc.get_markdown_block_in_html(markdown_convertion_for_file(warp(revised_content))))
@@ -73,17 +88,21 @@ def 注释源代码(file_manifest, project_folder, llm_kwargs, plugin_kwargs, ch
file_tree_struct.manifest[fp].compare_html = compare_html_path file_tree_struct.manifest[fp].compare_html = compare_html_path
with open(compare_html_path, 'w', encoding='utf-8') as f: with open(compare_html_path, 'w', encoding='utf-8') as f:
f.write(html_template) f.write(html_template)
print('done 1') tasks[index] = ""
chatbot.append([None, f"正在处理:"]) chatbot.append([None, f"正在处理:"])
futures = [] futures = []
index = 0
for i_say, gpt_say, fp in zip(gpt_response_collection[0::2], gpt_response_collection[1::2], file_manifest): for i_say, gpt_say, fp in zip(gpt_response_collection[0::2], gpt_response_collection[1::2], file_manifest):
future = executor.submit(_task_multi_threading, i_say, gpt_say, fp, file_tree_struct) future = executor.submit(_task_multi_threading, i_say, gpt_say, fp, file_tree_struct, index)
index += 1
futures.append(future) futures.append(future)
# <第三步,等待任务完成>
cnt = 0 cnt = 0
while True: while True:
cnt += 1 cnt += 1
wd.feed()
time.sleep(3) time.sleep(3)
worker_done = [h.done() for h in futures] worker_done = [h.done() for h in futures]
remain = len(worker_done) - sum(worker_done) remain = len(worker_done) - sum(worker_done)
@@ -92,14 +111,18 @@ def 注释源代码(file_manifest, project_folder, llm_kwargs, plugin_kwargs, ch
preview_html_list = [] preview_html_list = []
for done, fp in zip(worker_done, file_manifest): for done, fp in zip(worker_done, file_manifest):
if not done: continue if not done: continue
preview_html_list.append(file_tree_struct.manifest[fp].compare_html) if hasattr(file_tree_struct.manifest[fp], 'compare_html'):
preview_html_list.append(file_tree_struct.manifest[fp].compare_html)
else:
logger.error(f"文件: {fp} 的注释结果未能成功")
file_links = generate_file_link(preview_html_list) file_links = generate_file_link(preview_html_list)
yield from update_ui_lastest_msg( yield from update_ui_latest_msg(
f"剩余源文件数量: {remain}.\n\n" + f"当前任务: <br/>{'<br/>'.join(tasks)}.<br/>" +
f"已完成的文件: {sum(worker_done)}.\n\n" + f"剩余源文件数量: {remain}.<br/>" +
f"已完成的文件: {sum(worker_done)}.<br/>" +
file_links + file_links +
"\n\n" + "<br/>" +
''.join(['.']*(cnt % 10 + 1) ''.join(['.']*(cnt % 10 + 1)
), chatbot=chatbot, history=history, delay=0) ), chatbot=chatbot, history=history, delay=0)
yield from update_ui(chatbot=chatbot, history=[]) # 刷新界面 yield from update_ui(chatbot=chatbot, history=[]) # 刷新界面
@@ -120,6 +143,7 @@ def 注释源代码(file_manifest, project_folder, llm_kwargs, plugin_kwargs, ch
@CatchException @CatchException
def 注释Python项目(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request): def 注释Python项目(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):
history = [] # 清空历史,以免输入溢出 history = [] # 清空历史,以免输入溢出
plugin_kwargs["use_chinese"] = plugin_kwargs.get("use_chinese", False)
import glob, os import glob, os
if os.path.exists(txt): if os.path.exists(txt):
project_folder = txt project_folder = txt

View File

@@ -0,0 +1,36 @@
from toolbox import get_conf, update_ui
from crazy_functions.plugin_template.plugin_class_template import GptAcademicPluginTemplate, ArgProperty
from crazy_functions.SourceCode_Comment import 注释Python项目
class SourceCodeComment_Wrap(GptAcademicPluginTemplate):
def __init__(self):
"""
请注意`execute`会执行在不同的线程中,因此您在定义和使用类变量时,应当慎之又慎!
"""
pass
def define_arg_selection_menu(self):
"""
定义插件的二级选项菜单
"""
gui_definition = {
"main_input":
ArgProperty(title="路径", description="程序路径(上传文件后自动填写)", default_value="", type="string").model_dump_json(), # 主输入,自动从输入框同步
"use_chinese":
ArgProperty(title="注释语言", options=["英文", "中文"], default_value="英文", description="", type="dropdown").model_dump_json(),
# "use_emoji":
# ArgProperty(title="在注释中使用emoji", options=["禁止", "允许"], default_value="禁止", description="无", type="dropdown").model_dump_json(),
}
return gui_definition
def execute(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):
"""
执行插件
"""
if plugin_kwargs["use_chinese"] == "中文":
plugin_kwargs["use_chinese"] = True
else:
plugin_kwargs["use_chinese"] = False
yield from 注释Python项目(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request)

View File

@@ -0,0 +1,204 @@
import requests
import random
import time
import re
import json
from bs4 import BeautifulSoup
from functools import lru_cache
from itertools import zip_longest
from check_proxy import check_proxy
from toolbox import CatchException, update_ui, get_conf, promote_file_to_downloadzone, update_ui_latest_msg, generate_file_link
from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive, input_clipping
from request_llms.bridge_all import model_info
from request_llms.bridge_all import predict_no_ui_long_connection
from crazy_functions.prompts.internet import SearchOptimizerPrompt, SearchAcademicOptimizerPrompt
from crazy_functions.json_fns.pydantic_io import GptJsonIO, JsonStringError
from textwrap import dedent
from loguru import logger
from pydantic import BaseModel, Field
class Query(BaseModel):
search_keyword: str = Field(description="search query for video resource")
class VideoResource(BaseModel):
thought: str = Field(description="analysis of the search results based on the user's query")
title: str = Field(description="title of the video")
author: str = Field(description="author/uploader of the video")
bvid: str = Field(description="unique ID of the video")
another_failsafe_bvid: str = Field(description="provide another bvid, the other one is not working")
def get_video_resource(search_keyword):
from crazy_functions.media_fns.get_media import search_videos
# Search for videos and return the first result
videos = search_videos(
search_keyword
)
# Return the first video if results exist, otherwise return None
return videos
def download_video(bvid, user_name, chatbot, history):
# from experimental_mods.get_bilibili_resource import download_bilibili
from crazy_functions.media_fns.get_media import download_video
# pause a while
tic_time = 8
for i in range(tic_time):
yield from update_ui_latest_msg(
lastmsg=f"即将下载音频。等待{tic_time-i}秒后自动继续, 点击“停止”键取消此操作。",
chatbot=chatbot, history=[], delay=1)
# download audio
chatbot.append((None, "下载音频, 请稍等...")); yield from update_ui(chatbot=chatbot, history=history)
downloaded_files = yield from download_video(bvid, only_audio=True, user_name=user_name, chatbot=chatbot, history=history)
if len(downloaded_files) == 0:
# failed to download audio
return []
# preview
preview_list = [promote_file_to_downloadzone(fp) for fp in downloaded_files]
file_links = generate_file_link(preview_list)
yield from update_ui_latest_msg(f"已完成的文件: <br/>" + file_links, chatbot=chatbot, history=history, delay=0)
chatbot.append((None, f"即将下载视频。"))
# pause a while
tic_time = 16
for i in range(tic_time):
yield from update_ui_latest_msg(
lastmsg=f"即将下载视频。等待{tic_time-i}秒后自动继续, 点击“停止”键取消此操作。",
chatbot=chatbot, history=[], delay=1)
# download video
chatbot.append((None, "下载视频, 请稍等...")); yield from update_ui(chatbot=chatbot, history=history)
downloaded_files_part2 = yield from download_video(bvid, only_audio=False, user_name=user_name, chatbot=chatbot, history=history)
# preview
preview_list = [promote_file_to_downloadzone(fp) for fp in downloaded_files_part2]
file_links = generate_file_link(preview_list)
yield from update_ui_latest_msg(f"已完成的文件: <br/>" + file_links, chatbot=chatbot, history=history, delay=0)
# return
return downloaded_files + downloaded_files_part2
class Strategy(BaseModel):
thought: str = Field(description="analysis of the user's wish, for example, can you recall the name of the resource?")
which_methods: str = Field(description="Which method to use to find the necessary information? choose from 'method_1' and 'method_2'.")
method_1_search_keywords: str = Field(description="Generate keywords to search the internet if you choose method 1, otherwise empty.")
method_2_generate_keywords: str = Field(description="Generate keywords for video download engine if you choose method 2, otherwise empty.")
@CatchException
def 多媒体任务(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):
user_wish: str = txt
# query demos:
# - "我想找一首歌里面有句歌词是“turn your face towards the sun”"
# - "一首歌,第一句是红豆生南国"
# - "一首音乐,中国航天任务专用的那首"
# - "戴森球计划在熔岩星球的音乐"
# - "hanser的百变什么精"
# - "打大圣残躯时的bgm"
# - "渊下宫战斗音乐"
# 搜索
chatbot.append((txt, "检索中, 请稍等..."))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
if "跳过联网搜索" not in user_wish:
# 结构化生成
internet_search_keyword = user_wish
yield from update_ui_latest_msg(lastmsg=f"发起互联网检索: {internet_search_keyword} ...", chatbot=chatbot, history=[], delay=1)
from crazy_functions.Internet_GPT import internet_search_with_analysis_prompt
result = yield from internet_search_with_analysis_prompt(
prompt=internet_search_keyword,
analysis_prompt="请根据搜索结果分析,获取用户需要找的资源的名称、作者、出处等信息。",
llm_kwargs=llm_kwargs,
chatbot=chatbot
)
yield from update_ui_latest_msg(lastmsg=f"互联网检索结论: {result} \n\n 正在生成进一步检索方案 ...", chatbot=chatbot, history=[], delay=1)
rf_req = dedent(f"""
The user wish to get the following resource:
{user_wish}
Meanwhile, you can access another expert's opinion on the user's wish:
{result}
Generate search keywords (less than 5 keywords) for video download engine accordingly.
""")
else:
user_wish = user_wish.replace("跳过联网搜索", "").strip()
rf_req = dedent(f"""
The user wish to get the following resource:
{user_wish}
Generate research keywords (less than 5 keywords) accordingly.
""")
gpt_json_io = GptJsonIO(Query)
inputs = rf_req + gpt_json_io.format_instructions
run_gpt_fn = lambda inputs, sys_prompt: predict_no_ui_long_connection(inputs=inputs, llm_kwargs=llm_kwargs, history=[], sys_prompt=sys_prompt, observe_window=[])
analyze_res = run_gpt_fn(inputs, "")
logger.info(analyze_res)
query: Query = gpt_json_io.generate_output_auto_repair(analyze_res, run_gpt_fn)
video_engine_keywords = query.search_keyword
# 关键词展示
chatbot.append((None, f"检索关键词已确认: {video_engine_keywords}。筛选中, 请稍等..."))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
# 获取候选资源
candidate_dictionary: dict = get_video_resource(video_engine_keywords)
candidate_dictionary_as_str = json.dumps(candidate_dictionary, ensure_ascii=False, indent=4)
# 展示候选资源
candidate_display = "\n".join([f"{i+1}. {it['title']}" for i, it in enumerate(candidate_dictionary)])
chatbot.append((None, f"候选:\n\n{candidate_display}"))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
# 结构化生成
rf_req_2 = dedent(f"""
The user wish to get the following resource:
{user_wish}
Select the most relevant and suitable video resource from the following search results:
{candidate_dictionary_as_str}
Note:
1. The first several search video results are more likely to satisfy the user's wish.
2. The time duration of the video should be less than 10 minutes.
3. You should analyze the search results first, before giving your answer.
4. Use Chinese if possible.
5. Beside the primary video selection, give a backup video resource `bvid`.
""")
gpt_json_io = GptJsonIO(VideoResource)
inputs = rf_req_2 + gpt_json_io.format_instructions
run_gpt_fn = lambda inputs, sys_prompt: predict_no_ui_long_connection(inputs=inputs, llm_kwargs=llm_kwargs, history=[], sys_prompt=sys_prompt, observe_window=[])
analyze_res = run_gpt_fn(inputs, "")
logger.info(analyze_res)
video_resource: VideoResource = gpt_json_io.generate_output_auto_repair(analyze_res, run_gpt_fn)
# Display
chatbot.append(
(None,
f"分析:{video_resource.thought}" "<br/>"
f"选择: `{video_resource.title}`。" "<br/>"
f"作者:{video_resource.author}"
)
)
chatbot.append((None, f"下载中, 请稍等..."))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
if video_resource and video_resource.bvid:
logger.info(video_resource)
downloaded = yield from download_video(video_resource.bvid, chatbot.get_user(), chatbot, history)
if not downloaded:
chatbot.append((None, f"下载失败, 尝试备选 ..."))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
downloaded = yield from download_video(video_resource.another_failsafe_bvid, chatbot.get_user(), chatbot, history)
@CatchException
def debug(bvid, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):
yield from download_video(bvid, chatbot.get_user(), chatbot, history)

View File

@@ -1,5 +1,5 @@
from toolbox import CatchException, update_ui, gen_time_str, trimmed_format_exc, ProxyNetworkActivate from toolbox import CatchException, update_ui, gen_time_str, trimmed_format_exc, ProxyNetworkActivate
from toolbox import report_exception, get_log_folder, update_ui_lastest_msg, Singleton from toolbox import report_exception, get_log_folder, update_ui_latest_msg, Singleton
from crazy_functions.agent_fns.pipe import PluginMultiprocessManager, PipeCom from crazy_functions.agent_fns.pipe import PluginMultiprocessManager, PipeCom
from crazy_functions.agent_fns.general import AutoGenGeneral from crazy_functions.agent_fns.general import AutoGenGeneral

View File

@@ -1,4 +1,5 @@
from crazy_functions.agent_fns.pipe import PluginMultiprocessManager, PipeCom from crazy_functions.agent_fns.pipe import PluginMultiprocessManager, PipeCom
from loguru import logger
class EchoDemo(PluginMultiprocessManager): class EchoDemo(PluginMultiprocessManager):
def subprocess_worker(self, child_conn): def subprocess_worker(self, child_conn):
@@ -7,7 +8,7 @@ class EchoDemo(PluginMultiprocessManager):
while True: while True:
msg = self.child_conn.recv() # PipeCom msg = self.child_conn.recv() # PipeCom
if msg.cmd == "user_input": if msg.cmd == "user_input":
# wait futher user input # wait father user input
self.child_conn.send(PipeCom("show", msg.content)) self.child_conn.send(PipeCom("show", msg.content))
wait_success = self.subprocess_worker_wait_user_feedback(wait_msg="我准备好处理下一个问题了.") wait_success = self.subprocess_worker_wait_user_feedback(wait_msg="我准备好处理下一个问题了.")
if not wait_success: if not wait_success:
@@ -16,4 +17,4 @@ class EchoDemo(PluginMultiprocessManager):
elif msg.cmd == "terminate": elif msg.cmd == "terminate":
self.child_conn.send(PipeCom("done", "")) self.child_conn.send(PipeCom("done", ""))
break break
print('[debug] subprocess_worker terminated') logger.info('[debug] subprocess_worker terminated')

View File

@@ -27,7 +27,7 @@ def gpt_academic_generate_oai_reply(
llm_kwargs=llm_config, llm_kwargs=llm_config,
history=history, history=history,
sys_prompt=self._oai_system_message[0]['content'], sys_prompt=self._oai_system_message[0]['content'],
console_slience=True console_silence=True
) )
assumed_done = reply.endswith('\nTERMINATE') assumed_done = reply.endswith('\nTERMINATE')
return True, reply return True, reply

View File

@@ -1,5 +1,6 @@
from toolbox import get_log_folder, update_ui, gen_time_str, get_conf, promote_file_to_downloadzone from toolbox import get_log_folder, update_ui, gen_time_str, get_conf, promote_file_to_downloadzone
from crazy_functions.agent_fns.watchdog import WatchDog from crazy_functions.agent_fns.watchdog import WatchDog
from loguru import logger
import time, os import time, os
class PipeCom: class PipeCom:
@@ -47,7 +48,7 @@ class PluginMultiprocessManager:
def terminate(self): def terminate(self):
self.p.terminate() self.p.terminate()
self.alive = False self.alive = False
print("[debug] instance terminated") logger.info("[debug] instance terminated")
def subprocess_worker(self, child_conn): def subprocess_worker(self, child_conn):
# ⭐⭐ run in subprocess # ⭐⭐ run in subprocess

View File

@@ -1,14 +1,16 @@
from toolbox import CatchException, update_ui
from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from request_llms.bridge_all import predict_no_ui_long_connection
import datetime import datetime
import re import re
import os import os
from loguru import logger
from textwrap import dedent from textwrap import dedent
from toolbox import CatchException, update_ui
from request_llms.bridge_all import predict_no_ui_long_connection
from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
# TODO: 解决缩进问题 # TODO: 解决缩进问题
find_function_end_prompt = ''' find_function_end_prompt = '''
Below is a page of code that you need to read. This page may not yet complete, you job is to split this page to sperate functions, class functions etc. Below is a page of code that you need to read. This page may not yet complete, you job is to split this page to separate functions, class functions etc.
- Provide the line number where the first visible function ends. - Provide the line number where the first visible function ends.
- Provide the line number where the next visible function begins. - Provide the line number where the next visible function begins.
- If there are no other functions in this page, you should simply return the line number of the last line. - If there are no other functions in this page, you should simply return the line number of the last line.
@@ -57,7 +59,7 @@ OUTPUT:
revise_funtion_prompt = ''' revise_function_prompt = '''
You need to read the following code, and revise the source code ({FILE_BASENAME}) according to following instructions: You need to read the following code, and revise the source code ({FILE_BASENAME}) according to following instructions:
1. You should analyze the purpose of the functions (if there are any). 1. You should analyze the purpose of the functions (if there are any).
2. You need to add docstring for the provided functions (if there are any). 2. You need to add docstring for the provided functions (if there are any).
@@ -66,6 +68,7 @@ Be aware:
1. You must NOT modify the indent of code. 1. You must NOT modify the indent of code.
2. You are NOT authorized to change or translate non-comment code, and you are NOT authorized to add empty lines either, toggle qu. 2. You are NOT authorized to change or translate non-comment code, and you are NOT authorized to add empty lines either, toggle qu.
3. Use {LANG} to add comments and docstrings. Do NOT translate Chinese that is already in the code. 3. Use {LANG} to add comments and docstrings. Do NOT translate Chinese that is already in the code.
4. Besides adding a docstring, use the ⭐ symbol to annotate the most core and important line of code within the function, explaining its role.
------------------ Example ------------------ ------------------ Example ------------------
INPUT: INPUT:
@@ -114,10 +117,66 @@ def zip_result(folder):
''' '''
revise_function_prompt_chinese = '''
您需要阅读以下代码,并根据以下说明修订源代码({FILE_BASENAME}):
1. 如果源代码中包含函数的话, 你应该分析给定函数实现了什么功能
2. 如果源代码中包含函数的话, 你需要为函数添加docstring, docstring必须使用中文
请注意:
1. 你不得修改代码的缩进
2. 你无权更改或翻译代码中的非注释部分,也不允许添加空行
3. 使用 {LANG} 添加注释和文档字符串。不要翻译代码中已有的中文
4. 除了添加docstring之外, 使用⭐符号给该函数中最核心、最重要的一行代码添加注释,并说明其作用
------------------ 示例 ------------------
INPUT:
```
L0000 |
L0001 |def zip_result(folder):
L0002 | t = gen_time_str()
L0003 | zip_folder(folder, get_log_folder(), f"result.zip")
L0004 | return os.path.join(get_log_folder(), f"result.zip")
L0005 |
L0006 |
```
OUTPUT:
<instruction_1_purpose>
该函数用于压缩指定文件夹,并返回生成的`zip`文件的路径。
</instruction_1_purpose>
<instruction_2_revised_code>
```
def zip_result(folder):
"""
该函数将指定的文件夹压缩成ZIP文件, 并将其存储在日志文件夹中。
输入参数:
folder (str): 需要压缩的文件夹的路径。
返回值:
str: 日志文件夹中创建的ZIP文件的路径。
"""
t = gen_time_str()
zip_folder(folder, get_log_folder(), f"result.zip") # ⭐ 执行文件夹的压缩
return os.path.join(get_log_folder(), f"result.zip")
```
</instruction_2_revised_code>
------------------ End of Example ------------------
------------------ the real INPUT you need to process NOW ({FILE_BASENAME}) ------------------
```
{THE_CODE}
```
{INDENT_REMINDER}
{BRIEF_REMINDER}
{HINT_REMINDER}
'''
class PythonCodeComment(): class PythonCodeComment():
def __init__(self, llm_kwargs, language) -> None: def __init__(self, llm_kwargs, plugin_kwargs, language, observe_window_update) -> None:
self.original_content = "" self.original_content = ""
self.full_context = [] self.full_context = []
self.full_context_with_line_no = [] self.full_context_with_line_no = []
@@ -125,7 +184,13 @@ class PythonCodeComment():
self.page_limit = 100 # 100 lines of code each page self.page_limit = 100 # 100 lines of code each page
self.ignore_limit = 20 self.ignore_limit = 20
self.llm_kwargs = llm_kwargs self.llm_kwargs = llm_kwargs
self.plugin_kwargs = plugin_kwargs
self.language = language self.language = language
self.observe_window_update = observe_window_update
if self.language == "chinese":
self.core_prompt = revise_function_prompt_chinese
else:
self.core_prompt = revise_function_prompt
self.path = None self.path = None
self.file_basename = None self.file_basename = None
self.file_brief = "" self.file_brief = ""
@@ -157,7 +222,7 @@ class PythonCodeComment():
history=[], history=[],
sys_prompt="", sys_prompt="",
observe_window=[], observe_window=[],
console_slience=True console_silence=True
) )
def extract_number(text): def extract_number(text):
@@ -251,12 +316,12 @@ class PythonCodeComment():
def tag_code(self, fn, hint): def tag_code(self, fn, hint):
code = fn code = fn
_, n_indent = self.dedent(code) _, n_indent = self.dedent(code)
indent_reminder = "" if n_indent == 0 else "(Reminder: as you can see, this piece of code has indent made up with {n_indent} whitespace, please preseve them in the OUTPUT.)" indent_reminder = "" if n_indent == 0 else "(Reminder: as you can see, this piece of code has indent made up with {n_indent} whitespace, please preserve them in the OUTPUT.)"
brief_reminder = "" if self.file_brief == "" else f"({self.file_basename} abstract: {self.file_brief})" brief_reminder = "" if self.file_brief == "" else f"({self.file_basename} abstract: {self.file_brief})"
hint_reminder = "" if hint is None else f"(Reminder: do not ignore or modify code such as `{hint}`, provide complete code in the OUTPUT.)" hint_reminder = "" if hint is None else f"(Reminder: do not ignore or modify code such as `{hint}`, provide complete code in the OUTPUT.)"
self.llm_kwargs['temperature'] = 0 self.llm_kwargs['temperature'] = 0
result = predict_no_ui_long_connection( result = predict_no_ui_long_connection(
inputs=revise_funtion_prompt.format( inputs=self.core_prompt.format(
LANG=self.language, LANG=self.language,
FILE_BASENAME=self.file_basename, FILE_BASENAME=self.file_basename,
THE_CODE=code, THE_CODE=code,
@@ -268,7 +333,7 @@ class PythonCodeComment():
history=[], history=[],
sys_prompt="", sys_prompt="",
observe_window=[], observe_window=[],
console_slience=True console_silence=True
) )
def get_code_block(reply): def get_code_block(reply):
@@ -335,7 +400,7 @@ class PythonCodeComment():
return revised return revised
def begin_comment_source_code(self, chatbot=None, history=None): def begin_comment_source_code(self, chatbot=None, history=None):
# from toolbox import update_ui_lastest_msg # from toolbox import update_ui_latest_msg
assert self.path is not None assert self.path is not None
assert '.py' in self.path # must be python source code assert '.py' in self.path # must be python source code
# write_target = self.path + '.revised.py' # write_target = self.path + '.revised.py'
@@ -344,9 +409,10 @@ class PythonCodeComment():
# with open(self.path + '.revised.py', 'w+', encoding='utf8') as f: # with open(self.path + '.revised.py', 'w+', encoding='utf8') as f:
while True: while True:
try: try:
# yield from update_ui_lastest_msg(f"({self.file_basename}) 正在读取下一段代码片段:\n", chatbot=chatbot, history=history, delay=0) # yield from update_ui_latest_msg(f"({self.file_basename}) 正在读取下一段代码片段:\n", chatbot=chatbot, history=history, delay=0)
next_batch, line_no_start, line_no_end = self.get_next_batch() next_batch, line_no_start, line_no_end = self.get_next_batch()
# yield from update_ui_lastest_msg(f"({self.file_basename}) 处理代码片段:\n\n{next_batch}", chatbot=chatbot, history=history, delay=0) self.observe_window_update(f"正在处理{self.file_basename} - {line_no_start}/{len(self.full_context)}\n")
# yield from update_ui_latest_msg(f"({self.file_basename}) 处理代码片段:\n\n{next_batch}", chatbot=chatbot, history=history, delay=0)
hint = None hint = None
MAX_ATTEMPT = 2 MAX_ATTEMPT = 2
@@ -355,7 +421,7 @@ class PythonCodeComment():
try: try:
successful, hint = self.verify_successful(next_batch, result) successful, hint = self.verify_successful(next_batch, result)
except Exception as e: except Exception as e:
print('ignored exception:\n' + str(e)) logger.error('ignored exception:\n' + str(e))
break break
if successful: if successful:
break break

View File

@@ -1,4 +1,5 @@
import threading, time import threading, time
from loguru import logger
class WatchDog(): class WatchDog():
def __init__(self, timeout, bark_fn, interval=3, msg="") -> None: def __init__(self, timeout, bark_fn, interval=3, msg="") -> None:
@@ -13,7 +14,7 @@ class WatchDog():
while True: while True:
if self.kill_dog: break if self.kill_dog: break
if time.time() - self.last_feed > self.timeout: if time.time() - self.last_feed > self.timeout:
if len(self.msg) > 0: print(self.msg) if len(self.msg) > 0: logger.info(self.msg)
self.bark_fn() self.bark_fn()
break break
time.sleep(self.interval) time.sleep(self.interval)

View File

@@ -1,39 +1,47 @@
import ast import token
import tokenize
class CommentRemover(ast.NodeTransformer): import copy
def visit_FunctionDef(self, node): import io
# 移除函数的文档字符串
if (node.body and isinstance(node.body[0], ast.Expr) and
isinstance(node.body[0].value, ast.Str)):
node.body = node.body[1:]
self.generic_visit(node)
return node
def visit_ClassDef(self, node):
# 移除类的文档字符串
if (node.body and isinstance(node.body[0], ast.Expr) and
isinstance(node.body[0].value, ast.Str)):
node.body = node.body[1:]
self.generic_visit(node)
return node
def visit_Module(self, node):
# 移除模块的文档字符串
if (node.body and isinstance(node.body[0], ast.Expr) and
isinstance(node.body[0].value, ast.Str)):
node.body = node.body[1:]
self.generic_visit(node)
return node
def remove_python_comments(source_code): def remove_python_comments(input_source: str) -> str:
# 解析源代码为 AST source_flag = copy.copy(input_source)
tree = ast.parse(source_code) source = io.StringIO(input_source)
# 移除注释 ls = input_source.split('\n')
transformer = CommentRemover() prev_toktype = token.INDENT
tree = transformer.visit(tree) readline = source.readline
# 将处理后的 AST 转换回源代码
return ast.unparse(tree) def get_char_index(lineno, col):
# find the index of the char in the source code
if lineno == 1:
return len('\n'.join(ls[:(lineno-1)])) + col
else:
return len('\n'.join(ls[:(lineno-1)])) + col + 1
def replace_char_between(start_lineno, start_col, end_lineno, end_col, source, replace_char, ls):
# replace char between start_lineno, start_col and end_lineno, end_col with replace_char, but keep '\n' and ' '
b = get_char_index(start_lineno, start_col)
e = get_char_index(end_lineno, end_col)
for i in range(b, e):
if source[i] == '\n':
source = source[:i] + '\n' + source[i+1:]
elif source[i] == ' ':
source = source[:i] + ' ' + source[i+1:]
else:
source = source[:i] + replace_char + source[i+1:]
return source
tokgen = tokenize.generate_tokens(readline)
for toktype, ttext, (slineno, scol), (elineno, ecol), ltext in tokgen:
if toktype == token.STRING and (prev_toktype == token.INDENT):
source_flag = replace_char_between(slineno, scol, elineno, ecol, source_flag, ' ', ls)
elif toktype == token.STRING and (prev_toktype == token.NEWLINE):
source_flag = replace_char_between(slineno, scol, elineno, ecol, source_flag, ' ', ls)
elif toktype == tokenize.COMMENT:
source_flag = replace_char_between(slineno, scol, elineno, ecol, source_flag, ' ', ls)
prev_toktype = toktype
return source_flag
# 示例使用 # 示例使用
if __name__ == "__main__": if __name__ == "__main__":

View File

@@ -1,141 +0,0 @@
from toolbox import CatchException, update_ui, promote_file_to_downloadzone
from .crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency
import datetime, json
def fetch_items(list_of_items, batch_size):
for i in range(0, len(list_of_items), batch_size):
yield list_of_items[i:i + batch_size]
def string_to_options(arguments):
import argparse
import shlex
# Create an argparse.ArgumentParser instance
parser = argparse.ArgumentParser()
# Add command-line arguments
parser.add_argument("--llm_to_learn", type=str, help="LLM model to learn", default="gpt-3.5-turbo")
parser.add_argument("--prompt_prefix", type=str, help="Prompt prefix", default='')
parser.add_argument("--system_prompt", type=str, help="System prompt", default='')
parser.add_argument("--batch", type=int, help="System prompt", default=50)
parser.add_argument("--pre_seq_len", type=int, help="pre_seq_len", default=50)
parser.add_argument("--learning_rate", type=float, help="learning_rate", default=2e-2)
parser.add_argument("--num_gpus", type=int, help="num_gpus", default=1)
parser.add_argument("--json_dataset", type=str, help="json_dataset", default="")
parser.add_argument("--ptuning_directory", type=str, help="ptuning_directory", default="")
# Parse the arguments
args = parser.parse_args(shlex.split(arguments))
return args
@CatchException
def 微调数据集生成(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):
"""
txt 输入栏用户输入的文本,例如需要翻译的一段话,再例如一个包含了待处理文件的路径
llm_kwargs gpt模型参数如温度和top_p等一般原样传递下去就行
plugin_kwargs 插件模型的参数
chatbot 聊天显示框的句柄,用于显示给用户
history 聊天历史,前情提要
system_prompt 给gpt的静默提醒
user_request 当前用户的请求信息IP地址等
"""
history = [] # 清空历史,以免输入溢出
chatbot.append(("这是什么功能?", "[Local Message] 微调数据集生成"))
if ("advanced_arg" in plugin_kwargs) and (plugin_kwargs["advanced_arg"] == ""): plugin_kwargs.pop("advanced_arg")
args = plugin_kwargs.get("advanced_arg", None)
if args is None:
chatbot.append(("没给定指令", "退出"))
yield from update_ui(chatbot=chatbot, history=history); return
else:
arguments = string_to_options(arguments=args)
dat = []
with open(txt, 'r', encoding='utf8') as f:
for line in f.readlines():
json_dat = json.loads(line)
dat.append(json_dat["content"])
llm_kwargs['llm_model'] = arguments.llm_to_learn
for batch in fetch_items(dat, arguments.batch):
res = yield from request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency(
inputs_array=[f"{arguments.prompt_prefix}\n\n{b}" for b in (batch)],
inputs_show_user_array=[f"Show Nothing" for _ in (batch)],
llm_kwargs=llm_kwargs,
chatbot=chatbot,
history_array=[[] for _ in (batch)],
sys_prompt_array=[arguments.system_prompt for _ in (batch)],
max_workers=10 # OpenAI所允许的最大并行过载
)
with open(txt+'.generated.json', 'a+', encoding='utf8') as f:
for b, r in zip(batch, res[1::2]):
f.write(json.dumps({"content":b, "summary":r}, ensure_ascii=False)+'\n')
promote_file_to_downloadzone(txt+'.generated.json', rename_file='generated.json', chatbot=chatbot)
return
@CatchException
def 启动微调(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):
"""
txt 输入栏用户输入的文本,例如需要翻译的一段话,再例如一个包含了待处理文件的路径
llm_kwargs gpt模型参数如温度和top_p等一般原样传递下去就行
plugin_kwargs 插件模型的参数
chatbot 聊天显示框的句柄,用于显示给用户
history 聊天历史,前情提要
system_prompt 给gpt的静默提醒
user_request 当前用户的请求信息IP地址等
"""
import subprocess
history = [] # 清空历史,以免输入溢出
chatbot.append(("这是什么功能?", "[Local Message] 微调数据集生成"))
if ("advanced_arg" in plugin_kwargs) and (plugin_kwargs["advanced_arg"] == ""): plugin_kwargs.pop("advanced_arg")
args = plugin_kwargs.get("advanced_arg", None)
if args is None:
chatbot.append(("没给定指令", "退出"))
yield from update_ui(chatbot=chatbot, history=history); return
else:
arguments = string_to_options(arguments=args)
pre_seq_len = arguments.pre_seq_len # 128
learning_rate = arguments.learning_rate # 2e-2
num_gpus = arguments.num_gpus # 1
json_dataset = arguments.json_dataset # 't_code.json'
ptuning_directory = arguments.ptuning_directory # '/home/hmp/ChatGLM2-6B/ptuning'
command = f"torchrun --standalone --nnodes=1 --nproc-per-node={num_gpus} main.py \
--do_train \
--train_file AdvertiseGen/{json_dataset} \
--validation_file AdvertiseGen/{json_dataset} \
--preprocessing_num_workers 20 \
--prompt_column content \
--response_column summary \
--overwrite_cache \
--model_name_or_path THUDM/chatglm2-6b \
--output_dir output/clothgen-chatglm2-6b-pt-{pre_seq_len}-{learning_rate} \
--overwrite_output_dir \
--max_source_length 256 \
--max_target_length 256 \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 16 \
--predict_with_generate \
--max_steps 100 \
--logging_steps 10 \
--save_steps 20 \
--learning_rate {learning_rate} \
--pre_seq_len {pre_seq_len} \
--quantization_bit 4"
process = subprocess.Popen(command, shell=True, cwd=ptuning_directory)
try:
process.communicate(timeout=3600*24)
except subprocess.TimeoutExpired:
process.kill()
return

View File

@@ -1,10 +1,10 @@
from toolbox import update_ui, get_conf, trimmed_format_exc, get_max_token, Singleton
from shared_utils.char_visual_effect import scolling_visual_effect
import threading
import os import os
import logging import threading
from loguru import logger
from shared_utils.char_visual_effect import scrolling_visual_effect
from toolbox import update_ui, get_conf, trimmed_format_exc, get_max_token, Singleton
def input_clipping(inputs, history, max_token_limit): def input_clipping(inputs, history, max_token_limit, return_clip_flags=False):
""" """
当输入文本 + 历史文本超出最大限制时,采取措施丢弃一部分文本。 当输入文本 + 历史文本超出最大限制时,采取措施丢弃一部分文本。
输入: 输入:
@@ -20,17 +20,20 @@ def input_clipping(inputs, history, max_token_limit):
enc = model_info["gpt-3.5-turbo"]['tokenizer'] enc = model_info["gpt-3.5-turbo"]['tokenizer']
def get_token_num(txt): return len(enc.encode(txt, disallowed_special=())) def get_token_num(txt): return len(enc.encode(txt, disallowed_special=()))
mode = 'input-and-history' mode = 'input-and-history'
# 当 输入部分的token占比 小于 全文的一半时,只裁剪历史 # 当 输入部分的token占比 小于 全文的一半时,只裁剪历史
input_token_num = get_token_num(inputs) input_token_num = get_token_num(inputs)
original_input_len = len(inputs)
if input_token_num < max_token_limit//2: if input_token_num < max_token_limit//2:
mode = 'only-history' mode = 'only-history'
max_token_limit = max_token_limit - input_token_num max_token_limit = max_token_limit - input_token_num
everything = [inputs] if mode == 'input-and-history' else [''] everything = [inputs] if mode == 'input-and-history' else ['']
everything.extend(history) everything.extend(history)
n_token = get_token_num('\n'.join(everything)) full_token_num = n_token = get_token_num('\n'.join(everything))
everything_token = [get_token_num(e) for e in everything] everything_token = [get_token_num(e) for e in everything]
everything_token_num = sum(everything_token)
delta = max(everything_token) // 16 # 截断时的颗粒度 delta = max(everything_token) // 16 # 截断时的颗粒度
while n_token > max_token_limit: while n_token > max_token_limit:
@@ -43,10 +46,24 @@ def input_clipping(inputs, history, max_token_limit):
if mode == 'input-and-history': if mode == 'input-and-history':
inputs = everything[0] inputs = everything[0]
full_token_num = everything_token_num
else: else:
pass full_token_num = everything_token_num + input_token_num
history = everything[1:] history = everything[1:]
return inputs, history
flags = {
"mode": mode,
"original_input_token_num": input_token_num,
"original_full_token_num": full_token_num,
"original_input_len": original_input_len,
"clipped_input_len": len(inputs),
}
if not return_clip_flags:
return inputs, history
else:
return inputs, history, flags
def request_gpt_model_in_new_thread_with_ui_alive( def request_gpt_model_in_new_thread_with_ui_alive(
inputs, inputs_show_user, llm_kwargs, inputs, inputs_show_user, llm_kwargs,
@@ -116,7 +133,7 @@ def request_gpt_model_in_new_thread_with_ui_alive(
except: except:
# 【第三种情况】:其他错误:重试几次 # 【第三种情况】:其他错误:重试几次
tb_str = '```\n' + trimmed_format_exc() + '```' tb_str = '```\n' + trimmed_format_exc() + '```'
print(tb_str) logger.error(tb_str)
mutable[0] += f"[Local Message] 警告,在执行过程中遭遇问题, Traceback\n\n{tb_str}\n\n" mutable[0] += f"[Local Message] 警告,在执行过程中遭遇问题, Traceback\n\n{tb_str}\n\n"
if retry_op > 0: if retry_op > 0:
retry_op -= 1 retry_op -= 1
@@ -152,6 +169,7 @@ def can_multi_process(llm) -> bool:
def default_condition(llm) -> bool: def default_condition(llm) -> bool:
# legacy condition # legacy condition
if llm.startswith('gpt-'): return True if llm.startswith('gpt-'): return True
if llm.startswith('chatgpt-'): return True
if llm.startswith('api2d-'): return True if llm.startswith('api2d-'): return True
if llm.startswith('azure-'): return True if llm.startswith('azure-'): return True
if llm.startswith('spark'): return True if llm.startswith('spark'): return True
@@ -238,7 +256,7 @@ def request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency(
# 【第一种情况】:顺利完成 # 【第一种情况】:顺利完成
gpt_say = predict_no_ui_long_connection( gpt_say = predict_no_ui_long_connection(
inputs=inputs, llm_kwargs=llm_kwargs, history=history, inputs=inputs, llm_kwargs=llm_kwargs, history=history,
sys_prompt=sys_prompt, observe_window=mutable[index], console_slience=True sys_prompt=sys_prompt, observe_window=mutable[index], console_silence=True
) )
mutable[index][2] = "已成功" mutable[index][2] = "已成功"
return gpt_say return gpt_say
@@ -266,7 +284,7 @@ def request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency(
# 【第三种情况】:其他错误 # 【第三种情况】:其他错误
if detect_timeout(): raise RuntimeError("检测到程序终止。") if detect_timeout(): raise RuntimeError("检测到程序终止。")
tb_str = '```\n' + trimmed_format_exc() + '```' tb_str = '```\n' + trimmed_format_exc() + '```'
print(tb_str) logger.error(tb_str)
gpt_say += f"[Local Message] 警告,线程{index}在执行过程中遭遇问题, Traceback\n\n{tb_str}\n\n" gpt_say += f"[Local Message] 警告,线程{index}在执行过程中遭遇问题, Traceback\n\n{tb_str}\n\n"
if len(mutable[index][0]) > 0: gpt_say += "此线程失败前收到的回答:\n\n" + mutable[index][0] if len(mutable[index][0]) > 0: gpt_say += "此线程失败前收到的回答:\n\n" + mutable[index][0]
if retry_op > 0: if retry_op > 0:
@@ -308,7 +326,7 @@ def request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency(
mutable[thread_index][1] = time.time() mutable[thread_index][1] = time.time()
# 在前端打印些好玩的东西 # 在前端打印些好玩的东西
for thread_index, _ in enumerate(worker_done): for thread_index, _ in enumerate(worker_done):
print_something_really_funny = f"[ ...`{scolling_visual_effect(mutable[thread_index][0], scroller_max_len)}`... ]" print_something_really_funny = f"[ ...`{scrolling_visual_effect(mutable[thread_index][0], scroller_max_len)}`... ]"
observe_win.append(print_something_really_funny) observe_win.append(print_something_really_funny)
# 在前端打印些好玩的东西 # 在前端打印些好玩的东西
stat_str = ''.join([f'`{mutable[thread_index][2]}`: {obs}\n\n' stat_str = ''.join([f'`{mutable[thread_index][2]}`: {obs}\n\n'
@@ -361,7 +379,7 @@ def read_and_clean_pdf_text(fp):
import fitz, copy import fitz, copy
import re import re
import numpy as np import numpy as np
from shared_utils.colorful import print亮黄, print亮绿 # from shared_utils.colorful import print亮黄, print亮绿
fc = 0 # Index 0 文本 fc = 0 # Index 0 文本
fs = 1 # Index 1 字体 fs = 1 # Index 1 字体
fb = 2 # Index 2 框框 fb = 2 # Index 2 框框
@@ -371,11 +389,11 @@ def read_and_clean_pdf_text(fp):
""" """
提取文本块主字体 提取文本块主字体
""" """
fsize_statiscs = {} fsize_statistics = {}
for wtf in l['spans']: for wtf in l['spans']:
if wtf['size'] not in fsize_statiscs: fsize_statiscs[wtf['size']] = 0 if wtf['size'] not in fsize_statistics: fsize_statistics[wtf['size']] = 0
fsize_statiscs[wtf['size']] += len(wtf['text']) fsize_statistics[wtf['size']] += len(wtf['text'])
return max(fsize_statiscs, key=fsize_statiscs.get) return max(fsize_statistics, key=fsize_statistics.get)
def ffsize_same(a,b): def ffsize_same(a,b):
""" """
@@ -415,11 +433,11 @@ def read_and_clean_pdf_text(fp):
############################## <第 2 步,获取正文主字体> ################################## ############################## <第 2 步,获取正文主字体> ##################################
try: try:
fsize_statiscs = {} fsize_statistics = {}
for span in meta_span: for span in meta_span:
if span[1] not in fsize_statiscs: fsize_statiscs[span[1]] = 0 if span[1] not in fsize_statistics: fsize_statistics[span[1]] = 0
fsize_statiscs[span[1]] += span[2] fsize_statistics[span[1]] += span[2]
main_fsize = max(fsize_statiscs, key=fsize_statiscs.get) main_fsize = max(fsize_statistics, key=fsize_statistics.get)
if REMOVE_FOOT_NOTE: if REMOVE_FOOT_NOTE:
give_up_fize_threshold = main_fsize * REMOVE_FOOT_FFSIZE_PERCENT give_up_fize_threshold = main_fsize * REMOVE_FOOT_FFSIZE_PERCENT
except: except:
@@ -578,7 +596,7 @@ class nougat_interface():
def nougat_with_timeout(self, command, cwd, timeout=3600): def nougat_with_timeout(self, command, cwd, timeout=3600):
import subprocess import subprocess
from toolbox import ProxyNetworkActivate from toolbox import ProxyNetworkActivate
logging.info(f'正在执行命令 {command}') logger.info(f'正在执行命令 {command}')
with ProxyNetworkActivate("Nougat_Download"): with ProxyNetworkActivate("Nougat_Download"):
process = subprocess.Popen(command, shell=False, cwd=cwd, env=os.environ) process = subprocess.Popen(command, shell=False, cwd=cwd, env=os.environ)
try: try:
@@ -586,15 +604,15 @@ class nougat_interface():
except subprocess.TimeoutExpired: except subprocess.TimeoutExpired:
process.kill() process.kill()
stdout, stderr = process.communicate() stdout, stderr = process.communicate()
print("Process timed out!") logger.error("Process timed out!")
return False return False
return True return True
def NOUGAT_parse_pdf(self, fp, chatbot, history): def NOUGAT_parse_pdf(self, fp, chatbot, history):
from toolbox import update_ui_lastest_msg from toolbox import update_ui_latest_msg
yield from update_ui_lastest_msg("正在解析论文, 请稍候。进度:正在排队, 等待线程锁...", yield from update_ui_latest_msg("正在解析论文, 请稍候。进度:正在排队, 等待线程锁...",
chatbot=chatbot, history=history, delay=0) chatbot=chatbot, history=history, delay=0)
self.threadLock.acquire() self.threadLock.acquire()
import glob, threading, os import glob, threading, os
@@ -602,7 +620,7 @@ class nougat_interface():
dst = os.path.join(get_log_folder(plugin_name='nougat'), gen_time_str()) dst = os.path.join(get_log_folder(plugin_name='nougat'), gen_time_str())
os.makedirs(dst) os.makedirs(dst)
yield from update_ui_lastest_msg("正在解析论文, 请稍候。进度正在加载NOUGAT... 提示首次运行需要花费较长时间下载NOUGAT参数", yield from update_ui_latest_msg("正在解析论文, 请稍候。进度正在加载NOUGAT... 提示首次运行需要花费较长时间下载NOUGAT参数",
chatbot=chatbot, history=history, delay=0) chatbot=chatbot, history=history, delay=0)
command = ['nougat', '--out', os.path.abspath(dst), os.path.abspath(fp)] command = ['nougat', '--out', os.path.abspath(dst), os.path.abspath(fp)]
self.nougat_with_timeout(command, cwd=os.getcwd(), timeout=3600) self.nougat_with_timeout(command, cwd=os.getcwd(), timeout=3600)

View File

@@ -1,5 +1,6 @@
import os import os
from textwrap import indent from textwrap import indent
from loguru import logger
class FileNode: class FileNode:
def __init__(self, name, build_manifest=False): def __init__(self, name, build_manifest=False):
@@ -60,7 +61,7 @@ class FileNode:
current_node.children.append(term) current_node.children.append(term)
def print_files_recursively(self, level=0, code="R0"): def print_files_recursively(self, level=0, code="R0"):
print(' '*level + self.name + ' ' + str(self.is_leaf) + ' ' + str(self.level)) logger.info(' '*level + self.name + ' ' + str(self.is_leaf) + ' ' + str(self.level))
for j, child in enumerate(self.children): for j, child in enumerate(self.children):
child.print_files_recursively(level=level+1, code=code+str(j)) child.print_files_recursively(level=level+1, code=code+str(j))
self.parenting_ship.extend(child.parenting_ship) self.parenting_ship.extend(child.parenting_ship)
@@ -123,4 +124,4 @@ if __name__ == "__main__":
"用于加载和分割文件中的文本的通用文件加载器用于加载和分割文件中的文本的通用文件加载器用于加载和分割文件中的文本的通用文件加载器", "用于加载和分割文件中的文本的通用文件加载器用于加载和分割文件中的文本的通用文件加载器用于加载和分割文件中的文本的通用文件加载器",
"包含了用于构建和管理向量数据库的函数和类包含了用于构建和管理向量数据库的函数和类包含了用于构建和管理向量数据库的函数和类", "包含了用于构建和管理向量数据库的函数和类包含了用于构建和管理向量数据库的函数和类包含了用于构建和管理向量数据库的函数和类",
] ]
print(build_file_tree_mermaid_diagram(file_manifest, file_comments, "项目文件树")) logger.info(build_file_tree_mermaid_diagram(file_manifest, file_comments, "项目文件树"))

View File

@@ -1,4 +1,4 @@
from toolbox import CatchException, update_ui, update_ui_lastest_msg from toolbox import CatchException, update_ui, update_ui_latest_msg
from crazy_functions.multi_stage.multi_stage_utils import GptAcademicGameBaseState from crazy_functions.multi_stage.multi_stage_utils import GptAcademicGameBaseState
from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from request_llms.bridge_all import predict_no_ui_long_connection from request_llms.bridge_all import predict_no_ui_long_connection
@@ -13,7 +13,7 @@ class MiniGame_ASCII_Art(GptAcademicGameBaseState):
else: else:
if prompt.strip() == 'exit': if prompt.strip() == 'exit':
self.delete_game = True self.delete_game = True
yield from update_ui_lastest_msg(lastmsg=f"谜底是{self.obj},游戏结束。", chatbot=chatbot, history=history, delay=0.) yield from update_ui_latest_msg(lastmsg=f"谜底是{self.obj},游戏结束。", chatbot=chatbot, history=history, delay=0.)
return return
chatbot.append([prompt, ""]) chatbot.append([prompt, ""])
yield from update_ui(chatbot=chatbot, history=history) yield from update_ui(chatbot=chatbot, history=history)
@@ -31,12 +31,12 @@ class MiniGame_ASCII_Art(GptAcademicGameBaseState):
self.cur_task = 'identify user guess' self.cur_task = 'identify user guess'
res = get_code_block(raw_res) res = get_code_block(raw_res)
history += ['', f'the answer is {self.obj}', inputs, res] history += ['', f'the answer is {self.obj}', inputs, res]
yield from update_ui_lastest_msg(lastmsg=res, chatbot=chatbot, history=history, delay=0.) yield from update_ui_latest_msg(lastmsg=res, chatbot=chatbot, history=history, delay=0.)
elif self.cur_task == 'identify user guess': elif self.cur_task == 'identify user guess':
if is_same_thing(self.obj, prompt, self.llm_kwargs): if is_same_thing(self.obj, prompt, self.llm_kwargs):
self.delete_game = True self.delete_game = True
yield from update_ui_lastest_msg(lastmsg="你猜对了!", chatbot=chatbot, history=history, delay=0.) yield from update_ui_latest_msg(lastmsg="你猜对了!", chatbot=chatbot, history=history, delay=0.)
else: else:
self.cur_task = 'identify user guess' self.cur_task = 'identify user guess'
yield from update_ui_lastest_msg(lastmsg="猜错了再试试输入“exit”获取答案。", chatbot=chatbot, history=history, delay=0.) yield from update_ui_latest_msg(lastmsg="猜错了再试试输入“exit”获取答案。", chatbot=chatbot, history=history, delay=0.)

View File

@@ -63,7 +63,7 @@ prompts_terminate = """小说的前文回顾:
""" """
from toolbox import CatchException, update_ui, update_ui_lastest_msg from toolbox import CatchException, update_ui, update_ui_latest_msg
from crazy_functions.multi_stage.multi_stage_utils import GptAcademicGameBaseState from crazy_functions.multi_stage.multi_stage_utils import GptAcademicGameBaseState
from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from request_llms.bridge_all import predict_no_ui_long_connection from request_llms.bridge_all import predict_no_ui_long_connection
@@ -112,7 +112,7 @@ class MiniGame_ResumeStory(GptAcademicGameBaseState):
if prompt.strip() == 'exit' or prompt.strip() == '结束剧情': if prompt.strip() == 'exit' or prompt.strip() == '结束剧情':
# should we terminate game here? # should we terminate game here?
self.delete_game = True self.delete_game = True
yield from update_ui_lastest_msg(lastmsg=f"游戏结束。", chatbot=chatbot, history=history, delay=0.) yield from update_ui_latest_msg(lastmsg=f"游戏结束。", chatbot=chatbot, history=history, delay=0.)
return return
if '剧情收尾' in prompt: if '剧情收尾' in prompt:
self.cur_task = 'story_terminate' self.cur_task = 'story_terminate'
@@ -137,8 +137,8 @@ class MiniGame_ResumeStory(GptAcademicGameBaseState):
) )
self.story.append(story_paragraph) self.story.append(story_paragraph)
# # 配图 # # 配图
yield from update_ui_lastest_msg(lastmsg=story_paragraph + '<br/>正在生成插图中 ...', chatbot=chatbot, history=history, delay=0.) yield from update_ui_latest_msg(lastmsg=story_paragraph + '<br/>正在生成插图中 ...', chatbot=chatbot, history=history, delay=0.)
yield from update_ui_lastest_msg(lastmsg=story_paragraph + '<br/>'+ self.generate_story_image(story_paragraph), chatbot=chatbot, history=history, delay=0.) yield from update_ui_latest_msg(lastmsg=story_paragraph + '<br/>'+ self.generate_story_image(story_paragraph), chatbot=chatbot, history=history, delay=0.)
# # 构建后续剧情引导 # # 构建后续剧情引导
previously_on_story = "" previously_on_story = ""
@@ -171,8 +171,8 @@ class MiniGame_ResumeStory(GptAcademicGameBaseState):
) )
self.story.append(story_paragraph) self.story.append(story_paragraph)
# # 配图 # # 配图
yield from update_ui_lastest_msg(lastmsg=story_paragraph + '<br/>正在生成插图中 ...', chatbot=chatbot, history=history, delay=0.) yield from update_ui_latest_msg(lastmsg=story_paragraph + '<br/>正在生成插图中 ...', chatbot=chatbot, history=history, delay=0.)
yield from update_ui_lastest_msg(lastmsg=story_paragraph + '<br/>'+ self.generate_story_image(story_paragraph), chatbot=chatbot, history=history, delay=0.) yield from update_ui_latest_msg(lastmsg=story_paragraph + '<br/>'+ self.generate_story_image(story_paragraph), chatbot=chatbot, history=history, delay=0.)
# # 构建后续剧情引导 # # 构建后续剧情引导
previously_on_story = "" previously_on_story = ""
@@ -204,8 +204,8 @@ class MiniGame_ResumeStory(GptAcademicGameBaseState):
chatbot, history_, self.sys_prompt_ chatbot, history_, self.sys_prompt_
) )
# # 配图 # # 配图
yield from update_ui_lastest_msg(lastmsg=story_paragraph + '<br/>正在生成插图中 ...', chatbot=chatbot, history=history, delay=0.) yield from update_ui_latest_msg(lastmsg=story_paragraph + '<br/>正在生成插图中 ...', chatbot=chatbot, history=history, delay=0.)
yield from update_ui_lastest_msg(lastmsg=story_paragraph + '<br/>'+ self.generate_story_image(story_paragraph), chatbot=chatbot, history=history, delay=0.) yield from update_ui_latest_msg(lastmsg=story_paragraph + '<br/>'+ self.generate_story_image(story_paragraph), chatbot=chatbot, history=history, delay=0.)
# terminate game # terminate game
self.delete_game = True self.delete_game = True

View File

@@ -2,7 +2,7 @@ import time
import importlib import importlib
from toolbox import trimmed_format_exc, gen_time_str, get_log_folder from toolbox import trimmed_format_exc, gen_time_str, get_log_folder
from toolbox import CatchException, update_ui, gen_time_str, trimmed_format_exc, is_the_upload_folder from toolbox import CatchException, update_ui, gen_time_str, trimmed_format_exc, is_the_upload_folder
from toolbox import promote_file_to_downloadzone, get_log_folder, update_ui_lastest_msg from toolbox import promote_file_to_downloadzone, get_log_folder, update_ui_latest_msg
import multiprocessing import multiprocessing
def get_class_name(class_string): def get_class_name(class_string):

View File

@@ -24,8 +24,8 @@ class Actor(BaseModel):
film_names: List[str] = Field(description="list of names of films they starred in") film_names: List[str] = Field(description="list of names of films they starred in")
""" """
import json, re, logging import json, re
from loguru import logger as logging
PYDANTIC_FORMAT_INSTRUCTIONS = """The output should be formatted as a JSON instance that conforms to the JSON schema below. PYDANTIC_FORMAT_INSTRUCTIONS = """The output should be formatted as a JSON instance that conforms to the JSON schema below.
@@ -102,10 +102,10 @@ class GptJsonIO():
logging.info(f'Repairing json{response}') logging.info(f'Repairing json{response}')
repair_prompt = self.generate_repair_prompt(broken_json = response, error=repr(e)) repair_prompt = self.generate_repair_prompt(broken_json = response, error=repr(e))
result = self.generate_output(gpt_gen_fn(repair_prompt, self.format_instructions)) result = self.generate_output(gpt_gen_fn(repair_prompt, self.format_instructions))
logging.info('Repaire json success.') logging.info('Repair json success.')
except Exception as e: except Exception as e:
# 没辙了,放弃治疗 # 没辙了,放弃治疗
logging.info('Repaire json fail.') logging.info('Repair json fail.')
raise JsonStringError('Cannot repair json.', str(e)) raise JsonStringError('Cannot repair json.', str(e))
return result return result

View File

@@ -0,0 +1,26 @@
from crazy_functions.json_fns.pydantic_io import GptJsonIO, JsonStringError
def structure_output(txt, prompt, err_msg, run_gpt_fn, pydantic_cls):
gpt_json_io = GptJsonIO(pydantic_cls)
analyze_res = run_gpt_fn(
txt,
sys_prompt=prompt + gpt_json_io.format_instructions
)
try:
friend = gpt_json_io.generate_output_auto_repair(analyze_res, run_gpt_fn)
except JsonStringError as e:
return None, err_msg
err_msg = ""
return friend, err_msg
def select_tool(prompt, run_gpt_fn, pydantic_cls):
pydantic_cls_instance, err_msg = structure_output(
txt=prompt,
prompt="根据提示, 分析应该调用哪个工具函数\n\n",
err_msg=f"不能理解该联系人",
run_gpt_fn=run_gpt_fn,
pydantic_cls=pydantic_cls
)
return pydantic_cls_instance, err_msg

View File

@@ -1,15 +1,17 @@
from toolbox import update_ui, update_ui_lastest_msg, get_log_folder import os
from toolbox import get_conf, promote_file_to_downloadzone
from .latex_toolbox import PRESERVE, TRANSFORM
from .latex_toolbox import set_forbidden_text, set_forbidden_text_begin_end, set_forbidden_text_careful_brace
from .latex_toolbox import reverse_forbidden_text_careful_brace, reverse_forbidden_text, convert_to_linklist, post_process
from .latex_toolbox import fix_content, find_main_tex_file, merge_tex_files, compile_latex_with_timeout
from .latex_toolbox import find_title_and_abs
from .latex_pickle_io import objdump, objload
import os, shutil
import re import re
import shutil
import numpy as np import numpy as np
from loguru import logger
from toolbox import update_ui, update_ui_latest_msg, get_log_folder, gen_time_str
from toolbox import get_conf, promote_file_to_downloadzone
from crazy_functions.latex_fns.latex_toolbox import PRESERVE, TRANSFORM
from crazy_functions.latex_fns.latex_toolbox import set_forbidden_text, set_forbidden_text_begin_end, set_forbidden_text_careful_brace
from crazy_functions.latex_fns.latex_toolbox import reverse_forbidden_text_careful_brace, reverse_forbidden_text, convert_to_linklist, post_process
from crazy_functions.latex_fns.latex_toolbox import fix_content, find_main_tex_file, merge_tex_files, compile_latex_with_timeout
from crazy_functions.latex_fns.latex_toolbox import find_title_and_abs
from crazy_functions.latex_fns.latex_pickle_io import objdump, objload
pj = os.path.join pj = os.path.join
@@ -18,7 +20,7 @@ def split_subprocess(txt, project_folder, return_dict, opts):
""" """
break down latex file to a linked list, break down latex file to a linked list,
each node use a preserve flag to indicate whether it should each node use a preserve flag to indicate whether it should
be proccessed by GPT. be processed by GPT.
""" """
text = txt text = txt
mask = np.zeros(len(txt), dtype=np.uint8) + TRANSFORM mask = np.zeros(len(txt), dtype=np.uint8) + TRANSFORM
@@ -83,14 +85,14 @@ class LatexPaperSplit():
""" """
break down latex file to a linked list, break down latex file to a linked list,
each node use a preserve flag to indicate whether it should each node use a preserve flag to indicate whether it should
be proccessed by GPT. be processed by GPT.
""" """
def __init__(self) -> None: def __init__(self) -> None:
self.nodes = None self.nodes = None
self.msg = "*{\\scriptsize\\textbf{警告该PDF由GPT-Academic开源项目调用大语言模型+Latex翻译插件一键生成" + \ self.msg = "*{\\scriptsize\\textbf{警告该PDF由GPT-Academic开源项目调用大语言模型+Latex翻译插件一键生成" + \
"版权归原文作者所有。翻译内容可靠性无保障,请仔细鉴别并以原文为准。" + \ "版权归原文作者所有。翻译内容可靠性无保障,请仔细鉴别并以原文为准。" + \
"项目Github地址 \\url{https://github.com/binary-husky/gpt_academic/}。" "项目Github地址 \\url{https://github.com/binary-husky/gpt_academic/}。"
# 请您不要删除或修改这行警告除非您是论文的原作者如果您是论文原作者欢迎加REAME中的QQ联系开发者 # 请您不要删除或修改这行警告除非您是论文的原作者如果您是论文原作者欢迎加README中的QQ联系开发者
self.msg_declare = "为了防止大语言模型的意外谬误产生扩散影响,禁止移除或修改此警告。}}\\\\" self.msg_declare = "为了防止大语言模型的意外谬误产生扩散影响,禁止移除或修改此警告。}}\\\\"
self.title = "unknown" self.title = "unknown"
self.abstract = "unknown" self.abstract = "unknown"
@@ -149,7 +151,7 @@ class LatexPaperSplit():
""" """
break down latex file to a linked list, break down latex file to a linked list,
each node use a preserve flag to indicate whether it should each node use a preserve flag to indicate whether it should
be proccessed by GPT. be processed by GPT.
P.S. use multiprocessing to avoid timeout error P.S. use multiprocessing to avoid timeout error
""" """
import multiprocessing import multiprocessing
@@ -298,7 +300,8 @@ def Latex精细分解与转化(file_manifest, project_folder, llm_kwargs, plugin
write_html(pfg.sp_file_contents, pfg.sp_file_result, chatbot=chatbot, project_folder=project_folder) write_html(pfg.sp_file_contents, pfg.sp_file_result, chatbot=chatbot, project_folder=project_folder)
# <-------- 写出文件 ----------> # <-------- 写出文件 ---------->
msg = f"当前大语言模型: {llm_kwargs['llm_model']},当前语言模型温度设定: {llm_kwargs['temperature']}" model_name = llm_kwargs['llm_model'].replace('_', '\\_') # 替换LLM模型名称中的下划线为转义字符
msg = f"当前大语言模型: {model_name},当前语言模型温度设定: {llm_kwargs['temperature']}"
final_tex = lps.merge_result(pfg.file_result, mode, msg) final_tex = lps.merge_result(pfg.file_result, mode, msg)
objdump((lps, pfg.file_result, mode, msg), file=pj(project_folder,'merge_result.pkl')) objdump((lps, pfg.file_result, mode, msg), file=pj(project_folder,'merge_result.pkl'))
@@ -323,7 +326,7 @@ def remove_buggy_lines(file_path, log_path, tex_name, tex_name_pure, n_fix, work
buggy_lines = [int(l) for l in buggy_lines] buggy_lines = [int(l) for l in buggy_lines]
buggy_lines = sorted(buggy_lines) buggy_lines = sorted(buggy_lines)
buggy_line = buggy_lines[0]-1 buggy_line = buggy_lines[0]-1
print("reversing tex line that has errors", buggy_line) logger.warning("reversing tex line that has errors", buggy_line)
# 重组,逆转出错的段落 # 重组,逆转出错的段落
if buggy_line not in fixed_line: if buggy_line not in fixed_line:
@@ -337,7 +340,7 @@ def remove_buggy_lines(file_path, log_path, tex_name, tex_name_pure, n_fix, work
return True, f"{tex_name_pure}_fix_{n_fix}", buggy_lines return True, f"{tex_name_pure}_fix_{n_fix}", buggy_lines
except: except:
print("Fatal error occurred, but we cannot identify error, please download zip, read latex log, and compile manually.") logger.error("Fatal error occurred, but we cannot identify error, please download zip, read latex log, and compile manually.")
return False, -1, [-1] return False, -1, [-1]
@@ -348,7 +351,42 @@ def 编译Latex(chatbot, history, main_file_original, main_file_modified, work_f
max_try = 32 max_try = 32
chatbot.append([f"正在编译PDF文档", f'编译已经开始。当前工作路径为{work_folder}如果程序停顿5分钟以上请直接去该路径下取回翻译结果或者重启之后再度尝试 ...']); yield from update_ui(chatbot=chatbot, history=history) chatbot.append([f"正在编译PDF文档", f'编译已经开始。当前工作路径为{work_folder}如果程序停顿5分钟以上请直接去该路径下取回翻译结果或者重启之后再度尝试 ...']); yield from update_ui(chatbot=chatbot, history=history)
chatbot.append([f"正在编译PDF文档", '...']); yield from update_ui(chatbot=chatbot, history=history); time.sleep(1); chatbot[-1] = list(chatbot[-1]) # 刷新界面 chatbot.append([f"正在编译PDF文档", '...']); yield from update_ui(chatbot=chatbot, history=history); time.sleep(1); chatbot[-1] = list(chatbot[-1]) # 刷新界面
yield from update_ui_lastest_msg('编译已经开始...', chatbot, history) # 刷新Gradio前端界面 yield from update_ui_latest_msg('编译已经开始...', chatbot, history) # 刷新Gradio前端界面
# 检查是否需要使用xelatex
def check_if_need_xelatex(tex_path):
try:
with open(tex_path, 'r', encoding='utf-8', errors='replace') as f:
content = f.read(5000)
# 检查是否有使用xelatex的宏包
need_xelatex = any(
pkg in content
for pkg in ['fontspec', 'xeCJK', 'xetex', 'unicode-math', 'xltxtra', 'xunicode']
)
if need_xelatex:
logger.info(f"检测到宏包需要xelatex编译, 切换至xelatex编译")
else:
logger.info(f"未检测到宏包需要xelatex编译, 使用pdflatex编译")
return need_xelatex
except Exception:
return False
# 根据编译器类型返回编译命令
def get_compile_command(compiler, filename):
compile_command = f'{compiler} -interaction=batchmode -file-line-error {filename}.tex'
logger.info('Latex 编译指令: ' + compile_command)
return compile_command
# 确定使用的编译器
compiler = 'pdflatex'
if check_if_need_xelatex(pj(work_folder_modified, f'{main_file_modified}.tex')):
logger.info("检测到宏包需要xelatex编译切换至xelatex编译")
# Check if xelatex is installed
try:
import subprocess
subprocess.run(['xelatex', '--version'], capture_output=True, check=True)
compiler = 'xelatex'
except (subprocess.CalledProcessError, FileNotFoundError):
raise RuntimeError("检测到需要使用xelatex编译但系统中未安装xelatex。请先安装texlive或其他提供xelatex的LaTeX发行版。")
while True: while True:
import os import os
@@ -358,36 +396,36 @@ def 编译Latex(chatbot, history, main_file_original, main_file_modified, work_f
shutil.copyfile(may_exist_bbl, target_bbl) shutil.copyfile(may_exist_bbl, target_bbl)
# https://stackoverflow.com/questions/738755/dont-make-me-manually-abort-a-latex-compile-when-theres-an-error # https://stackoverflow.com/questions/738755/dont-make-me-manually-abort-a-latex-compile-when-theres-an-error
yield from update_ui_lastest_msg(f'尝试第 {n_fix}/{max_try} 次编译, 编译原始PDF ...', chatbot, history) # 刷新Gradio前端界面 yield from update_ui_latest_msg(f'尝试第 {n_fix}/{max_try} 次编译, 编译原始PDF ...', chatbot, history) # 刷新Gradio前端界面
ok = compile_latex_with_timeout(f'pdflatex -interaction=batchmode -file-line-error {main_file_original}.tex', work_folder_original) ok = compile_latex_with_timeout(get_compile_command(compiler, main_file_original), work_folder_original)
yield from update_ui_lastest_msg(f'尝试第 {n_fix}/{max_try} 次编译, 编译转化后的PDF ...', chatbot, history) # 刷新Gradio前端界面 yield from update_ui_latest_msg(f'尝试第 {n_fix}/{max_try} 次编译, 编译转化后的PDF ...', chatbot, history) # 刷新Gradio前端界面
ok = compile_latex_with_timeout(f'pdflatex -interaction=batchmode -file-line-error {main_file_modified}.tex', work_folder_modified) ok = compile_latex_with_timeout(get_compile_command(compiler, main_file_modified), work_folder_modified)
if ok and os.path.exists(pj(work_folder_modified, f'{main_file_modified}.pdf')): if ok and os.path.exists(pj(work_folder_modified, f'{main_file_modified}.pdf')):
# 只有第二步成功,才能继续下面的步骤 # 只有第二步成功,才能继续下面的步骤
yield from update_ui_lastest_msg(f'尝试第 {n_fix}/{max_try} 次编译, 编译BibTex ...', chatbot, history) # 刷新Gradio前端界面 yield from update_ui_latest_msg(f'尝试第 {n_fix}/{max_try} 次编译, 编译BibTex ...', chatbot, history) # 刷新Gradio前端界面
if not os.path.exists(pj(work_folder_original, f'{main_file_original}.bbl')): if not os.path.exists(pj(work_folder_original, f'{main_file_original}.bbl')):
ok = compile_latex_with_timeout(f'bibtex {main_file_original}.aux', work_folder_original) ok = compile_latex_with_timeout(f'bibtex {main_file_original}.aux', work_folder_original)
if not os.path.exists(pj(work_folder_modified, f'{main_file_modified}.bbl')): if not os.path.exists(pj(work_folder_modified, f'{main_file_modified}.bbl')):
ok = compile_latex_with_timeout(f'bibtex {main_file_modified}.aux', work_folder_modified) ok = compile_latex_with_timeout(f'bibtex {main_file_modified}.aux', work_folder_modified)
yield from update_ui_lastest_msg(f'尝试第 {n_fix}/{max_try} 次编译, 编译文献交叉引用 ...', chatbot, history) # 刷新Gradio前端界面 yield from update_ui_latest_msg(f'尝试第 {n_fix}/{max_try} 次编译, 编译文献交叉引用 ...', chatbot, history) # 刷新Gradio前端界面
ok = compile_latex_with_timeout(f'pdflatex -interaction=batchmode -file-line-error {main_file_original}.tex', work_folder_original) ok = compile_latex_with_timeout(get_compile_command(compiler, main_file_original), work_folder_original)
ok = compile_latex_with_timeout(f'pdflatex -interaction=batchmode -file-line-error {main_file_modified}.tex', work_folder_modified) ok = compile_latex_with_timeout(get_compile_command(compiler, main_file_modified), work_folder_modified)
ok = compile_latex_with_timeout(f'pdflatex -interaction=batchmode -file-line-error {main_file_original}.tex', work_folder_original) ok = compile_latex_with_timeout(get_compile_command(compiler, main_file_original), work_folder_original)
ok = compile_latex_with_timeout(f'pdflatex -interaction=batchmode -file-line-error {main_file_modified}.tex', work_folder_modified) ok = compile_latex_with_timeout(get_compile_command(compiler, main_file_modified), work_folder_modified)
if mode!='translate_zh': if mode!='translate_zh':
yield from update_ui_lastest_msg(f'尝试第 {n_fix}/{max_try} 次编译, 使用latexdiff生成论文转化前后对比 ...', chatbot, history) # 刷新Gradio前端界面 yield from update_ui_latest_msg(f'尝试第 {n_fix}/{max_try} 次编译, 使用latexdiff生成论文转化前后对比 ...', chatbot, history) # 刷新Gradio前端界面
print( f'latexdiff --encoding=utf8 --append-safecmd=subfile {work_folder_original}/{main_file_original}.tex {work_folder_modified}/{main_file_modified}.tex --flatten > {work_folder}/merge_diff.tex') logger.info( f'latexdiff --encoding=utf8 --append-safecmd=subfile {work_folder_original}/{main_file_original}.tex {work_folder_modified}/{main_file_modified}.tex --flatten > {work_folder}/merge_diff.tex')
ok = compile_latex_with_timeout(f'latexdiff --encoding=utf8 --append-safecmd=subfile {work_folder_original}/{main_file_original}.tex {work_folder_modified}/{main_file_modified}.tex --flatten > {work_folder}/merge_diff.tex', os.getcwd()) ok = compile_latex_with_timeout(f'latexdiff --encoding=utf8 --append-safecmd=subfile {work_folder_original}/{main_file_original}.tex {work_folder_modified}/{main_file_modified}.tex --flatten > {work_folder}/merge_diff.tex', os.getcwd())
yield from update_ui_lastest_msg(f'尝试第 {n_fix}/{max_try} 次编译, 正在编译对比PDF ...', chatbot, history) # 刷新Gradio前端界面 yield from update_ui_latest_msg(f'尝试第 {n_fix}/{max_try} 次编译, 正在编译对比PDF ...', chatbot, history) # 刷新Gradio前端界面
ok = compile_latex_with_timeout(f'pdflatex -interaction=batchmode -file-line-error merge_diff.tex', work_folder) ok = compile_latex_with_timeout(get_compile_command(compiler, 'merge_diff'), work_folder)
ok = compile_latex_with_timeout(f'bibtex merge_diff.aux', work_folder) ok = compile_latex_with_timeout(f'bibtex merge_diff.aux', work_folder)
ok = compile_latex_with_timeout(f'pdflatex -interaction=batchmode -file-line-error merge_diff.tex', work_folder) ok = compile_latex_with_timeout(get_compile_command(compiler, 'merge_diff'), work_folder)
ok = compile_latex_with_timeout(f'pdflatex -interaction=batchmode -file-line-error merge_diff.tex', work_folder) ok = compile_latex_with_timeout(get_compile_command(compiler, 'merge_diff'), work_folder)
# <---------- 检查结果 -----------> # <---------- 检查结果 ----------->
results_ = "" results_ = ""
@@ -397,13 +435,13 @@ def 编译Latex(chatbot, history, main_file_original, main_file_modified, work_f
results_ += f"原始PDF编译是否成功: {original_pdf_success};" results_ += f"原始PDF编译是否成功: {original_pdf_success};"
results_ += f"转化PDF编译是否成功: {modified_pdf_success};" results_ += f"转化PDF编译是否成功: {modified_pdf_success};"
results_ += f"对比PDF编译是否成功: {diff_pdf_success};" results_ += f"对比PDF编译是否成功: {diff_pdf_success};"
yield from update_ui_lastest_msg(f'{n_fix}编译结束:<br/>{results_}...', chatbot, history) # 刷新Gradio前端界面 yield from update_ui_latest_msg(f'{n_fix}编译结束:<br/>{results_}...', chatbot, history) # 刷新Gradio前端界面
if diff_pdf_success: if diff_pdf_success:
result_pdf = pj(work_folder_modified, f'merge_diff.pdf') # get pdf path result_pdf = pj(work_folder_modified, f'merge_diff.pdf') # get pdf path
promote_file_to_downloadzone(result_pdf, rename_file=None, chatbot=chatbot) # promote file to web UI promote_file_to_downloadzone(result_pdf, rename_file=None, chatbot=chatbot) # promote file to web UI
if modified_pdf_success: if modified_pdf_success:
yield from update_ui_lastest_msg(f'转化PDF编译已经成功, 正在尝试生成对比PDF, 请稍候 ...', chatbot, history) # 刷新Gradio前端界面 yield from update_ui_latest_msg(f'转化PDF编译已经成功, 正在尝试生成对比PDF, 请稍候 ...', chatbot, history) # 刷新Gradio前端界面
result_pdf = pj(work_folder_modified, f'{main_file_modified}.pdf') # get pdf path result_pdf = pj(work_folder_modified, f'{main_file_modified}.pdf') # get pdf path
origin_pdf = pj(work_folder_original, f'{main_file_original}.pdf') # get pdf path origin_pdf = pj(work_folder_original, f'{main_file_original}.pdf') # get pdf path
if os.path.exists(pj(work_folder, '..', 'translation')): if os.path.exists(pj(work_folder, '..', 'translation')):
@@ -419,7 +457,7 @@ def 编译Latex(chatbot, history, main_file_original, main_file_modified, work_f
shutil.copyfile(concat_pdf, pj(work_folder, '..', 'translation', 'comparison.pdf')) shutil.copyfile(concat_pdf, pj(work_folder, '..', 'translation', 'comparison.pdf'))
promote_file_to_downloadzone(concat_pdf, rename_file=None, chatbot=chatbot) # promote file to web UI promote_file_to_downloadzone(concat_pdf, rename_file=None, chatbot=chatbot) # promote file to web UI
except Exception as e: except Exception as e:
print(e) logger.error(e)
pass pass
return True # 成功啦 return True # 成功啦
else: else:
@@ -434,7 +472,7 @@ def 编译Latex(chatbot, history, main_file_original, main_file_modified, work_f
work_folder_modified=work_folder_modified, work_folder_modified=work_folder_modified,
fixed_line=fixed_line fixed_line=fixed_line
) )
yield from update_ui_lastest_msg(f'由于最为关键的转化PDF编译失败, 将根据报错信息修正tex源文件并重试, 当前报错的latex代码处于第{buggy_lines}行 ...', chatbot, history) # 刷新Gradio前端界面 yield from update_ui_latest_msg(f'由于最为关键的转化PDF编译失败, 将根据报错信息修正tex源文件并重试, 当前报错的latex代码处于第{buggy_lines}行 ...', chatbot, history) # 刷新Gradio前端界面
if not can_retry: break if not can_retry: break
return False # 失败啦 return False # 失败啦
@@ -465,4 +503,71 @@ def write_html(sp_file_contents, sp_file_result, chatbot, project_folder):
promote_file_to_downloadzone(file=res, chatbot=chatbot) promote_file_to_downloadzone(file=res, chatbot=chatbot)
except: except:
from toolbox import trimmed_format_exc from toolbox import trimmed_format_exc
print('writing html result failed:', trimmed_format_exc()) logger.error('writing html result failed:', trimmed_format_exc())
def upload_to_gptac_cloud_if_user_allow(chatbot, arxiv_id):
try:
# 如果用户允许我们将arxiv论文PDF上传到GPTAC学术云
from toolbox import map_file_to_sha256
# 检查是否顺利,如果没有生成预期的文件,则跳过
is_result_good = False
for file_path in chatbot._cookies.get("files_to_promote", []):
if file_path.endswith('translate_zh.pdf'):
is_result_good = True
if not is_result_good:
return
# 上传文件
for file_path in chatbot._cookies.get("files_to_promote", []):
align_name = None
# normalized name
for name in ['translate_zh.pdf', 'comparison.pdf']:
if file_path.endswith(name): align_name = name
# if match any align name
if align_name:
logger.info(f'Uploading to GPTAC cloud as the user has set `allow_cloud_io`: {file_path}')
with open(file_path, 'rb') as f:
import requests
url = 'https://cloud-2.agent-matrix.com/arxiv_tf_paper_normal_upload'
files = {'file': (align_name, f, 'application/octet-stream')}
data = {
'arxiv_id': arxiv_id,
'file_hash': map_file_to_sha256(file_path),
'language': 'zh',
'trans_prompt': 'to_be_implemented',
'llm_model': 'to_be_implemented',
'llm_model_param': 'to_be_implemented',
}
resp = requests.post(url=url, files=files, data=data, timeout=30)
logger.info(f'Uploading terminate ({resp.status_code})`: {file_path}')
except:
# 如果上传失败,不会中断程序,因为这是次要功能
pass
def check_gptac_cloud(arxiv_id, chatbot):
import requests
success = False
downloaded = []
try:
for pdf_target in ['translate_zh.pdf', 'comparison.pdf']:
url = 'https://cloud-2.agent-matrix.com/arxiv_tf_paper_normal_exist'
data = {
'arxiv_id': arxiv_id,
'name': pdf_target,
}
resp = requests.post(url=url, data=data)
cache_hit_result = resp.text.strip('"')
if cache_hit_result.startswith("http"):
url = cache_hit_result
logger.info(f'Downloading from GPTAC cloud: {url}')
resp = requests.get(url=url, timeout=30)
target = os.path.join(get_log_folder(plugin_name='gptac_cloud'), gen_time_str(), pdf_target)
os.makedirs(os.path.dirname(target), exist_ok=True)
with open(target, 'wb') as f:
f.write(resp.content)
new_path = promote_file_to_downloadzone(target, chatbot=chatbot)
success = True
downloaded.append(new_path)
except:
pass
return success, downloaded

View File

@@ -6,12 +6,16 @@ class SafeUnpickler(pickle.Unpickler):
def get_safe_classes(self): def get_safe_classes(self):
from crazy_functions.latex_fns.latex_actions import LatexPaperFileGroup, LatexPaperSplit from crazy_functions.latex_fns.latex_actions import LatexPaperFileGroup, LatexPaperSplit
from crazy_functions.latex_fns.latex_toolbox import LinkedListNode from crazy_functions.latex_fns.latex_toolbox import LinkedListNode
from numpy.core.multiarray import scalar
from numpy import dtype
# 定义允许的安全类 # 定义允许的安全类
safe_classes = { safe_classes = {
# 在这里添加其他安全的类 # 在这里添加其他安全的类
'LatexPaperFileGroup': LatexPaperFileGroup, 'LatexPaperFileGroup': LatexPaperFileGroup,
'LatexPaperSplit': LatexPaperSplit, 'LatexPaperSplit': LatexPaperSplit,
'LinkedListNode': LinkedListNode, 'LinkedListNode': LinkedListNode,
'scalar': scalar,
'dtype': dtype,
} }
return safe_classes return safe_classes
@@ -22,8 +26,6 @@ class SafeUnpickler(pickle.Unpickler):
for class_name in self.safe_classes.keys(): for class_name in self.safe_classes.keys():
if (class_name in f'{module}.{name}'): if (class_name in f'{module}.{name}'):
match_class_name = class_name match_class_name = class_name
if module == 'numpy' or module.startswith('numpy.'):
return super().find_class(module, name)
if match_class_name is not None: if match_class_name is not None:
return self.safe_classes[match_class_name] return self.safe_classes[match_class_name]
# 如果尝试加载未授权的类,则抛出异常 # 如果尝试加载未授权的类,则抛出异常

View File

@@ -1,6 +1,8 @@
import os, shutil import os
import re import re
import shutil
import numpy as np import numpy as np
from loguru import logger
PRESERVE = 0 PRESERVE = 0
TRANSFORM = 1 TRANSFORM = 1
@@ -55,7 +57,7 @@ def post_process(root):
str_stack.append("{") str_stack.append("{")
elif c == "}": elif c == "}":
if len(str_stack) == 1: if len(str_stack) == 1:
print("stack fix") logger.warning("fixing brace error")
return i return i
str_stack.pop(-1) str_stack.pop(-1)
else: else:
@@ -166,7 +168,7 @@ def set_forbidden_text(text, mask, pattern, flags=0):
def reverse_forbidden_text(text, mask, pattern, flags=0, forbid_wrapper=True): def reverse_forbidden_text(text, mask, pattern, flags=0, forbid_wrapper=True):
""" """
Move area out of preserve area (make text editable for GPT) Move area out of preserve area (make text editable for GPT)
count the number of the braces so as to catch compelete text area. count the number of the braces so as to catch complete text area.
e.g. e.g.
\begin{abstract} blablablablablabla. \end{abstract} \begin{abstract} blablablablablabla. \end{abstract}
""" """
@@ -186,7 +188,7 @@ def reverse_forbidden_text(text, mask, pattern, flags=0, forbid_wrapper=True):
def set_forbidden_text_careful_brace(text, mask, pattern, flags=0): def set_forbidden_text_careful_brace(text, mask, pattern, flags=0):
""" """
Add a preserve text area in this paper (text become untouchable for GPT). Add a preserve text area in this paper (text become untouchable for GPT).
count the number of the braces so as to catch compelete text area. count the number of the braces so as to catch complete text area.
e.g. e.g.
\caption{blablablablabla\texbf{blablabla}blablabla.} \caption{blablablablabla\texbf{blablabla}blablabla.}
""" """
@@ -212,7 +214,7 @@ def reverse_forbidden_text_careful_brace(
): ):
""" """
Move area out of preserve area (make text editable for GPT) Move area out of preserve area (make text editable for GPT)
count the number of the braces so as to catch compelete text area. count the number of the braces so as to catch complete text area.
e.g. e.g.
\caption{blablablablabla\texbf{blablabla}blablabla.} \caption{blablablablabla\texbf{blablabla}blablabla.}
""" """
@@ -285,23 +287,23 @@ def find_main_tex_file(file_manifest, mode):
在多Tex文档中寻找主文件必须包含documentclass返回找到的第一个。 在多Tex文档中寻找主文件必须包含documentclass返回找到的第一个。
P.S. 但愿没人把latex模板放在里面传进来 (6.25 加入判定latex模板的代码) P.S. 但愿没人把latex模板放在里面传进来 (6.25 加入判定latex模板的代码)
""" """
canidates = [] candidates = []
for texf in file_manifest: for texf in file_manifest:
if os.path.basename(texf).startswith("merge"): if os.path.basename(texf).startswith("merge"):
continue continue
with open(texf, "r", encoding="utf8", errors="ignore") as f: with open(texf, "r", encoding="utf8", errors="ignore") as f:
file_content = f.read() file_content = f.read()
if r"\documentclass" in file_content: if r"\documentclass" in file_content:
canidates.append(texf) candidates.append(texf)
else: else:
continue continue
if len(canidates) == 0: if len(candidates) == 0:
raise RuntimeError("无法找到一个主Tex文件包含documentclass关键字") raise RuntimeError("无法找到一个主Tex文件包含documentclass关键字")
elif len(canidates) == 1: elif len(candidates) == 1:
return canidates[0] return candidates[0]
else: # if len(canidates) >= 2 通过一些Latex模板中常见但通常不会出现在正文的单词对不同latex源文件扣分取评分最高者返回 else: # if len(candidates) >= 2 通过一些Latex模板中常见但通常不会出现在正文的单词对不同latex源文件扣分取评分最高者返回
canidates_score = [] candidates_score = []
# 给出一些判定模板文档的词作为扣分项 # 给出一些判定模板文档的词作为扣分项
unexpected_words = [ unexpected_words = [
"\\LaTeX", "\\LaTeX",
@@ -314,19 +316,19 @@ def find_main_tex_file(file_manifest, mode):
"reviewers", "reviewers",
] ]
expected_words = ["\\input", "\\ref", "\\cite"] expected_words = ["\\input", "\\ref", "\\cite"]
for texf in canidates: for texf in candidates:
canidates_score.append(0) candidates_score.append(0)
with open(texf, "r", encoding="utf8", errors="ignore") as f: with open(texf, "r", encoding="utf8", errors="ignore") as f:
file_content = f.read() file_content = f.read()
file_content = rm_comments(file_content) file_content = rm_comments(file_content)
for uw in unexpected_words: for uw in unexpected_words:
if uw in file_content: if uw in file_content:
canidates_score[-1] -= 1 candidates_score[-1] -= 1
for uw in expected_words: for uw in expected_words:
if uw in file_content: if uw in file_content:
canidates_score[-1] += 1 candidates_score[-1] += 1
select = np.argmax(canidates_score) # 取评分最高者返回 select = np.argmax(candidates_score) # 取评分最高者返回
return canidates[select] return candidates[select]
def rm_comments(main_file): def rm_comments(main_file):
@@ -372,7 +374,7 @@ def find_tex_file_ignore_case(fp):
def merge_tex_files_(project_foler, main_file, mode): def merge_tex_files_(project_foler, main_file, mode):
""" """
Merge Tex project recrusively Merge Tex project recursively
""" """
main_file = rm_comments(main_file) main_file = rm_comments(main_file)
for s in reversed([q for q in re.finditer(r"\\input\{(.*?)\}", main_file, re.M)]): for s in reversed([q for q in re.finditer(r"\\input\{(.*?)\}", main_file, re.M)]):
@@ -427,7 +429,7 @@ def find_title_and_abs(main_file):
def merge_tex_files(project_foler, main_file, mode): def merge_tex_files(project_foler, main_file, mode):
""" """
Merge Tex project recrusively Merge Tex project recursively
P.S. 顺便把CTEX塞进去以支持中文 P.S. 顺便把CTEX塞进去以支持中文
P.S. 顺便把Latex的注释去除 P.S. 顺便把Latex的注释去除
""" """
@@ -601,7 +603,7 @@ def compile_latex_with_timeout(command, cwd, timeout=60):
except subprocess.TimeoutExpired: except subprocess.TimeoutExpired:
process.kill() process.kill()
stdout, stderr = process.communicate() stdout, stderr = process.communicate()
print("Process timed out!") logger.error("Process timed out (compile_latex_with_timeout)!")
return False return False
return True return True
@@ -642,6 +644,216 @@ def run_in_subprocess(func):
def _merge_pdfs(pdf1_path, pdf2_path, output_path): def _merge_pdfs(pdf1_path, pdf2_path, output_path):
try:
logger.info("Merging PDFs using _merge_pdfs_ng")
_merge_pdfs_ng(pdf1_path, pdf2_path, output_path)
except:
logger.info("Merging PDFs using _merge_pdfs_legacy")
_merge_pdfs_legacy(pdf1_path, pdf2_path, output_path)
def _merge_pdfs_ng(pdf1_path, pdf2_path, output_path):
import PyPDF2 # PyPDF2这个库有严重的内存泄露问题把它放到子进程中运行从而方便内存的释放
from PyPDF2.generic import NameObject, TextStringObject, ArrayObject, FloatObject, NumberObject
Percent = 1
# raise RuntimeError('PyPDF2 has a serious memory leak problem, please use other tools to merge PDF files.')
# Open the first PDF file
with open(pdf1_path, "rb") as pdf1_file:
pdf1_reader = PyPDF2.PdfFileReader(pdf1_file)
# Open the second PDF file
with open(pdf2_path, "rb") as pdf2_file:
pdf2_reader = PyPDF2.PdfFileReader(pdf2_file)
# Create a new PDF file to store the merged pages
output_writer = PyPDF2.PdfFileWriter()
# Determine the number of pages in each PDF file
num_pages = max(pdf1_reader.numPages, pdf2_reader.numPages)
# Merge the pages from the two PDF files
for page_num in range(num_pages):
# Add the page from the first PDF file
if page_num < pdf1_reader.numPages:
page1 = pdf1_reader.getPage(page_num)
else:
page1 = PyPDF2.PageObject.createBlankPage(pdf1_reader)
# Add the page from the second PDF file
if page_num < pdf2_reader.numPages:
page2 = pdf2_reader.getPage(page_num)
else:
page2 = PyPDF2.PageObject.createBlankPage(pdf1_reader)
# Create a new empty page with double width
new_page = PyPDF2.PageObject.createBlankPage(
width=int(
int(page1.mediaBox.getWidth())
+ int(page2.mediaBox.getWidth()) * Percent
),
height=max(page1.mediaBox.getHeight(), page2.mediaBox.getHeight()),
)
new_page.mergeTranslatedPage(page1, 0, 0)
new_page.mergeTranslatedPage(
page2,
int(
int(page1.mediaBox.getWidth())
- int(page2.mediaBox.getWidth()) * (1 - Percent)
),
0,
)
if "/Annots" in new_page:
annotations = new_page["/Annots"]
for i, annot in enumerate(annotations):
annot_obj = annot.get_object()
# 检查注释类型是否是链接(/Link
if annot_obj.get("/Subtype") == "/Link":
# 检查是否为内部链接跳转(/GoTo或外部URI链接/URI
action = annot_obj.get("/A")
if action:
if "/S" in action and action["/S"] == "/GoTo":
# 内部链接:跳转到文档中的某个页面
dest = action.get("/D") # 目标页或目标位置
# if dest and annot.idnum in page2_annot_id:
# if dest in pdf2_reader.named_destinations:
if dest and page2.annotations:
if annot in page2.annotations:
# 获取原始文件中跳转信息,包括跳转页面
destination = pdf2_reader.named_destinations[
dest
]
page_number = (
pdf2_reader.get_destination_page_number(
destination
)
)
# 更新跳转信息,跳转到对应的页面和,指定坐标 (100, 150),缩放比例为 100%
# “/D”:[10,'/XYZ',100,100,0]
if destination.dest_array[1] == "/XYZ":
annot_obj["/A"].update(
{
NameObject("/D"): ArrayObject(
[
NumberObject(page_number),
destination.dest_array[1],
FloatObject(
destination.dest_array[
2
]
+ int(
page1.mediaBox.getWidth()
)
),
destination.dest_array[3],
destination.dest_array[4],
]
) # 确保键和值是 PdfObject
}
)
else:
annot_obj["/A"].update(
{
NameObject("/D"): ArrayObject(
[
NumberObject(page_number),
destination.dest_array[1],
]
) # 确保键和值是 PdfObject
}
)
rect = annot_obj.get("/Rect")
# 更新点击坐标
rect = ArrayObject(
[
FloatObject(
rect[0]
+ int(page1.mediaBox.getWidth())
),
rect[1],
FloatObject(
rect[2]
+ int(page1.mediaBox.getWidth())
),
rect[3],
]
)
annot_obj.update(
{
NameObject(
"/Rect"
): rect # 确保键和值是 PdfObject
}
)
# if dest and annot.idnum in page1_annot_id:
# if dest in pdf1_reader.named_destinations:
if dest and page1.annotations:
if annot in page1.annotations:
# 获取原始文件中跳转信息,包括跳转页面
destination = pdf1_reader.named_destinations[
dest
]
page_number = (
pdf1_reader.get_destination_page_number(
destination
)
)
# 更新跳转信息,跳转到对应的页面和,指定坐标 (100, 150),缩放比例为 100%
# “/D”:[10,'/XYZ',100,100,0]
if destination.dest_array[1] == "/XYZ":
annot_obj["/A"].update(
{
NameObject("/D"): ArrayObject(
[
NumberObject(page_number),
destination.dest_array[1],
FloatObject(
destination.dest_array[
2
]
),
destination.dest_array[3],
destination.dest_array[4],
]
) # 确保键和值是 PdfObject
}
)
else:
annot_obj["/A"].update(
{
NameObject("/D"): ArrayObject(
[
NumberObject(page_number),
destination.dest_array[1],
]
) # 确保键和值是 PdfObject
}
)
rect = annot_obj.get("/Rect")
rect = ArrayObject(
[
FloatObject(rect[0]),
rect[1],
FloatObject(rect[2]),
rect[3],
]
)
annot_obj.update(
{
NameObject(
"/Rect"
): rect # 确保键和值是 PdfObject
}
)
elif "/S" in action and action["/S"] == "/URI":
# 外部链接跳转到某个URI
uri = action.get("/URI")
output_writer.addPage(new_page)
# Save the merged PDF file
with open(output_path, "wb") as output_file:
output_writer.write(output_file)
def _merge_pdfs_legacy(pdf1_path, pdf2_path, output_path):
import PyPDF2 # PyPDF2这个库有严重的内存泄露问题把它放到子进程中运行从而方便内存的释放 import PyPDF2 # PyPDF2这个库有严重的内存泄露问题把它放到子进程中运行从而方便内存的释放
Percent = 0.95 Percent = 0.95

View File

@@ -1,5 +1,6 @@
import time, logging, json, sys, struct import time, json, sys, struct
import numpy as np import numpy as np
from loguru import logger as logging
from scipy.io.wavfile import WAVE_FORMAT from scipy.io.wavfile import WAVE_FORMAT
def write_numpy_to_wave(filename, rate, data, add_header=False): def write_numpy_to_wave(filename, rate, data, add_header=False):
@@ -106,18 +107,14 @@ def is_speaker_speaking(vad, data, sample_rate):
class AliyunASR(): class AliyunASR():
def test_on_sentence_begin(self, message, *args): def test_on_sentence_begin(self, message, *args):
# print("test_on_sentence_begin:{}".format(message))
pass pass
def test_on_sentence_end(self, message, *args): def test_on_sentence_end(self, message, *args):
# print("test_on_sentence_end:{}".format(message))
message = json.loads(message) message = json.loads(message)
self.parsed_sentence = message['payload']['result'] self.parsed_sentence = message['payload']['result']
self.event_on_entence_end.set() self.event_on_entence_end.set()
# print(self.parsed_sentence)
def test_on_start(self, message, *args): def test_on_start(self, message, *args):
# print("test_on_start:{}".format(message))
pass pass
def test_on_error(self, message, *args): def test_on_error(self, message, *args):
@@ -129,13 +126,11 @@ class AliyunASR():
pass pass
def test_on_result_chg(self, message, *args): def test_on_result_chg(self, message, *args):
# print("test_on_chg:{}".format(message))
message = json.loads(message) message = json.loads(message)
self.parsed_text = message['payload']['result'] self.parsed_text = message['payload']['result']
self.event_on_result_chg.set() self.event_on_result_chg.set()
def test_on_completed(self, message, *args): def test_on_completed(self, message, *args):
# print("on_completed:args=>{} message=>{}".format(args, message))
pass pass
def audio_convertion_thread(self, uuid): def audio_convertion_thread(self, uuid):
@@ -248,14 +243,14 @@ class AliyunASR():
try: try:
response = client.do_action_with_exception(request) response = client.do_action_with_exception(request)
print(response) logging.info(response)
jss = json.loads(response) jss = json.loads(response)
if 'Token' in jss and 'Id' in jss['Token']: if 'Token' in jss and 'Id' in jss['Token']:
token = jss['Token']['Id'] token = jss['Token']['Id']
expireTime = jss['Token']['ExpireTime'] expireTime = jss['Token']['ExpireTime']
print("token = " + token) logging.info("token = " + token)
print("expireTime = " + str(expireTime)) logging.info("expireTime = " + str(expireTime))
except Exception as e: except Exception as e:
print(e) logging.error(e)
return token return token

View File

@@ -0,0 +1,43 @@
from toolbox import update_ui, get_conf, promote_file_to_downloadzone, update_ui_latest_msg, generate_file_link
from shared_utils.docker_as_service_api import stream_daas
from shared_utils.docker_as_service_api import DockerServiceApiComModel
import random
def download_video(video_id, only_audio, user_name, chatbot, history):
from toolbox import get_log_folder
chatbot.append([None, "Processing..."])
yield from update_ui(chatbot, history)
client_command = f'{video_id} --audio-only' if only_audio else video_id
server_urls = get_conf('DAAS_SERVER_URLS')
server_url = random.choice(server_urls)
docker_service_api_com_model = DockerServiceApiComModel(client_command=client_command)
save_file_dir = get_log_folder(user_name, plugin_name='media_downloader')
for output_manifest in stream_daas(docker_service_api_com_model, server_url, save_file_dir):
status_buf = ""
status_buf += "DaaS message: \n\n"
status_buf += output_manifest['server_message'].replace('\n', '<br/>')
status_buf += "\n\n"
status_buf += "DaaS standard error: \n\n"
status_buf += output_manifest['server_std_err'].replace('\n', '<br/>')
status_buf += "\n\n"
status_buf += "DaaS standard output: \n\n"
status_buf += output_manifest['server_std_out'].replace('\n', '<br/>')
status_buf += "\n\n"
status_buf += "DaaS file attach: \n\n"
status_buf += str(output_manifest['server_file_attach'])
yield from update_ui_latest_msg(status_buf, chatbot, history)
return output_manifest['server_file_attach']
def search_videos(keywords):
from toolbox import get_log_folder
client_command = keywords
server_urls = get_conf('DAAS_SERVER_URLS')
server_url = random.choice(server_urls)
server_url = server_url.replace('stream', 'search')
docker_service_api_com_model = DockerServiceApiComModel(client_command=client_command)
save_file_dir = get_log_folder("default_user", plugin_name='media_downloader')
for output_manifest in stream_daas(docker_service_api_com_model, server_url, save_file_dir):
return output_manifest['server_message']

View File

@@ -1,6 +1,6 @@
from pydantic import BaseModel, Field from pydantic import BaseModel, Field
from typing import List from typing import List
from toolbox import update_ui_lastest_msg, disable_auto_promotion from toolbox import update_ui_latest_msg, disable_auto_promotion
from toolbox import CatchException, update_ui, get_conf, select_api_key, get_log_folder from toolbox import CatchException, update_ui, get_conf, select_api_key, get_log_folder
from request_llms.bridge_all import predict_no_ui_long_connection from request_llms.bridge_all import predict_no_ui_long_connection
from crazy_functions.json_fns.pydantic_io import GptJsonIO, JsonStringError from crazy_functions.json_fns.pydantic_io import GptJsonIO, JsonStringError

View File

@@ -1,4 +1,5 @@
from crazy_functions.ipc_fns.mp import run_in_subprocess_with_timeout from crazy_functions.ipc_fns.mp import run_in_subprocess_with_timeout
from loguru import logger
def force_breakdown(txt, limit, get_token_fn): def force_breakdown(txt, limit, get_token_fn):
""" 当无法用标点、空行分割时,我们用最暴力的方法切割 """ 当无法用标点、空行分割时,我们用最暴力的方法切割
@@ -76,7 +77,7 @@ def cut(limit, get_token_fn, txt_tocut, must_break_at_empty_line, break_anyway=F
remain_txt_to_cut = post remain_txt_to_cut = post
remain_txt_to_cut, remain_txt_to_cut_storage = maintain_storage(remain_txt_to_cut, remain_txt_to_cut_storage) remain_txt_to_cut, remain_txt_to_cut_storage = maintain_storage(remain_txt_to_cut, remain_txt_to_cut_storage)
process = fin_len/total_len process = fin_len/total_len
print(f'正在文本切分 {int(process*100)}%') logger.info(f'正在文本切分 {int(process*100)}%')
if len(remain_txt_to_cut.strip()) == 0: if len(remain_txt_to_cut.strip()) == 0:
break break
return res return res
@@ -119,7 +120,7 @@ if __name__ == '__main__':
for i in range(5): for i in range(5):
file_content += file_content file_content += file_content
print(len(file_content)) logger.info(len(file_content))
TOKEN_LIMIT_PER_FRAGMENT = 2500 TOKEN_LIMIT_PER_FRAGMENT = 2500
res = breakdown_text_to_satisfy_token_limit(file_content, TOKEN_LIMIT_PER_FRAGMENT) res = breakdown_text_to_satisfy_token_limit(file_content, TOKEN_LIMIT_PER_FRAGMENT)

View File

@@ -113,7 +113,7 @@ def translate_pdf(article_dict, llm_kwargs, chatbot, fp, generated_conclusion_fi
return [txt] return [txt]
else: else:
# raw_token_num > TOKEN_LIMIT_PER_FRAGMENT # raw_token_num > TOKEN_LIMIT_PER_FRAGMENT
# find a smooth token limit to achieve even seperation # find a smooth token limit to achieve even separation
count = int(math.ceil(raw_token_num / TOKEN_LIMIT_PER_FRAGMENT)) count = int(math.ceil(raw_token_num / TOKEN_LIMIT_PER_FRAGMENT))
token_limit_smooth = raw_token_num // count + count token_limit_smooth = raw_token_num // count + count
return breakdown_text_to_satisfy_token_limit(txt, limit=token_limit_smooth, llm_model=llm_kwargs['llm_model']) return breakdown_text_to_satisfy_token_limit(txt, limit=token_limit_smooth, llm_model=llm_kwargs['llm_model'])

View File

@@ -1,6 +1,6 @@
import os import os
from toolbox import CatchException, report_exception, get_log_folder, gen_time_str, check_packages from toolbox import CatchException, report_exception, get_log_folder, gen_time_str, check_packages
from toolbox import update_ui, promote_file_to_downloadzone, update_ui_lastest_msg, disable_auto_promotion from toolbox import update_ui, promote_file_to_downloadzone, update_ui_latest_msg, disable_auto_promotion
from toolbox import write_history_to_file, promote_file_to_downloadzone, get_conf, extract_archive from toolbox import write_history_to_file, promote_file_to_downloadzone, get_conf, extract_archive
from crazy_functions.pdf_fns.parse_pdf import parse_pdf, translate_pdf from crazy_functions.pdf_fns.parse_pdf import parse_pdf, translate_pdf

View File

@@ -5,6 +5,7 @@ from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_
from crazy_functions.crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency from crazy_functions.crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency
from crazy_functions.crazy_utils import read_and_clean_pdf_text from crazy_functions.crazy_utils import read_and_clean_pdf_text
from shared_utils.colorful import * from shared_utils.colorful import *
from loguru import logger
import os import os
def 解析PDF_简单拆解(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt): def 解析PDF_简单拆解(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt):
@@ -93,7 +94,7 @@ def 解析PDF_简单拆解(file_manifest, project_folder, llm_kwargs, plugin_kwa
generated_html_files.append(ch.save_file(create_report_file_name)) generated_html_files.append(ch.save_file(create_report_file_name))
except: except:
from toolbox import trimmed_format_exc from toolbox import trimmed_format_exc
print('writing html result failed:', trimmed_format_exc()) logger.error('writing html result failed:', trimmed_format_exc())
# 准备文件的下载 # 准备文件的下载
for pdf_path in generated_conclusion_files: for pdf_path in generated_conclusion_files:

View File

@@ -4,126 +4,228 @@ from toolbox import promote_file_to_downloadzone, extract_archive
from toolbox import generate_file_link, zip_folder from toolbox import generate_file_link, zip_folder
from crazy_functions.crazy_utils import get_files_from_everything from crazy_functions.crazy_utils import get_files_from_everything
from shared_utils.colorful import * from shared_utils.colorful import *
from loguru import logger
import os import os
import requests
import time
def retry_request(max_retries=3, delay=3):
"""
Decorator for retrying HTTP requests
Args:
max_retries: Maximum number of retry attempts
delay: Delay between retries in seconds
"""
def decorator(func):
def wrapper(*args, **kwargs):
for attempt in range(max_retries):
try:
return func(*args, **kwargs)
except Exception as e:
if attempt < max_retries - 1:
logger.error(
f"Request failed, retrying... ({attempt + 1}/{max_retries}) Error: {e}"
)
time.sleep(delay)
continue
raise e
return None
return wrapper
return decorator
@retry_request()
def make_request(method, url, **kwargs):
"""
Make HTTP request with retry mechanism
"""
return requests.request(method, url, **kwargs)
def doc2x_api_response_status(response, uid=""):
"""
Check the status of Doc2x API response
Args:
response_data: Response object from Doc2x API
"""
response_json = response.json()
response_data = response_json.get("data", {})
code = response_json.get("code", "Unknown")
meg = response_data.get("message", response_json)
trace_id = response.headers.get("trace-id", "Failed to get trace-id")
if response.status_code != 200:
raise RuntimeError(
f"Doc2x return an error:\nTrace ID: {trace_id} {uid}\n{response.status_code} - {response_json}"
)
if code in ["parse_page_limit_exceeded", "parse_concurrency_limit"]:
raise RuntimeError(
f"Reached the limit of Doc2x:\nTrace ID: {trace_id} {uid}\n{code} - {meg}"
)
if code not in ["ok", "success"]:
raise RuntimeError(
f"Doc2x return an error:\nTrace ID: {trace_id} {uid}\n{code} - {meg}"
)
return response_data
def refresh_key(doc2x_api_key):
import requests, json
url = "https://api.doc2x.noedgeai.com/api/token/refresh"
res = requests.post(
url,
headers={"Authorization": "Bearer " + doc2x_api_key}
)
res_json = []
if res.status_code == 200:
decoded = res.content.decode("utf-8")
res_json = json.loads(decoded)
doc2x_api_key = res_json['data']['token']
else:
raise RuntimeError(format("[ERROR] status code: %d, body: %s" % (res.status_code, res.text)))
return doc2x_api_key
def 解析PDF_DOC2X_转Latex(pdf_file_path): def 解析PDF_DOC2X_转Latex(pdf_file_path):
import requests, json, os zip_file_path, unzipped_folder = 解析PDF_DOC2X(pdf_file_path, format="tex")
DOC2X_API_KEY = get_conf('DOC2X_API_KEY') return unzipped_folder
def 解析PDF_DOC2X(pdf_file_path, format="tex"):
"""
format: 'tex', 'md', 'docx'
"""
DOC2X_API_KEY = get_conf("DOC2X_API_KEY")
latex_dir = get_log_folder(plugin_name="pdf_ocr_latex") latex_dir = get_log_folder(plugin_name="pdf_ocr_latex")
markdown_dir = get_log_folder(plugin_name="pdf_ocr")
doc2x_api_key = DOC2X_API_KEY doc2x_api_key = DOC2X_API_KEY
if doc2x_api_key.startswith('sk-'):
url = "https://api.doc2x.noedgeai.com/api/v1/pdf"
else:
doc2x_api_key = refresh_key(doc2x_api_key)
url = "https://api.doc2x.noedgeai.com/api/platform/pdf"
res = requests.post( # < ------ 第1步预上传获取URL然后上传文件 ------ >
url, logger.info("Doc2x 上传文件预上传获取URL")
files={"file": open(pdf_file_path, "rb")}, res = make_request(
data={"ocr": "1"}, "POST",
headers={"Authorization": "Bearer " + doc2x_api_key} "https://v2.doc2x.noedgeai.com/api/v2/parse/preupload",
headers={"Authorization": "Bearer " + doc2x_api_key},
timeout=15,
) )
res_json = [] res_data = doc2x_api_response_status(res)
if res.status_code == 200: upload_url = res_data["url"]
decoded = res.content.decode("utf-8") uuid = res_data["uid"]
for z_decoded in decoded.split('\n'):
if len(z_decoded) == 0: continue
assert z_decoded.startswith("data: ")
z_decoded = z_decoded[len("data: "):]
decoded_json = json.loads(z_decoded)
res_json.append(decoded_json)
else:
raise RuntimeError(format("[ERROR] status code: %d, body: %s" % (res.status_code, res.text)))
uuid = res_json[0]['uuid'] logger.info("Doc2x 上传文件:上传文件")
to = "latex" # latex, md, docx with open(pdf_file_path, "rb") as file:
url = "https://api.doc2x.noedgeai.com/api/export"+"?request_id="+uuid+"&to="+to res = make_request("PUT", upload_url, data=file, timeout=60)
res.raise_for_status()
res = requests.get(url, headers={"Authorization": "Bearer " + doc2x_api_key}) # < ------ 第2步轮询等待 ------ >
latex_zip_path = os.path.join(latex_dir, gen_time_str() + '.zip') logger.info("Doc2x 处理文件中:轮询等待")
latex_unzip_path = os.path.join(latex_dir, gen_time_str()) params = {"uid": uuid}
if res.status_code == 200: max_attempts = 60
with open(latex_zip_path, "wb") as f: f.write(res.content) attempt = 0
else: while attempt < max_attempts:
raise RuntimeError(format("[ERROR] status code: %d, body: %s" % (res.status_code, res.text))) res = make_request(
"GET",
import zipfile "https://v2.doc2x.noedgeai.com/api/v2/parse/status",
with zipfile.ZipFile(latex_zip_path, 'r') as zip_ref: headers={"Authorization": "Bearer " + doc2x_api_key},
zip_ref.extractall(latex_unzip_path) params=params,
timeout=15,
return latex_unzip_path
def 解析PDF_DOC2X_单文件(fp, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, DOC2X_API_KEY, user_request):
def pdf2markdown(filepath):
import requests, json, os
markdown_dir = get_log_folder(plugin_name="pdf_ocr")
doc2x_api_key = DOC2X_API_KEY
if doc2x_api_key.startswith('sk-'):
url = "https://api.doc2x.noedgeai.com/api/v1/pdf"
else:
doc2x_api_key = refresh_key(doc2x_api_key)
url = "https://api.doc2x.noedgeai.com/api/platform/pdf"
chatbot.append((None, "加载PDF文件发送至DOC2X解析..."))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
res = requests.post(
url,
files={"file": open(filepath, "rb")},
data={"ocr": "1"},
headers={"Authorization": "Bearer " + doc2x_api_key}
) )
res_json = [] res_data = doc2x_api_response_status(res)
if res.status_code == 200: if res_data["status"] == "success":
decoded = res.content.decode("utf-8") break
for z_decoded in decoded.split('\n'): elif res_data["status"] == "processing":
if len(z_decoded) == 0: continue time.sleep(5)
assert z_decoded.startswith("data: ") logger.info(f"Doc2x is processing at {res_data['progress']}%")
z_decoded = z_decoded[len("data: "):] attempt += 1
decoded_json = json.loads(z_decoded)
res_json.append(decoded_json)
if 'limit exceeded' in decoded_json.get('status', ''):
raise RuntimeError("Doc2x API 页数受限,请联系 Doc2x 方面,并更换新的 API 秘钥。")
else: else:
raise RuntimeError(format("[ERROR] status code: %d, body: %s" % (res.status_code, res.text))) raise RuntimeError(f"Doc2x return an error: {res_data}")
uuid = res_json[0]['uuid'] if attempt >= max_attempts:
to = "md" # latex, md, docx raise RuntimeError("Doc2x processing timeout after maximum attempts")
url = "https://api.doc2x.noedgeai.com/api/export"+"?request_id="+uuid+"&to="+to
chatbot.append((None, f"读取解析: {url} ...")) # < ------ 第3步提交转化 ------ >
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 logger.info("Doc2x 第3步提交转化")
data = {
"uid": uuid,
"to": format,
"formula_mode": "dollar",
"filename": "output"
}
res = make_request(
"POST",
"https://v2.doc2x.noedgeai.com/api/v2/convert/parse",
headers={"Authorization": "Bearer " + doc2x_api_key},
json=data,
timeout=15,
)
doc2x_api_response_status(res, uid=f"uid: {uuid}")
# < ------ 第4步等待结果 ------ >
logger.info("Doc2x 第4步等待结果")
params = {"uid": uuid}
max_attempts = 36
attempt = 0
while attempt < max_attempts:
res = make_request(
"GET",
"https://v2.doc2x.noedgeai.com/api/v2/convert/parse/result",
headers={"Authorization": "Bearer " + doc2x_api_key},
params=params,
timeout=15,
)
res_data = doc2x_api_response_status(res, uid=f"uid: {uuid}")
if res_data["status"] == "success":
break
elif res_data["status"] == "processing":
time.sleep(3)
logger.info("Doc2x still processing to convert file")
attempt += 1
if attempt >= max_attempts:
raise RuntimeError("Doc2x conversion timeout after maximum attempts")
# < ------ 第5步最后的处理 ------ >
logger.info("Doc2x 第5步下载转换后的文件")
if format == "tex":
target_path = latex_dir
if format == "md":
target_path = markdown_dir
os.makedirs(target_path, exist_ok=True)
max_attempt = 3
# < ------ 下载 ------ >
for attempt in range(max_attempt):
try:
result_url = res_data["url"]
res = make_request("GET", result_url, timeout=60)
zip_path = os.path.join(target_path, gen_time_str() + ".zip")
unzip_path = os.path.join(target_path, gen_time_str())
if res.status_code == 200:
with open(zip_path, "wb") as f:
f.write(res.content)
else:
raise RuntimeError(f"Doc2x return an error: {res.json()}")
except Exception as e:
if attempt < max_attempt - 1:
logger.error(f"Failed to download uid = {uuid} file, retrying... {e}")
time.sleep(3)
continue
else:
raise e
# < ------ 解压 ------ >
import zipfile
with zipfile.ZipFile(zip_path, "r") as zip_ref:
zip_ref.extractall(unzip_path)
return zip_path, unzip_path
def 解析PDF_DOC2X_单文件(
fp,
project_folder,
llm_kwargs,
plugin_kwargs,
chatbot,
history,
system_prompt,
DOC2X_API_KEY,
user_request,
):
def pdf2markdown(filepath):
chatbot.append((None, f"Doc2x 解析中"))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
md_zip_path, unzipped_folder = 解析PDF_DOC2X(filepath, format="md")
res = requests.get(url, headers={"Authorization": "Bearer " + doc2x_api_key})
md_zip_path = os.path.join(markdown_dir, gen_time_str() + '.zip')
if res.status_code == 200:
with open(md_zip_path, "wb") as f: f.write(res.content)
else:
raise RuntimeError(format("[ERROR] status code: %d, body: %s" % (res.status_code, res.text)))
promote_file_to_downloadzone(md_zip_path, chatbot=chatbot) promote_file_to_downloadzone(md_zip_path, chatbot=chatbot)
chatbot.append((None, f"完成解析 {md_zip_path} ...")) chatbot.append((None, f"完成解析 {md_zip_path} ..."))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return md_zip_path return md_zip_path
def deliver_to_markdown_plugin(md_zip_path, user_request): def deliver_to_markdown_plugin(md_zip_path, user_request):
@@ -137,77 +239,97 @@ def 解析PDF_DOC2X_单文件(fp, project_folder, llm_kwargs, plugin_kwargs, cha
os.makedirs(target_path_base, exist_ok=True) os.makedirs(target_path_base, exist_ok=True)
shutil.copyfile(md_zip_path, this_file_path) shutil.copyfile(md_zip_path, this_file_path)
ex_folder = this_file_path + ".extract" ex_folder = this_file_path + ".extract"
extract_archive( extract_archive(file_path=this_file_path, dest_dir=ex_folder)
file_path=this_file_path, dest_dir=ex_folder
)
# edit markdown files # edit markdown files
success, file_manifest, project_folder = get_files_from_everything(ex_folder, type='.md') success, file_manifest, project_folder = get_files_from_everything(
ex_folder, type=".md"
)
for generated_fp in file_manifest: for generated_fp in file_manifest:
# 修正一些公式问题 # 修正一些公式问题
with open(generated_fp, 'r', encoding='utf8') as f: with open(generated_fp, "r", encoding="utf8") as f:
content = f.read() content = f.read()
# 将公式中的\[ \]替换成$$ # 将公式中的\[ \]替换成$$
content = content.replace(r'\[', r'$$').replace(r'\]', r'$$') content = content.replace(r"\[", r"$$").replace(r"\]", r"$$")
# 将公式中的\( \)替换成$ # 将公式中的\( \)替换成$
content = content.replace(r'\(', r'$').replace(r'\)', r'$') content = content.replace(r"\(", r"$").replace(r"\)", r"$")
content = content.replace('```markdown', '\n').replace('```', '\n') content = content.replace("```markdown", "\n").replace("```", "\n")
with open(generated_fp, 'w', encoding='utf8') as f: with open(generated_fp, "w", encoding="utf8") as f:
f.write(content) f.write(content)
promote_file_to_downloadzone(generated_fp, chatbot=chatbot) promote_file_to_downloadzone(generated_fp, chatbot=chatbot)
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
# 生成在线预览html # 生成在线预览html
file_name = '在线预览翻译(原文)' + gen_time_str() + '.html' file_name = "在线预览翻译(原文)" + gen_time_str() + ".html"
preview_fp = os.path.join(ex_folder, file_name) preview_fp = os.path.join(ex_folder, file_name)
from shared_utils.advanced_markdown_format import markdown_convertion_for_file from shared_utils.advanced_markdown_format import (
markdown_convertion_for_file,
)
with open(generated_fp, "r", encoding="utf-8") as f: with open(generated_fp, "r", encoding="utf-8") as f:
md = f.read() md = f.read()
# # Markdown中使用不标准的表格需要在表格前加上一个emoji以便公式渲染 # # Markdown中使用不标准的表格需要在表格前加上一个emoji以便公式渲染
# md = re.sub(r'^<table>', r'.<table>', md, flags=re.MULTILINE) # md = re.sub(r'^<table>', r'.<table>', md, flags=re.MULTILINE)
html = markdown_convertion_for_file(md) html = markdown_convertion_for_file(md)
with open(preview_fp, "w", encoding="utf-8") as f: f.write(html) with open(preview_fp, "w", encoding="utf-8") as f:
f.write(html)
chatbot.append([None, f"生成在线预览:{generate_file_link([preview_fp])}"]) chatbot.append([None, f"生成在线预览:{generate_file_link([preview_fp])}"])
promote_file_to_downloadzone(preview_fp, chatbot=chatbot) promote_file_to_downloadzone(preview_fp, chatbot=chatbot)
chatbot.append((None, f"调用Markdown插件 {ex_folder} ...")) chatbot.append((None, f"调用Markdown插件 {ex_folder} ..."))
plugin_kwargs['markdown_expected_output_dir'] = ex_folder plugin_kwargs["markdown_expected_output_dir"] = ex_folder
translated_f_name = 'translated_markdown.md' translated_f_name = "translated_markdown.md"
generated_fp = plugin_kwargs['markdown_expected_output_path'] = os.path.join(ex_folder, translated_f_name) generated_fp = plugin_kwargs["markdown_expected_output_path"] = os.path.join(
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 ex_folder, translated_f_name
yield from Markdown英译中(ex_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request) )
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
yield from Markdown英译中(
ex_folder,
llm_kwargs,
plugin_kwargs,
chatbot,
history,
system_prompt,
user_request,
)
if os.path.exists(generated_fp): if os.path.exists(generated_fp):
# 修正一些公式问题 # 修正一些公式问题
with open(generated_fp, 'r', encoding='utf8') as f: content = f.read() with open(generated_fp, "r", encoding="utf8") as f:
content = content.replace('```markdown', '\n').replace('```', '\n') content = f.read()
content = content.replace("```markdown", "\n").replace("```", "\n")
# Markdown中使用不标准的表格需要在表格前加上一个emoji以便公式渲染 # Markdown中使用不标准的表格需要在表格前加上一个emoji以便公式渲染
# content = re.sub(r'^<table>', r'.<table>', content, flags=re.MULTILINE) # content = re.sub(r'^<table>', r'.<table>', content, flags=re.MULTILINE)
with open(generated_fp, 'w', encoding='utf8') as f: f.write(content) with open(generated_fp, "w", encoding="utf8") as f:
f.write(content)
# 生成在线预览html # 生成在线预览html
file_name = '在线预览翻译' + gen_time_str() + '.html' file_name = "在线预览翻译" + gen_time_str() + ".html"
preview_fp = os.path.join(ex_folder, file_name) preview_fp = os.path.join(ex_folder, file_name)
from shared_utils.advanced_markdown_format import markdown_convertion_for_file from shared_utils.advanced_markdown_format import (
markdown_convertion_for_file,
)
with open(generated_fp, "r", encoding="utf-8") as f: with open(generated_fp, "r", encoding="utf-8") as f:
md = f.read() md = f.read()
html = markdown_convertion_for_file(md) html = markdown_convertion_for_file(md)
with open(preview_fp, "w", encoding="utf-8") as f: f.write(html) with open(preview_fp, "w", encoding="utf-8") as f:
f.write(html)
promote_file_to_downloadzone(preview_fp, chatbot=chatbot) promote_file_to_downloadzone(preview_fp, chatbot=chatbot)
# 生成包含图片的压缩包 # 生成包含图片的压缩包
dest_folder = get_log_folder(chatbot.get_user()) dest_folder = get_log_folder(chatbot.get_user())
zip_name = '翻译后的带图文档.zip' zip_name = "翻译后的带图文档.zip"
zip_folder(source_folder=ex_folder, dest_folder=dest_folder, zip_name=zip_name) zip_folder(
source_folder=ex_folder, dest_folder=dest_folder, zip_name=zip_name
)
zip_fp = os.path.join(dest_folder, zip_name) zip_fp = os.path.join(dest_folder, zip_name)
promote_file_to_downloadzone(zip_fp, chatbot=chatbot) promote_file_to_downloadzone(zip_fp, chatbot=chatbot)
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
md_zip_path = yield from pdf2markdown(fp) md_zip_path = yield from pdf2markdown(fp)
yield from deliver_to_markdown_plugin(md_zip_path, user_request) yield from deliver_to_markdown_plugin(md_zip_path, user_request)
def 解析PDF_基于DOC2X(file_manifest, *args): def 解析PDF_基于DOC2X(file_manifest, *args):
for index, fp in enumerate(file_manifest): for index, fp in enumerate(file_manifest):
yield from 解析PDF_DOC2X_单文件(fp, *args) yield from 解析PDF_DOC2X_单文件(fp, *args)
return return

View File

@@ -14,17 +14,17 @@ def extract_text_from_files(txt, chatbot, history):
final_result(list):文本内容 final_result(list):文本内容
page_one(list):第一页内容/摘要 page_one(list):第一页内容/摘要
file_manifest(list):文件路径 file_manifest(list):文件路径
excption(string):需要用户手动处理的信息,如没出错则保持为空 exception(string):需要用户手动处理的信息,如没出错则保持为空
""" """
final_result = [] final_result = []
page_one = [] page_one = []
file_manifest = [] file_manifest = []
excption = "" exception = ""
if txt == "": if txt == "":
final_result.append(txt) final_result.append(txt)
return False, final_result, page_one, file_manifest, excption #如输入区内容不是文件则直接返回输入区内容 return False, final_result, page_one, file_manifest, exception #如输入区内容不是文件则直接返回输入区内容
#查找输入区内容中的文件 #查找输入区内容中的文件
file_pdf,pdf_manifest,folder_pdf = get_files_from_everything(txt, '.pdf') file_pdf,pdf_manifest,folder_pdf = get_files_from_everything(txt, '.pdf')
@@ -33,20 +33,20 @@ def extract_text_from_files(txt, chatbot, history):
file_doc,doc_manifest,folder_doc = get_files_from_everything(txt, '.doc') file_doc,doc_manifest,folder_doc = get_files_from_everything(txt, '.doc')
if file_doc: if file_doc:
excption = "word" exception = "word"
return False, final_result, page_one, file_manifest, excption return False, final_result, page_one, file_manifest, exception
file_num = len(pdf_manifest) + len(md_manifest) + len(word_manifest) file_num = len(pdf_manifest) + len(md_manifest) + len(word_manifest)
if file_num == 0: if file_num == 0:
final_result.append(txt) final_result.append(txt)
return False, final_result, page_one, file_manifest, excption #如输入区内容不是文件则直接返回输入区内容 return False, final_result, page_one, file_manifest, exception #如输入区内容不是文件则直接返回输入区内容
if file_pdf: if file_pdf:
try: # 尝试导入依赖,如果缺少依赖,则给出安装建议 try: # 尝试导入依赖,如果缺少依赖,则给出安装建议
import fitz import fitz
except: except:
excption = "pdf" exception = "pdf"
return False, final_result, page_one, file_manifest, excption return False, final_result, page_one, file_manifest, exception
for index, fp in enumerate(pdf_manifest): for index, fp in enumerate(pdf_manifest):
file_content, pdf_one = read_and_clean_pdf_text(fp) # 尝试按照章节切割PDF file_content, pdf_one = read_and_clean_pdf_text(fp) # 尝试按照章节切割PDF
file_content = file_content.encode('utf-8', 'ignore').decode() # avoid reading non-utf8 chars file_content = file_content.encode('utf-8', 'ignore').decode() # avoid reading non-utf8 chars
@@ -72,8 +72,8 @@ def extract_text_from_files(txt, chatbot, history):
try: # 尝试导入依赖,如果缺少依赖,则给出安装建议 try: # 尝试导入依赖,如果缺少依赖,则给出安装建议
from docx import Document from docx import Document
except: except:
excption = "word_pip" exception = "word_pip"
return False, final_result, page_one, file_manifest, excption return False, final_result, page_one, file_manifest, exception
for index, fp in enumerate(word_manifest): for index, fp in enumerate(word_manifest):
doc = Document(fp) doc = Document(fp)
file_content = '\n'.join([p.text for p in doc.paragraphs]) file_content = '\n'.join([p.text for p in doc.paragraphs])
@@ -82,4 +82,4 @@ def extract_text_from_files(txt, chatbot, history):
final_result.append(file_content) final_result.append(file_content)
file_manifest.append(os.path.relpath(fp, folder_word)) file_manifest.append(os.path.relpath(fp, folder_word))
return True, final_result, page_one, file_manifest, excption return True, final_result, page_one, file_manifest, exception

View File

@@ -0,0 +1,138 @@
import atexit
from loguru import logger
from typing import List
from llama_index.core import Document
from llama_index.core.ingestion import run_transformations
from llama_index.core.schema import TextNode
from crazy_functions.rag_fns.vector_store_index import GptacVectorStoreIndex
from request_llms.embed_models.openai_embed import OpenAiEmbeddingModel
DEFAULT_QUERY_GENERATION_PROMPT = """\
Now, you have context information as below:
---------------------
{context_str}
---------------------
Answer the user request below (use the context information if necessary, otherwise you can ignore them):
---------------------
{query_str}
"""
QUESTION_ANSWER_RECORD = """\
{{
"type": "This is a previous conversation with the user",
"question": "{question}",
"answer": "{answer}",
}}
"""
class SaveLoad():
def does_checkpoint_exist(self, checkpoint_dir=None):
import os, glob
if checkpoint_dir is None: checkpoint_dir = self.checkpoint_dir
if not os.path.exists(checkpoint_dir): return False
if len(glob.glob(os.path.join(checkpoint_dir, "*.json"))) == 0: return False
return True
def save_to_checkpoint(self, checkpoint_dir=None):
logger.info(f'saving vector store to: {checkpoint_dir}')
if checkpoint_dir is None: checkpoint_dir = self.checkpoint_dir
self.vs_index.storage_context.persist(persist_dir=checkpoint_dir)
def load_from_checkpoint(self, checkpoint_dir=None):
if checkpoint_dir is None: checkpoint_dir = self.checkpoint_dir
if self.does_checkpoint_exist(checkpoint_dir=checkpoint_dir):
logger.info('loading checkpoint from disk')
from llama_index.core import StorageContext, load_index_from_storage
storage_context = StorageContext.from_defaults(persist_dir=checkpoint_dir)
self.vs_index = load_index_from_storage(storage_context, embed_model=self.embed_model)
return self.vs_index
else:
return self.create_new_vs()
def create_new_vs(self):
return GptacVectorStoreIndex.default_vector_store(embed_model=self.embed_model)
def purge(self):
import shutil
shutil.rmtree(self.checkpoint_dir, ignore_errors=True)
self.vs_index = self.create_new_vs(self.checkpoint_dir)
class LlamaIndexRagWorker(SaveLoad):
def __init__(self, user_name, llm_kwargs, auto_load_checkpoint=True, checkpoint_dir=None) -> None:
self.debug_mode = True
self.embed_model = OpenAiEmbeddingModel(llm_kwargs)
self.user_name = user_name
self.checkpoint_dir = checkpoint_dir
if auto_load_checkpoint:
self.vs_index = self.load_from_checkpoint(checkpoint_dir)
else:
self.vs_index = self.create_new_vs()
atexit.register(lambda: self.save_to_checkpoint(checkpoint_dir))
def assign_embedding_model(self):
pass
def inspect_vector_store(self):
# This function is for debugging
self.vs_index.storage_context.index_store.to_dict()
docstore = self.vs_index.storage_context.docstore.docs
vector_store_preview = "\n".join([ f"{_id} | {tn.text}" for _id, tn in docstore.items() ])
logger.info('\n++ --------inspect_vector_store begin--------')
logger.info(vector_store_preview)
logger.info('oo --------inspect_vector_store end--------')
return vector_store_preview
def add_documents_to_vector_store(self, document_list: List[Document]):
"""
Adds a list of Document objects to the vector store after processing.
"""
documents = document_list
documents_nodes = run_transformations(
documents, # type: ignore
self.vs_index._transformations,
show_progress=True
)
self.vs_index.insert_nodes(documents_nodes)
if self.debug_mode:
self.inspect_vector_store()
def add_text_to_vector_store(self, text: str):
node = TextNode(text=text)
documents_nodes = run_transformations(
[node],
self.vs_index._transformations,
show_progress=True
)
self.vs_index.insert_nodes(documents_nodes)
if self.debug_mode:
self.inspect_vector_store()
def remember_qa(self, question, answer):
formatted_str = QUESTION_ANSWER_RECORD.format(question=question, answer=answer)
self.add_text_to_vector_store(formatted_str)
def retrieve_from_store_with_query(self, query):
if self.debug_mode:
self.inspect_vector_store()
retriever = self.vs_index.as_retriever()
return retriever.retrieve(query)
def build_prompt(self, query, nodes):
context_str = self.generate_node_array_preview(nodes)
return DEFAULT_QUERY_GENERATION_PROMPT.format(context_str=context_str, query_str=query)
def generate_node_array_preview(self, nodes):
buf = "\n".join(([f"(No.{i+1} | score {n.score:.3f}): {n.text}" for i, n in enumerate(nodes)]))
if self.debug_mode: logger.info(buf)
return buf
def purge_vector_store(self):
"""
Purges the current vector store and creates a new one.
"""
self.purge()

View File

@@ -0,0 +1,108 @@
import llama_index
import os
import atexit
from typing import List
from loguru import logger
from llama_index.core import Document
from llama_index.core.schema import TextNode
from request_llms.embed_models.openai_embed import OpenAiEmbeddingModel
from shared_utils.connect_void_terminal import get_chat_default_kwargs
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from crazy_functions.rag_fns.vector_store_index import GptacVectorStoreIndex
from llama_index.core.ingestion import run_transformations
from llama_index.core import PromptTemplate
from llama_index.core.response_synthesizers import TreeSummarize
from llama_index.core import StorageContext
from llama_index.vector_stores.milvus import MilvusVectorStore
from crazy_functions.rag_fns.llama_index_worker import LlamaIndexRagWorker
DEFAULT_QUERY_GENERATION_PROMPT = """\
Now, you have context information as below:
---------------------
{context_str}
---------------------
Answer the user request below (use the context information if necessary, otherwise you can ignore them):
---------------------
{query_str}
"""
QUESTION_ANSWER_RECORD = """\
{{
"type": "This is a previous conversation with the user",
"question": "{question}",
"answer": "{answer}",
}}
"""
class MilvusSaveLoad():
def does_checkpoint_exist(self, checkpoint_dir=None):
import os, glob
if checkpoint_dir is None: checkpoint_dir = self.checkpoint_dir
if not os.path.exists(checkpoint_dir): return False
if len(glob.glob(os.path.join(checkpoint_dir, "*.json"))) == 0: return False
return True
def save_to_checkpoint(self, checkpoint_dir=None):
logger.info(f'saving vector store to: {checkpoint_dir}')
# if checkpoint_dir is None: checkpoint_dir = self.checkpoint_dir
# self.vs_index.storage_context.persist(persist_dir=checkpoint_dir)
def load_from_checkpoint(self, checkpoint_dir=None):
if checkpoint_dir is None: checkpoint_dir = self.checkpoint_dir
if self.does_checkpoint_exist(checkpoint_dir=checkpoint_dir):
logger.info('loading checkpoint from disk')
from llama_index.core import StorageContext, load_index_from_storage
storage_context = StorageContext.from_defaults(persist_dir=checkpoint_dir)
try:
self.vs_index = load_index_from_storage(storage_context, embed_model=self.embed_model)
return self.vs_index
except:
return self.create_new_vs(checkpoint_dir)
else:
return self.create_new_vs(checkpoint_dir)
def create_new_vs(self, checkpoint_dir, overwrite=False):
vector_store = MilvusVectorStore(
uri=os.path.join(checkpoint_dir, "milvus_demo.db"),
dim=self.embed_model.embedding_dimension(),
overwrite=overwrite
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = GptacVectorStoreIndex.default_vector_store(storage_context=storage_context, embed_model=self.embed_model)
return index
def purge(self):
self.vs_index = self.create_new_vs(self.checkpoint_dir, overwrite=True)
class MilvusRagWorker(MilvusSaveLoad, LlamaIndexRagWorker):
def __init__(self, user_name, llm_kwargs, auto_load_checkpoint=True, checkpoint_dir=None) -> None:
self.debug_mode = True
self.embed_model = OpenAiEmbeddingModel(llm_kwargs)
self.user_name = user_name
self.checkpoint_dir = checkpoint_dir
if auto_load_checkpoint:
self.vs_index = self.load_from_checkpoint(checkpoint_dir)
else:
self.vs_index = self.create_new_vs(checkpoint_dir)
atexit.register(lambda: self.save_to_checkpoint(checkpoint_dir))
def inspect_vector_store(self):
# This function is for debugging
try:
self.vs_index.storage_context.index_store.to_dict()
docstore = self.vs_index.storage_context.docstore.docs
if not docstore.items():
raise ValueError("cannot inspect")
vector_store_preview = "\n".join([ f"{_id} | {tn.text}" for _id, tn in docstore.items() ])
except:
dummy_retrieve_res: List["NodeWithScore"] = self.vs_index.as_retriever().retrieve(' ')
vector_store_preview = "\n".join(
[f"{node.id_} | {node.text}" for node in dummy_retrieve_res]
)
logger.info('\n++ --------inspect_vector_store begin--------')
logger.info(vector_store_preview)
logger.info('oo --------inspect_vector_store end--------')
return vector_store_preview

View File

@@ -0,0 +1,22 @@
import os
from llama_index.core import SimpleDirectoryReader
supports_format = ['.csv', '.docx', '.epub', '.ipynb', '.mbox', '.md', '.pdf', '.txt', '.ppt',
'.pptm', '.pptx']
# 修改后的 extract_text 函数,结合 SimpleDirectoryReader 和自定义解析逻辑
def extract_text(file_path):
_, ext = os.path.splitext(file_path.lower())
# 使用 SimpleDirectoryReader 处理它支持的文件格式
if ext in supports_format:
try:
reader = SimpleDirectoryReader(input_files=[file_path])
documents = reader.load_data()
if len(documents) > 0:
return documents[0].text
except Exception as e:
pass
return None

View File

@@ -0,0 +1,58 @@
from llama_index.core import VectorStoreIndex
from typing import Any, List, Optional
from llama_index.core.callbacks.base import CallbackManager
from llama_index.core.schema import TransformComponent
from llama_index.core.service_context import ServiceContext
from llama_index.core.settings import (
Settings,
callback_manager_from_settings_or_context,
transformations_from_settings_or_context,
)
from llama_index.core.storage.storage_context import StorageContext
class GptacVectorStoreIndex(VectorStoreIndex):
@classmethod
def default_vector_store(
cls,
storage_context: Optional[StorageContext] = None,
show_progress: bool = False,
callback_manager: Optional[CallbackManager] = None,
transformations: Optional[List[TransformComponent]] = None,
# deprecated
service_context: Optional[ServiceContext] = None,
embed_model = None,
**kwargs: Any,
):
"""Create index from documents.
Args:
documents (Optional[Sequence[BaseDocument]]): List of documents to
build the index from.
"""
storage_context = storage_context or StorageContext.from_defaults()
docstore = storage_context.docstore
callback_manager = (
callback_manager
or callback_manager_from_settings_or_context(Settings, service_context)
)
transformations = transformations or transformations_from_settings_or_context(
Settings, service_context
)
with callback_manager.as_trace("index_construction"):
return cls(
nodes=[],
storage_context=storage_context,
callback_manager=callback_manager,
show_progress=show_progress,
transformations=transformations,
service_context=service_context,
embed_model=embed_model,
**kwargs,
)

View File

@@ -1,16 +1,17 @@
# From project chatglm-langchain # From project chatglm-langchain
import threading
from toolbox import Singleton
import os import os
import shutil
import os import os
import uuid import uuid
import tqdm import tqdm
import shutil
import threading
import numpy as np
from toolbox import Singleton
from loguru import logger
from langchain.vectorstores import FAISS from langchain.vectorstores import FAISS
from langchain.docstore.document import Document from langchain.docstore.document import Document
from typing import List, Tuple from typing import List, Tuple
import numpy as np
from crazy_functions.vector_fns.general_file_loader import load_file from crazy_functions.vector_fns.general_file_loader import load_file
embedding_model_dict = { embedding_model_dict = {
@@ -59,7 +60,7 @@ def similarity_search_with_score_by_vector(
self, embedding: List[float], k: int = 4 self, embedding: List[float], k: int = 4
) -> List[Tuple[Document, float]]: ) -> List[Tuple[Document, float]]:
def seperate_list(ls: List[int]) -> List[List[int]]: def separate_list(ls: List[int]) -> List[List[int]]:
lists = [] lists = []
ls1 = [ls[0]] ls1 = [ls[0]]
for i in range(1, len(ls)): for i in range(1, len(ls)):
@@ -81,7 +82,7 @@ def similarity_search_with_score_by_vector(
continue continue
_id = self.index_to_docstore_id[i] _id = self.index_to_docstore_id[i]
doc = self.docstore.search(_id) doc = self.docstore.search(_id)
if not self.chunk_conent: if not self.chunk_content:
if not isinstance(doc, Document): if not isinstance(doc, Document):
raise ValueError(f"Could not find document for id {_id}, got {doc}") raise ValueError(f"Could not find document for id {_id}, got {doc}")
doc.metadata["score"] = int(scores[0][j]) doc.metadata["score"] = int(scores[0][j])
@@ -103,12 +104,12 @@ def similarity_search_with_score_by_vector(
id_set.add(l) id_set.add(l)
if break_flag: if break_flag:
break break
if not self.chunk_conent: if not self.chunk_content:
return docs return docs
if len(id_set) == 0 and self.score_threshold > 0: if len(id_set) == 0 and self.score_threshold > 0:
return [] return []
id_list = sorted(list(id_set)) id_list = sorted(list(id_set))
id_lists = seperate_list(id_list) id_lists = separate_list(id_list)
for id_seq in id_lists: for id_seq in id_lists:
for id in id_seq: for id in id_seq:
if id == id_seq[0]: if id == id_seq[0]:
@@ -131,7 +132,7 @@ class LocalDocQA:
embeddings: object = None embeddings: object = None
top_k: int = VECTOR_SEARCH_TOP_K top_k: int = VECTOR_SEARCH_TOP_K
chunk_size: int = CHUNK_SIZE chunk_size: int = CHUNK_SIZE
chunk_conent: bool = True chunk_content: bool = True
score_threshold: int = VECTOR_SEARCH_SCORE_THRESHOLD score_threshold: int = VECTOR_SEARCH_SCORE_THRESHOLD
def init_cfg(self, def init_cfg(self,
@@ -150,17 +151,17 @@ class LocalDocQA:
failed_files = [] failed_files = []
if isinstance(filepath, str): if isinstance(filepath, str):
if not os.path.exists(filepath): if not os.path.exists(filepath):
print("路径不存在") logger.error("路径不存在")
return None return None
elif os.path.isfile(filepath): elif os.path.isfile(filepath):
file = os.path.split(filepath)[-1] file = os.path.split(filepath)[-1]
try: try:
docs = load_file(filepath, SENTENCE_SIZE) docs = load_file(filepath, SENTENCE_SIZE)
print(f"{file} 已成功加载") logger.info(f"{file} 已成功加载")
loaded_files.append(filepath) loaded_files.append(filepath)
except Exception as e: except Exception as e:
print(e) logger.error(e)
print(f"{file} 未能成功加载") logger.error(f"{file} 未能成功加载")
return None return None
elif os.path.isdir(filepath): elif os.path.isdir(filepath):
docs = [] docs = []
@@ -170,23 +171,23 @@ class LocalDocQA:
docs += load_file(fullfilepath, SENTENCE_SIZE) docs += load_file(fullfilepath, SENTENCE_SIZE)
loaded_files.append(fullfilepath) loaded_files.append(fullfilepath)
except Exception as e: except Exception as e:
print(e) logger.error(e)
failed_files.append(file) failed_files.append(file)
if len(failed_files) > 0: if len(failed_files) > 0:
print("以下文件未能成功加载:") logger.error("以下文件未能成功加载:")
for file in failed_files: for file in failed_files:
print(f"{file}\n") logger.error(f"{file}\n")
else: else:
docs = [] docs = []
for file in filepath: for file in filepath:
docs += load_file(file, SENTENCE_SIZE) docs += load_file(file, SENTENCE_SIZE)
print(f"{file} 已成功加载") logger.info(f"{file} 已成功加载")
loaded_files.append(file) loaded_files.append(file)
if len(docs) > 0: if len(docs) > 0:
print("文件加载完毕,正在生成向量库") logger.info("文件加载完毕,正在生成向量库")
if vs_path and os.path.isdir(vs_path): if vs_path and os.path.isdir(vs_path):
try: try:
self.vector_store = FAISS.load_local(vs_path, text2vec) self.vector_store = FAISS.load_local(vs_path, text2vec)
@@ -208,16 +209,16 @@ class LocalDocQA:
# query 查询内容 # query 查询内容
# vs_path 知识库路径 # vs_path 知识库路径
# chunk_conent 是否启用上下文关联 # chunk_content 是否启用上下文关联
# score_threshold 搜索匹配score阈值 # score_threshold 搜索匹配score阈值
# vector_search_top_k 搜索知识库内容条数默认搜索5条结果 # vector_search_top_k 搜索知识库内容条数默认搜索5条结果
# chunk_sizes 匹配单段内容的连接上下文长度 # chunk_sizes 匹配单段内容的连接上下文长度
def get_knowledge_based_conent_test(self, query, vs_path, chunk_conent, def get_knowledge_based_content_test(self, query, vs_path, chunk_content,
score_threshold=VECTOR_SEARCH_SCORE_THRESHOLD, score_threshold=VECTOR_SEARCH_SCORE_THRESHOLD,
vector_search_top_k=VECTOR_SEARCH_TOP_K, chunk_size=CHUNK_SIZE, vector_search_top_k=VECTOR_SEARCH_TOP_K, chunk_size=CHUNK_SIZE,
text2vec=None): text2vec=None):
self.vector_store = FAISS.load_local(vs_path, text2vec) self.vector_store = FAISS.load_local(vs_path, text2vec)
self.vector_store.chunk_conent = chunk_conent self.vector_store.chunk_content = chunk_content
self.vector_store.score_threshold = score_threshold self.vector_store.score_threshold = score_threshold
self.vector_store.chunk_size = chunk_size self.vector_store.chunk_size = chunk_size
@@ -233,14 +234,14 @@ class LocalDocQA:
prompt += "\n\n".join([f"({k}): " + doc.page_content for k, doc in enumerate(related_docs_with_score)]) prompt += "\n\n".join([f"({k}): " + doc.page_content for k, doc in enumerate(related_docs_with_score)])
prompt += "\n\n---\n\n" prompt += "\n\n---\n\n"
prompt = prompt.encode('utf-8', 'ignore').decode() # avoid reading non-utf8 chars prompt = prompt.encode('utf-8', 'ignore').decode() # avoid reading non-utf8 chars
# print(prompt) # logger.info(prompt)
response = {"query": query, "source_documents": related_docs_with_score} response = {"query": query, "source_documents": related_docs_with_score}
return response, prompt return response, prompt
def construct_vector_store(vs_id, vs_path, files, sentence_size, history, one_conent, one_content_segmentation, text2vec): def construct_vector_store(vs_id, vs_path, files, sentence_size, history, one_content, one_content_segmentation, text2vec):
for file in files: for file in files:
assert os.path.exists(file), "输入文件不存在:" + file assert os.path.exists(file), "输入文件不存在:" + file
import nltk import nltk
@@ -262,7 +263,7 @@ def construct_vector_store(vs_id, vs_path, files, sentence_size, history, one_co
else: else:
pass pass
# file_status = "文件未成功加载,请重新上传文件" # file_status = "文件未成功加载,请重新上传文件"
# print(file_status) # logger.info(file_status)
return local_doc_qa, vs_path return local_doc_qa, vs_path
@Singleton @Singleton
@@ -278,7 +279,7 @@ class knowledge_archive_interface():
if self.text2vec_large_chinese is None: if self.text2vec_large_chinese is None:
# < -------------------预热文本向量化模组--------------- > # < -------------------预热文本向量化模组--------------- >
from toolbox import ProxyNetworkActivate from toolbox import ProxyNetworkActivate
print('Checking Text2vec ...') logger.info('Checking Text2vec ...')
from langchain.embeddings.huggingface import HuggingFaceEmbeddings from langchain.embeddings.huggingface import HuggingFaceEmbeddings
with ProxyNetworkActivate('Download_LLM'): # 临时地激活代理网络 with ProxyNetworkActivate('Download_LLM'): # 临时地激活代理网络
self.text2vec_large_chinese = HuggingFaceEmbeddings(model_name="GanymedeNil/text2vec-large-chinese") self.text2vec_large_chinese = HuggingFaceEmbeddings(model_name="GanymedeNil/text2vec-large-chinese")
@@ -296,7 +297,7 @@ class knowledge_archive_interface():
files=file_manifest, files=file_manifest,
sentence_size=100, sentence_size=100,
history=[], history=[],
one_conent="", one_content="",
one_content_segmentation="", one_content_segmentation="",
text2vec = self.get_chinese_text2vec(), text2vec = self.get_chinese_text2vec(),
) )
@@ -318,19 +319,19 @@ class knowledge_archive_interface():
files=[], files=[],
sentence_size=100, sentence_size=100,
history=[], history=[],
one_conent="", one_content="",
one_content_segmentation="", one_content_segmentation="",
text2vec = self.get_chinese_text2vec(), text2vec = self.get_chinese_text2vec(),
) )
VECTOR_SEARCH_SCORE_THRESHOLD = 0 VECTOR_SEARCH_SCORE_THRESHOLD = 0
VECTOR_SEARCH_TOP_K = 4 VECTOR_SEARCH_TOP_K = 4
CHUNK_SIZE = 512 CHUNK_SIZE = 512
resp, prompt = self.qa_handle.get_knowledge_based_conent_test( resp, prompt = self.qa_handle.get_knowledge_based_content_test(
query = txt, query = txt,
vs_path = self.kai_path, vs_path = self.kai_path,
score_threshold=VECTOR_SEARCH_SCORE_THRESHOLD, score_threshold=VECTOR_SEARCH_SCORE_THRESHOLD,
vector_search_top_k=VECTOR_SEARCH_TOP_K, vector_search_top_k=VECTOR_SEARCH_TOP_K,
chunk_conent=True, chunk_content=True,
chunk_size=CHUNK_SIZE, chunk_size=CHUNK_SIZE,
text2vec = self.get_chinese_text2vec(), text2vec = self.get_chinese_text2vec(),
) )

View File

@@ -1,6 +1,6 @@
from pydantic import BaseModel, Field from pydantic import BaseModel, Field
from typing import List from typing import List
from toolbox import update_ui_lastest_msg, disable_auto_promotion from toolbox import update_ui_latest_msg, disable_auto_promotion
from request_llms.bridge_all import predict_no_ui_long_connection from request_llms.bridge_all import predict_no_ui_long_connection
from crazy_functions.json_fns.pydantic_io import GptJsonIO, JsonStringError from crazy_functions.json_fns.pydantic_io import GptJsonIO, JsonStringError
import copy, json, pickle, os, sys, time import copy, json, pickle, os, sys, time
@@ -9,14 +9,14 @@ import copy, json, pickle, os, sys, time
def read_avail_plugin_enum(): def read_avail_plugin_enum():
from crazy_functional import get_crazy_functions from crazy_functional import get_crazy_functions
plugin_arr = get_crazy_functions() plugin_arr = get_crazy_functions()
# remove plugins with out explaination # remove plugins with out explanation
plugin_arr = {k:v for k, v in plugin_arr.items() if ('Info' in v) and ('Function' in v)} plugin_arr = {k:v for k, v in plugin_arr.items() if ('Info' in v) and ('Function' in v)}
plugin_arr_info = {"F_{:04d}".format(i):v["Info"] for i, v in enumerate(plugin_arr.values(), start=1)} plugin_arr_info = {"F_{:04d}".format(i):v["Info"] for i, v in enumerate(plugin_arr.values(), start=1)}
plugin_arr_dict = {"F_{:04d}".format(i):v for i, v in enumerate(plugin_arr.values(), start=1)} plugin_arr_dict = {"F_{:04d}".format(i):v for i, v in enumerate(plugin_arr.values(), start=1)}
plugin_arr_dict_parse = {"F_{:04d}".format(i):v for i, v in enumerate(plugin_arr.values(), start=1)} plugin_arr_dict_parse = {"F_{:04d}".format(i):v for i, v in enumerate(plugin_arr.values(), start=1)}
plugin_arr_dict_parse.update({f"F_{i}":v for i, v in enumerate(plugin_arr.values(), start=1)}) plugin_arr_dict_parse.update({f"F_{i}":v for i, v in enumerate(plugin_arr.values(), start=1)})
prompt = json.dumps(plugin_arr_info, ensure_ascii=False, indent=2) prompt = json.dumps(plugin_arr_info, ensure_ascii=False, indent=2)
prompt = "\n\nThe defination of PluginEnum:\nPluginEnum=" + prompt prompt = "\n\nThe definition of PluginEnum:\nPluginEnum=" + prompt
return prompt, plugin_arr_dict, plugin_arr_dict_parse return prompt, plugin_arr_dict, plugin_arr_dict_parse
def wrap_code(txt): def wrap_code(txt):
@@ -55,7 +55,7 @@ def execute_plugin(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prom
plugin_selection: str = Field(description="The most related plugin from one of the PluginEnum.", default="F_0000") plugin_selection: str = Field(description="The most related plugin from one of the PluginEnum.", default="F_0000")
reason_of_selection: str = Field(description="The reason why you should select this plugin.", default="This plugin satisfy user requirement most") reason_of_selection: str = Field(description="The reason why you should select this plugin.", default="This plugin satisfy user requirement most")
# ⭐ ⭐ ⭐ 选择插件 # ⭐ ⭐ ⭐ 选择插件
yield from update_ui_lastest_msg(lastmsg=f"正在执行任务: {txt}\n\n查找可用插件中...", chatbot=chatbot, history=history, delay=0) yield from update_ui_latest_msg(lastmsg=f"正在执行任务: {txt}\n\n查找可用插件中...", chatbot=chatbot, history=history, delay=0)
gpt_json_io = GptJsonIO(Plugin) gpt_json_io = GptJsonIO(Plugin)
gpt_json_io.format_instructions = "The format of your output should be a json that can be parsed by json.loads.\n" gpt_json_io.format_instructions = "The format of your output should be a json that can be parsed by json.loads.\n"
gpt_json_io.format_instructions += """Output example: {"plugin_selection":"F_1234", "reason_of_selection":"F_1234 plugin satisfy user requirement most"}\n""" gpt_json_io.format_instructions += """Output example: {"plugin_selection":"F_1234", "reason_of_selection":"F_1234 plugin satisfy user requirement most"}\n"""
@@ -74,13 +74,13 @@ def execute_plugin(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prom
msg += "请求的Prompt为\n" + wrap_code(get_inputs_show_user(inputs, plugin_arr_enum_prompt)) msg += "请求的Prompt为\n" + wrap_code(get_inputs_show_user(inputs, plugin_arr_enum_prompt))
msg += "语言模型回复为:\n" + wrap_code(gpt_reply) msg += "语言模型回复为:\n" + wrap_code(gpt_reply)
msg += "\n但您可以尝试再试一次\n" msg += "\n但您可以尝试再试一次\n"
yield from update_ui_lastest_msg(lastmsg=msg, chatbot=chatbot, history=history, delay=2) yield from update_ui_latest_msg(lastmsg=msg, chatbot=chatbot, history=history, delay=2)
return return
if plugin_sel.plugin_selection not in plugin_arr_dict_parse: if plugin_sel.plugin_selection not in plugin_arr_dict_parse:
msg = f"抱歉, 找不到合适插件执行该任务, 或者{llm_kwargs['llm_model']}无法理解您的需求。" msg = f"抱歉, 找不到合适插件执行该任务, 或者{llm_kwargs['llm_model']}无法理解您的需求。"
msg += f"语言模型{llm_kwargs['llm_model']}选择了不存在的插件:\n" + wrap_code(gpt_reply) msg += f"语言模型{llm_kwargs['llm_model']}选择了不存在的插件:\n" + wrap_code(gpt_reply)
msg += "\n但您可以尝试再试一次\n" msg += "\n但您可以尝试再试一次\n"
yield from update_ui_lastest_msg(lastmsg=msg, chatbot=chatbot, history=history, delay=2) yield from update_ui_latest_msg(lastmsg=msg, chatbot=chatbot, history=history, delay=2)
return return
# ⭐ ⭐ ⭐ 确认插件参数 # ⭐ ⭐ ⭐ 确认插件参数
@@ -90,7 +90,7 @@ def execute_plugin(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prom
appendix_info = get_recent_file_prompt_support(chatbot) appendix_info = get_recent_file_prompt_support(chatbot)
plugin = plugin_arr_dict_parse[plugin_sel.plugin_selection] plugin = plugin_arr_dict_parse[plugin_sel.plugin_selection]
yield from update_ui_lastest_msg(lastmsg=f"正在执行任务: {txt}\n\n提取插件参数...", chatbot=chatbot, history=history, delay=0) yield from update_ui_latest_msg(lastmsg=f"正在执行任务: {txt}\n\n提取插件参数...", chatbot=chatbot, history=history, delay=0)
class PluginExplicit(BaseModel): class PluginExplicit(BaseModel):
plugin_selection: str = plugin_sel.plugin_selection plugin_selection: str = plugin_sel.plugin_selection
plugin_arg: str = Field(description="The argument of the plugin.", default="") plugin_arg: str = Field(description="The argument of the plugin.", default="")
@@ -109,6 +109,6 @@ def execute_plugin(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prom
fn = plugin['Function'] fn = plugin['Function']
fn_name = fn.__name__ fn_name = fn.__name__
msg = f'{llm_kwargs["llm_model"]}为您选择了插件: `{fn_name}`\n\n插件说明:{plugin["Info"]}\n\n插件参数:{plugin_sel.plugin_arg}\n\n假如偏离了您的要求,按停止键终止。' msg = f'{llm_kwargs["llm_model"]}为您选择了插件: `{fn_name}`\n\n插件说明:{plugin["Info"]}\n\n插件参数:{plugin_sel.plugin_arg}\n\n假如偏离了您的要求,按停止键终止。'
yield from update_ui_lastest_msg(lastmsg=msg, chatbot=chatbot, history=history, delay=2) yield from update_ui_latest_msg(lastmsg=msg, chatbot=chatbot, history=history, delay=2)
yield from fn(plugin_sel.plugin_arg, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, -1) yield from fn(plugin_sel.plugin_arg, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, -1)
return return

View File

@@ -1,6 +1,6 @@
from pydantic import BaseModel, Field from pydantic import BaseModel, Field
from typing import List from typing import List
from toolbox import update_ui_lastest_msg, get_conf from toolbox import update_ui_latest_msg, get_conf
from request_llms.bridge_all import predict_no_ui_long_connection from request_llms.bridge_all import predict_no_ui_long_connection
from crazy_functions.json_fns.pydantic_io import GptJsonIO from crazy_functions.json_fns.pydantic_io import GptJsonIO
import copy, json, pickle, os, sys import copy, json, pickle, os, sys
@@ -9,7 +9,7 @@ import copy, json, pickle, os, sys
def modify_configuration_hot(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_intention): def modify_configuration_hot(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_intention):
ALLOW_RESET_CONFIG = get_conf('ALLOW_RESET_CONFIG') ALLOW_RESET_CONFIG = get_conf('ALLOW_RESET_CONFIG')
if not ALLOW_RESET_CONFIG: if not ALLOW_RESET_CONFIG:
yield from update_ui_lastest_msg( yield from update_ui_latest_msg(
lastmsg=f"当前配置不允许被修改如需激活本功能请在config.py中设置ALLOW_RESET_CONFIG=True后重启软件。", lastmsg=f"当前配置不允许被修改如需激活本功能请在config.py中设置ALLOW_RESET_CONFIG=True后重启软件。",
chatbot=chatbot, history=history, delay=2 chatbot=chatbot, history=history, delay=2
) )
@@ -30,7 +30,7 @@ def modify_configuration_hot(txt, llm_kwargs, plugin_kwargs, chatbot, history, s
new_option_value: str = Field(description="the new value of the option", default=None) new_option_value: str = Field(description="the new value of the option", default=None)
# ⭐ ⭐ ⭐ 分析用户意图 # ⭐ ⭐ ⭐ 分析用户意图
yield from update_ui_lastest_msg(lastmsg=f"正在执行任务: {txt}\n\n读取新配置中", chatbot=chatbot, history=history, delay=0) yield from update_ui_latest_msg(lastmsg=f"正在执行任务: {txt}\n\n读取新配置中", chatbot=chatbot, history=history, delay=0)
gpt_json_io = GptJsonIO(ModifyConfigurationIntention) gpt_json_io = GptJsonIO(ModifyConfigurationIntention)
inputs = "Analyze how to change configuration according to following user input, answer me with json: \n\n" + \ inputs = "Analyze how to change configuration according to following user input, answer me with json: \n\n" + \
">> " + txt.rstrip('\n').replace('\n','\n>> ') + '\n\n' + \ ">> " + txt.rstrip('\n').replace('\n','\n>> ') + '\n\n' + \
@@ -44,11 +44,11 @@ def modify_configuration_hot(txt, llm_kwargs, plugin_kwargs, chatbot, history, s
ok = (explicit_conf in txt) ok = (explicit_conf in txt)
if ok: if ok:
yield from update_ui_lastest_msg( yield from update_ui_latest_msg(
lastmsg=f"正在执行任务: {txt}\n\n新配置{explicit_conf}={user_intention.new_option_value}", lastmsg=f"正在执行任务: {txt}\n\n新配置{explicit_conf}={user_intention.new_option_value}",
chatbot=chatbot, history=history, delay=1 chatbot=chatbot, history=history, delay=1
) )
yield from update_ui_lastest_msg( yield from update_ui_latest_msg(
lastmsg=f"正在执行任务: {txt}\n\n新配置{explicit_conf}={user_intention.new_option_value}\n\n正在修改配置中", lastmsg=f"正在执行任务: {txt}\n\n新配置{explicit_conf}={user_intention.new_option_value}\n\n正在修改配置中",
chatbot=chatbot, history=history, delay=2 chatbot=chatbot, history=history, delay=2
) )
@@ -57,25 +57,25 @@ def modify_configuration_hot(txt, llm_kwargs, plugin_kwargs, chatbot, history, s
from toolbox import set_conf from toolbox import set_conf
set_conf(explicit_conf, user_intention.new_option_value) set_conf(explicit_conf, user_intention.new_option_value)
yield from update_ui_lastest_msg( yield from update_ui_latest_msg(
lastmsg=f"正在执行任务: {txt}\n\n配置修改完成,重新页面即可生效。", chatbot=chatbot, history=history, delay=1 lastmsg=f"正在执行任务: {txt}\n\n配置修改完成,重新页面即可生效。", chatbot=chatbot, history=history, delay=1
) )
else: else:
yield from update_ui_lastest_msg( yield from update_ui_latest_msg(
lastmsg=f"失败,如果需要配置{explicit_conf},您需要明确说明并在指令中提到它。", chatbot=chatbot, history=history, delay=5 lastmsg=f"失败,如果需要配置{explicit_conf},您需要明确说明并在指令中提到它。", chatbot=chatbot, history=history, delay=5
) )
def modify_configuration_reboot(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_intention): def modify_configuration_reboot(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_intention):
ALLOW_RESET_CONFIG = get_conf('ALLOW_RESET_CONFIG') ALLOW_RESET_CONFIG = get_conf('ALLOW_RESET_CONFIG')
if not ALLOW_RESET_CONFIG: if not ALLOW_RESET_CONFIG:
yield from update_ui_lastest_msg( yield from update_ui_latest_msg(
lastmsg=f"当前配置不允许被修改如需激活本功能请在config.py中设置ALLOW_RESET_CONFIG=True后重启软件。", lastmsg=f"当前配置不允许被修改如需激活本功能请在config.py中设置ALLOW_RESET_CONFIG=True后重启软件。",
chatbot=chatbot, history=history, delay=2 chatbot=chatbot, history=history, delay=2
) )
return return
yield from modify_configuration_hot(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_intention) yield from modify_configuration_hot(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_intention)
yield from update_ui_lastest_msg( yield from update_ui_latest_msg(
lastmsg=f"正在执行任务: {txt}\n\n配置修改完成,五秒后即将重启!若出现报错请无视即可。", chatbot=chatbot, history=history, delay=5 lastmsg=f"正在执行任务: {txt}\n\n配置修改完成,五秒后即将重启!若出现报错请无视即可。", chatbot=chatbot, history=history, delay=5
) )
os.execl(sys.executable, sys.executable, *sys.argv) os.execl(sys.executable, sys.executable, *sys.argv)

View File

@@ -5,7 +5,7 @@ class VoidTerminalState():
self.reset_state() self.reset_state()
def reset_state(self): def reset_state(self):
self.has_provided_explaination = False self.has_provided_explanation = False
def lock_plugin(self, chatbot): def lock_plugin(self, chatbot):
chatbot._cookies['lock_plugin'] = 'crazy_functions.虚空终端->虚空终端' chatbot._cookies['lock_plugin'] = 'crazy_functions.虚空终端->虚空终端'

File diff suppressed because it is too large Load Diff

View File

@@ -1,17 +1,19 @@
import re, requests, unicodedata, os
from toolbox import update_ui, get_log_folder from toolbox import update_ui, get_log_folder
from toolbox import write_history_to_file, promote_file_to_downloadzone from toolbox import write_history_to_file, promote_file_to_downloadzone
from toolbox import CatchException, report_exception, get_conf from toolbox import CatchException, report_exception, get_conf
import re, requests, unicodedata, os from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive from loguru import logger
def download_arxiv_(url_pdf): def download_arxiv_(url_pdf):
if 'arxiv.org' not in url_pdf: if 'arxiv.org' not in url_pdf:
if ('.' in url_pdf) and ('/' not in url_pdf): if ('.' in url_pdf) and ('/' not in url_pdf):
new_url = 'https://arxiv.org/abs/'+url_pdf new_url = 'https://arxiv.org/abs/'+url_pdf
print('下载编号:', url_pdf, '自动定位:', new_url) logger.info('下载编号:', url_pdf, '自动定位:', new_url)
# download_arxiv_(new_url) # download_arxiv_(new_url)
return download_arxiv_(new_url) return download_arxiv_(new_url)
else: else:
print('不能识别的URL') logger.info('不能识别的URL')
return None return None
if 'abs' in url_pdf: if 'abs' in url_pdf:
url_pdf = url_pdf.replace('abs', 'pdf') url_pdf = url_pdf.replace('abs', 'pdf')
@@ -42,15 +44,12 @@ def download_arxiv_(url_pdf):
requests_pdf_url = url_pdf requests_pdf_url = url_pdf
file_path = download_dir+title_str file_path = download_dir+title_str
print('下载中') logger.info('下载中')
proxies = get_conf('proxies') proxies = get_conf('proxies')
r = requests.get(requests_pdf_url, proxies=proxies) r = requests.get(requests_pdf_url, proxies=proxies)
with open(file_path, 'wb+') as f: with open(file_path, 'wb+') as f:
f.write(r.content) f.write(r.content)
print('下载完成') logger.info('下载完成')
# print('输出下载命令:','aria2c -o \"%s\" %s'%(title_str,url_pdf))
# subprocess.call('aria2c --all-proxy=\"172.18.116.150:11084\" -o \"%s\" %s'%(download_dir+title_str,url_pdf), shell=True)
x = "%s %s %s.bib" % (paper_id, other_info['year'], other_info['authors']) x = "%s %s %s.bib" % (paper_id, other_info['year'], other_info['authors'])
x = x.replace('?', '')\ x = x.replace('?', '')\
@@ -63,19 +62,9 @@ def download_arxiv_(url_pdf):
def get_name(_url_): def get_name(_url_):
import os
from bs4 import BeautifulSoup from bs4 import BeautifulSoup
print('正在获取文献名!') logger.info('正在获取文献名!')
print(_url_) logger.info(_url_)
# arxiv_recall = {}
# if os.path.exists('./arxiv_recall.pkl'):
# with open('./arxiv_recall.pkl', 'rb') as f:
# arxiv_recall = pickle.load(f)
# if _url_ in arxiv_recall:
# print('在缓存中')
# return arxiv_recall[_url_]
proxies = get_conf('proxies') proxies = get_conf('proxies')
res = requests.get(_url_, proxies=proxies) res = requests.get(_url_, proxies=proxies)
@@ -92,7 +81,7 @@ def get_name(_url_):
other_details['abstract'] = abstract other_details['abstract'] = abstract
except: except:
other_details['year'] = '' other_details['year'] = ''
print('年份获取失败') logger.info('年份获取失败')
# get author # get author
try: try:
@@ -101,7 +90,7 @@ def get_name(_url_):
other_details['authors'] = authors other_details['authors'] = authors
except: except:
other_details['authors'] = '' other_details['authors'] = ''
print('authors获取失败') logger.info('authors获取失败')
# get comment # get comment
try: try:
@@ -116,11 +105,11 @@ def get_name(_url_):
other_details['comment'] = '' other_details['comment'] = ''
except: except:
other_details['comment'] = '' other_details['comment'] = ''
print('年份获取失败') logger.info('年份获取失败')
title_str = BeautifulSoup( title_str = BeautifulSoup(
res.text, 'html.parser').find('title').contents[0] res.text, 'html.parser').find('title').contents[0]
print('获取成功:', title_str) logger.info('获取成功:', title_str)
# arxiv_recall[_url_] = (title_str+'.pdf', other_details) # arxiv_recall[_url_] = (title_str+'.pdf', other_details)
# with open('./arxiv_recall.pkl', 'wb') as f: # with open('./arxiv_recall.pkl', 'wb') as f:
# pickle.dump(arxiv_recall, f) # pickle.dump(arxiv_recall, f)

View File

@@ -1,4 +1,4 @@
from toolbox import CatchException, update_ui, update_ui_lastest_msg from toolbox import CatchException, update_ui, update_ui_latest_msg
from crazy_functions.multi_stage.multi_stage_utils import GptAcademicGameBaseState from crazy_functions.multi_stage.multi_stage_utils import GptAcademicGameBaseState
from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from request_llms.bridge_all import predict_no_ui_long_connection from request_llms.bridge_all import predict_no_ui_long_connection

View File

@@ -1,6 +1,5 @@
from toolbox import CatchException, update_ui from toolbox import CatchException, update_ui
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
@CatchException @CatchException
def 交互功能模板函数(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request): def 交互功能模板函数(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):

View File

@@ -15,9 +15,9 @@ Testing:
from toolbox import CatchException, update_ui, gen_time_str, trimmed_format_exc, is_the_upload_folder from toolbox import CatchException, update_ui, gen_time_str, trimmed_format_exc, is_the_upload_folder
from toolbox import promote_file_to_downloadzone, get_log_folder, update_ui_lastest_msg from toolbox import promote_file_to_downloadzone, get_log_folder, update_ui_latest_msg
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive, get_plugin_arg from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive, get_plugin_arg
from .crazy_utils import input_clipping, try_install_deps from crazy_functions.crazy_utils import input_clipping, try_install_deps
from crazy_functions.gen_fns.gen_fns_shared import is_function_successfully_generated from crazy_functions.gen_fns.gen_fns_shared import is_function_successfully_generated
from crazy_functions.gen_fns.gen_fns_shared import get_class_name from crazy_functions.gen_fns.gen_fns_shared import get_class_name
from crazy_functions.gen_fns.gen_fns_shared import subprocess_worker from crazy_functions.gen_fns.gen_fns_shared import subprocess_worker
@@ -27,7 +27,7 @@ import time
import glob import glob
import multiprocessing import multiprocessing
templete = """ template = """
```python ```python
import ... # Put dependencies here, e.g. import numpy as np. import ... # Put dependencies here, e.g. import numpy as np.
@@ -77,10 +77,10 @@ def gpt_interact_multi_step(txt, file_type, llm_kwargs, chatbot, history):
# 第二步 # 第二步
prompt_compose = [ prompt_compose = [
"If previous stage is successful, rewrite the function you have just written to satisfy following templete: \n", "If previous stage is successful, rewrite the function you have just written to satisfy following template: \n",
templete template
] ]
i_say = "".join(prompt_compose); inputs_show_user = "If previous stage is successful, rewrite the function you have just written to satisfy executable templete. " i_say = "".join(prompt_compose); inputs_show_user = "If previous stage is successful, rewrite the function you have just written to satisfy executable template. "
gpt_say = yield from request_gpt_model_in_new_thread_with_ui_alive( gpt_say = yield from request_gpt_model_in_new_thread_with_ui_alive(
inputs=i_say, inputs_show_user=inputs_show_user, inputs=i_say, inputs_show_user=inputs_show_user,
llm_kwargs=llm_kwargs, chatbot=chatbot, history=history, llm_kwargs=llm_kwargs, chatbot=chatbot, history=history,
@@ -164,18 +164,18 @@ def 函数动态生成(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_
if get_plugin_arg(plugin_kwargs, key="file_path_arg", default=False): if get_plugin_arg(plugin_kwargs, key="file_path_arg", default=False):
file_path = get_plugin_arg(plugin_kwargs, key="file_path_arg", default=None) file_path = get_plugin_arg(plugin_kwargs, key="file_path_arg", default=None)
file_list.append(file_path) file_list.append(file_path)
yield from update_ui_lastest_msg(f"当前文件: {file_path}", chatbot, history, 1) yield from update_ui_latest_msg(f"当前文件: {file_path}", chatbot, history, 1)
elif have_any_recent_upload_files(chatbot): elif have_any_recent_upload_files(chatbot):
file_dir = get_recent_file_prompt_support(chatbot) file_dir = get_recent_file_prompt_support(chatbot)
file_list = glob.glob(os.path.join(file_dir, '**/*'), recursive=True) file_list = glob.glob(os.path.join(file_dir, '**/*'), recursive=True)
yield from update_ui_lastest_msg(f"当前文件处理列表: {file_list}", chatbot, history, 1) yield from update_ui_latest_msg(f"当前文件处理列表: {file_list}", chatbot, history, 1)
else: else:
chatbot.append(["文件检索", "没有发现任何近期上传的文件。"]) chatbot.append(["文件检索", "没有发现任何近期上传的文件。"])
yield from update_ui_lastest_msg("没有发现任何近期上传的文件。", chatbot, history, 1) yield from update_ui_latest_msg("没有发现任何近期上传的文件。", chatbot, history, 1)
return # 2. 如果没有文件 return # 2. 如果没有文件
if len(file_list) == 0: if len(file_list) == 0:
chatbot.append(["文件检索", "没有发现任何近期上传的文件。"]) chatbot.append(["文件检索", "没有发现任何近期上传的文件。"])
yield from update_ui_lastest_msg("没有发现任何近期上传的文件。", chatbot, history, 1) yield from update_ui_latest_msg("没有发现任何近期上传的文件。", chatbot, history, 1)
return # 2. 如果没有文件 return # 2. 如果没有文件
# 读取文件 # 读取文件
@@ -183,7 +183,7 @@ def 函数动态生成(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_
# 粗心检查 # 粗心检查
if is_the_upload_folder(txt): if is_the_upload_folder(txt):
yield from update_ui_lastest_msg(f"请在输入框内填写需求, 然后再次点击该插件! 至于您的文件,不用担心, 文件路径 {txt} 已经被记忆. ", chatbot, history, 1) yield from update_ui_latest_msg(f"请在输入框内填写需求, 然后再次点击该插件! 至于您的文件,不用担心, 文件路径 {txt} 已经被记忆. ", chatbot, history, 1)
return return
# 开始干正事 # 开始干正事
@@ -195,7 +195,7 @@ def 函数动态生成(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_
code, installation_advance, txt, file_type, llm_kwargs, chatbot, history = \ code, installation_advance, txt, file_type, llm_kwargs, chatbot, history = \
yield from gpt_interact_multi_step(txt, file_type, llm_kwargs, chatbot, history) yield from gpt_interact_multi_step(txt, file_type, llm_kwargs, chatbot, history)
chatbot.append(["代码生成阶段结束", ""]) chatbot.append(["代码生成阶段结束", ""])
yield from update_ui_lastest_msg(f"正在验证上述代码的有效性 ...", chatbot, history, 1) yield from update_ui_latest_msg(f"正在验证上述代码的有效性 ...", chatbot, history, 1)
# ⭐ 分离代码块 # ⭐ 分离代码块
code = get_code_block(code) code = get_code_block(code)
# ⭐ 检查模块 # ⭐ 检查模块
@@ -206,11 +206,11 @@ def 函数动态生成(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_
if not traceback: traceback = trimmed_format_exc() if not traceback: traceback = trimmed_format_exc()
# 处理异常 # 处理异常
if not traceback: traceback = trimmed_format_exc() if not traceback: traceback = trimmed_format_exc()
yield from update_ui_lastest_msg(f"{j+1}/{MAX_TRY} 次代码生成尝试, 失败了~ 别担心, 我们5秒后再试一次... \n\n此次我们的错误追踪是\n```\n{traceback}\n```\n", chatbot, history, 5) yield from update_ui_latest_msg(f"{j+1}/{MAX_TRY} 次代码生成尝试, 失败了~ 别担心, 我们5秒后再试一次... \n\n此次我们的错误追踪是\n```\n{traceback}\n```\n", chatbot, history, 5)
# 代码生成结束, 开始执行 # 代码生成结束, 开始执行
TIME_LIMIT = 15 TIME_LIMIT = 15
yield from update_ui_lastest_msg(f"开始创建新进程并执行代码! 时间限制 {TIME_LIMIT} 秒. 请等待任务完成... ", chatbot, history, 1) yield from update_ui_latest_msg(f"开始创建新进程并执行代码! 时间限制 {TIME_LIMIT} 秒. 请等待任务完成... ", chatbot, history, 1)
manager = multiprocessing.Manager() manager = multiprocessing.Manager()
return_dict = manager.dict() return_dict = manager.dict()

View File

@@ -1,6 +1,6 @@
from toolbox import CatchException, update_ui, gen_time_str from toolbox import CatchException, update_ui, gen_time_str
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from .crazy_utils import input_clipping from crazy_functions.crazy_utils import input_clipping
import copy, json import copy, json
@CatchException @CatchException

View File

@@ -6,13 +6,14 @@
""" """
import time
from toolbox import CatchException, update_ui, gen_time_str, trimmed_format_exc, ProxyNetworkActivate from toolbox import CatchException, update_ui, gen_time_str, trimmed_format_exc, ProxyNetworkActivate
from toolbox import get_conf, select_api_key, update_ui_lastest_msg, Singleton from toolbox import get_conf, select_api_key, update_ui_latest_msg, Singleton
from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive, get_plugin_arg from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive, get_plugin_arg
from crazy_functions.crazy_utils import input_clipping, try_install_deps from crazy_functions.crazy_utils import input_clipping, try_install_deps
from crazy_functions.agent_fns.persistent import GradioMultiuserManagerForPersistentClasses from crazy_functions.agent_fns.persistent import GradioMultiuserManagerForPersistentClasses
from crazy_functions.agent_fns.auto_agent import AutoGenMath from crazy_functions.agent_fns.auto_agent import AutoGenMath
import time from loguru import logger
def remove_model_prefix(llm): def remove_model_prefix(llm):
if llm.startswith('api2d-'): llm = llm.replace('api2d-', '') if llm.startswith('api2d-'): llm = llm.replace('api2d-', '')
@@ -80,12 +81,12 @@ def 多智能体终端(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_
persistent_key = f"{user_uuid}->多智能体终端" persistent_key = f"{user_uuid}->多智能体终端"
if persistent_class_multi_user_manager.already_alive(persistent_key): if persistent_class_multi_user_manager.already_alive(persistent_key):
# 当已经存在一个正在运行的多智能体终端时,直接将用户输入传递给它,而不是再次启动一个新的多智能体终端 # 当已经存在一个正在运行的多智能体终端时,直接将用户输入传递给它,而不是再次启动一个新的多智能体终端
print('[debug] feed new user input') logger.info('[debug] feed new user input')
executor = persistent_class_multi_user_manager.get(persistent_key) executor = persistent_class_multi_user_manager.get(persistent_key)
exit_reason = yield from executor.main_process_ui_control(txt, create_or_resume="resume") exit_reason = yield from executor.main_process_ui_control(txt, create_or_resume="resume")
else: else:
# 运行多智能体终端 (首次) # 运行多智能体终端 (首次)
print('[debug] create new executor instance') logger.info('[debug] create new executor instance')
history = [] history = []
chatbot.append(["正在启动: 多智能体终端", "插件动态生成, 执行开始, 作者 Microsoft & Binary-Husky."]) chatbot.append(["正在启动: 多智能体终端", "插件动态生成, 执行开始, 作者 Microsoft & Binary-Husky."])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 yield from update_ui(chatbot=chatbot, history=history) # 刷新界面

View File

@@ -1,7 +1,7 @@
from toolbox import update_ui from toolbox import update_ui
from toolbox import CatchException, report_exception from toolbox import CatchException, report_exception
from toolbox import write_history_to_file, promote_file_to_downloadzone from toolbox import write_history_to_file, promote_file_to_downloadzone
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
fast_debug = False fast_debug = False

View File

@@ -1,5 +1,5 @@
from toolbox import CatchException, report_exception, select_api_key, update_ui, get_conf from toolbox import CatchException, report_exception, select_api_key, update_ui, get_conf
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from toolbox import write_history_to_file, promote_file_to_downloadzone, get_log_folder from toolbox import write_history_to_file, promote_file_to_downloadzone, get_log_folder
def split_audio_file(filename, split_duration=1000): def split_audio_file(filename, split_duration=1000):

View File

@@ -1,16 +1,18 @@
from loguru import logger
from toolbox import update_ui, promote_file_to_downloadzone, gen_time_str from toolbox import update_ui, promote_file_to_downloadzone, gen_time_str
from toolbox import CatchException, report_exception from toolbox import CatchException, report_exception
from toolbox import write_history_to_file, promote_file_to_downloadzone from toolbox import write_history_to_file, promote_file_to_downloadzone
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from .crazy_utils import read_and_clean_pdf_text from crazy_functions.crazy_utils import read_and_clean_pdf_text
from .crazy_utils import input_clipping from crazy_functions.crazy_utils import input_clipping
def 解析PDF(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt): def 解析PDF(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt):
file_write_buffer = [] file_write_buffer = []
for file_name in file_manifest: for file_name in file_manifest:
print('begin analysis on:', file_name) logger.info('begin analysis on:', file_name)
############################## <第 0 步切割PDF> ################################## ############################## <第 0 步切割PDF> ##################################
# 递归地切割PDF文件每一块尽量是完整的一个section比如introductionexperiment等必要时再进行切割 # 递归地切割PDF文件每一块尽量是完整的一个section比如introductionexperiment等必要时再进行切割
# 的长度必须小于 2500 个 Token # 的长度必须小于 2500 个 Token
@@ -38,7 +40,7 @@ def 解析PDF(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot,
last_iteration_result = paper_meta # 初始值是摘要 last_iteration_result = paper_meta # 初始值是摘要
MAX_WORD_TOTAL = 4096 * 0.7 MAX_WORD_TOTAL = 4096 * 0.7
n_fragment = len(paper_fragments) n_fragment = len(paper_fragments)
if n_fragment >= 20: print('文章极长,不能达到预期效果') if n_fragment >= 20: logger.warning('文章极长,不能达到预期效果')
for i in range(n_fragment): for i in range(n_fragment):
NUM_OF_WORD = MAX_WORD_TOTAL // n_fragment NUM_OF_WORD = MAX_WORD_TOTAL // n_fragment
i_say = f"Read this section, recapitulate the content of this section with less than {NUM_OF_WORD} Chinese characters: {paper_fragments[i]}" i_say = f"Read this section, recapitulate the content of this section with less than {NUM_OF_WORD} Chinese characters: {paper_fragments[i]}"

View File

@@ -1,6 +1,7 @@
from loguru import logger
from toolbox import update_ui from toolbox import update_ui
from toolbox import CatchException, report_exception from toolbox import CatchException, report_exception
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from toolbox import write_history_to_file, promote_file_to_downloadzone from toolbox import write_history_to_file, promote_file_to_downloadzone
fast_debug = False fast_debug = False
@@ -57,7 +58,6 @@ def readPdf(pdfPath):
layout = device.get_result() layout = device.get_result()
for obj in layout._objs: for obj in layout._objs:
if isinstance(obj, pdfminer.layout.LTTextBoxHorizontal): if isinstance(obj, pdfminer.layout.LTTextBoxHorizontal):
# print(obj.get_text())
outTextList.append(obj.get_text()) outTextList.append(obj.get_text())
return outTextList return outTextList
@@ -66,7 +66,7 @@ def readPdf(pdfPath):
def 解析Paper(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt): def 解析Paper(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt):
import time, glob, os import time, glob, os
from bs4 import BeautifulSoup from bs4 import BeautifulSoup
print('begin analysis on:', file_manifest) logger.info('begin analysis on:', file_manifest)
for index, fp in enumerate(file_manifest): for index, fp in enumerate(file_manifest):
if ".tex" in fp: if ".tex" in fp:
with open(fp, 'r', encoding='utf-8', errors='replace') as f: with open(fp, 'r', encoding='utf-8', errors='replace') as f:

View File

@@ -1,9 +1,9 @@
from toolbox import CatchException, report_exception, get_log_folder, gen_time_str from toolbox import CatchException, report_exception, get_log_folder, gen_time_str
from toolbox import update_ui, promote_file_to_downloadzone, update_ui_lastest_msg, disable_auto_promotion from toolbox import update_ui, promote_file_to_downloadzone, update_ui_latest_msg, disable_auto_promotion
from toolbox import write_history_to_file, promote_file_to_downloadzone from toolbox import write_history_to_file, promote_file_to_downloadzone
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from .crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency from crazy_functions.crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency
from .crazy_utils import read_and_clean_pdf_text from crazy_functions.crazy_utils import read_and_clean_pdf_text
from .pdf_fns.parse_pdf import parse_pdf, get_avail_grobid_url, translate_pdf from .pdf_fns.parse_pdf import parse_pdf, get_avail_grobid_url, translate_pdf
from shared_utils.colorful import * from shared_utils.colorful import *
import copy import copy
@@ -60,7 +60,7 @@ def 批量翻译PDF文档(txt, llm_kwargs, plugin_kwargs, chatbot, history, syst
# 清空历史,以免输入溢出 # 清空历史,以免输入溢出
history = [] history = []
from .crazy_utils import get_files_from_everything from crazy_functions.crazy_utils import get_files_from_everything
success, file_manifest, project_folder = get_files_from_everything(txt, type='.pdf') success, file_manifest, project_folder = get_files_from_everything(txt, type='.pdf')
if len(file_manifest) > 0: if len(file_manifest) > 0:
# 尝试导入依赖,如果缺少依赖,则给出安装建议 # 尝试导入依赖,如果缺少依赖,则给出安装建议

View File

@@ -1,4 +1,5 @@
import os import os
from loguru import logger
from toolbox import CatchException, update_ui, gen_time_str, promote_file_to_downloadzone from toolbox import CatchException, update_ui, gen_time_str, promote_file_to_downloadzone
from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from crazy_functions.crazy_utils import input_clipping from crazy_functions.crazy_utils import input_clipping
@@ -34,10 +35,10 @@ def eval_manim(code):
return f'gpt_log/{time_str}.mp4' return f'gpt_log/{time_str}.mp4'
except subprocess.CalledProcessError as e: except subprocess.CalledProcessError as e:
output = e.output.decode() output = e.output.decode()
print(f"Command returned non-zero exit status {e.returncode}: {output}.") logger.error(f"Command returned non-zero exit status {e.returncode}: {output}.")
return f"Evaluating python script failed: {e.output}." return f"Evaluating python script failed: {e.output}."
except: except:
print('generating mp4 failed') logger.error('generating mp4 failed')
return "Generating mp4 failed." return "Generating mp4 failed."
@@ -165,7 +166,7 @@ class PointWithTrace(Scene):
``` ```
# do not use get_graph, this funciton is deprecated # do not use get_graph, this function is deprecated
class ExampleFunctionGraph(Scene): class ExampleFunctionGraph(Scene):
def construct(self): def construct(self):

View File

@@ -1,13 +1,12 @@
from loguru import logger
from toolbox import update_ui from toolbox import update_ui
from toolbox import CatchException, report_exception from toolbox import CatchException, report_exception
from .crazy_utils import read_and_clean_pdf_text from crazy_functions.crazy_utils import read_and_clean_pdf_text
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
fast_debug = False
def 解析PDF(file_name, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt): def 解析PDF(file_name, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt):
import tiktoken logger.info('begin analysis on:', file_name)
print('begin analysis on:', file_name)
############################## <第 0 步切割PDF> ################################## ############################## <第 0 步切割PDF> ##################################
# 递归地切割PDF文件每一块尽量是完整的一个section比如introductionexperiment等必要时再进行切割 # 递归地切割PDF文件每一块尽量是完整的一个section比如introductionexperiment等必要时再进行切割
@@ -36,7 +35,7 @@ def 解析PDF(file_name, llm_kwargs, plugin_kwargs, chatbot, history, system_pro
last_iteration_result = paper_meta # 初始值是摘要 last_iteration_result = paper_meta # 初始值是摘要
MAX_WORD_TOTAL = 4096 MAX_WORD_TOTAL = 4096
n_fragment = len(paper_fragments) n_fragment = len(paper_fragments)
if n_fragment >= 20: print('文章极长,不能达到预期效果') if n_fragment >= 20: logger.warning('文章极长,不能达到预期效果')
for i in range(n_fragment): for i in range(n_fragment):
NUM_OF_WORD = MAX_WORD_TOTAL // n_fragment NUM_OF_WORD = MAX_WORD_TOTAL // n_fragment
i_say = f"Read this section, recapitulate the content of this section with less than {NUM_OF_WORD} words: {paper_fragments[i]}" i_say = f"Read this section, recapitulate the content of this section with less than {NUM_OF_WORD} words: {paper_fragments[i]}"
@@ -57,7 +56,7 @@ def 解析PDF(file_name, llm_kwargs, plugin_kwargs, chatbot, history, system_pro
chatbot.append([i_say_show_user, gpt_say]) chatbot.append([i_say_show_user, gpt_say])
############################## <第 4 步设置一个token上限防止回答时Token溢出> ################################## ############################## <第 4 步设置一个token上限防止回答时Token溢出> ##################################
from .crazy_utils import input_clipping from crazy_functions.crazy_utils import input_clipping
_, final_results = input_clipping("", final_results, max_token_limit=3200) _, final_results = input_clipping("", final_results, max_token_limit=3200)
yield from update_ui(chatbot=chatbot, history=final_results) # 注意这里的历史记录被替代了 yield from update_ui(chatbot=chatbot, history=final_results) # 注意这里的历史记录被替代了

View File

@@ -1,12 +1,12 @@
from loguru import logger
from toolbox import update_ui from toolbox import update_ui
from toolbox import CatchException, report_exception from toolbox import CatchException, report_exception
from toolbox import write_history_to_file, promote_file_to_downloadzone from toolbox import write_history_to_file, promote_file_to_downloadzone
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
fast_debug = False
def 生成函数注释(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt): def 生成函数注释(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt):
import time, os import time, os
print('begin analysis on:', file_manifest) logger.info('begin analysis on:', file_manifest)
for index, fp in enumerate(file_manifest): for index, fp in enumerate(file_manifest):
with open(fp, 'r', encoding='utf-8', errors='replace') as f: with open(fp, 'r', encoding='utf-8', errors='replace') as f:
file_content = f.read() file_content = f.read()
@@ -16,22 +16,20 @@ def 生成函数注释(file_manifest, project_folder, llm_kwargs, plugin_kwargs,
chatbot.append((i_say_show_user, "[Local Message] waiting gpt response.")) chatbot.append((i_say_show_user, "[Local Message] waiting gpt response."))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
if not fast_debug: msg = '正常'
msg = '正常' # ** gpt request **
# ** gpt request ** gpt_say = yield from request_gpt_model_in_new_thread_with_ui_alive(
gpt_say = yield from request_gpt_model_in_new_thread_with_ui_alive( i_say, i_say_show_user, llm_kwargs, chatbot, history=[], sys_prompt=system_prompt) # 带超时倒计时
i_say, i_say_show_user, llm_kwargs, chatbot, history=[], sys_prompt=system_prompt) # 带超时倒计时
chatbot[-1] = (i_say_show_user, gpt_say) chatbot[-1] = (i_say_show_user, gpt_say)
history.append(i_say_show_user); history.append(gpt_say) history.append(i_say_show_user); history.append(gpt_say)
yield from update_ui(chatbot=chatbot, history=history, msg=msg) # 刷新界面
if not fast_debug: time.sleep(2)
if not fast_debug:
res = write_history_to_file(history)
promote_file_to_downloadzone(res, chatbot=chatbot)
chatbot.append(("完成了吗?", res))
yield from update_ui(chatbot=chatbot, history=history, msg=msg) # 刷新界面 yield from update_ui(chatbot=chatbot, history=history, msg=msg) # 刷新界面
time.sleep(2)
res = write_history_to_file(history)
promote_file_to_downloadzone(res, chatbot=chatbot)
chatbot.append(("完成了吗?", res))
yield from update_ui(chatbot=chatbot, history=history, msg=msg) # 刷新界面

View File

@@ -1,5 +1,5 @@
from toolbox import CatchException, update_ui, report_exception from toolbox import CatchException, update_ui, report_exception
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from crazy_functions.plugin_template.plugin_class_template import ( from crazy_functions.plugin_template.plugin_class_template import (
GptAcademicPluginTemplate, GptAcademicPluginTemplate,
) )
@@ -201,8 +201,7 @@ def 解析历史输入(history, llm_kwargs, file_manifest, chatbot, plugin_kwarg
MAX_WORD_TOTAL = 4096 MAX_WORD_TOTAL = 4096
n_txt = len(txt) n_txt = len(txt)
last_iteration_result = "从以下文本中提取摘要。" last_iteration_result = "从以下文本中提取摘要。"
if n_txt >= 20:
print("文章极长,不能达到预期效果")
for i in range(n_txt): for i in range(n_txt):
NUM_OF_WORD = MAX_WORD_TOTAL // n_txt NUM_OF_WORD = MAX_WORD_TOTAL // n_txt
i_say = f"Read this section, recapitulate the content of this section with less than {NUM_OF_WORD} words in Chinese: {txt[i]}" i_say = f"Read this section, recapitulate the content of this section with less than {NUM_OF_WORD} words in Chinese: {txt[i]}"
@@ -325,16 +324,16 @@ def 生成多种Mermaid图表(
if os.path.exists(txt): # 如输入区无内容则直接解析历史记录 if os.path.exists(txt): # 如输入区无内容则直接解析历史记录
from crazy_functions.pdf_fns.parse_word import extract_text_from_files from crazy_functions.pdf_fns.parse_word import extract_text_from_files
file_exist, final_result, page_one, file_manifest, excption = ( file_exist, final_result, page_one, file_manifest, exception = (
extract_text_from_files(txt, chatbot, history) extract_text_from_files(txt, chatbot, history)
) )
else: else:
file_exist = False file_exist = False
excption = "" exception = ""
file_manifest = [] file_manifest = []
if excption != "": if exception != "":
if excption == "word": if exception == "word":
report_exception( report_exception(
chatbot, chatbot,
history, history,
@@ -342,7 +341,7 @@ def 生成多种Mermaid图表(
b=f"找到了.doc文件但是该文件格式不被支持请先转化为.docx格式。", b=f"找到了.doc文件但是该文件格式不被支持请先转化为.docx格式。",
) )
elif excption == "pdf": elif exception == "pdf":
report_exception( report_exception(
chatbot, chatbot,
history, history,
@@ -350,7 +349,7 @@ def 生成多种Mermaid图表(
b=f"导入软件依赖失败。使用该模块需要额外依赖,安装方法```pip install --upgrade pymupdf```。", b=f"导入软件依赖失败。使用该模块需要额外依赖,安装方法```pip install --upgrade pymupdf```。",
) )
elif excption == "word_pip": elif exception == "word_pip":
report_exception( report_exception(
chatbot, chatbot,
history, history,

View File

@@ -1,6 +1,6 @@
from toolbox import CatchException, update_ui, ProxyNetworkActivate, update_ui_lastest_msg, get_log_folder, get_user from toolbox import CatchException, update_ui, ProxyNetworkActivate, update_ui_latest_msg, get_log_folder, get_user
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive, get_files_from_everything from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive, get_files_from_everything
from loguru import logger
install_msg =""" install_msg ="""
1. python -m pip install torch --index-url https://download.pytorch.org/whl/cpu 1. python -m pip install torch --index-url https://download.pytorch.org/whl/cpu
@@ -40,9 +40,9 @@ def 知识库文件注入(txt, llm_kwargs, plugin_kwargs, chatbot, history, syst
except Exception as e: except Exception as e:
chatbot.append(["依赖不足", f"{str(e)}\n\n导入依赖失败。请用以下命令安装" + install_msg]) chatbot.append(["依赖不足", f"{str(e)}\n\n导入依赖失败。请用以下命令安装" + install_msg])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
# from .crazy_utils import try_install_deps # from crazy_functions.crazy_utils import try_install_deps
# try_install_deps(['zh_langchain==0.2.1', 'pypinyin'], reload_m=['pypinyin', 'zh_langchain']) # try_install_deps(['zh_langchain==0.2.1', 'pypinyin'], reload_m=['pypinyin', 'zh_langchain'])
# yield from update_ui_lastest_msg("安装完成,您可以再次重试。", chatbot, history) # yield from update_ui_latest_msg("安装完成,您可以再次重试。", chatbot, history)
return return
# < --------------------读取文件--------------- > # < --------------------读取文件--------------- >
@@ -60,7 +60,7 @@ def 知识库文件注入(txt, llm_kwargs, plugin_kwargs, chatbot, history, syst
# < -------------------预热文本向量化模组--------------- > # < -------------------预热文本向量化模组--------------- >
chatbot.append(['<br/>'.join(file_manifest), "正在预热文本向量化模组, 如果是第一次运行, 将消耗较长时间下载中文向量化模型..."]) chatbot.append(['<br/>'.join(file_manifest), "正在预热文本向量化模组, 如果是第一次运行, 将消耗较长时间下载中文向量化模型..."])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
print('Checking Text2vec ...') logger.info('Checking Text2vec ...')
from langchain.embeddings.huggingface import HuggingFaceEmbeddings from langchain.embeddings.huggingface import HuggingFaceEmbeddings
with ProxyNetworkActivate('Download_LLM'): # 临时地激活代理网络 with ProxyNetworkActivate('Download_LLM'): # 临时地激活代理网络
HuggingFaceEmbeddings(model_name="GanymedeNil/text2vec-large-chinese") HuggingFaceEmbeddings(model_name="GanymedeNil/text2vec-large-chinese")
@@ -68,7 +68,7 @@ def 知识库文件注入(txt, llm_kwargs, plugin_kwargs, chatbot, history, syst
# < -------------------构建知识库--------------- > # < -------------------构建知识库--------------- >
chatbot.append(['<br/>'.join(file_manifest), "正在构建知识库..."]) chatbot.append(['<br/>'.join(file_manifest), "正在构建知识库..."])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
print('Establishing knowledge archive ...') logger.info('Establishing knowledge archive ...')
with ProxyNetworkActivate('Download_LLM'): # 临时地激活代理网络 with ProxyNetworkActivate('Download_LLM'): # 临时地激活代理网络
kai = knowledge_archive_interface() kai = knowledge_archive_interface()
vs_path = get_log_folder(user=get_user(chatbot), plugin_name='vec_store') vs_path = get_log_folder(user=get_user(chatbot), plugin_name='vec_store')
@@ -93,9 +93,9 @@ def 读取知识库作答(txt, llm_kwargs, plugin_kwargs, chatbot, history, syst
except Exception as e: except Exception as e:
chatbot.append(["依赖不足", f"{str(e)}\n\n导入依赖失败。请用以下命令安装" + install_msg]) chatbot.append(["依赖不足", f"{str(e)}\n\n导入依赖失败。请用以下命令安装" + install_msg])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
# from .crazy_utils import try_install_deps # from crazy_functions.crazy_utils import try_install_deps
# try_install_deps(['zh_langchain==0.2.1', 'pypinyin'], reload_m=['pypinyin', 'zh_langchain']) # try_install_deps(['zh_langchain==0.2.1', 'pypinyin'], reload_m=['pypinyin', 'zh_langchain'])
# yield from update_ui_lastest_msg("安装完成,您可以再次重试。", chatbot, history) # yield from update_ui_latest_msg("安装完成,您可以再次重试。", chatbot, history)
return return
# < ------------------- --------------- > # < ------------------- --------------- >

View File

@@ -1,5 +1,5 @@
from toolbox import CatchException, update_ui from toolbox import CatchException, update_ui
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive, input_clipping from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive, input_clipping
import requests import requests
from bs4 import BeautifulSoup from bs4 import BeautifulSoup
from request_llms.bridge_all import model_info from request_llms.bridge_all import model_info
@@ -23,8 +23,8 @@ def google(query, proxies):
item = {'title': title, 'link': link} item = {'title': title, 'link': link}
results.append(item) results.append(item)
for r in results: # for r in results:
print(r['link']) # print(r['link'])
return results return results
def scrape_text(url, proxies) -> str: def scrape_text(url, proxies) -> str:

View File

@@ -1,5 +1,5 @@
from toolbox import CatchException, update_ui from toolbox import CatchException, update_ui
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive, input_clipping from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive, input_clipping
import requests import requests
from bs4 import BeautifulSoup from bs4 import BeautifulSoup
from request_llms.bridge_all import model_info from request_llms.bridge_all import model_info
@@ -22,8 +22,8 @@ def bing_search(query, proxies=None):
item = {'title': title, 'link': link} item = {'title': title, 'link': link}
results.append(item) results.append(item)
for r in results: # for r in results:
print(r['link']) # print(r['link'])
return results return results

View File

@@ -47,7 +47,7 @@ explain_msg = """
from pydantic import BaseModel, Field from pydantic import BaseModel, Field
from typing import List from typing import List
from toolbox import CatchException, update_ui, is_the_upload_folder from toolbox import CatchException, update_ui, is_the_upload_folder
from toolbox import update_ui_lastest_msg, disable_auto_promotion from toolbox import update_ui_latest_msg, disable_auto_promotion
from request_llms.bridge_all import predict_no_ui_long_connection from request_llms.bridge_all import predict_no_ui_long_connection
from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from crazy_functions.crazy_utils import input_clipping from crazy_functions.crazy_utils import input_clipping
@@ -113,19 +113,19 @@ def 虚空终端(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt
# 用简单的关键词检测用户意图 # 用简单的关键词检测用户意图
is_certain, _ = analyze_intention_with_simple_rules(txt) is_certain, _ = analyze_intention_with_simple_rules(txt)
if is_the_upload_folder(txt): if is_the_upload_folder(txt):
state.set_state(chatbot=chatbot, key='has_provided_explaination', value=False) state.set_state(chatbot=chatbot, key='has_provided_explanation', value=False)
appendix_msg = "\n\n**很好,您已经上传了文件**,现在请您描述您的需求。" appendix_msg = "\n\n**很好,您已经上传了文件**,现在请您描述您的需求。"
if is_certain or (state.has_provided_explaination): if is_certain or (state.has_provided_explanation):
# 如果意图明确,跳过提示环节 # 如果意图明确,跳过提示环节
state.set_state(chatbot=chatbot, key='has_provided_explaination', value=True) state.set_state(chatbot=chatbot, key='has_provided_explanation', value=True)
state.unlock_plugin(chatbot=chatbot) state.unlock_plugin(chatbot=chatbot)
yield from update_ui(chatbot=chatbot, history=history) yield from update_ui(chatbot=chatbot, history=history)
yield from 虚空终端主路由(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request) yield from 虚空终端主路由(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request)
return return
else: else:
# 如果意图模糊,提示 # 如果意图模糊,提示
state.set_state(chatbot=chatbot, key='has_provided_explaination', value=True) state.set_state(chatbot=chatbot, key='has_provided_explanation', value=True)
state.lock_plugin(chatbot=chatbot) state.lock_plugin(chatbot=chatbot)
chatbot.append(("虚空终端状态:", explain_msg+appendix_msg)) chatbot.append(("虚空终端状态:", explain_msg+appendix_msg))
yield from update_ui(chatbot=chatbot, history=history) yield from update_ui(chatbot=chatbot, history=history)
@@ -141,7 +141,7 @@ def 虚空终端主路由(txt, llm_kwargs, plugin_kwargs, chatbot, history, syst
# ⭐ ⭐ ⭐ 分析用户意图 # ⭐ ⭐ ⭐ 分析用户意图
is_certain, user_intention = analyze_intention_with_simple_rules(txt) is_certain, user_intention = analyze_intention_with_simple_rules(txt)
if not is_certain: if not is_certain:
yield from update_ui_lastest_msg( yield from update_ui_latest_msg(
lastmsg=f"正在执行任务: {txt}\n\n分析用户意图中", chatbot=chatbot, history=history, delay=0) lastmsg=f"正在执行任务: {txt}\n\n分析用户意图中", chatbot=chatbot, history=history, delay=0)
gpt_json_io = GptJsonIO(UserIntention) gpt_json_io = GptJsonIO(UserIntention)
rf_req = "\nchoose from ['ModifyConfiguration', 'ExecutePlugin', 'Chat']" rf_req = "\nchoose from ['ModifyConfiguration', 'ExecutePlugin', 'Chat']"
@@ -154,13 +154,13 @@ def 虚空终端主路由(txt, llm_kwargs, plugin_kwargs, chatbot, history, syst
user_intention = gpt_json_io.generate_output_auto_repair(analyze_res, run_gpt_fn) user_intention = gpt_json_io.generate_output_auto_repair(analyze_res, run_gpt_fn)
lastmsg=f"正在执行任务: {txt}\n\n用户意图理解: 意图={explain_intention_to_user[user_intention.intention_type]}", lastmsg=f"正在执行任务: {txt}\n\n用户意图理解: 意图={explain_intention_to_user[user_intention.intention_type]}",
except JsonStringError as e: except JsonStringError as e:
yield from update_ui_lastest_msg( yield from update_ui_latest_msg(
lastmsg=f"正在执行任务: {txt}\n\n用户意图理解: 失败 当前语言模型({llm_kwargs['llm_model']})不能理解您的意图", chatbot=chatbot, history=history, delay=0) lastmsg=f"正在执行任务: {txt}\n\n用户意图理解: 失败 当前语言模型({llm_kwargs['llm_model']})不能理解您的意图", chatbot=chatbot, history=history, delay=0)
return return
else: else:
pass pass
yield from update_ui_lastest_msg( yield from update_ui_latest_msg(
lastmsg=f"正在执行任务: {txt}\n\n用户意图理解: 意图={explain_intention_to_user[user_intention.intention_type]}", lastmsg=f"正在执行任务: {txt}\n\n用户意图理解: 意图={explain_intention_to_user[user_intention.intention_type]}",
chatbot=chatbot, history=history, delay=0) chatbot=chatbot, history=history, delay=0)

View File

@@ -64,7 +64,7 @@ def parseNotebook(filename, enable_markdown=1):
def ipynb解释(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt): def ipynb解释(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt):
from .crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency from crazy_functions.crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency
if ("advanced_arg" in plugin_kwargs) and (plugin_kwargs["advanced_arg"] == ""): plugin_kwargs.pop("advanced_arg") if ("advanced_arg" in plugin_kwargs) and (plugin_kwargs["advanced_arg"] == ""): plugin_kwargs.pop("advanced_arg")
enable_markdown = plugin_kwargs.get("advanced_arg", "1") enable_markdown = plugin_kwargs.get("advanced_arg", "1")

View File

@@ -1,5 +1,5 @@
from toolbox import CatchException, update_ui, get_conf from toolbox import CatchException, update_ui, get_conf
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
import datetime import datetime
@CatchException @CatchException
def 同时问询(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request): def 同时问询(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):

View File

@@ -1,11 +1,13 @@
from toolbox import update_ui from toolbox import update_ui
from toolbox import CatchException, get_conf, markdown_convertion from toolbox import CatchException, get_conf, markdown_convertion
from request_llms.bridge_all import predict_no_ui_long_connection
from crazy_functions.crazy_utils import input_clipping from crazy_functions.crazy_utils import input_clipping
from crazy_functions.agent_fns.watchdog import WatchDog from crazy_functions.agent_fns.watchdog import WatchDog
from request_llms.bridge_all import predict_no_ui_long_connection from crazy_functions.live_audio.aliyunASR import AliyunASR
from loguru import logger
import threading, time import threading, time
import numpy as np import numpy as np
from .live_audio.aliyunASR import AliyunASR
import json import json
import re import re
@@ -40,11 +42,11 @@ class AsyncGptTask():
MAX_TOKEN_ALLO = 2560 MAX_TOKEN_ALLO = 2560
i_say, history = input_clipping(i_say, history, max_token_limit=MAX_TOKEN_ALLO) i_say, history = input_clipping(i_say, history, max_token_limit=MAX_TOKEN_ALLO)
gpt_say_partial = predict_no_ui_long_connection(inputs=i_say, llm_kwargs=llm_kwargs, history=history, sys_prompt=sys_prompt, gpt_say_partial = predict_no_ui_long_connection(inputs=i_say, llm_kwargs=llm_kwargs, history=history, sys_prompt=sys_prompt,
observe_window=observe_window[index], console_slience=True) observe_window=observe_window[index], console_silence=True)
except ConnectionAbortedError as token_exceed_err: except ConnectionAbortedError as token_exceed_err:
print('至少一个线程任务Token溢出而失败', e) logger.error('至少一个线程任务Token溢出而失败', e)
except Exception as e: except Exception as e:
print('至少一个线程任务意外失败', e) logger.error('至少一个线程任务意外失败', e)
def add_async_gpt_task(self, i_say, chatbot_index, llm_kwargs, history, system_prompt): def add_async_gpt_task(self, i_say, chatbot_index, llm_kwargs, history, system_prompt):
self.observe_future.append([""]) self.observe_future.append([""])

View File

@@ -1,12 +1,11 @@
from toolbox import update_ui from toolbox import update_ui
from toolbox import CatchException, report_exception from toolbox import CatchException, report_exception
from toolbox import write_history_to_file, promote_file_to_downloadzone from toolbox import write_history_to_file, promote_file_to_downloadzone
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
def 解析Paper(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt): def 解析Paper(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt):
import time, glob, os import time, glob, os
print('begin analysis on:', file_manifest)
for index, fp in enumerate(file_manifest): for index, fp in enumerate(file_manifest):
with open(fp, 'r', encoding='utf-8', errors='replace') as f: with open(fp, 'r', encoding='utf-8', errors='replace') as f:
file_content = f.read() file_content = f.read()

View File

@@ -1,6 +1,6 @@
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from toolbox import CatchException, report_exception, promote_file_to_downloadzone from toolbox import CatchException, report_exception, promote_file_to_downloadzone
from toolbox import update_ui, update_ui_lastest_msg, disable_auto_promotion, write_history_to_file from toolbox import update_ui, update_ui_latest_msg, disable_auto_promotion, write_history_to_file
import logging import logging
import requests import requests
import time import time
@@ -156,7 +156,7 @@ def 谷歌检索小助手(txt, llm_kwargs, plugin_kwargs, chatbot, history, syst
history = [] history = []
meta_paper_info_list = yield from get_meta_information(txt, chatbot, history) meta_paper_info_list = yield from get_meta_information(txt, chatbot, history)
if len(meta_paper_info_list) == 0: if len(meta_paper_info_list) == 0:
yield from update_ui_lastest_msg(lastmsg='获取文献失败可能触发了google反爬虫机制。',chatbot=chatbot, history=history, delay=0) yield from update_ui_latest_msg(lastmsg='获取文献失败可能触发了google反爬虫机制。',chatbot=chatbot, history=history, delay=0)
return return
batchsize = 5 batchsize = 5
for batch in range(math.ceil(len(meta_paper_info_list)/batchsize)): for batch in range(math.ceil(len(meta_paper_info_list)/batchsize)):

View File

@@ -180,6 +180,7 @@ version: '3'
services: services:
gpt_academic_with_latex: gpt_academic_with_latex:
image: ghcr.io/binary-husky/gpt_academic_with_latex:master # (Auto Built by Dockerfile: docs/GithubAction+NoLocal+Latex) image: ghcr.io/binary-husky/gpt_academic_with_latex:master # (Auto Built by Dockerfile: docs/GithubAction+NoLocal+Latex)
# 对于ARM64设备请将以上镜像名称替换为 ghcr.io/binary-husky/gpt_academic_with_latex_arm:master
environment: environment:
# 请查阅 `config.py` 以查看所有的配置信息 # 请查阅 `config.py` 以查看所有的配置信息
API_KEY: ' sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx ' API_KEY: ' sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx '

View File

@@ -1 +0,0 @@
# 此Dockerfile不再维护请前往docs/GithubAction+JittorLLMs

View File

@@ -5,6 +5,10 @@ FROM fuqingxu/11.3.1-runtime-ubuntu20.04-with-texlive:latest
# edge-tts需要的依赖某些pip包所需的依赖 # edge-tts需要的依赖某些pip包所需的依赖
RUN apt update && apt install ffmpeg build-essential -y RUN apt update && apt install ffmpeg build-essential -y
RUN apt-get install -y fontconfig
RUN ln -s /usr/local/texlive/2023/texmf-dist/fonts/truetype /usr/share/fonts/truetype/texlive
RUN fc-cache -fv
RUN apt-get clean
# use python3 as the system default python # use python3 as the system default python
WORKDIR /gpt WORKDIR /gpt
@@ -30,7 +34,7 @@ RUN python3 -m pip install -r request_llms/requirements_qwen.txt
RUN python3 -m pip install -r request_llms/requirements_chatglm.txt RUN python3 -m pip install -r request_llms/requirements_chatglm.txt
RUN python3 -m pip install -r request_llms/requirements_newbing.txt RUN python3 -m pip install -r request_llms/requirements_newbing.txt
RUN python3 -m pip install nougat-ocr RUN python3 -m pip install nougat-ocr
RUN python3 -m pip cache purge
# 预热Tiktoken模块 # 预热Tiktoken模块
RUN python3 -c 'from check_proxy import warm_up_modules; warm_up_modules()' RUN python3 -c 'from check_proxy import warm_up_modules; warm_up_modules()'

View File

@@ -1,57 +0,0 @@
# docker build -t gpt-academic-all-capacity -f docs/GithubAction+AllCapacity --network=host --build-arg http_proxy=http://localhost:10881 --build-arg https_proxy=http://localhost:10881 .
# docker build -t gpt-academic-all-capacity -f docs/GithubAction+AllCapacityBeta --network=host .
# docker run -it --net=host gpt-academic-all-capacity bash
# 从NVIDIA源从而支持显卡检查宿主的nvidia-smi中的cuda版本必须>=11.3
FROM fuqingxu/11.3.1-runtime-ubuntu20.04-with-texlive:latest
# edge-tts需要的依赖某些pip包所需的依赖
RUN apt update && apt install ffmpeg build-essential -y
# use python3 as the system default python
WORKDIR /gpt
RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.8
# # 非必要步骤更换pip源 (以下三行,可以删除)
# RUN echo '[global]' > /etc/pip.conf && \
# echo 'index-url = https://mirrors.aliyun.com/pypi/simple/' >> /etc/pip.conf && \
# echo 'trusted-host = mirrors.aliyun.com' >> /etc/pip.conf
# 下载pytorch
RUN python3 -m pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu113
# 准备pip依赖
RUN python3 -m pip install openai numpy arxiv rich
RUN python3 -m pip install colorama Markdown pygments pymupdf
RUN python3 -m pip install python-docx moviepy pdfminer
RUN python3 -m pip install zh_langchain==0.2.1 pypinyin
RUN python3 -m pip install rarfile py7zr
RUN python3 -m pip install aliyun-python-sdk-core==2.13.3 pyOpenSSL webrtcvad scipy git+https://github.com/aliyun/alibabacloud-nls-python-sdk.git
# 下载分支
WORKDIR /gpt
RUN git clone --depth=1 https://github.com/binary-husky/gpt_academic.git
WORKDIR /gpt/gpt_academic
RUN git clone --depth=1 https://github.com/OpenLMLab/MOSS.git request_llms/moss
RUN python3 -m pip install -r requirements.txt
RUN python3 -m pip install -r request_llms/requirements_moss.txt
RUN python3 -m pip install -r request_llms/requirements_qwen.txt
RUN python3 -m pip install -r request_llms/requirements_chatglm.txt
RUN python3 -m pip install -r request_llms/requirements_newbing.txt
RUN python3 -m pip install nougat-ocr
# 预热Tiktoken模块
RUN python3 -c 'from check_proxy import warm_up_modules; warm_up_modules()'
# 安装知识库插件的额外依赖
RUN apt-get update && apt-get install libgl1 -y
RUN pip3 install transformers protobuf langchain sentence-transformers faiss-cpu nltk beautifulsoup4 bitsandbytes tabulate icetk --upgrade
RUN pip3 install unstructured[all-docs] --upgrade
RUN python3 -c 'from check_proxy import warm_up_vectordb; warm_up_vectordb()'
RUN rm -rf /usr/local/lib/python3.8/dist-packages/tests
# COPY .cache /root/.cache
# COPY config_private.py config_private.py
# 启动
CMD ["python3", "-u", "main.py"]

View File

@@ -7,6 +7,7 @@ RUN apt-get install -y git python python3 python-dev python3-dev --fix-missing
# edge-tts需要的依赖某些pip包所需的依赖 # edge-tts需要的依赖某些pip包所需的依赖
RUN apt update && apt install ffmpeg build-essential -y RUN apt update && apt install ffmpeg build-essential -y
RUN apt-get clean
# use python3 as the system default python # use python3 as the system default python
RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.8 RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.8
@@ -22,6 +23,7 @@ RUN python3 -m pip install -r request_llms/requirements_moss.txt
RUN python3 -m pip install -r request_llms/requirements_qwen.txt RUN python3 -m pip install -r request_llms/requirements_qwen.txt
RUN python3 -m pip install -r request_llms/requirements_chatglm.txt RUN python3 -m pip install -r request_llms/requirements_chatglm.txt
RUN python3 -m pip install -r request_llms/requirements_newbing.txt RUN python3 -m pip install -r request_llms/requirements_newbing.txt
RUN python3 -m pip cache purge
# 预热Tiktoken模块 # 预热Tiktoken模块

View File

@@ -18,5 +18,7 @@ RUN apt update && apt install ffmpeg -y
# 可选步骤,用于预热模块 # 可选步骤,用于预热模块
RUN python3 -c 'from check_proxy import warm_up_modules; warm_up_modules()' RUN python3 -c 'from check_proxy import warm_up_modules; warm_up_modules()'
RUN python3 -m pip cache purge && apt-get clean
# 启动 # 启动
CMD ["python3", "-u", "main.py"] CMD ["python3", "-u", "main.py"]

View File

@@ -1,35 +1,36 @@
# 此Dockerfile适用于无本地模型的环境构建如果需要使用chatglm等本地模型请参考 docs/Dockerfile+ChatGLM # 此Dockerfile适用于"无本地模型"的环境构建如果需要使用chatglm等本地模型请参考 docs/Dockerfile+ChatGLM
# - 1 修改 `config.py` # - 1 修改 `config.py`
# - 2 构建 docker build -t gpt-academic-nolocal-latex -f docs/GithubAction+NoLocal+Latex . # - 2 构建 docker build -t gpt-academic-nolocal-latex -f docs/GithubAction+NoLocal+Latex .
# - 3 运行 docker run -v /home/fuqingxu/arxiv_cache:/root/arxiv_cache --rm -it --net=host gpt-academic-nolocal-latex # - 3 运行 docker run -v /home/fuqingxu/arxiv_cache:/root/arxiv_cache --rm -it --net=host gpt-academic-nolocal-latex
FROM fuqingxu/python311_texlive_ctex:latest FROM menghuan1918/ubuntu_uv_ctex:latest
ENV PATH "$PATH:/usr/local/texlive/2022/bin/x86_64-linux" ENV DEBIAN_FRONTEND=noninteractive
ENV PATH "$PATH:/usr/local/texlive/2023/bin/x86_64-linux" SHELL ["/bin/bash", "-c"]
ENV PATH "$PATH:/usr/local/texlive/2024/bin/x86_64-linux"
ENV PATH "$PATH:/usr/local/texlive/2025/bin/x86_64-linux"
ENV PATH "$PATH:/usr/local/texlive/2026/bin/x86_64-linux"
# 指定路径
WORKDIR /gpt WORKDIR /gpt
RUN pip3 install openai numpy arxiv rich # 先复制依赖文件
RUN pip3 install colorama Markdown pygments pymupdf COPY requirements.txt .
RUN pip3 install python-docx pdfminer
RUN pip3 install nougat-ocr
# 装载项目文件
COPY . .
# 安装依赖 # 安装依赖
RUN pip3 install -r requirements.txt RUN pip install --break-system-packages openai numpy arxiv rich colorama Markdown pygments pymupdf python-docx pdfminer \
&& pip install --break-system-packages -r requirements.txt \
&& if [ "$(uname -m)" = "x86_64" ]; then \
pip install --break-system-packages nougat-ocr; \
fi \
&& pip cache purge \
&& rm -rf /root/.cache/pip/*
# edge-tts需要的依赖 # 创建非root用户
RUN apt update && apt install ffmpeg -y RUN useradd -m gptuser && chown -R gptuser /gpt
USER gptuser
# 最后才复制代码文件,这样代码更新时只需重建最后几层可以大幅减少docker pull所需的大小
COPY --chown=gptuser:gptuser . .
# 可选步骤,用于预热模块 # 可选步骤,用于预热模块
RUN python3 -c 'from check_proxy import warm_up_modules; warm_up_modules()' RUN python3 -c 'from check_proxy import warm_up_modules; warm_up_modules()'
RUN python3 -m pip cache purge
# 启动 # 启动
CMD ["python3", "-u", "main.py"] CMD ["python3", "-u", "main.py"]

View File

@@ -24,6 +24,8 @@ RUN apt update && apt install ffmpeg -y
# 可选步骤,用于预热模块 # 可选步骤,用于预热模块
RUN python3 -c 'from check_proxy import warm_up_modules; warm_up_modules()' RUN python3 -c 'from check_proxy import warm_up_modules; warm_up_modules()'
RUN python3 -m pip cache purge && apt-get clean
# 启动 # 启动
CMD ["python3", "-u", "main.py"] CMD ["python3", "-u", "main.py"]

26
docs/WindowsRun.bat Normal file
View File

@@ -0,0 +1,26 @@
@echo off
setlocal
:: 设置环境变量
set ENV_NAME=gpt
set ENV_PATH=%~dp0%ENV_NAME%
set SCRIPT_PATH=%~dp0main.py
:: 判断环境是否已解压
if not exist "%ENV_PATH%" (
echo Extracting environment...
mkdir "%ENV_PATH%"
tar -xzf gpt.tar.gz -C "%ENV_PATH%"
:: 运行conda环境激活脚本
call "%ENV_PATH%\Scripts\activate.bat"
) else (
:: 如果环境已存在,直接激活
call "%ENV_PATH%\Scripts\activate.bat"
)
echo Start to run program:
:: 运行Python脚本
python "%SCRIPT_PATH%"
endlocal
pause

View File

@@ -4,7 +4,7 @@ We currently support fastapi in order to solve sub-path deploy issue.
1. change CUSTOM_PATH setting in `config.py` 1. change CUSTOM_PATH setting in `config.py`
``` sh ```sh
nano config.py nano config.py
``` ```
@@ -35,9 +35,8 @@ if __name__ == "__main__":
main() main()
``` ```
3. Go! 3. Go!
``` sh ```sh
python main.py python main.py
``` ```

File diff suppressed because it is too large Load Diff

Some files were not shown because too many files have changed in this diff Show More