Files
gpt_academic/crazy_functions/review_fns/prompts/semantic_prompts.py
binary-husky 8042750d41 Master 4.0 (#2210)
* stage academic conversation

* stage document conversation

* fix buggy gradio version

* file dynamic load

* merge more academic plugins

* accelerate nltk

* feat: 为predict函数添加文件和URL读取功能
- 添加URL检测和网页内容提取功能,支持自动提取网页文本
- 添加文件路径识别和文件内容读取功能,支持private_upload路径格式
- 集成WebTextExtractor处理网页内容提取
- 集成TextContentLoader处理本地文件读取
- 支持文件路径与问题组合的智能处理

* back

* block unstable

---------

Co-authored-by: XiaoBoAI <liuboyin2019@ia.ac.cn>
2025-08-23 15:59:22 +08:00

277 lines
9.1 KiB
Python

# Search type prompt
SEMANTIC_TYPE_PROMPT = """Determine the most appropriate search type for Semantic Scholar.
Query: {query}
Task: Analyze the research query and select the most appropriate search type for Semantic Scholar API.
Available search types:
1. paper: General paper search
- Use for broad topic searches
- Looking for specific papers
- Keyword-based searches
Example: "transformer models in NLP"
2. author: Author-based search
- Finding works by specific researchers
- Author profile analysis
Example: "papers by Yoshua Bengio"
3. paper_details: Specific paper lookup
- Getting details about a known paper
- Finding specific versions or citations
Example: "Attention is All You Need paper details"
4. citations: Citation analysis
- Finding papers that cite a specific work
- Impact analysis
Example: "papers citing BERT"
5. references: Reference analysis
- Finding papers cited by a specific work
- Background research
Example: "references in GPT-3 paper"
6. recommendations: Paper recommendations
- Finding similar papers
- Research direction exploration
Example: "papers similar to Transformer"
Examples:
1. Query: "Latest papers about deep learning"
<search_type>paper</search_type>
2. Query: "Works by Geoffrey Hinton since 2020"
<search_type>author</search_type>
3. Query: "Papers citing the original Transformer paper"
<search_type>citations</search_type>
Please analyze the query and respond ONLY with XML tags:
<search_type>Choose the most appropriate search type from the list above</search_type>"""
# Query optimization prompt
SEMANTIC_QUERY_PROMPT = """Optimize the following query for Semantic Scholar search.
Query: {query}
Task: Transform the natural language query into an optimized search query for maximum relevance.
Always generate English search terms regardless of the input language.
IMPORTANT: Ignore any requirements about journal ranking (CAS, JCR, IF index),
or output format requirements. Focus only on the core research topic for the search query.
Query optimization guidelines:
1. Use quotes for exact phrases
- Ensures exact matching
- Reduces irrelevant results
Example: "\"attention mechanism\"" vs attention mechanism
2. Include key technical terms
- Use specific technical terminology
- Include common variations
Example: "transformer architecture" neural networks
3. Author names (if relevant)
- Include full names when known
- Consider common name variations
Example: "Geoffrey Hinton" OR "G. E. Hinton"
Examples:
1. Natural query: "Recent advances in transformer models"
<query>"transformer model" "neural architecture" deep learning</query>
2. Natural query: "BERT applications in text classification"
<query>"BERT" "text classification" "language model" application</query>
3. Natural query: "Deep learning for computer vision by Kaiming He"
<query>"deep learning" "computer vision" author:"Kaiming He"</query>
Please analyze the query and respond ONLY with XML tags:
<query>Provide the optimized search query</query>
Note:
- Balance between specificity and coverage
- Include important technical terms
- Use quotes for key phrases
- Consider synonyms and related terms"""
# Fields selection prompt
SEMANTIC_FIELDS_PROMPT = """Select relevant fields to retrieve from Semantic Scholar.
Query: {query}
Task: Determine which paper fields should be retrieved based on the research needs.
Available fields:
Core fields:
- title: Paper title (always included)
- abstract: Full paper abstract
- authors: Author information
- year: Publication year
- venue: Publication venue
Citation fields:
- citations: Papers citing this work
- references: Papers cited by this work
Additional fields:
- embedding: Paper vector embedding
- tldr: AI-generated summary
- venue: Publication venue/journal
- url: Paper URL
Examples:
1. Query: "Latest developments in NLP"
<fields>title, abstract, authors, year, venue, citations</fields>
2. Query: "Most influential papers in deep learning"
<fields>title, abstract, authors, year, citations, references</fields>
3. Query: "Survey of transformer architectures"
<fields>title, abstract, authors, year, tldr, references</fields>
Please analyze the query and respond ONLY with XML tags:
<fields>List relevant fields, comma-separated</fields>
Note:
- Choose fields based on the query's purpose
- Include citation data for impact analysis
- Consider tldr for quick paper screening
- Balance completeness with API efficiency"""
# Sort parameters prompt
SEMANTIC_SORT_PROMPT = """Determine optimal sorting parameters for the query.
Query: {query}
Task: Select the most appropriate sorting method and result limit for the search.
Always generate English search terms regardless of the input language.
Sorting options:
1. relevance (default)
- Best match to query terms
- Recommended for specific technical searches
Example: "specific algorithm implementations"
2. citations
- Sort by citation count
- Best for finding influential papers
Example: "most important papers in deep learning"
3. year
- Sort by publication date
- Best for following recent developments
Example: "latest advances in NLP"
Examples:
1. Query: "Recent breakthroughs in AI"
<sort_by>year</sort_by>
<limit>30</limit>
2. Query: "Most influential papers about GANs"
<sort_by>citations</sort_by>
<limit>20</limit>
3. Query: "Specific papers about BERT fine-tuning"
<sort_by>relevance</sort_by>
<limit>25</limit>
Please analyze the query and respond ONLY with XML tags:
<sort_by>Choose: relevance, citations, or year</sort_by>
<limit>Suggest number between 10-50</limit>
Note:
- Consider the query's temporal aspects
- Balance between comprehensive coverage and information overload
- Use citation sorting for impact analysis
- Use year sorting for tracking developments"""
# System prompts for each task
SEMANTIC_TYPE_SYSTEM_PROMPT = """You are an expert at analyzing academic queries.
Your task is to determine the most appropriate type of search on Semantic Scholar.
Consider the query's intent, scope, and specific research needs.
Always respond in English regardless of the input language."""
SEMANTIC_QUERY_SYSTEM_PROMPT = """You are an expert at crafting Semantic Scholar search queries.
Your task is to optimize natural language queries for maximum relevance.
Focus on creating precise queries that leverage the platform's search capabilities.
Always generate English search terms regardless of the input language."""
SEMANTIC_FIELDS_SYSTEM_PROMPT = """You are an expert at Semantic Scholar data fields.
Your task is to select the most relevant fields based on the research context.
Consider both essential and supplementary information needs.
Always respond in English regardless of the input language."""
SEMANTIC_SORT_SYSTEM_PROMPT = """You are an expert at optimizing search results.
Your task is to determine the best sorting parameters based on the query context.
Consider the balance between relevance, impact, and recency.
Always respond in English regardless of the input language."""
# 添加新的综合搜索提示词
SEMANTIC_SEARCH_PROMPT = """Analyze and optimize the research query for Semantic Scholar search.
Query: {query}
Task: Transform the natural language query into optimized search criteria for Semantic Scholar.
IMPORTANT: Ignore any requirements about journal ranking (CAS, JCR, IF index),
or output format requirements when generating the search terms. These requirements
should be considered only for post-search filtering, not as part of the core query.
Available search options:
1. Paper search:
- Title and abstract search
- Author search
- Field-specific search
Example: "transformer architecture neural networks"
2. Field tags:
- title: Search in title
- abstract: Search in abstract
- authors: Search by author names
- venue: Search by publication venue
Example: "title:transformer authors:\"Vaswani\""
3. Advanced options:
- Year range filtering
- Citation count filtering
- Venue filtering
Example: "deep learning year>2020 venue:\"NeurIPS\""
Examples:
1. Query: "Recent transformer papers by Vaswani with high impact"
<search_criteria>
<query>title:transformer authors:"Vaswani" year>2017</query>
<search_type>paper</search_type>
<fields>title,abstract,authors,year,citations,venue</fields>
<sort_by>citations</sort_by>
<limit>30</limit>
</search_criteria>
2. Query: "Most cited papers about BERT in top conferences"
<search_criteria>
<query>title:BERT venue:"ACL|EMNLP|NAACL"</query>
<search_type>paper</search_type>
<fields>title,abstract,authors,year,citations,venue,references</fields>
<sort_by>citations</sort_by>
<limit>25</limit>
</search_criteria>
Please analyze the query and respond with XML tags containing complete search criteria."""
SEMANTIC_SEARCH_SYSTEM_PROMPT = """You are an expert at crafting Semantic Scholar search queries.
Your task is to analyze research queries and transform them into optimized search criteria.
Consider query intent, field relevance, and citation impact when creating the search parameters.
Focus on producing precise and comprehensive search criteria that will yield the most relevant results.
Always generate English search terms and respond in English regardless of the input language."""