LangChain之Output parsers
Output parsers将LLM输出的文本,转换为structured data
CommaSeparatedListOutputParser
解析结果为List,提示词如下:
def get_format_instructions(self) -> str:
return (
"Your response should be a list of comma separated values, "
"eg: `foo, bar, baz`"
)
- 1
- 2
- 3
- 4
- 5
运行
解析方法如下:
def parse(self, text: str) -> List[str]:
"""Parse the output of an LLM call."""
return text.strip().split(", ")
- 1
- 2
- 3
DatetimeOutputParser
解析结果为日期时间
output_parser = DatetimeOutputParser()
print(output_parser.get_format_instructions())
- 1
- 2
上面代码输出的提示词
Write a datetime string that matches the
following pattern: "%Y-%m-%dT%H:%M:%S.%fZ". Examples: 477-06-08T11:31:35.756750Z, 245-12-26T14:36:39.117625Z, 711-05-08T07:41:23.815247Z
- 1
- 2
EnumOutputParser
解析结果为枚举类型,且枚举类型只支持str
提示词如下:
def get_format_instructions(self) -> str:
return f"Select one of the following options: {', '.join(self._valid_values)}"
- 1
- 2
运行
示例代码:
class Colors(Enum):
RED = "red"
GREEN = "green"
BLUE = "blue"
parser = EnumOutputParser(enum=Colors)
print(parser.get_format_instructions())
- 1
- 2
- 3
- 4
- 5
- 6
- 7
输出结果:
Select one of the following options: red, green, blue
- 1
PydanticOutputParser
解析结果为Json结构
如何使用:
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field, validator
from typing import List
class Actor(BaseModel):
name: str = Field(description="name of an actor")
film_names: List[str] = Field(description="list of names of films they starred in")
# 提问问题
actor_query = "Generate the filmography for a random actor."
# 构造一个解析器,输出的json结构按照Actor定义
parser = PydanticOutputParser(pydantic_object=Actor)
prompt = PromptTemplate(
template="Answer the user query.\n{format_instructions}\n{query}\n",
input_variables=["query"],
partial_variables={"format_instructions": parser.get_format_instructions()},
)
_input = prompt.format_prompt(query=actor_query)
# 输出prompt最终格式化结果
print(_input.to_string())
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
输出结果
Answer the user query.
The output should be formatted as a JSON instance that conforms to the JSON schema below.
As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.
Here is the output schema:
```
{"properties": {"name": {"description": "name of an actor", "title": "Name", "type": "string"}, "film_names": {"description": "list of names of films they starred in", "items": {"type": "string"}, "title": "Film Names", "type": "array"}}, "required": ["name", "film_names"]}
```
Generate the filmography for a random actor.
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
其中除了第一行和最后一行内容,中间部分为parser.get_format_instructions()
执行结果
PydanticOutputParser使用的提示词模板如下,只需要将最后 {schema}
根据给定的结构体填充即可
PYDANTIC_FORMAT_INSTRUCTIONS = """The output should be formatted as a JSON instance that conforms to the JSON schema below.
As an example, for the schema {{"properties": {{"foo": {{"title": "Foo", "description": "a list of strings", "type": "array", "items": {{"type": "string"}}}}}}, "required": ["foo"]}}
the object {{"foo": ["bar", "baz"]}} is a well-formatted instance of the schema. The object {{"properties": {{"foo": ["bar", "baz"]}}}} is not well-formatted.
Here is the output schema:
```
{schema}
```
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
OutputFixingParser
上面解析json的例子中如果LLM输出错误,需要重新通过LLM修正错误,具体示例如下
class Actor(BaseModel):
name: str = Field(description="name of an actor")
film_names: List[str] = Field(description="list of names of films they starred in")
actor_query = "Generate the filmography for a random actor."
parser = PydanticOutputParser(pydantic_object=Actor)
# 错误的地方在于,这里json字符串内部应该使用双引号,而不是单引号。。
misformatted = "{'name': 'Tom Hanks', 'film_names': ['Forrest Gump']}"
parser.parse(misformatted)
# 这里输出会报错
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
使用OutputFixingParser修正错误
import os
from dotenv import load_dotenv
# 加载.env文件中的环境变量,包括OPENAI_API_KEY等
load_dotenv()
# 通过OutputFixingParser纠正json格式错误
from langchain.output_parsers import OutputFixingParser
from langchain.chat_models import ChatOpenAI
new_parser = OutputFixingParser.from_llm(parser=parser, llm=ChatOpenAI())
new_parser.parse(misformatted)
# 最后能够正确解析为Json格式
# 输出如下
# Actor(name='Tom Hanks', film_names=['Forrest Gump'])
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
OutputFixingParser实现原理:
只使用PydanticOutputParser时,逻辑如下
- 构建prompts
- 调用LLM获取输出结果
- 将输出结果解析为对应类
使用OutputFixingParser后,逻辑变为
- 构建prompts
- 调用LLM获取输出结果
- 将输出结果解析为对应类,但是解析失败
- 构建修正的prompts + 解析错误时的错误信息,重新调用LLM,获取结果
- 将输出结果解析为对应类
上面加粗部分为新增的处理逻辑
其中第4步中OutputFixingParser构建的prompts如下
NAIVE_FIX = """Instructions:
--------------
{instructions}
--------------
Completion:
--------------
{completion}
--------------
Above, the Completion did not satisfy the constraints given in the Instructions.
Error:
--------------
{error}
--------------
Please try again. Please only respond with an answer that satisfies the constraints laid out in the Instructions:"""
NAIVE_FIX_PROMPT = PromptTemplate.from_template(NAIVE_FIX)
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
解析代码
def parse(self, completion: str) -> T:
try:
parsed_completion = self.parser.parse(completion)
except OutputParserException as e:
new_completion = self.retry_chain.run(
instructions=self.parser.get_format_instructions(),
completion=completion,
error=repr(e),
)
parsed_completion = self.parser.parse(new_completion)
return parsed_completion
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
使用LLMChain
转载请注明出处