LangChain之Output parsers

时间:2024-10-28 07:06:45

LangChainOutput parsers

Output parsers将LLM输出的文本,转换为structured data

CommaSeparatedListOutputParser

解析结果为List,提示词如下:

def get_format_instructions(self) -> str:
    return (
        "Your response should be a list of comma separated values, "
        "eg: `foo, bar, baz`"
    )
  • 1
  • 2
  • 3
  • 4
  • 5
'
运行

解析方法如下:

def parse(self, text: str) -> List[str]:
    """Parse the output of an LLM call."""
    return text.strip().split(", ")
  • 1
  • 2
  • 3

DatetimeOutputParser

解析结果为日期时间

output_parser = DatetimeOutputParser()
print(output_parser.get_format_instructions())
  • 1
  • 2

上面代码输出的提示词

Write a datetime string that matches the 
            following pattern: "%Y-%m-%dT%H:%M:%S.%fZ". Examples: 477-06-08T11:31:35.756750Z, 245-12-26T14:36:39.117625Z, 711-05-08T07:41:23.815247Z
  • 1
  • 2

EnumOutputParser

解析结果为枚举类型,且枚举类型只支持str

提示词如下:

def get_format_instructions(self) -> str:
    return f"Select one of the following options: {', '.join(self._valid_values)}"
  • 1
  • 2
'
运行

示例代码

class Colors(Enum):
    RED = "red"
    GREEN = "green"
    BLUE = "blue"

parser = EnumOutputParser(enum=Colors)
print(parser.get_format_instructions())
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

输出结果:

Select one of the following options: red, green, blue
  • 1

PydanticOutputParser

解析结果为Json结构

如何使用:

from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field, validator
from typing import List

class Actor(BaseModel):
    name: str = Field(description="name of an actor")
    film_names: List[str] = Field(description="list of names of films they starred in")

# 提问问题
actor_query = "Generate the filmography for a random actor."
# 构造一个解析器,输出的json结构按照Actor定义
parser = PydanticOutputParser(pydantic_object=Actor)

prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

_input = prompt.format_prompt(query=actor_query)

# 输出prompt最终格式化结果
print(_input.to_string())
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23

输出结果

Answer the user query.
The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"name": {"description": "name of an actor", "title": "Name", "type": "string"}, "film_names": {"description": "list of names of films they starred in", "items": {"type": "string"}, "title": "Film Names", "type": "array"}}, "required": ["name", "film_names"]}
```
Generate the filmography for a random actor.
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

其中除了第一行和最后一行内容,中间部分为parser.get_format_instructions()执行结果

PydanticOutputParser使用的提示词模板如下,只需要将最后 {schema} 根据给定的结构体填充即可

PYDANTIC_FORMAT_INSTRUCTIONS = """The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {{"properties": {{"foo": {{"title": "Foo", "description": "a list of strings", "type": "array", "items": {{"type": "string"}}}}}}, "required": ["foo"]}}
the object {{"foo": ["bar", "baz"]}} is a well-formatted instance of the schema. The object {{"properties": {{"foo": ["bar", "baz"]}}}} is not well-formatted.

Here is the output schema:
```
{schema}
```
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9

OutputFixingParser

上面解析json的例子中如果LLM输出错误,需要重新通过LLM修正错误,具体示例如下

class Actor(BaseModel):
    name: str = Field(description="name of an actor")
    film_names: List[str] = Field(description="list of names of films they starred in")
        
actor_query = "Generate the filmography for a random actor."

parser = PydanticOutputParser(pydantic_object=Actor)
# 错误的地方在于,这里json字符串内部应该使用双引号,而不是单引号。。
misformatted = "{'name': 'Tom Hanks', 'film_names': ['Forrest Gump']}"

parser.parse(misformatted)

# 这里输出会报错
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13

使用OutputFixingParser修正错误

import os
from dotenv import load_dotenv

# 加载.env文件中的环境变量,包括OPENAI_API_KEY等
load_dotenv()

# 通过OutputFixingParser纠正json格式错误
from langchain.output_parsers import OutputFixingParser
from langchain.chat_models import ChatOpenAI
new_parser = OutputFixingParser.from_llm(parser=parser, llm=ChatOpenAI())

new_parser.parse(misformatted)
# 最后能够正确解析为Json格式

# 输出如下
# Actor(name='Tom Hanks', film_names=['Forrest Gump'])
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16

OutputFixingParser实现原理:

只使用PydanticOutputParser时,逻辑如下

  1. 构建prompts
  2. 调用LLM获取输出结果
  3. 将输出结果解析为对应类

使用OutputFixingParser后,逻辑变为

  1. 构建prompts
  2. 调用LLM获取输出结果
  3. 将输出结果解析为对应类,但是解析失败
  4. 构建修正的prompts + 解析错误时的错误信息,重新调用LLM,获取结果
  5. 将输出结果解析为对应类

上面加粗部分为新增的处理逻辑

其中第4步中OutputFixingParser构建的prompts如下

NAIVE_FIX = """Instructions:
--------------
{instructions}
--------------
Completion:
--------------
{completion}
--------------

Above, the Completion did not satisfy the constraints given in the Instructions.
Error:
--------------
{error}
--------------

Please try again. Please only respond with an answer that satisfies the constraints laid out in the Instructions:"""

NAIVE_FIX_PROMPT = PromptTemplate.from_template(NAIVE_FIX)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18

解析代码

def parse(self, completion: str) -> T:
    try:
        parsed_completion = self.parser.parse(completion)
    except OutputParserException as e:
        new_completion = self.retry_chain.run(
            instructions=self.parser.get_format_instructions(),
            completion=completion,
            error=repr(e),
        )
        parsed_completion = self.parser.parse(new_completion)

    return parsed_completion
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12

使用LLMChain

转载请注明出处