本地大模型编程实战(04)给文本自动打标签

使用本地大模型可以根据需要给文本打标签，本文介绍了如何基于 langchain 和本地部署的大模型给文本打标签。

本文使用 llama3.1 作为本地大模型，它的性能比非开源大模型要查一下，不过在我们可以调整提示词后，它也基本能达到要求。

准备

在正式开始撸代码之前，需要准备一下编程环境。

计算机
本文涉及的所有代码可以在没有显存的环境中执行。我使用的机器配置为：
- CPU: Intel i5-8400 2.80GHz
- 内存: 16GB
Visual Studio Code 和 venv 这是很受欢迎的开发工具，相关文章的代码可以在 Visual Studio Code 中开发和调试。我们用 python 的 venv 创建虚拟环境, 详见：
在Visual Studio Code中配置venv。
Ollama 在 Ollama 平台上部署本地大模型非常方便，基于此平台，我们可以让 langchain 使用 llama3.1、qwen2.5 等各种本地大模型。详见：
在langchian中使用本地部署的llama3.1大模型。

实例化本地大模型

from langchain_ollama import ChatOllama
llm = ChatOllama(model="llama3.1",temperature=0.2,verbose=True)

情感分析

下面的代码定义了一个类 Classification 用来限定大模型对文本打标签后的格式。大模型需要给文本打如下三个标签：

sentiment/情绪： positive/积极的，negative/消极的
aggressiveness/攻击性：以1-10代表
language/语言: 文本的语言

def simple_control(s):

    tagging_prompt = ChatPromptTemplate.from_template(
        """
    Extract the desired information from the following passage.

    Only extract the properties mentioned in the 'Classification' function.

    Passage:
    {input}
    """
    )

    # 指定 Pydantic 模型控制返回内容格式
    class Classification(BaseModel):
        sentiment: str = Field(description="The sentiment of the text")
        aggressiveness: int = Field(
            description="How aggressive the text is on a scale from 1 to 10"
        )
        language: str = Field(description="The language the text is written in")


    llm_structured = llm.with_structured_output(Classification)
    prompt = tagging_prompt.invoke({"input": s})
    response = llm_structured.invoke(prompt)

    return response.model_dump()

我们测试一下：

s = "I'm incredibly glad I met you! I think we'll be great friends!"
result = simple_control(s)
print(f'result:\n{result}')

{'sentiment': 'positive', 'aggressiveness': 1, 'language': 'English'}

s = "Estoy muy enojado con vos! Te voy a dar tu merecido!"
result = simple_control(s)
print(f'result:\n{result}')

{'sentiment': 'negative', 'aggressiveness': 10, 'language': 'Spanish'}

更精细的控制

下面我们尝试对打标签的结果进行更加精细的控制：

sentiment/情绪： happy,neutral,sad 中的一种
aggressiveness/攻击性：以1-10代表
language/语言: English,Spanish,Chinese 中的一种

提示词不需要做改变，我们只是修改了 Classification 。

def finer_control(s):    
    """
    官网使用OpenAI，我们使用的是本地大模型。
    直接用官网的代码效果不好：sentiment无法按预期标记出happy,neutral,sad，依然只能标记出：positive、negative；aggressiveness的值一直为0。
    """

    # 指定 Pydantic 模型控制返回内容格式
    class Classification(BaseModel):
        sentiment: str = Field(description="The sentiment of the text,it must be one of happy,neutral,sad")
        aggressiveness: int = Field(description="The aggressive of the text,it must be one of 1,2,3,4,5,6,7,8,9,10,the higher the number the more aggressive")
        language: str = Field(description="The language the text is written in,it must be one of English,Spanish,Chinese")


    tagging_prompt = ChatPromptTemplate.from_template(
        """
        Extract the desired information from the following passage.

        Only extract the properties mentioned in the 'Classification' function.

        Passage:
        {input}
        """
    )


    llm_structured = llm.with_structured_output(Classification)

    prompt = tagging_prompt.invoke({"input": s})
    response = llm_structured.invoke(prompt)
    return response.model_dump()

我们来测试一下：

s = "I'm incredibly glad I met you! I think we'll be great friends!"
result = finer_control(s)
print(f'finer_control result:\n{result}')

{'sentiment': 'happy', 'aggressiveness': 1, 'language': 'English'}

s = "Weather is ok here, I can go outside without much more than a coat"
result = finer_control(s)
print(f'finer_control result:\n{result}')

{'sentiment': 'neutral', 'aggressiveness': 5, 'language': 'English'}

s="今天的天气糟透了，我什么都不想干！"
result = finer_control(s)
print(f'finer_control result:\n{result}')

{'sentiment': 'sad', 'aggressiveness': 10, 'language': 'Chinese'}

总结

我们可以看到，使用本地部署的 llama3.1 给文本打标签的能力还可以，我想这种本地部署方案可以解决一般的情感分析等给文本打标签的任务。

代码

本文涉及的所有代码以及相关资源都已经共享，参见：

github
gitee

参考:

Classify Text into Labels

🪐祝好运🪐

准备#

实例化本地大模型#

情感分析#

更精细的控制#

总结#

代码#

准备

实例化本地大模型

情感分析

更精细的控制

总结

代码