Prompt Engineering 101 : The principles to writing effective prompts for Large Language Models

Viraj Kadam
4 min readJun 24, 2024

--

Introduction

Generally, there are two main types of LLMs.

  1. Base LLMs : Pre-trained models which are trained for next word prediction. These are trained with large datasets, and learn general statistics about a language etc.
  2. Instruct tuned LLMs : The Base LLMs models are usually trained with instruction dataset, so to be able to do certain tasks.

Essential parameters and message components

  1. Model parameters
    1. Temperature : Temperature controls the sampling probablity of the predicted word. A higher temperature increases randomness in the response. It is a good idea to keep the temperature low for deterministic tasks like fact based Q&A, while higher temperature for creative tasks like writing a poem or essay.
    2. Max length : Max output response from the model.
    3. Top P : Another sampling parameter with temperature, which controls how much of the top probability mass function are used. Keep the value of Top P low, for more deterministic answers.
  2. Message components
    1. System message : The system message sets the behavior and persona of the model. We can use the relevant system prompt for the application. A system token is used to delimit the system message.
    2. User message: The instruction for the prompt, the user and the context for the question go into the user message.A instruction token is used to delimit the system message.

Basic Principles for Efficient prompts

  1. Write clear and specific prompts :
    Be specific about the task you want the model to do. Write clear prompts, with clear seperation between the different elements of the prompt. For example, clearly mention what you want the model to do ,like classify ,translate ,summarize etc.
  2. Be Precise :
    While being specific enough, it is important to also be precise in the prompt.
  3. Specific output format:
    In the prompt instructions , ask for the specific output format (JSON, XML etc).
  4. Use Delimiters:
    Use Delimiters such as hash (#) , triple single-ticks (’’’) or triple back-ticks (”””) to seperate out elements of the prompt, such as the input context when doing context based Q&A.
  5. Iterative Development:
    Start with simple, clear,specific and precise prompt. The perform a analysis on the test data on why the results are not as expected. Then refine the idea and the prompt. Building the prompt is an iterative process.

Example : Prompt to extract specific metadata from a paragraph of bird description.

First, we set the system prompt, which defines the style of models answering.

system_prompt = "You are a information extraction model, which extracts the information that the user requests for, from the context that the user provides.Your output is always in json format"

Then we provide a example of the task to be done. This is called as one-shot prompting. A prompt with more examples is called as a few shot prompting.

example_0="""

### Example Description and JSON Output

**Example Bird Description**:
"The Red-winged Blackbird is known for its distinctive song, which is a loud, musical 'conk-la-ree!' It inhabits wetlands, marshes, and open fields. The bird measures about 7-9 inches in length with a wingspan of approximately 12-16 inches. It has striking black plumage with bright red and yellow shoulder patches. The bird's call is a sharp 'check!' similar to that of other blackbird species. It is often confused with the Tricolored Blackbird, which has similar shoulder patches but a different song pattern."

**JSON Output**:
```json
{
"song_vocalization": "A loud, musical 'conk-la-ree!'",
"habitat": "Wetlands, marshes, and open fields",
"size": "Measures about 7-9 inches in length with a wingspan of approximately 12-16 inches",
"appearance": "Striking black plumage with bright red and yellow shoulder patches",
"call_vocalization": "A sharp 'check!'",
"similar_species": "Tricolored Blackbird with similar shoulder patches but a different song pattern"
}
"""

Next, we define the output format we expect the output of the llm to be in.

output_json = """{
"song_vocalization": "Description of the bird's song vocalizations",
"habitat": "Description of the bird's habitat",
"size": "Description of the bird's size",
"appearance": "Description of the bird's appearance",
"call_vocalization": "Description of the bird's call vocalizations",
"similar_species": "Description of similar species and their attributes"
}"""

Finally, we add all of the above in our instruction prompt.

extraction_template = """Extract the following details about the bird from the provided context,delimited by triple single-ticks and format them into a JSON object. The description will be provided in the Input section.

Note that:
- If the relevant information for a field does not exist in the given context, return an empty string for that field.
- Please extract relevant information strictly from the context provided.
- An example of the task to be done is provided in the Example section.

###Field_name-Description:
1. **song_vocalization**: Identify and describe the type of song or singing patterns the bird exhibits. Include details about the rhythm, pitch, and repetition.
2. **habitat**: Describe the bird's natural environment or specific locations where it is typically found. Include information about the vegetation, climate, and geographic regions.
3. **size**: Provide measurements or descriptions of the bird's physical dimensions. Include details about its length, wingspan, and weight if available.
4. **appearance**: Describe the bird's visual characteristics. Include information about its coloration, plumage patterns, distinctive markings, and overall shape.
5. **call_vocalization**: Identify and describe the types of calls the bird makes. Include specific sounds, frequencies, and purposes (e.g., alarms, communication).
6. **similar_species**: List species that are mentioned as similar to the bird being described. Include specific attributes or features that make them comparable.

### JSON Output Format
```json
{output_json}
###Example
{example}

###Input
Context: '''
{bird_info_context}
'''
"""

The output for the prompt when run on a large set of examples looks like

References

--

--

No responses yet