Chapter 9: LLM Settings and Hyperparameters

This module explores the key settings that control how Large Language Models generate text. Learn to optimize these parameters for different use cases and understand their effects on output quality.

LLM Settings | Prompt Engineering: Master the Language of AI| IndinTechnoEra

1. Temperature

Temperature controls the randomness of predictions by scaling the logits before applying softmax. Lower values make outputs more deterministic, while higher values increase creativity.

How It Works:

0.0-0.3: Very deterministic, repetitive but precise
0.4-0.7: Balanced creativity and reliability
0.8-1.2: Highly creative, less predictable
>1.2: Often incoherent, experimental only

Temp: 0.2 (Precise)

"The capital of France is Paris. Paris is located in northern France and serves as the country's political and cultural center."

Temp: 0.7 (Balanced)

"Paris, the romantic capital of France, is known for its iconic Eiffel Tower and rich history. Situated along the Seine River, it's a global hub for art and fashion."

Temp: 1.0 (Creative)

"Ah, Paris! The City of Light twinkles along the Seine, where croissants crisp in bakeries and artists find muse in every cobblestone. France's beating heart pulses strongest here."

Interactive Temperature Demo:

0.0 (Deterministic) 0.7 2.0 (Random)

"Paris, the capital of France, is renowned for its art, architecture, and cultural significance. The city attracts millions of visitors annually to landmarks like the Louvre and Notre-Dame."

Python Implementation:

        from transformers
        import pipeline

        # Low temperature for factual responses

        factual_generator = pipeline("text-generation", model="meta-llama/Meta-Llama-3-8B-Instruct")

        factual_response = factual_generator(

          "Explain quantum computing basics",

          temperature=0.3,

          max_length=200

        )

        # High temperature for creative writing

        creative_generator = pipeline("text-generation", model="meta-llama/Meta-Llama-3-8B-Instruct")

        creative_response = creative_generator(

          "Write a poetic description of a quantum computer",

          temperature=0.9,

          max_length=200

        )

        print("Factual:", factual_response[0]['generated_text'])

        print("\nCreative:", creative_response[0]['generated_text'])

2. Top-P (Nucleus Sampling)

Top-p sampling selects from the smallest set of tokens whose cumulative probability exceeds p, allowing dynamic vocabulary selection.

Key Characteristics:

0.0-0.5: Very narrow selection, conservative outputs
0.6-0.9: Balanced diversity and quality (recommended)
1.0: Equivalent to no top-p filtering
Works well with temperature 0.7-1.0

Top-P: 0.3 (Narrow)

"The benefits of exercise include improved cardiovascular health, increased muscle strength, and better mental wellbeing."

Top-P: 0.7 (Balanced)

"Regular exercise offers numerous advantages: it boosts heart health, enhances mood through endorphin release, builds endurance, and may even extend lifespan."

Top-P: 0.95 (Broad)

"Moving your body does wonders! From the obvious perks like toned muscles to subtle benefits like neural growth factors that make your brain sparkle, exercise is nature's miracle drug."

Interactive Top-P Demo:

0.1 (Narrow) 0.7 1.0 (Broad)

"Learning new skills creates neural pathways in the brain, enhances cognitive flexibility, and can improve problem-solving abilities across different domains."

Python Implementation:

        from transformers
        import pipeline

        # Conservative top-p for technical content

        tech_generator = pipeline("text-generation", model="meta-llama/Meta-Llama-3-8B-Instruct")

        tech_response = tech_generator(

          "Explain SSL encryption",

          top_p=0.5,

          temperature=0.5,

          max_length=200

        )

        # Higher top-p for creative content

        story_generator = pipeline("text-generation", model="meta-llama/Meta-Llama-3-8B-Instruct")

        story_response = story_generator(

          "Write a short sci-fi premise",

          top_p=0.9,

          temperature=0.8,

          max_length=200

        )

        print("Technical:", tech_response[0]['generated_text'])

        print("\nCreative:", story_response[0]['generated_text'])

3. Max Tokens

The maximum number of tokens to generate, controlling response length. One token ≈ 3/4 of a word for English text.

Example Prompt:

"Summarize the key events of the American Revolution"

Max Tokens: 50

"The American Revolution (1765-1783) was a colonial revolt where the 13 Colonies defeated Britain, established independence, and created the United States."

Max Tokens: 100

"The American Revolution began with colonial protests against British taxes (Stamp Act, Tea Act), escalated with the Boston Tea Party and Battles of Lexington/Concord (1775), and concluded with the Treaty of Paris (1783) after key battles like Saratoga and Yorktown."

Max Tokens: 200

"Spanning 1765-1783, the American Revolution emerged from colonial opposition to British taxation without representation. Key events include the Stamp Act protests (1765), Boston Massacre (1770), Boston Tea Party (1773), and the Continental Congresses. The war officially began in 1775 with battles at Lexington and Concord. The Declaration of Independence (1776) formalized the break. Turning points included the Continental Army's winter at Valley Forge and victory at Saratoga (1777), which secured French support. The war concluded with the British surrender at Yorktown (1781) and Treaty of Paris (1783)."

Python Implementation:

        from transformers
        import pipeline

        # Short summary

        short_summary = pipeline("text-generation", model="meta-llama/Meta-Llama-3-8B-Instruct")

        short_result = short_summary(

          "Explain photosynthesis in one sentence",

          max_length=50, # ~40 words

          temperature=0.3

        )

        # Detailed explanation

        detailed_explanation = pipeline("text-generation", model="meta-llama/Meta-Llama-3-8B-Instruct")

        detailed_result = detailed_explanation(

          "Explain the process of photosynthesis in detail",

          max_length=300,
        # ~225 words

          temperature=0.5

        )

        print("Short:", short_result[0]['generated_text'])

        print("\nDetailed:", detailed_result[0]['generated_text'])

4. Top-K Sampling

Limits sampling to the k most likely tokens at each step. Unlike top-p, uses a fixed number rather than probability mass.

Comparison with Top-P:

Feature	Top-K	Top-P
Selection Method	Fixed number of tokens	Dynamic probability mass
Consistency	Same k for all steps	Varies by probability dist.
Best For	When vocabulary is stable	When confidence varies

Example Outputs:

Top-K=10

"The Renaissance was a cultural movement that began in 14th century Italy, characterized by renewed interest in classical art and learning."

Top-K=50

"Emerging from the medieval shadows, the Renaissance blossomed in Florence - a rebirth of Greco-Roman ideals that set Europe ablaze with humanist philosophy, breathtaking art, and scientific inquiry that would reshape the world."

5. Penalties (Frequency & Presence)

These parameters discourage repetition by penalizing tokens based on their occurrence.

Key Parameters:

Frequency Penalty (0-2): Reduces repetition of the exact same token
Presence Penalty (0-2): Reduces repetition of similar concepts
Values 1.0+ are very aggressive, 0.1-0.5 is typical

No Penalties

"The key to success is hard work. Hard work leads to achievement. Achievement requires hard work. Work hard to succeed."

With Penalties (freq=0.5, pres=0.3)

"The key to success combines diligent effort, strategic planning, and continuous learning. Consistent application of these principles fosters achievement while maintaining adaptability."

6. Practical Use Cases

Recommended settings for common application scenarios.

Use Case	Temperature	Top-P	Max Tokens	Other Settings
Factual Q&A	0.1-0.3	0.5-0.7	100-300	freq=0.1
Creative Writing	0.7-1.0	0.8-0.95	300-600	pres=0.2
Code Generation	0.2-0.4	0.5-0.8	200-500	freq=0.3
Summarization	0.3-0.6	0.7-0.9	100-400	pres=0.1
Brainstorming	0.8-1.2	0.9-1.0	200-500	pres=0.5

Settings Optimization Process

Workflow for Finding Optimal Settings:

Define your success criteria (accuracy, creativity, length, etc.)
Start with recommended defaults for your use case
Test systematically - vary one parameter at a time
Evaluate outputs against your criteria
Document performance of different combinations
Implement monitoring to detect drift over time
Re-calibrate periodically as models and needs evolve

IndianTechnoEra

LLM Settings | Prompt Engineering: Master the Language of AI

Chapter 9: LLM Settings and Hyperparameters

1. Temperature

How It Works:

Temp: 0.2 (Precise)

Temp: 0.7 (Balanced)

Temp: 1.0 (Creative)

Interactive Temperature Demo:

Python Implementation:

2. Top-P (Nucleus Sampling)

Key Characteristics:

Top-P: 0.3 (Narrow)

Top-P: 0.7 (Balanced)

Top-P: 0.95 (Broad)

Interactive Top-P Demo:

Python Implementation:

3. Max Tokens

Example Prompt:

Max Tokens: 50

Max Tokens: 100

Max Tokens: 200

Python Implementation:

4. Top-K Sampling

Comparison with Top-P:

Example Outputs:

Top-K=10

Top-K=50

5. Penalties (Frequency & Presence)

Key Parameters:

No Penalties

With Penalties (freq=0.5, pres=0.3)

6. Practical Use Cases

Settings Optimization Process

Workflow for Finding Optimal Settings:

Post a Comment

Information Security and IPR | Complete Tutorial Series

Computer Fundamental: Complete Tutorial Series

Ab Tere Bin - Kumar Sanu - Aashiqui (1990) - Lyrics in Hindi, Urdu, Hinglish and English Translation

ITE - CodeSam