Prompt Engineering for Coding Tasks

If you've ever used ChatGPT to help with a tedious Python script that you have been putting off, or to find the best way to approach a coding University assignment, you have likely realized that while Large Language Models (LLMs) can be helpful for some coding tasks, they often struggle to generate efficient and high-quality code.
We are not alone in our interest in having LLMs as coding assistants. There has been rapidly growing interest in using LLMs for coding by companies, leading to the development of LLM-powered coding assistants such as GitHub Copilot.
Using LLMs for coding has significant challenges as we discussed in the article "Why LLMs are Not Good for Coding". Nevertheless, there are prompt engineering techniques that can improve code generation for certain tasks.
In this article, we will introduce some effective prompt engineering techniques to enhance code generation.
Let's dive deep!
Prompt Engineering
Prompt engineering for LLMs involves carefully crafting prompts to maximize the quality and relevance of the model's output. This process is both an art and a science, as it requires an understanding of how the model interprets and processes language.
Articles often feature titles like "Top 10 Prompts for ChatGPT". However, any prompt could be the "best" if written to enhance the effectiveness of the model's Attention Mechanism. Prompt engineering optimizes this mechanism by guiding the model to focus on the most relevant aspects of a query. This reduces the noise of irrelevant information the model might otherwise consider, leading to more inaccurate outputs.
There are many prompt engineering strategies for natural language tasks. If you are interested in these, I recommend you the article "Prompt Engineering Course by OpenAI – Prompting Guidelines". In the following section, we will discuss strategies specifically tailored to coding.
Coding Prompt Engineering
In essence, prompt engineering for coding with LLMs involves providing a deeper level of technical detail and strategically framing the prompt to align with the model's capabilities.
Let's start simple and advance to more complex ones!
Just as clarity and specificity guide the model toward the desired output in natural language generation, similar prompt engineering techniques can be applied to coding tasks using code comments.
This simple technique involves including comments that outline what the code should accomplish or specifying constraints and requirements step by step. It is an effective way to ensure that the model considers these elements during generation, potentially improving the relevance and quality of the code.

Some LLMs tailored for coding, such as CodeGen [1] and AlphaCode [2], have already been trained to explore the weak – and noisy – patterns found in both natural and programming languages within code. In these models, natural language comments serve as a bridge between the model's general linguistic capabilities and the specific requirements of the coding task.
Asking for Auxiliary Tasks
This second technique involves writing prompts that ask the model to perform additional tasks beyond straightforward code generation, such as debugging, explaining, writing tests, or documenting the generated code.
Requesting intermediary steps that lead to the desired code is also considered an auxiliary task. This prompt engineering approach is not only prevalent in coding but also when using LLMs to solve mathematical problems. It has been shown that requiring the model to compute – or even simply explain or think about – the intermediary steps of a complex task encourages the model to spend more time and, therefore, more computational effort on the requested task.

This technique has also been leveraged during the training of coding-specific LLMs, such as when using large language models for compiler optimization [3]. The auxiliary tasks force the model to process code at a deeper level, leading to improved code generation during inference.

Iterative Decoding
Iterative decoding is a powerful technique that involves refining the output of the model through multiple iterations, each time adjusting the input prompt based on feedback from previous outputs.
A general strategy for iterative decoding could be:
- Initial Code Generation: The process begins with the LLM generating an initial code snippet based on a prompt that describes the desired functionality.
- Evaluation: Once the initial code is generated, it can be evaluated either by ourselves using an automated system – like unit tests or code analyzers – or by asking the model to review or optimize its own response.
- Refinement Prompt: The feedback is then used to refine the prompt or directly modify the code in the query to the LLM. This may involve clarifying the requirements, specifying additional constraints, or asking the model to rectify identified errors. The refined prompt might look like, "Optimize the function below for better memory usage", or "Can you make this code snippet more Pythonic?"
For example, when trying to make our code more Python as just discussed:

Steps 2 and 3 can be repeated, iteratively refining the code until it meets all our requirements. Each iteration aims to move the model's output closer to the desired code, leveraging the model's ability to understand and incorporate feedback into its responses.
Personally, I find this technique is essential for tailoring code to specific needs in software development. For example, when using an LLM to develop a web application backend, the initial prompt might ask for a basic user data API request. Subsequent iterative prompts would address any revealed security vulnerabilities or performance issues, continuously refining the code for better security and efficiency.
Prompt Perplexity
Clearer prompt understanding by the model is crucial for generating more accurate and functional code. Technically, the effectiveness of a prompt is linked to how familiar the model is with the language it contains.
The final and "more complex" prompt engineering technique of this article involves evaluating the model's understanding of a given prompt by using the prompt perplexity score.
Perplexity is a measure of how well a model understands the input prompt or, more accurately, how "surprised" the model is by the input text. In the context of LLMs, a lower perplexity score (less "surprised") indicates that the model finds the prompt more predictable and easier to understand based on its training data.
Indeed, the scientific paper "Demystifying Prompts in Language Models via Perplexity Estimation" [4] demonstrates exactly that: lower perplexity in a prompt indicates the model is more confident in handling it, thus generating better outputs.

To calculate perplexity, one typically need access to the internals of the language model, specifically the probabilities it assigns to sequences of text.
When using the OpenAI API, we can retrieve the log probabilities for the outputs given a prompt by using the logprobs and top_logprobs input parameters. These probabilities can then be used to calculate the perplexity of the text.
When iteratively refining prompts for coding, I usually evaluate the perplexity to ensure that the modifications lead to a lower prompt perplexity.
Let me know if you would like to see how to compute prompt perplexity in detail!
Final Thoughts
In this article, we have seen four different prompt engineering techniques that I personally find very useful in leveraging LLMs for my daily coding tasks.
We have explored methods ranging from the basic – such as embedding a structural code backbone in comments – to the advanced – like utilizing token probabilities for better understanding and prompt optimization.
While LLMs are not originally designed for code generation, their ability to generalize to such tasks has been demonstrated by both beginners and advanced programmers. Prompt Engineering enhances the efficiency and accuracy of LLM-assisted code by carefully shaping the input prompts.
I hope the techniques discussed here help you to extend general natural language processing into specific code generation tasks.
Do you have other techniques up your sleeve?
That is all! Many thanks for reading!
I hope this article helps you when using LLMs for coding!
You can also subscribe to my Newsletter to stay tuned for new content. Especially, if you are interested in articles about general LLMs and ChatGPT:
References
[1] Nijkamp, E., Pang, B., Hayashi, H., Tu, L., Wang, H., Zhou, Y., … & Xiong, C. (2022). Codegen: An open large language model for code with multi-turn program synthesis. arXiv:2203.13474.
[2] Li, Y., Choi, D., Chung, J., Kushman, N., Schrittwieser, J., Leblond, R., … & Vinyals, O. (2022).Competition-level code generation with alphacode. Science, 378(6624), 1092–1097.
[3] Cummins, C., Seeker, V., Grubisic, D., Elhoushi, M., Liang, Y., Roziere, B., … & Leather, H. (2023).Large language models for compiler optimization. arXiv preprint arXiv:2309.07062.
[4] Gonen, H., Iyer, S., Blevins, T., Smith, N. A., & Zettlemoyer, L. (2022). Demystifying prompts in language models via perplexity estimation. arXiv preprint arXiv:2212.04037.

