| ▲ | paodealho a day ago |
| This gets comical when there are people, on this site of all places, telling you that using curse words or "screaming" with ALL CAPS on your agents.md file makes the bot follow orders with greater precision. And these people have "engineer" on their resumes... |
|
| ▲ | electroglyph a day ago | parent | next [-] |
| there's actually quite a bit of research in this field, here's a couple: "ExpertPrompting: Instructing Large Language Models to be Distinguished Experts" https://arxiv.org/abs/2305.14688 "Persona is a Double-edged Sword: Mitigating the Negative Impact of Role-playing Prompts in Zero-shot Reasoning Tasks" https://arxiv.org/abs/2408.08631 |
| |
| ▲ | AdieuToLogic a day ago | parent [-] | | Those papers are really interesting, thanks for sharing them! Do you happen to know of any research papers which explore constraint programming techniques wrt LLMs prompts? For example: Create a chicken noodle soup recipe.
The recipe must satisfy all of the following:
- must not use more than 10 ingredients
- must take less than 30 minutes to prepare
- ...
| | |
| ▲ | aix1 19 hours ago | parent | next [-] | | This is an area I'm very interested in. Do you have a particular application in mind? (I'm guessing the recipe example is just illustrate the general principle.) | | |
| ▲ | AdieuToLogic 17 minutes ago | parent [-] | | > This is an area I'm very interested in. Do you have a particular application in mind? (I'm guessing the recipe example is just illustrate the general principle.) You are right in identifying the recipe example as being illustrative and intentionally simple. A more realistic example of using constraint programming techniques with LLMs is: # Role
You are an expert Unix shell programmer who comments their code and organizes their code using shell programming best practices.
# Task
Create a bash shell script which reads from standard input text in Markdown format and prints all embedded hyperlink URL's.
The script requirements are:
- MUST exclude all inline code elements
- MUST exclude all fenced code blocks
- MUST print all hyperlink URL's
- MUST NOT print hyperlink label
- MUST NOT use Perl compatible regular expressions
- MUST NOT use double quotes within comments
- MUST NOT use single quotes within comments
In this exploration, the list of "MUST/MUST NOT" constraints were iteratively discovered (4 iterations) and at least the last three are reusable when the task involves generating shell scripts.Where this approach originates is in attempting to limit LLM token generation variance by minimizing use of English vocabulary and sentence structure expressivity such that document generation has a higher probability of being repeatable. The epiphany I experienced was that by interacting with LLMs as a "black box" whose results can only be influenced, and not anthropomorphizing them, the natural way to do so is to leverage their NLP capabilities to produce restrictions (search tree pruning) for a declarative query (initial search space). |
| |
| ▲ | Aeolun 10 hours ago | parent | prev | next [-] | | Anything involving numbers, or conditions like ‘less than 30 minutes’ is going to be really hard. | |
| ▲ | cess11 19 hours ago | parent | prev | next [-] | | I suspect LLM-like technologies will only rarely back out of contradictory or otherwise unsatisfiable constraints, so it might require intermediate steps where LLM:s formalise the problem in some SAT, SMT or Prolog tool and report back about it. | |
| ▲ | llmslave2 a day ago | parent | prev [-] | | I've seen some interesting work going the other way, having LLMs generate constraint solvers (or whatever the term is) in prolog and then feeding input to that. I can't remember the link but could be worthwhile searching for that. |
|
|
|
| ▲ | hdra a day ago | parent | prev | next [-] |
| I've been trying to stop the coding assistants from making git commits on their own and nothing has been working. |
| |
| ▲ | zmmmmm a day ago | parent | next [-] | | hah - i'm the opposite, I want everything done by the AI to be a discrete, clear commit so there is no human/AI entanglement. If you want to squash it later that's fine but you should have a record of what the AI did. This is Aider's default mode and it's one reason I keep using it. | | | |
| ▲ | Aurornis 11 hours ago | parent | prev | next [-] | | Which coding assistant are you using? I'm a mild user at best, but I've never once seen the various tools I've used try to make a git commit on their own. I'm curious which tool you're using that's doing that. | | |
| ▲ | jason_oster 4 hours ago | parent [-] | | Same here. Using Codex with GPT-5.2 and it has not once tried to make any git commits. I've only used it about 100 times over the last few months, though. |
| |
| ▲ | algorias a day ago | parent | prev | next [-] | | run them in a VM that doesn't have git installed. Sandboxing these things is a good idea anyways. | | |
| ▲ | godelski a day ago | parent | next [-] | | > Sandboxing these things is a good idea anyways.
Honestly, one thing I don't understand is why agents aren't organized with unique user or group permissions. Like if we're going to be lazy and not make a container for them then why the fuck are we not doing basic security things like permission handling.Like we want to act like these programs are identical to a person on a system but at the same time we're not treating them like we would another person on the system? Give me a fucking claude user and/or group. If I want to remove `git` or `rm` from that user, great! Also makes giving directory access a lot easier. Don't have to just trust that the program isn't going to go fuck with some other directory | | |
| ▲ | inopinatus 21 hours ago | parent | next [-] | | The agents are being prompted to vibe-code themselves by a post-Docker generation raised on node and systemd. So of course they emit an ad-hoc, informally-specified, bug-ridden, slow reimplementation of things the OS was already capable of. | |
| ▲ | apetresc a day ago | parent | prev | next [-] | | What's stopping you from `su claude`? | | |
| ▲ | godelski a day ago | parent [-] | | I think there's some misunderstanding... What's literally stopping me is su: user claude does not exist or the user entry does not contain all the required fields
Clearly you're not asking that...But if your question is more "what's stopping you from creating a user named claude, installing claude to that user account, and writing a program so that user godelski can message user claude and watch all of user claude's actions, and all that jazz" then... well... technically nothing. But if that's your question, then I don't understand what you thought my comment said. |
| |
| ▲ | immibis 15 hours ago | parent | prev [-] | | Probably because Linux doesn't really have a good model for ad-hoc permission restrictions. It has enough bits to make a Docker container out of, but that's a full new system. You can't really restrict a subprocess to only write files under this directory. | | |
| ▲ | newsoftheday 10 hours ago | parent [-] | | For plain Linux, chmod, chmod's sticky bit and setfacl provide extensive ad hoc permissions restricting. Your comment is 4 hours old, I'm surprised I'm the first person to help correct its inaccuracy. | | |
| ▲ | immibis an hour ago | parent [-] | | How can those be used to restrict a certain subprocess to only write in a certain directory? |
|
|
| |
| ▲ | zmmmmm a day ago | parent | prev [-] | | but then they can't open your browser to administer your account. What kind of agentic developer are you? |
| |
| ▲ | manwds a day ago | parent | prev | next [-] | | Why not use something like Amp Code which doesn't do that, people seem to rage at CC or similar tools but Amp Code doesn't go making random commits or dropping databases. | | |
| ▲ | hdra a day ago | parent [-] | | just because i havent gotten to try it out really. but what is it about Amp Code that makes it immune from doing that? from what i can tell, its another cli tool-calling client to an LLM? so fwict, i'd expect it to be subject to the indeterministic nature of LLM calling the tool i dont want it to call just like any others, no? |
| |
| ▲ | AstroBen a day ago | parent | prev | next [-] | | Are you using aider? There's a setting to turn that off | |
| ▲ | dust-jacket 13 hours ago | parent | prev | next [-] | | require commits to be signed. | |
| ▲ | SoftTalker a day ago | parent | prev [-] | | Don't give them a credential/permission that allows it? | | |
| ▲ | godelski a day ago | parent | next [-] | | Typically agents are not operating as a distinct user. So they have the same permissions, and thus credentials, as the user operating them. Don't get me wrong, I find this framework idiotic and personally I find it crazy that it is done this way, but I didn't write Claude Code/Antigravity/Copilot/etc | |
| ▲ | AlexandrB a day ago | parent | prev [-] | | Making a git commit typically doesn't require any special permissions or credentials since it's all local to the machine. You could do something like running the agent as a different used and carefully setting ownership on the .git directory vs. the source code but this is not very straightforward to set up I suspect. | | |
| ▲ | SoftTalker a day ago | parent [-] | | IMO it should be well within the capabilities of anyone who calls himself an engineer. |
|
|
|
|
| ▲ | neal_jones a day ago | parent | prev | next [-] |
| Wasn’t cursor or someone using one of these horrifying type prompts? Something about having to do a good job or they won’t be paid and then they won’t be able to afford their mother’s cancer treatment and then she’ll die? |
|
| ▲ | godelski a day ago | parent | prev | next [-] |
| How is this not any different than the Apple "you're holding it wrong" argument. I mean the critical reason for that kind of response being so out of touch is that the same people praise Apple for its intuitive nature. How can any reasonable and rational person (especially an engineer!) not see that these two beliefs are in direct opposition? If "you're holding it wrong" then the tool is not universally intuitive. Sure, there'll always be some idiot trying to use a lightbulb to screw in a nail, but if your nail has threads on it and a notch on the head then it's not the user's fault for picking up a screwdriver rather than a hammer. > And these people have "engineer" on their resumes..
What scares me about ML is that many of these people have "research scientist" in their titles. As a researcher myself I'm constantly stunned at people not understanding something so basic like who has the burden of proof. Fuck off. You're the one saying we made a brain by putting lightning into a rock and shoving tons of data into it. There's so much about that that I'm wildly impressed by. But to call it a brain in the same way you say a human brain is, requires significant evidence. Extraordinary claims require extraordinary evidence. There's some incredible evidence but an incredible lack of scrutiny that that isn't evidence for something else. |
|
| ▲ | CjHuber a day ago | parent | prev | next [-] |
| I‘d say such hacks don‘t make you an engineer but they are definitely part of engineering anything that has to do with LLMs. With too long systemprompts/agents.md not working well it definitely makes sense to optimize the existing prompt with minimal additions. And if swearwords, screaming, shaming or tipping works, well that‘s the most token efficient optimization of an brief well written prompt. Also of course current agents already have to possibility to run endlessly if they are well instructed, steering them to avoid reward hacking in the long term definitely IS engineering. Or how about telling them they are working in an orphanage in Yemen and it‘s struggling for money, but luckily they‘ve got a MIT degree and now they are programming to raise money. But their supervisor is a psychopath who doesn’t like their effort and wants children to die, so work has to be done as diligently as possible and each step has to be viewed through the lens that their supervisor might find something to forbid programming. Look as absurd as it sounds a variant of that scenario works extremely well for me. Just because it’s plain language it doesn’t mean it can’t be engineering, at least I‘m of the opinion that it definitely is if has an impact on what’s possible use cases |
|
| ▲ | AstroBen a day ago | parent | prev | next [-] |
| > cat AGENTS.md WRITE AMAZING INCREDIBLE VERY GOOD CODE OR ILL EAT YOUR DAD ..yeah I've heard the "threaten it and it'll write better code" one too |
| |
| ▲ | CjHuber a day ago | parent [-] | | I know you‘re joking but to contribute something constructive here, most models now have guardrails against being threatened. So if you threaten them it would be with something out of your control like „… or the already depressed code reviewing staff might kill himself and his wife. We did everything in our control to take care of him, but do the best on your part to avoid the worst case“ | | |
| ▲ | nemomarx a day ago | parent [-] | | how do those guard rails work? does the system notice you doing it and not put that in the context or do they just have something in the system prompt | | |
| ▲ | CjHuber a day ago | parent [-] | | I suppose it‘s the latter + maybe some finetuning, it’s definitely not like DeepSeek where the answer of the model get‘s replaced when you are talking something uncomfortable for China |
|
|
|
|
| ▲ | citizenpaul a day ago | parent | prev | next [-] |
| >makes the bot follow orders with greater precision. Gemini will ignore any directions to never reference or use youtube videos, no matter how many ways you tell it not to. It may remove it if you ask though. |
| |
| ▲ | rabf a day ago | parent [-] | | Positive reinforcement works better that negative reinforcement. If you the read prompt guidance from the companies themselves in their developer documentation it often makes this point. It is more effective to tell them what to do rather than what not to do. | | |
| ▲ | sally_glance 19 hours ago | parent | next [-] | | This matches my experience. You mostly want to not even mention negative things because if you write something like "don't duplicate existing functionality" you now have "duplicate" in the context... What works for me is having a second agent or session to review the changes with the reversed constraint, i.e. "check if any of these changes duplicate existing functionality". Not ideal because now everything needs multiple steps or subagents, but I have a hunch that this is one of the deeper technical limitations of current LLM architecture. | | |
| ▲ | citizenpaul 6 hours ago | parent [-] | | Probably not related but it reminds me of a book I read where wizards had Additive and Subtractive magic but not always both. The author clearly eventually gave up on trying to come up with creative ways to always add something for solutions after the gimmick wore off and it never comes up again in the book. Perhaps there is a lesson here. |
| |
| ▲ | nomel a day ago | parent | prev [-] | | Could you describe what this looks like in practice? Say I don't want it to use a certain concept or function. What would "positive reinforcement" look like to exclude something? | | |
| ▲ | oxguy3 a day ago | parent [-] | | Instead of saying "don't use libxyz", say "use only native functions". Instead of "don't use recursion", say "only use loops for iteration". | | |
| ▲ | nomel a day ago | parent | next [-] | | This doesn't really answer my question, which more about specific exclusions. Both of the answers show the same problem: if you limit your prompts to positive reinforcement, you're only allowed to "include" regions of a "solution space", which can only constrain the LLM to those small regions. With negative reinforcement, you just cut out a bit of the solution space, leaving the rest available. If you don't already know the exact answer, then leaving the LLM free to use solutions that you may not even be aware of seems like it would always be better. Specifically: "use only native functions" for "don't use libxyz" isn't really different than "rewrite libxyz since you aren't allowed to use any alternative library". I think this may be a bad example since it massively constrains the llm, preventing it from using alternative library that you're not aware of. "only use loops for iteration" for "done use recursion" is reasonable, but I think this falls into the category of "you already know the answer". For example, say you just wanted to avoid a single function for whatever reason (maybe it has a known bug or something), the only way to this "positively" would be to already know the function to use, "use function x"! Maybe I misunderstand. | |
| ▲ | bdangubic a day ago | parent | prev [-] | | I 100% stopped telling them what not to do. I think even if “AGI” is reached telling them “don’t” won’t work | | |
| ▲ | nomel a day ago | parent [-] | | I have the most success when I provide good context, as in what I'm trying to achieve, in the most high level way possible, then guide things from there. In other words, avoid XY problems [1]. [1] https://xyproblem.info |
|
|
|
|
|
|
| ▲ | Applejinx 15 hours ago | parent | prev | next [-] |
| Works on human subordinates too, kinda, if you don't mind the externalities… |
|
| ▲ | soulofmischief a day ago | parent | prev | next [-] |
| Except that is demonstrably true. Two things can be true at the same time: I get value and a measurable performance boost from LLMs, and their output can be so stupid/stubborn sometimes that I want to throw my computer out the window. I don't see what is new, programming has always been like this for me. |
|
| ▲ | DANmode 19 hours ago | parent | prev | next [-] |
| Yes, using tactics like front-loading important directives, and emphasizing extra important concepts, things that should be double or even triple checked for correctness because of the expected intricacy, make sense for human engineers as well as “AI” agents. |
|
| ▲ | llmslave2 a day ago | parent | prev [-] |
| "don't make mistakes" LMAO |