Remix.run Logo
octo888 a day ago

What is "IsAdmin" doing in the "create user" request DTO in the first place? The examples seem to indicate inappropriate re-use of data models.

Would it not be better to:

  type CreateUserRequest struct {
    Username string
    Password string
  }

  type UserView struct {
    Username string
    IsAdmin boolean
  }
etc?

No need to just have just 1 model that maps 1:1 to your DB row. This applies to all languages

liampulles a day ago | parent | next [-]

Yeah its a bad pattern. And we can expect to see it in a sizable number of production codebases.

diggan a day ago | parent [-]

ORMs love to put people and projects in the situation where it's much easier for data types in your program to match exactly with the tables in your database, but reality often requires them to be decoupled instead. That "create user" request contains "isAdmin" smells like someone mapped that exact request to a table, and just because the table has "isAdmin", the data type now also has it.

ljm a day ago | parent | prev | next [-]

Aside from security issues it presents, it’s just a really bad idea to tightly couple your data model to your API like that.

Any change to your DB schema is liable to become a breaking change on your API. If you need separate types for your requests and responses so be it.

matthew16550 a day ago | parent | prev | next [-]

It's a concept from Fieldings REST thesis that is important to the current meaning of REST:

Transfer Objects are not the Storage Objects.

db48x 19 hours ago | parent | prev | next [-]

Yes!

ajross a day ago | parent | prev | next [-]

Exactly. This isn't a flaw in the runtime or the metaphor, the bug here is that the app exposes private data in a public API. That's a mistake that doesn't care about implementation choice. You can make the same blunder with a RESTful interface or a CGI script.

At most, you can argue that simple serialization libraries (Go's is indeed one of the best) make it more tempting to "just send the data" in such a design, so if you squint really (really) hard, you can call this a "footgun" I guess.

But the rest of the headline is 100% nonsense. This is not about "Go" or "parsers". At all.

yencabulator 15 hours ago | parent | next [-]

Except in the case of encoding/xml garbage, that's really a limitation of the Go parser. It's a very non-strict parser and doesn't actually validate that the input conforms to XML. Think of it more as a query language for a stream of XML fragments.

tln a day ago | parent | prev [-]

What? It's about parsers in Go's standard library.

ajross a day ago | parent [-]

A "parser" is the piece of software that translates unstructured input (usually text) into structured output that better reflects the runtime of a programming language. A "security bug" in a parser is normally construed to be the existence of an input that causes the software to do something incorrect/undocumented/unexpected.

Nothing in the article discusses a parser or anything like a parser bug.

The article doesn't like that the semantics of the user-facing API wrapped around the parser is, I guess, "easy to make mistakes with". That's an article about API design, at most. But that's boring and specious and doesn't grab clicks, so they want you to think that Go's parsers are insecure instead.

woodruffw a day ago | parent | next [-]

I would consider the case-insensitive key handling in Go’s JSON parser and the malleability in the XML parser to both be “classic” examples of differential-prone behavior, which is the ultimate point made by the article.

(This is why the more formal definition of a parser is useful to ground on: a parser is a type of recognizer, and disagreements between recognizers that claim to recognize the same thing can be exploited. This doesn’t require a bug per se, only difference, which is why it’s a footgun.)

yencabulator 15 hours ago | parent [-]

> case-insensitive key handling in Go’s JSON parser

This was an explicit decision for convenience, because the Go struct fields will be Capitalized to export them but JSON tends to follow a lower case convention.

woodruffw 6 hours ago | parent [-]

Just because it’s convenient doesn’t make it a good idea! It’s also not what Go’s other parsers do per the rest of the post.

yencabulator 4 hours ago | parent [-]

The "other parsers" in stdlib are gob and encoding/xml.

Gob is a format made for Go, so it does not have a field naming convention mismatch; what's on wire is the Go convention.

encoding/xml isn't really structured as a "convert between equivalent-shaped structs and a message" utility, its struct tags are more like a query language, like a simpler cousin of XPath. Most real-world code using it does things like

    type Person struct {
        FirstName string   `xml:"name>first"`
        LastName  string   `xml:"name>last"`
    }
Where as with JSON there was a clear desire to have

    {"email": "jdoe@example.com", "name": "John Doe"}
parse with as little noise as possible:

    type Person struct {
        Email string
        Name  string
    }
woodruffw an hour ago | parent [-]

Again, I understand the rationale. I just don’t think it was a good idea.

wavemode a day ago | parent | prev | next [-]

> Nothing in the article discusses a parser or anything like a parser bug.

And it doesn't claim to. The article is titled "footguns" not "bugs". A footgun is just something that is easy to misuse due to unintuitive or unexpected behavior.

dimgl a day ago | parent [-]

> And it doesn't claim to.

Yes it does. The title is literally "Unexpected security footguns in Go's parsers". The article didn't identify a single footgun. This is just bad design.

mplanchard a day ago | parent | next [-]

The article links to CVEs directly caused by some of the less intuitive behavior here. I feel like that’s sufficient to qualify as a footgun

ajross a day ago | parent | prev [-]

That title is clearly constructed to imply that Go's parsers are insecure. The text of the article isn't about parsers at all, and only tangentially about Go[1].

It's deliberately misleading clickbait. You know it. I know it. We all know it.

If you want to have a considered discussion about pitfalls with the use of automatic serialization paradigms across trust boundaries, I'm here for it. If you just want to flame about Go, get better source material. This one isn't the hill to stand on.

[1] Which, again, has a really first rate serialization story; but not one fundamentally different from any of a zillion others. Cooking data from untrusted sources is just plain hard, and not something that anyone (much less the author of this awful blog post) is going to solve with a serialization API.

CactusRocket a day ago | parent | next [-]

I think you and I read a different article. For example, about treating JSON keys without case sensitivity:

> In our opinion, this is the most critical pitfall of Go’s JSON parser because it differs from the default parsers for JavaScript, Python, Rust, Ruby, Java, and all other parsers we tested.

It would be kind of difficult to argue that this is not about Go.

Don't get me wrong, I love Go just as much as the next person, but this article definitely lays bare some issues with serialization specific to Go.

ioasuncvinvaer a day ago | parent | prev | next [-]

I disagree.

It was a bad design choice to allow JSON keys with different capitalisation and to serialise all public struct members by default.

These decisions can easily result in the creation of an insecure system.

woodruffw a day ago | parent | prev | next [-]

I don’t think the title is clickbait. You’re fixating on just one example in the post, but the others demonstrate a clear pattern: there are footguns in Go’s various parsers, including forms of malleability or flexibility that enable parser differentials.

This isn’t intended to be a ding against Go; it’s a universal problem across programming languages (otherwise there’d be no differentials at all). But they’re worth noting, and I think the post amply demonstrates their security relevance.

x0x0 a day ago | parent | prev [-]

Of course it's about Go. They decided to build not just opt-out unmarshaling of objects, but fragile opt-out.

See eg rails' strong params for the opposite approach: opt-in.

sfvisser a day ago | parent | prev [-]

Exactly right. Better have a domain layer with data types representing the domain object 1:1 and add one or more API layers on top for interacting with those for some modality. Creation, deletion, verification, auth etc.

The security failure is not the parsing library, but failing to model your application architecture properly.

delusional a day ago | parent | prev | next [-]

> No need to just have just 1 model that maps 1:1 to your DB row. This applies to all languages

One reason is to avoid copying data constantly. I don't just mean this from an efficiency perspective, but also (and maybe more so) from a simplicity one. If you have a library for shoving data into a struct mechanistically, but you then take the data from that struct and shove it into an additional struct, what's the point of the library? You're writing the code move the data anyway.

In my dayjob I see this tendency constantly to have a lot of different very narrow structs that somehow integrate into some library, and then a TON of supporting code to copy between those structs. Only to then do very little actually useful work with any of the data at the end. I generally think you'd be happier with fatter structs that integrated less with weird "struct-filling" libraries.

kgeist a day ago | parent | next [-]

1. As shown in the article, exposing the internal model to APIs directly is a security footgun. I've seen sites get hacked because the developer serialized internal objects without any oversight just like in the article and accidentally exposed secrets.

2. Exposing internal models to APIs directly also makes it hard to refactor code because refactoring would change APIs, which would require updating the clients (especially problematic when the clients are owned by other teams). I've seen this firsthand too in a large legacy project, people were afraid to refactor the code because whenever they tried, it broke the clients downstream. So instead of refactoring, they just added various complex hacks to avoid touching the old core code (and of course, their models also mapped directly to the UI).

In the end, codebases like that, with no layer separation, become really hard to maintain and full of security problems.

All because they thought it was "simpler" to skip writing ~10 lines of extra boilerplate per model to map models to DTOs.

Lack of layer separation becomes a problem in the long term. When you're just starting out, it may seem like overengineering, but it isn't

delusional a day ago | parent [-]

> Lack of layer separation becomes a problem in the long term. When you're just starting out, it may seem like overengineering, but it isn't

I actually agree, but you're setting of a false dichotomy. I do believe in strong layering at the interfaces, for exactly the reasons you line up. What I don't believe in is what I might call "struct annotation based parsing" at those interfaces.

Typically, you don't want to pass DTO's around your code. Usually, you take in that struct, and then immediatly have some code to poke it into the appropriate places in your actual data structures. It's very often much easier to simply take a well structured but more direct and generic interpretation of the input data, and write the code to poke it into the correct places directly.

It is not that you should define your inputs separately from your internal data storage. It's that the specification of your input structure shouldn't exist as a struct, it should exist as the consequence of your parsing code.

> When you're just starting out, it may seem like overengineering, but it isn't

It's a real shame that the internet has driven us to assume everybody is a novice.

kgeist a day ago | parent [-]

>It's a real shame that the internet has driven us to assume everybody is a novice.

Sorry, English is not my native language. I didn't mean to say you're a novice.

>Usually, you take in that struct, and then immediatly have some code to poke it into the appropriate places in your actual data structures

>It's that the specification of your input structure shouldn't exist as a struct, it should exist as the consequence of your parsing code.

>It is not that you should define your inputs separately from your internal data storage. It's that the specification of your input structure shouldn't exist as a struct, it should exist as the consequence of your parsing code.

Can you give an example?

delusional a day ago | parent [-]

> Can you give an example?

Sure. I like taking Jackson (the Java library) as an example, since it actually supports both models. The way I've seen it used mostly is with jackson-databind. Here you define classes and annotate the fields with data that tells the library how to marshall them to/from json. Superficially, I find that similar to how Go or SerDe (from rust) suggests you handle that. In that programming model, I agree it makes total sense to declare some classes separately from your core structures, for all the reasons we've talked about.

The other model Jackson has is what they call the Jackson Tree Model. In this model you get back a representation of a Json Object. From that object you can get fields, those fields can themselves be objects, arrays, or immediate. It's an AST if you're a compiler person.

The first model might lead to code like this:

    public class Person {
        @JsonProperty
        String name;
        @JsonProperty
        String phoneNo;
    }
Usually, the annotations wont be able to fully specify the constraints of your code, so you'll see usage code like this:

    if(personDto.phoneNo.length() != 10) return Http(400, "Phone number must be 10 chars");
    person.phone = personDto.phoneNo
With the Tree Model you'd instead get a representation of the raw JSON from the client and pull out the fields you care about yourself:

    var phoneNoObj = jsonBody.get("phoneNo");
    if(phoneNoObj == null) return Http(400, "Phone number is required");
    var phoneNoStr = phoneNoObj.asString()
    if(phoneNoStr == null) return Http(400, "Phone number must be a string");
    if(phoneNoStr.length() != 10) return Http(400, "Phone number must be 10 chars");
    person.phone = phoneNoStr;
Notice that we are now just doing a single application specific parse of the json, and while we were at it we also got to surface a bunch more relevant errors. The Jackson Tree model is obviously pretty inefficient, but there are ways to implement it that makes it more efficient too.
kgeist a day ago | parent [-]

In Go, you can marshal JSON to a map, so you can extract fields manually, too.

masklinn a day ago | parent | prev | next [-]

> In my dayjob I see this tendency constantly to have a lot of different very narrow structs that somehow integrate into some library, and then a TON of supporting code to copy between those structs.

Maybe that's the problem to solve, rather than exposing the entire internal world to the outside? Because different views of the same entities is pretty critical otherwise it's way too easy to start e.g. returning PII to public endpoints because some internal process needed it.

delusional a day ago | parent [-]

> exposing the entire internal world to the outside

That's not at all what I said.

You don't need a struct to avoid exposing internal data. If you're building a JSON object, you can just not write the code to format some fields out. You don't need a new data layout for that.

CactusRocket a day ago | parent | next [-]

That's the whole point of the first item in the article, and the original comment you were replying to. In Go (and some other languages) that formatting is implicit and automatic. So you need to write to code to NOT format the fields out. Which leads to the potential security issues where data is leaked, or you can inject data into "hidden" fields. So since it's implicit and automatic, it's safer to, as a rule, define separate structs for the data input and map them, so that there is absolutely no chance (implicit or explicit) to leak or inject data.

masklinn a day ago | parent | prev [-]

> If you're building a JSON object, you can just not write the code to format some fields out.

Did you fail to read the article somehow?

Few if any modern language requires you to write the code to format individual fields out. Even Go does not, though when it comes to json most fields need to be annotated because of the choices it made to tie casing and visibility, but then it doesn't even require you to opt the type itself into being serializable, every struct is serializable by default.

Skipping fields on a serializable structure is what requires extra work.

eadmund a day ago | parent | prev | next [-]

> In my dayjob I see this tendency constantly to have a lot of different very narrow structs that somehow integrate into some library, and then a TON of supporting code to copy between those structs. Only to then do very little actually useful work with any of the data at the end.

Don’t think of it as doing a little useful work at the end; think of it as doing all the useful work in the centre. Your core logic should be as close to a pure implementation without external libraries as possible (ideally zero, but that is often not easily achievable), but call out to external libraries and services to get its work done as appropriate. That does mean a fair amount of data copying, but IMHO it’s worth it. Testing copies is easy and localised, whereas understand the implications of a JSON (or Protobuf, or Django, or whatever) object carried deep into one’s core logic and passed into other services and libraries is very very difficult.

There’s a common theme with the ORM trap here. The cost of a little bit of magic is often higher than the cost of a little bit of copying.

TeMPOraL a day ago | parent | prev | next [-]

> also (and maybe more so) from a simplicity one. If you have a library for shoving data into a struct mechanistically, but you then take the data from that struct and shove it into an additional struct, what's the point of the library? You're writing the code move the data anyway.

Super annoying if you need to do it by hand, and wastes compute and memory if you actually need to do copies of copies, but this is the mapping part of "object relational mapping", the M in ORM. Skipping it is a bad idea.

Your business/domain model should not be tied directly to your persistence model. It's a common mistake that's responsible for like half of the bad rep ORMs get. Data structures may look superficially similar, but they represent different concepts with different semantics and expectations. If you skip on that, you'll end up with tons of stupid mistakes like 'masklinn mentions, and more subtle bugs when the concepts being squashed together start pulling in opposite directions over time.

homebrewer a day ago | parent | prev [-]

I don't know about Go, but in Java and .NET world this is trivially solvable with libraries like MapStruct. If you have a model with 20 fields and need to create a tiny slice of it (with let's say three fields), you need a few lines of boilerplate: create a record with those fields (1-3 LOC):

  record Profile(int id, String login, boolean isAdmin) {}
create a mapper for it:

  interface UserMapper {
    // arrays are just one example, plain models and plenty
    // of other data structures are supported
    Profile[] usersToProfiles(User[] user);

    // other mappers...
  }
and then use it:

  class UserController {
    //
    @GET("/profiles")
    Profile[] getUserProfiles() {
      var users = userRepo.getUsers();
      return userMapper.usersToProfiles(users);
    }
  }
As long as fields' names match, everything will be handled for you. Adding another "view" of your users requires creating that "view" (as a record or as a plain class) and adding just one line to the mapper interface, even if that class contains all User's fields but one. So no need to write and maintain 19+ lines of copying data around.

It also handles nested/recursive entities, nulls, etc. It's also using codegen, not reflection, so performance is exactly the same as if you had written it by hand, and the code is easy to read.

https://mapstruct.org

Go developers usually "don't need these complications", so this is just another self-inflicted problem. Or maybe it's solved, look around.

jerf a day ago | parent | next [-]

Actually it's even easier in Go with struct embedding:

    type SmallerThing struct {
        Id     int
        Login  string
        IsAdmin bool
    }

    type UserController struct {
        SmallerThing
        OtherField  Whatever
        OtherField2 SomethingElse
    }
In principle this could break down if you need super, super complicated non-overlapping mappings, in practice I have yet to need that.
kgeist a day ago | parent | prev [-]

>you have a model with 20 fields and need to create a tiny slice of it (with let's say three fields), you need a few lines of boilerplate: create a record with those fields (1-3 LOC):

>create a mapper for it:

> ...

>Go developers usually "don't need these complications", so this is just another self-inflicted problem.

In Go:

  type DTO struct {
    A, B, C string
  }
Somewhere in your API layer:

  // copy the fields to the DTO
  return DTO{A: o.A, B: o.B, C: o.C}
I fail to see where the "self-inflicted problem" is and why it requires a whole library? (which seems to require around the same number of lines of code at the end of the day, if you count the imports, the additional mapper interface)
21 hours ago | parent | prev [-]
[deleted]