Remix.run Logo
mettamage 5 hours ago

> Reading code, mostly. Tracing data through five layers of someone else's design choices. Forming a hypothesis about why the bug is happening, then testing the hypothesis, then narrowing. Recognising that the function in front of you is too big and asking what part of it has its own reason to exist. Recognising that the schema in front of you encodes a decision someone made in 2019 and that the decision is now load-bearing for things they did not anticipate. Knowing which of the five tempting cleanups in the file is going to bite you in production and which is safe.

It always struck me as strange that universities never had a course that would teach open source code. As in: grab a repo of a popular open source project, read part of it and do your best to create a contribution in it.

The lectures should be about different open source projects and their design choices.

p4bl0 an hour ago | parent | next [-]

In the CS bachelor degree I'm responsible of, we have exactly that in the third and last year (it's in France, so as in ~all Europe the licence lasts three years and then students continue their studies doing a master in two years).

I've been teaching this course for ten years now, and it's been fantastic. A lot of open source contribution, mostly trivial, but some more significant than others too, have been made, to a lot of different projects. It teaches students to actually work on a real code base, using a real workflow (fork, clone, branch, commits, PR, review, commits, review, … hopefully merge), talking (in English) with maintainers, having to update tests and documentation not just code, and having to respect a lot of conventions that are not always explicitly listed anywhere (a first work that I always ask them to do is to present the project they have chosen, its tools, platforms, and languages, and to list all the programming conventions (indentation, naming, etc.) they can identify. At the end of it, it also make them realize what they can do, because at the beginning of the semester most of them think they will never be able to actually make a contribution to a real project.

This year only there were contributions to NewPipe, Cartes.app, Immich, Fossify apps, PyGameEngine, Jax, Shortcut, Wikimedia Commons App, Godot, …

Some years ago I even had students contributing to ls (yes, in the GNU core-utils).

ranger207 3 hours ago | parent | prev | next [-]

CS degrees are about computer science, not software engineering. The fact that the best available degree for a software developer is generally a CS degree is a historical accident, but regardless universities unfortunately don't tend to cater towards what CS students are actually getting their degree for

Gigachad 4 hours ago | parent | prev | next [-]

That would just result in projects being flooded with low quality submissions by students who don’t care but are forced to do it. And who get angry when you don’t merge it since they need you to for their course.

beej71 4 hours ago | parent | next [-]

We have such a class at our university. We used to have students issue PRs to real projects but then stopped for that very reason. Now we have our own big OSS project that they work on quarter-to-quarter. Seems like a decent compromise.

p4bl0 43 minutes ago | parent [-]

Oh, since you're here, I want to say thank you for all your guides! I learned from them yeaaars ago, and still recommend them to my own students to this day, especially your network programming guide which is linked from all of the network lab session sheets of my systems and networks course. Thanks!

p4bl0 an hour ago | parent | prev | next [-]

I teach such a course, and we don't have that. First, students must work on an existing issue of the project they choose and are only allowed (for my course) to submit an issue or a PR non related to an existing issue if they have already finished a first contribution that have been merged by the maintainers into to same project. The course grade is based on multiple factor and the code of the contribution itself is far from being the most important. The most important aspects are communication with the developers (and being respectful and polite certainly is significant) and the ability to identify and then respect the (often implicit) conventions of the project, as well as the proper use of the forge workflow for submitting a PR (fork, clone, branch, PR, discuss, etc.). Getting the contribution actually merged into the project is a neat bonus on the grade but is not required to pass the course.

Also, I totally ban using LLM, and unmotivated students often choose to work on very simple issues like easy refactoring or cosmetic aspects of web projects. It's okay with me for two reasons: first because it filters out unmotivated students from working of important issues and giving useless review work to open source maintainers, but also because we have all the other courses to do complex projects, here the point is to teach them by practice the workflow of contributing to an actual project, discussing with actual people, etc.

For some students it's already a good thing to have been able to get a copy of the latest development version of a given project, to install all of its development dependencies and tools, to compile it, and to reproduce the bug they chose to work on. It's not enough to pass the course, but it's a necessary first step to contribute to any project and it's quite a different experience from what they're used to with small school projects that are designed for teaching or that they entirely wrote themselves.

greazy 4 hours ago | parent | prev [-]

The solution would seem obvious: the lecturer should fork the repo, students submit PR to the fork and if they are deemed worthy they're pushed further upstream.

tnelsond4 5 hours ago | parent | prev | next [-]

You and I both know it's easier to rewrite a project from scratch than contribute to somebody else's project. Only half-joking.

tharkun__ 3 hours ago | parent | next [-]

You'd be (half-jokingly) amazed at how many people are entirely incapable of understanding and debugging an existing code base.

Like, literally: "Error message has string 'abdc-1234-something-whatever'". They can barely figure out to maybe search the code base for that error message. Unfortunately they can't find the full string. Now they're stuck and can't think of anything else to try.

So, effing, amazing. How do these people ever get through (coding) life? Ever heard of substring search coz error messages frequently have parts that are concatenated/variables inserted? Search for parts of it until you find something. It's not that hard dude, yes the 1234 probably is some dynamic id, so search for just something-whatever and you'll instantly find the relevant code and you can debug further.

But no, this "Senior" can't think of anything when not finding the full string anywhere in the codebase and would rather throw up their hands and let others figure it out.

Either a really dumb "Senior" that somehow got through so far at previous companies or they're silent quitting during probation period already.

If this continues it's not gonna be silent.

seba_dos1 4 hours ago | parent | prev [-]

I see it mentioned often, but it's a completely foreign stance to me. I'll take contributing to existing project over writing one from scratch any day, even if it's shitty enough to require general renovation. It's so much easier to jump onto work when there's an existing skeleton already than do all this boring grunt work to set things up and decide on layout that maybe was exciting when I was still only starting to learn how to program, but hasn't been for decades anymore.

I could see LLMs affecting that though. Their ability to output shitty and yet somewhat functional skeletons to work on manually further is just spot on.

godelski 4 hours ago | parent | prev | next [-]

Having taught at a university I'll say that the general reason is because there's already too much to teach, so you do your best. It's extra hard since there's a million people saying "why don't they teach X?" and you have to accommodate them.

There's problems like do you teach Python or C? It sounds silly but the difference is not about languages but how much you teach about systems. Teaching Python you get people going and they can produce faster, which does help students get less discouraged. But teaching C forces learning about the computer system and enables students to dive deeper to teach themselves many different subtopics that no 4 year program can.

What I think is generally missing and would be good to implement is code review and teaching how to understand a large existing codebase (all that grep, find, profilers, traces, tags, and all that jazz). This often gets taught in parallel (e.g. have students review each others code) but it's hit or miss, a lot of extra work, and not everyone does it.

Here's the shitty part: I was often told by peers and people higher up "don't look at student's code, just look at output and run tests." I always ignored, because that advice is why we're failing so many students. But I also understand it because professors are overburdened. There's too much work to do and teaching isn't even half the job. Then every new administrator or "office assistant" they hire, the more work you have (seriously, it'll take days to book a flight because you have to use some code but it takes 2 days for someone to tell you the code and 5 more to tell you that it was the wrong code and it's clearly your fault because you clicked on "book flight" and not trips > booking > flights > schedules > trips > access code > flights > search available flights. Honestly, I think all this llm agent stuff would sound silly if people actually just knew how to design things...)

enos_feedler 4 hours ago | parent | prev | next [-]

I believe when i was at utoronto the compsci dept had a course like this (2005ish)

mackeye 3 hours ago | parent | prev [-]

umich currently has a course like this (but it's a bit of a blowoff)