The impact of file position on code review

54 points by whatever3 12 hours ago

IMO file/folder structure in a project is a geographic mapping exercise. It's not that far off from city planning. I think it's so important to design the structure and naming conventions of a project with the ultimate goal of aesthetically-pleasing, human-understandable spatial organization of digital artifacts. You "go here" to "see that". This "lives here". You travel to the destination of the thing you're looking for, and gain a deeper understanding of how the world is constructed that you are inhabiting.

This is one of my pet peeves with any kind of cloud file storage -- oftentimes they try to reinvent or abstract away the idea of a deeply-nested filesystem tree, and it just feels anti-human to me lol

Edit: I hit submit and then acknowledged I was kind of ranting about something else entirely from the paper itself!

astrange 6 hours ago

This is my problem with almost every UI framework invented in the overengineered era of the 90s-2000s.
Try to find a feature in basically any open-source C++ or Java GUI app just by looking for the code that does it. You can't, because there isn't any code that does anything! Instead there's several different widely scattered pieces of boilerplate, none of which individually has any meat in it.
suzzer99 8 hours ago

Agreed. I always try to design apps so that renaming or changing the file structure causes minimal disruption. I want as little friction to refactoring as possible.
I'm a huge fan of files that just work wherever you put them (think pages on a website), and then relative import paths to all their subcomponents. For subcomponents shared throughout the app, I always use absolute paths so that reafactoring my page-level components into subfolders won't break the import.
stillpointlab 7 hours ago

I totally agree, and I use LLMs for this. I have a script that will output my directory structure (just the folders and filenames + the lines of code; aggregate for folders and per file). Then I ask the LLM to give me feedback. Surprisingly this has lead to some very useful feedback and improvements.

jsbg 9 hours ago

> We found files shown earlier in a PR to receive more comments than files shown later

I often stop reviewing a PR if I have too many comments on it already that need to be addressed so I wouldn't say this is necessarily what it sounds like (later files get less attention). Also, a comment on an early file might address a concern in a later file that will be fixed by the time I actually review the later files.

That said, keep your PRs short and you won't have to worry about stuff like that.

devnullbrain 7 hours ago

Read this as though it has a visibly bulging temple vein:
Does that mean you do the thing where I respond to your review, request a re-review, then get comments about code that was already there a week ago?!
- lmm 5 hours ago
  
  If your PR had lots of stuff that needs comments then I don't think you can reasonably expect me to check it fully in the first pass. E.g. if your first 3 functions are indented wrong, I'm going to write that they're indented wrong and not bother reading the rest of your PR until you've fixed that.
zoover2020 8 hours ago

Can't seem to get anyone else to Baird the train of small CRs. It's so obvious

analog31 8 hours ago

This is kind of a weird aside, but I attended many of my kids' music lessons. The teacher would ask them to play a selection, and then stop them after the first few measures, to work on some point of technique or interpretation. It meant that they never got to the end of anything, until the recital.

"Measure position" had a strong impact on review.

I realized that perfecting the entire selection isn't the point. Instead, the point is for the student to learn how to critique themselves, and improve their practicing.

semperos 10 hours ago

My coworker Toby Crawley recently developed this Firefox plugin to randomize the order of files in GitHub PR reviews:

https://addons.mozilla.org/en-US/firefox/addon/github-pr-fil...

progbits 9 hours ago

People review files top down?

I can't imagine doing that. I glance at the list, and try to spot the highest level change (schema changes, interfaces, etc) and start there. If the core idea is wrong no point wasting time on the rest.

Then I go up from there to implementation details and tests. Often jumping between modules or functions, coming back to some things later when I have more context.

devnullbrain 7 hours ago

I think people putting enough thought and effort into reviewing, and developing it as a skill, are a small minority.
FajitaNachos 8 hours ago

I would bet the vast majority of people do this. If they didn't, or preferred something else, it wouldn't be presented in the UI this way.
- ColonelPhantom 3 hours ago
  
  I disagree, because there is no 'easy' way to know what the most 'significant' changes are. I can think of a few heuristics (e.g. most changed, earliest changed, as well as prioritizing things like header files for C and friends), but nothing that would work universally or particularly well.

Brajeshwar 3 hours ago

This is wholesome. Now, I’m happy that there is a good reason for the way I name and structure files/folders, especially when working in a team.

I was part of a team that built a large web application with Pocket PC interaction capabilities for clinics and physicians in 2002-2003. I scaffolded the front-end, worked on it, while tinkering with the back-end (worked with a brilliant programmer), and built the entire Pocket PC App.

I met the team maintaining the application system after 10+ years around 2015-2016. The lead, who was once the junior we hired, talked about the naming convention and organization and how they followed it for a decade plus. I felt happy and proud of my work.

slongfield 9 hours ago

I've recently started using Github code reviews for a lot of C++, and one thing that I wish it would do is show the header (.h) files before the implementation (.cc) files.

Small PRs help, but I often end up just opening a handful of windows to have everything open at once.

rwmj 9 hours ago

You can't do that in github as far as I know, but you can get git to display diffs that way (eg. for local review or email based workflows). You have to use a git "orderfile". Example:
https://gitlab.com/nbdkit/nbdkit/-/blob/master/scripts/git.o...
We found it useful to display header files, interface files and documentation first, but maybe the linked paper will make us review that!
- devnullbrain 6 hours ago
  
  Github similarly lacks the ability to override the diff algorithm. This is a shame, because `histogram` makes some diffs look much more like what the human intended.
  The default, Myers, is from 1986!
devnullbrain 7 hours ago

For this reason I usually either check it out locally or use the Github IDE (. or >)

aerostable_slug 10 hours ago

Note to self: bury malicious code in "later"-reviewed files.

kridsdale3 9 hours ago

Similar, if you're going up against a promotion committee, or parole board in jail, you want to go when the judges have been recently fed.
- teddyh 7 hours ago
  
  Wasn’t that debunked?
  
  gruez 7 hours ago
  
  yes https://nautil.us/impossibly-hungry-judges-236688/

taeric 8 hours ago

I'm curious if this actually tells us anything about the quality of the reviews, though? You could easily surmise that you make some basic comments on structure and some patterns in a file and send it back to the coder with a general "take that advice to the full change, let me know when I can look at it again."

That is, it isn't necessarily the case that the other files weren't reviewed. Just any overarching comments could have been left as an exercise for the coder. No?

jpalawaga 9 hours ago

One thing my colleagues and I do is expressly offer up a starting point when reviewing changes: "the meat of the change is in xxx.go; the rest is ancillary/supporting". This allows the reader to contextualize themselves more quickly in the change, and understand the rest of the changes with greater ease.

You could take an adversarial view of it: maybe it's better if people get dropped into the changes so they're left to workout if the changes in each file stand alone rather, decontextualized from their original change.

MarkusQ 7 hours ago

This is why I always review the files with bugs in them first.

IceDane 5 hours ago

I find it easiest to just leave out the bugs in the first place. I put them aside for a rainy day.
I now have a collection spanning a decade's worth of bugs. Just waiting for a reason to unleash them on the world. It will be biblical.

gavinhoward 10 hours ago

One of the few times I've read a paper before it was posted.

I like this paper. While it's not anything groundbreaking or sensational, it did help me with the design of something I am working on.

deckar01 10 hours ago

I have always preferred small, self-contained commits for this reason. The GUI tooling for line based commits has left me wanting though. Maybe an LLM can be trained to demux commits.

smohare 9 hours ago

[dead]

Daviey 10 hours ago

It's a angle I hadn't considered, but I'm aware that I likely do this so the impact of the paper is that I be more dilligent. Thank you for sharing.

breatheoften 9 hours ago

I think the whole concept of source files as unordered blobs on the filesystem is just _wrong_ -- and a serious drag on the ability of a reader to quickly digest a codebase and make inferences about changes in isolation ...

Codebases are _ordered_ constructs -- regardless of the ridiculous broken build system abstractions that do everything possible to obscure this truth ...

12_throw_away 10 hours ago

Heh, I've had certain modules where I ended up prefixing the files with `a_initialization.lang` `b_startup.lang` `c_run_the_thing.lang` [...] to make it easier to read for humans. Just like, in, say, a book, where the chapters appear in a specific order, the order of things matters! Probably this is a literate programming sort of thing?

loeg 10 hours ago

Makes sense. As a reviewer I take active steps to focus primarily on the main change by "minimizing" (Phabricator) test and other incidental file changes in my first pass.