avoiding coding "representation errors"

A place to discuss the implementation and style of computer programs.

Moderators: phlip, Moderators General, Prelates

>-)
Posts: 527
Joined: Tue Apr 24, 2012 1:10 am UTC

avoiding coding "representation errors"

Postby >-) » Sun Oct 21, 2018 10:23 pm UTC

a class of programming bugs i often commit involve mistaking the "representation" of a variable -- such as adding degrees and radians, treating a coordinate in the camera-frame as a coordinate in the world-frame, or forgetting to swap the channels of an image from BGR to RGB.

this can be alleviated with a strong type system, and if i carefully create a type for each unit or representation that a coordinate/image might have, but that approach seems to be pretty heavy handed: i'd need to define a bunch of wrapper classes which do nothing besides check that types of the arguments match.

i'm not sure if using a type system to do this is the right approach since i've never seen it in real world code. so what is the solution?

User avatar
ucim
Posts: 6548
Joined: Fri Sep 28, 2012 3:23 pm UTC
Location: The One True Thread

Re: avoiding coding "representation errors"

Postby ucim » Sun Oct 21, 2018 10:55 pm UTC

Try incorporating the unit in the variable name: degrees_from_north, inches_to_target... stuff like that. It won't keep the computer from making a mistake, but it will hint to the programmer to not do so.

Jose
Order of the Sillies, Honoris Causam - bestowed by charlie_grumbles on NP 859 * OTTscar winner: Wordsmith - bestowed by yappobiscuts and the OTT on NP 1832 * Ecclesiastical Calendar of the Order of the Holy Contradiction * Please help addams if you can. She needs all of us.

User avatar
Xanthir
My HERO!!!
Posts: 5327
Joined: Tue Feb 20, 2007 12:49 am UTC
Location: The Googleplex
Contact:

Re: avoiding coding "representation errors"

Postby Xanthir » Mon Oct 22, 2018 12:22 pm UTC

I also often name functions that *take* a particular value and convert it into another value to encode that information; `fooFromRadians()`, for example, is much harder to accidentally pass degrees to. ^_^
(defun fibs (n &optional (a 1) (b 1)) (take n (unfold '+ a b)))

User avatar
Soupspoon
You have done something you shouldn't. Or are about to.
Posts: 3644
Joined: Thu Jan 28, 2016 7:00 pm UTC
Location: 53-1

Re: avoiding coding "representation errors"

Postby Soupspoon » Mon Oct 22, 2018 1:24 pm UTC

It happens to the best of them.

I usually fall down on 'dimensionless' misreferencing, like misidentifying 'nth character' with 'element n' in the zero-indexed array of characters. Or trying to work out if whatever version of spreadsheet-like MID(string,startcharacter, length) formula matches the substr(EXPRESSION,OFFSET, LEN) of something more code-like in nature that I'm more used to. And whether I need to do things off-by-one to splice out something identified by a FIND(string,match) position return of a marker (maybe multicharacter; and maybe post-splice, pre-splice or even intended to be part of the splicing grab, so already there's a LEN(match) included or not).

But, as already said, variable names (variations upon Hungarian notation, even if not actually type-different) is one way. Going full-blown creating child types in an objective structure that reveals the 'correct' value through autoconversion when used in a sibling-type context is perhaps another method, if you have the time and inclination to create and validate all the cross-links, but beware of rounding errors creeping in as it cross-converts, especially when the conversion is between an RGB of a grey tone and an HSL with a technically undefined hue.

>-)
Posts: 527
Joined: Tue Apr 24, 2012 1:10 am UTC

Re: avoiding coding "representation errors"

Postby >-) » Wed Oct 24, 2018 5:09 pm UTC

yes, automatically converting representations is definitely a bad idea, especially since the conversion process is often not "lossless" as you gave an example of. (also it's usually impossible to do automatically, as you can't convert a camera frame coordinate to a world frame coordinate without knowing the camera pose)

hungarian notation seems to be the right answer. as joel spolsky points out (https://www.joelonsoftware.com/2005/05/ ... ook-wrong/) the original intended usage of hungarian notation is exactly to solve this problem, NOT to prefix variables with their type just for the sake of doing so.

elasto
Posts: 3563
Joined: Mon May 10, 2010 1:53 am UTC

Re: avoiding coding "representation errors"

Postby elasto » Thu Oct 25, 2018 6:58 pm UTC

>-) wrote:hungarian notation seems to be the right answer. as joel spolsky points out (https://www.joelonsoftware.com/2005/05/ ... ook-wrong/) the original intended usage of hungarian notation is exactly to solve this problem, NOT to prefix variables with their type just for the sake of doing so.

That's a really interesting bit of history, thanks for pointing to that. Explains an awful lot too.

Tub
Posts: 401
Joined: Wed Jul 27, 2011 3:13 pm UTC

Re: avoiding coding "representation errors"

Postby Tub » Fri Oct 26, 2018 3:02 pm UTC

On the topic of "making wrong code look wrong", the two things that look wrong when I read the article are:
* using string concatenation to generate html
* cleanup code outside of a destructor or finally block
Clean code is a rather subjective and debatable thing, but in this case I'd rather fix the root cause.

If you have strict requirements (like preventing XSS attacks, where a single issue is fatal), you need to use safe APIs. Naming conventions will not help unless you lint for them (but then it's usually easier to write a safe API than a lint rule). If you're writing guidance software for a $1B space probe, invest the time in a type system that requires explicit units and conversions.

If it's just about making code more readable, and spending a bit less time debugging, variable and function names often benefit from a bit more verbosity. There are more options than just using hungarian prefixes, and going strictly 100% hungarian is often more of a maintenance burden than actual help. Find a balance that works for you, and be more verbose on identifiers with a huge scope.

User avatar
ucim
Posts: 6548
Joined: Fri Sep 28, 2012 3:23 pm UTC
Location: The One True Thread

Re: avoiding coding "representation errors"

Postby ucim » Fri Oct 26, 2018 3:16 pm UTC

Tub wrote:* using string concatenation to generate html
Because $start+$middle+$end could contain a $middle that accidentally closes a tag in $start? What would you recommend? There are times where components are self-contained:
$middle =<tags> stuff </tags>
but there are times when the "stuff" is computed separately from the tags surrounding it (formatting depends on one thing, content depends on another). Then what?

Jose
Order of the Sillies, Honoris Causam - bestowed by charlie_grumbles on NP 859 * OTTscar winner: Wordsmith - bestowed by yappobiscuts and the OTT on NP 1832 * Ecclesiastical Calendar of the Order of the Holy Contradiction * Please help addams if you can. She needs all of us.

Tub
Posts: 401
Joined: Wed Jul 27, 2011 3:13 pm UTC

Re: avoiding coding "representation errors"

Postby Tub » Fri Oct 26, 2018 10:40 pm UTC

ucim wrote:Because $start+$middle+$end could contain a $middle that accidentally closes a tag in $start?

Creating html via string concatenation is bad for the same reason that creating database queries with string concatenation is bad. One leads to XSS attacks, the other to sql injection. The same problem exists for xml, json, filenames, command lines, urls and any other textual formats that require escaping.

Escaping a string once and calling it "safe" is simply wrong, because different parts of a html document have different escaping rules.

For database queries, most databases provide a templated API, like db.query('SELECT * FROM foo WHERE a = ? AND b = ?', 42, "bar").

For HTML, there are tons of template engines of varying quality and features. A good one will parse and understand the html structure, reject invalid templates at compile time, and choose the proper escaping based on where your values are inserted.

User avatar
Flumble
Yes Man
Posts: 2073
Joined: Sun Aug 05, 2012 9:35 pm UTC

Re: avoiding coding "representation errors"

Postby Flumble » Sat Oct 27, 2018 12:50 am UTC

>-) wrote:this can be alleviated with a strong type system, and if i carefully create a type for each unit or representation that a coordinate/image might have, but that approach seems to be pretty heavy handed: i'd need to define a bunch of wrapper classes which do nothing besides check that types of the arguments match.

i'm not sure if using a type system to do this is the right approach since i've never seen it in real world code. so what is the solution?

The bulk of "real world" code is written in languages that have shitty type systems that require a lot of clutter and wrapper code if you want to add more semantics to your types, so you barely see it. (IIRC C++'s time library has very specific time types, but that's one of few cases.) I do think it is the right approach, but it requires a language with a decent type system and good inference/little duplication so you don't have too much bloat.
I think languages like Agda have a type system that is expressive enough (and a syntax terse enough) to cram nearly all your semantics in the type level, so you don't need any Hungarian naming for your variables. And you still get your compiler to complain (rather than a runtime error or a silently crashing orbiter) when it fails to deduce relativeImpulse = lockheedImpulse-referenceImpulse, because lockheedImpulse is in lbf¹s¹ whereas referenceImpulse is in N¹s¹. Or it may complain about fruits = apples+oranges unless apples (in AppleCount) and oranges (in OrangeCount) can be autoconverted to a FruitCount type. Or it may even complain about inLeftHand = inBothHands-inRightHand because they're all FruitCount←NonnegativeNumber, while a-b may clearly be negative unless you can assure the type system that a≥b.

It sounds very rude to eat an apple with your right hand while holding 0 apples in total.

User avatar
ucim
Posts: 6548
Joined: Fri Sep 28, 2012 3:23 pm UTC
Location: The One True Thread

Re: avoiding coding "representation errors"

Postby ucim » Sat Oct 27, 2018 3:51 am UTC

Tub wrote:
ucim wrote:Because $start+$middle+$end could contain a $middle that accidentally closes a tag in $start?
Creating html via string concatenation is bad for the same reason that creating database queries with string concatenation is bad...
But surely you should be able to do:
$rawhtml = $safestart+$safemiddle+$safeend;
$safehtml = clean($rawhtml);

assuming you have a clean() function that does the proper escaping for the kind of HTML you need at that point, no?

Jose
Order of the Sillies, Honoris Causam - bestowed by charlie_grumbles on NP 859 * OTTscar winner: Wordsmith - bestowed by yappobiscuts and the OTT on NP 1832 * Ecclesiastical Calendar of the Order of the Holy Contradiction * Please help addams if you can. She needs all of us.

User avatar
phlip
Restorer of Worlds
Posts: 7556
Joined: Sat Sep 23, 2006 3:56 am UTC
Location: Australia
Contact:

Re: avoiding coding "representation errors"

Postby phlip » Sat Oct 27, 2018 6:49 am UTC

ucim wrote:
Tub wrote:But surely you should be able to do:
$rawhtml = $safestart+$safemiddle+$safeend;
$safehtml = clean($rawhtml);

assuming you have a clean() function that does the proper escaping for the kind of HTML you need at that point, no?

But how does your clean() function tell the difference between a good tag that you want to be there, vs a bad tag that was injected from user input that you've accidentally forgotten to escape properly?

Code: Select all

enum ಠ_ಠ {°□°╰=1, °Д°╰, ಠ益ಠ╰};
void ┻━┻︵​╰(ಠ_ಠ ⚠) {exit((int)⚠);}
[he/him/his]

User avatar
ucim
Posts: 6548
Joined: Fri Sep 28, 2012 3:23 pm UTC
Location: The One True Thread

Re: avoiding coding "representation errors"

Postby ucim » Sat Oct 27, 2018 2:59 pm UTC

phlip wrote:But how does your clean() function tell the difference between a good tag that you want to be there, vs a bad tag that was injected from user input that you've accidentally forgotten to escape properly?
It doesn't, universally. But in some restricted use cases it might. In the case where user-supplied HTML is simply not permitted (and all user input is cleaned before further processing) then it should be ok. All tags would be programmer-supplied (perhaps based on user hints, such as bbcode)

Yes, the programmer could use logic that is too convoluted, but even template engines have to put it together somewhere, and that's going to be string concatenation too.

Yes?

btw, you have a quote fail. String concatenation bug? :)

Jose
Order of the Sillies, Honoris Causam - bestowed by charlie_grumbles on NP 859 * OTTscar winner: Wordsmith - bestowed by yappobiscuts and the OTT on NP 1832 * Ecclesiastical Calendar of the Order of the Holy Contradiction * Please help addams if you can. She needs all of us.

User avatar
Flumble
Yes Man
Posts: 2073
Joined: Sun Aug 05, 2012 9:35 pm UTC

Re: avoiding coding "representation errors"

Postby Flumble » Sat Oct 27, 2018 3:04 pm UTC

ucim wrote:
Tub wrote:
ucim wrote:Because $start+$middle+$end could contain a $middle that accidentally closes a tag in $start?
Creating html via string concatenation is bad for the same reason that creating database queries with string concatenation is bad...
But surely you should be able to do:
$rawhtml = $safestart+$safemiddle+$safeend;
$safehtml = clean($rawhtml);

assuming you have a clean() function that does the proper escaping for the kind of HTML you need at that point, no?

Surely $safestart+$safemiddle+$safeend$safehtml because concatenation is a safe operation, right?
But even if all escaping is done right and all concatenation is safe (and type-correct), HTML is a tree structure, so gluing an html to another html as if it's a sequence is fundamentally wrong.
Alright, you'll need an asText function somewhere that converts an HTML node to text using string concatenation. But that's the one place where you disable your error reporting or write all the type annotations and conversions so you can build "HTML text" (which is, of course, fundamentally different from other text and other representations of HTML).

User avatar
ucim
Posts: 6548
Joined: Fri Sep 28, 2012 3:23 pm UTC
Location: The One True Thread

Re: avoiding coding "representation errors"

Postby ucim » Sat Oct 27, 2018 7:03 pm UTC

Flumble wrote:HTML is a tree structure, so gluing an html to another html as if it's a sequence is fundamentally wrong.
Well, gluing HTML to HTML doesn't mean you get good HTML at the end ("enclosing HTML in HTML" is probably a better way to abstract it out, but enclosing employs concatenation anyway). That's why I specified beginning, middle, and end, neither of which is necessarily (complete) HTML.

e.g.
$opentags = '<b><i>';
$text = 'Hello world';
$closetags = '</b></i>;
$output = $opentags+$text+$closetags;


It's still on you to not mess up the tag order (as I did here), because that is what is fundamentally wrong. But that's a different kind of bug, and not the fault of concatenation.

and...
$opentags = '<b><i>';
$userinput = getuserinput();
$text = clean($userinput);
$closetags = '</i></b>;
$output = $opentags+$text+$closetags;


should work fine, no?

In this simple example, it might be better to do:
$taglist = array('b', 'i');
$tags = makeHTMLtags ($taglist);
$opentags = $tags['open']...


but I'm not convinced that it's the concatenation itself that is the issue. Like an MP3 file, you can't glue pieces arbitrarily. The pieces have to be assembled correctly. But I'm not sure that is amenable to a simple "just use HH (HTML Hungarian) notation" solution.

Jose
Order of the Sillies, Honoris Causam - bestowed by charlie_grumbles on NP 859 * OTTscar winner: Wordsmith - bestowed by yappobiscuts and the OTT on NP 1832 * Ecclesiastical Calendar of the Order of the Holy Contradiction * Please help addams if you can. She needs all of us.

elasto
Posts: 3563
Joined: Mon May 10, 2010 1:53 am UTC

Re: avoiding coding "representation errors"

Postby elasto » Sun Oct 28, 2018 7:47 pm UTC

Obviously you're just trying to give a simple example here, but surely you wouldn't be constructing it that way.

Off the top of my head, and assuming we are going for a really simple use-case here, wouldn't a better approach be something like:

Code: Select all

var rawUserInput = GetUserInput();
var cleanUserInput = Clean(rawUserInput);
var cleanFormattedUserInput = cleanUserInput.ApplyItalics().ApplyBold();


That way you can't mess up the tag order in the way you suggested..?

User avatar
ucim
Posts: 6548
Joined: Fri Sep 28, 2012 3:23 pm UTC
Location: The One True Thread

Re: avoiding coding "representation errors"

Postby ucim » Sun Oct 28, 2018 9:40 pm UTC

elasto wrote:var cleanFormattedUserInput = cleanUserInput.ApplyItalics().ApplyBold();
Sure, but when you write the ApplyBold() method, aren't you going to something like
$bolded = $openboldtag.$this.$closeboldtag;? (Please forgive my mangling together of php, OOP, and C++) :)

The point was that string concatenation isn't (or is it?) a bad way to create HTML. Ultimately, how else would you do it?

Yes, you can't stick things together willy nilly, but you do have to stick things together! The thing is to try to keep the pieces simple, so that it's harder to unknowingly stick the wrong things together. Sometimes though, there is a complicated piece of HTML for which you could either (here be dragons!) brute force it, or write a ch*rpton of fiddly little functions you'll only use once. And while it's tempting to say just write the functions to keep things clean, once those functions are written, it will be tempting to re-use them (nothing is ever used just once), so they will need to be made robust under all sorts of use cases you'll never use them for, in case you do use them for one of those other use cases you didn't think of.

Whenever you put a wrapper on something, it needs to be a good wrapper.

Jose
Order of the Sillies, Honoris Causam - bestowed by charlie_grumbles on NP 859 * OTTscar winner: Wordsmith - bestowed by yappobiscuts and the OTT on NP 1832 * Ecclesiastical Calendar of the Order of the Holy Contradiction * Please help addams if you can. She needs all of us.

Tub
Posts: 401
Joined: Wed Jul 27, 2011 3:13 pm UTC

Re: avoiding coding "representation errors"

Postby Tub » Mon Oct 29, 2018 10:45 am UTC

Jose, if you really wish to make the point that string concatenation is fine for html creation, please start by implementing that clean() function that you keep using, such that it produces correct and safe html in any case. Multiple people have explained to you why that's problematic, but you've ignored them. Maybe you'll need to implement it to understand.

Next you'll need to show how you're going to test your html for mismatched tags and other invalid constructs before deployment.

Those are the basic requirements for any API: safe, correct and verifyable. Anything else is not suitable beyond small personal projects.

Once you've shown that it's a usable API, we can discuss whether it's a good API, and whether it's better than the alternatives. Is the code easily readable? Is the html structure you generate easily discernible from your code? Is your development environment capable of syntax highlighting your html, possibly highlighting invalid syntax as you type it?
You keep asking about alternatives, but you keep ignoring the answers. Go and research a few templating systems; most of them fare much better than the approaches you've posted here.

User avatar
Sizik
Posts: 1221
Joined: Wed Aug 27, 2008 3:48 am UTC

Re: avoiding coding "representation errors"

Postby Sizik » Mon Oct 29, 2018 2:09 pm UTC

The real solution is to let go of the assumption that HTML should be stored and manipulated as plain strings, and only convert it to a string at the very end when the page is complete.
gmalivuk wrote:
King Author wrote:If space (rather, distance) is an illusion, it'd be possible for one meta-me to experience both body's sensory inputs.
Yes. And if wishes were horses, wishing wells would fill up very quickly with drowned horses.

Tyndmyr
Posts: 11443
Joined: Wed Jul 25, 2012 8:38 pm UTC

Re: avoiding coding "representation errors"

Postby Tyndmyr » Mon Oct 29, 2018 5:22 pm UTC

ucim wrote:
Tub wrote:
ucim wrote:Because $start+$middle+$end could contain a $middle that accidentally closes a tag in $start?
Creating html via string concatenation is bad for the same reason that creating database queries with string concatenation is bad...
But surely you should be able to do:
$rawhtml = $safestart+$safemiddle+$safeend;
$safehtml = clean($rawhtml);

assuming you have a clean() function that does the proper escaping for the kind of HTML you need at that point, no?

Jose


You can. However, in practice, complexity tends to build up. So long as complexity stays sufficiently low, it's fairly easy to keep track of, but if you scale upward, it'll eventually become less obvious what you're doing, and troubleshooting can be annoying.

There are some cases where you have to deal with complex strings in relationship to html(Struts webapps can be one of them if you get creative with the framework), but generally, you want to avoid it as much as possible so you don't need to return to your html generation as your app grows.

Also, as implied when Tub was talking about attacks, if anything you're creating is coming from user input, you need to properly sanitize that. Yeah, you could roll your own security, but given that it's a pretty generic problem, you're usually better served by using existing methods. Same goes with string concatenation. Yeah, it's a string at the end, but that doesn't mean you benefit from treating it as a string at all other times. Your content will determine what exactly is most handy, but a tree structure is pretty normal. Likewise, if you're dealing with XML, you're largely going to use an existing parser rather than doing string concatenation on your own, unless you have to deal with very unusual input, in which case you're basically rolling your own parser to deal with probably-non-standard data that somehow needs to be cleaned and used. That's an edge case, though.

So, there's a little fuzziness where circumstances exist in which "bad" practices are needed, but as a general rule of thumb, it's definitely worth following existing standards. The existence of edge cases isn't a good reason to ignore standard practices in most cases.


Return to “Coding”

Who is online

Users browsing this forum: No registered users and 5 guests