patch to goto optional semicolon changing opcodes

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

patch to goto optional semicolon changing opcodes

Patrick Donnelly
Sorry about vague subject. I'll try to clarify:

I'm using luac to confirm that the code formatted by [1] does not
change the behavior of a program. I've noticed that the optional
semicolon that terminates a goto or break statement [inside an if
block (I don't know why)] will influence the bytecode generation. If
there is a semicolon, an extra unreachable JMP is added:

$ luac -l -p - <<< "for i = 1, 2 do if true then break end end"

main <stdin:0,0> (9 instructions at 0x24a20d0)
0+ params, 5 slots, 1 upvalue, 4 locals, 2 constants, 0 functions
        1       [1]     LOADK           0 -1    ; 1
        2       [1]     LOADK           1 -2    ; 2
        3       [1]     LOADK           2 -1    ; 1
        4       [1]     FORPREP         0 3     ; to 8
        5       [1]     LOADBOOL        4 1 0
        6       [1]     TEST            4 1
        7       [1]     JMP             0 1     ; to 9
        8       [1]     FORLOOP         0 -4    ; to 5
        9       [1]     RETURN          0 1
$ luac -l -p - <<< "for i = 1, 2 do if true then break; end end"

main <stdin:0,0> (10 instructions at 0xca50d0)
0+ params, 5 slots, 1 upvalue, 4 locals, 2 constants, 0 functions
        1       [1]     LOADK           0 -1    ; 1
        2       [1]     LOADK           1 -2    ; 2
        3       [1]     LOADK           2 -1    ; 1
        4       [1]     FORPREP         0 4     ; to 9
        5       [1]     LOADBOOL        4 1 0
        6       [1]     TEST            4 1
        7       [1]     JMP             0 2     ; to 10
        8       [1]     JMP             0 0     ; to 9
        9       [1]     FORLOOP         0 -5    ; to 5
        10      [1]     RETURN          0 1


The patch to fix this is simple, similar to removing the optional
semicolon for the return statement.

Index: lua-5.2.0/src/lparser.c
===================================================================
--- lua-5.2.0/src/lparser.c     (revision 47)
+++ lua-5.2.0/src/lparser.c     (working copy)
@@ -1191,6 +1191,7 @@
     luaX_next(ls);  /* skip break */
     label = luaS_new(ls->L, "break");
   }
+  testnext(ls, ';');  /* skip optional semicolon */
   g = newlabelentry(ls, &ls->dyd->gt, label, line, pc);
   findlabel(ls, g);  /* close it if label already defined */
 }


I don't really consider this a bug (obviously) but hopefully Lua 5.2.1
can include this.


[1] http://lua-users.org/lists/lua-l/2012-04/msg01082.html

--
- Patrick Donnelly

Reply | Threaded
Open this post in threaded view
|

Re: patch to goto optional semicolon changing opcodes

Rena
On Fri, May 18, 2012 at 10:34 PM, Patrick Donnelly <[hidden email]> wrote:

> Sorry about vague subject. I'll try to clarify:
>
> I'm using luac to confirm that the code formatted by [1] does not
> change the behavior of a program. I've noticed that the optional
> semicolon that terminates a goto or break statement [inside an if
> block (I don't know why)] will influence the bytecode generation. If
> there is a semicolon, an extra unreachable JMP is added:
>
> $ luac -l -p - <<< "for i = 1, 2 do if true then break end end"
>
> main <stdin:0,0> (9 instructions at 0x24a20d0)
> 0+ params, 5 slots, 1 upvalue, 4 locals, 2 constants, 0 functions
>        1       [1]     LOADK           0 -1    ; 1
>        2       [1]     LOADK           1 -2    ; 2
>        3       [1]     LOADK           2 -1    ; 1
>        4       [1]     FORPREP         0 3     ; to 8
>        5       [1]     LOADBOOL        4 1 0
>        6       [1]     TEST            4 1
>        7       [1]     JMP             0 1     ; to 9
>        8       [1]     FORLOOP         0 -4    ; to 5
>        9       [1]     RETURN          0 1
> $ luac -l -p - <<< "for i = 1, 2 do if true then break; end end"
>
> main <stdin:0,0> (10 instructions at 0xca50d0)
> 0+ params, 5 slots, 1 upvalue, 4 locals, 2 constants, 0 functions
>        1       [1]     LOADK           0 -1    ; 1
>        2       [1]     LOADK           1 -2    ; 2
>        3       [1]     LOADK           2 -1    ; 1
>        4       [1]     FORPREP         0 4     ; to 9
>        5       [1]     LOADBOOL        4 1 0
>        6       [1]     TEST            4 1
>        7       [1]     JMP             0 2     ; to 10
>        8       [1]     JMP             0 0     ; to 9
>        9       [1]     FORLOOP         0 -5    ; to 5
>        10      [1]     RETURN          0 1
>
>
> The patch to fix this is simple, similar to removing the optional
> semicolon for the return statement.
>
> Index: lua-5.2.0/src/lparser.c
> ===================================================================
> --- lua-5.2.0/src/lparser.c     (revision 47)
> +++ lua-5.2.0/src/lparser.c     (working copy)
> @@ -1191,6 +1191,7 @@
>     luaX_next(ls);  /* skip break */
>     label = luaS_new(ls->L, "break");
>   }
> +  testnext(ls, ';');  /* skip optional semicolon */
>   g = newlabelentry(ls, &ls->dyd->gt, label, line, pc);
>   findlabel(ls, g);  /* close it if label already defined */
>  }
>
>
> I don't really consider this a bug (obviously) but hopefully Lua 5.2.1
> can include this.
>
>
> [1] http://lua-users.org/lists/lua-l/2012-04/msg01082.html
>
> --
> - Patrick Donnelly
>

I'd consider the generation of unnecessary, unreachable instructions a
bug, albeit a very minor one. Is there any reason not to fix it?

--
Sent from my Game Boy.

Reply | Threaded
Open this post in threaded view
|

Re: patch to goto optional semicolon changing opcodes

Matthew Wild
On 19 May 2012 08:11, Rena <[hidden email]> wrote:

> On Fri, May 18, 2012 at 10:34 PM, Patrick Donnelly <[hidden email]> wrote:
>> Sorry about vague subject. I'll try to clarify:
>>
>> I'm using luac to confirm that the code formatted by [1] does not
>> change the behavior of a program. I've noticed that the optional
>> semicolon that terminates a goto or break statement [inside an if
>> block (I don't know why)] will influence the bytecode generation. If
>> there is a semicolon, an extra unreachable JMP is added:
>>
>> $ luac -l -p - <<< "for i = 1, 2 do if true then break end end"
>>
>> main <stdin:0,0> (9 instructions at 0x24a20d0)
>> 0+ params, 5 slots, 1 upvalue, 4 locals, 2 constants, 0 functions
>>        1       [1]     LOADK           0 -1    ; 1
>>        2       [1]     LOADK           1 -2    ; 2
>>        3       [1]     LOADK           2 -1    ; 1
>>        4       [1]     FORPREP         0 3     ; to 8
>>        5       [1]     LOADBOOL        4 1 0
>>        6       [1]     TEST            4 1
>>        7       [1]     JMP             0 1     ; to 9
>>        8       [1]     FORLOOP         0 -4    ; to 5
>>        9       [1]     RETURN          0 1
>> $ luac -l -p - <<< "for i = 1, 2 do if true then break; end end"
>>
>> main <stdin:0,0> (10 instructions at 0xca50d0)
>> 0+ params, 5 slots, 1 upvalue, 4 locals, 2 constants, 0 functions
>>        1       [1]     LOADK           0 -1    ; 1
>>        2       [1]     LOADK           1 -2    ; 2
>>        3       [1]     LOADK           2 -1    ; 1
>>        4       [1]     FORPREP         0 4     ; to 9
>>        5       [1]     LOADBOOL        4 1 0
>>        6       [1]     TEST            4 1
>>        7       [1]     JMP             0 2     ; to 10
>>        8       [1]     JMP             0 0     ; to 9
>>        9       [1]     FORLOOP         0 -5    ; to 5
>>        10      [1]     RETURN          0 1
>>
>>
>> The patch to fix this is simple, similar to removing the optional
>> semicolon for the return statement.
>>
>> Index: lua-5.2.0/src/lparser.c
>> ===================================================================
>> --- lua-5.2.0/src/lparser.c     (revision 47)
>> +++ lua-5.2.0/src/lparser.c     (working copy)
>> @@ -1191,6 +1191,7 @@
>>     luaX_next(ls);  /* skip break */
>>     label = luaS_new(ls->L, "break");
>>   }
>> +  testnext(ls, ';');  /* skip optional semicolon */
>>   g = newlabelentry(ls, &ls->dyd->gt, label, line, pc);
>>   findlabel(ls, g);  /* close it if label already defined */
>>  }
>>
>>
>> I don't really consider this a bug (obviously) but hopefully Lua 5.2.1
>> can include this.
>>
>>
>> [1] http://lua-users.org/lists/lua-l/2012-04/msg01082.html
>>
>> --
>> - Patrick Donnelly
>>
>
> I'd consider the generation of unnecessary, unreachable instructions a
> bug, albeit a very minor one. Is there any reason not to fix it?

Lua has always done things like this. For example, functions always
have a RETURN opcode at the end, even if the function has an explicit
return in the code.

The position has always been that standard Lua is not an optimising
compiler, which is one (more) reason its source code is clean and
small. From this perspective I think an removing an extra opcode could
be considered adding a feature rather than removing a bug. It doesn't
really change any external behaviour.

Regards,
Matthew
Reply | Threaded
Open this post in threaded view
|

Re: patch to goto optional semicolon changing opcodes

Roberto Ierusalimschy
In reply to this post by Patrick Donnelly
> I don't really consider this a bug (obviously) but hopefully Lua 5.2.1
> can include this.

Note that the syntax in Lua 5.2 changed; now a 'break' does not need to
be the last statement in a block. So, there is no "optional semicolon"
after the break, but an empty statement. All the following examples are
valid now:

  for i = 1, 2 do if true then break;;; end end
  for i = 1, 2 do if true then break  a = 3 end end
  for i = 1, 2 do if true then break;  a = 3 end end
  for i = 1, 2 do if true then break;  ::label:: ; end end

Some of those cases need the extra JMP, some do not. Some people may
consider that "break;" is a common case to deserve a special treatment,
but it would be better to handle the general case.

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: patch to goto optional semicolon changing opcodes

Patrick Donnelly
On May 19, 2012 9:44 AM, "Roberto Ierusalimschy" <[hidden email]> wrote:

>
> > I don't really consider this a bug (obviously) but hopefully Lua 5.2.1
> > can include this.
>
> Note that the syntax in Lua 5.2 changed; now a 'break' does not need to
> be the last statement in a block. So, there is no "optional semicolon"
> after the break, but an empty statement. All the following examples are
> valid now:
>
>  for i = 1, 2 do if true then break;;; end end
>  for i = 1, 2 do if true then break  a = 3 end end
>  for i = 1, 2 do if true then break;  a = 3 end end
>  for i = 1, 2 do if true then break;  ::label:: ; end end
>
> Some of those cases need the extra JMP, some do not. Some people may
> consider that "break;" is a common case to deserve a special treatment,
> but it would be better to handle the general case.

That makes sense. For now I'll just keep a patched version of Lua
while I look for an alternative solution for confirming source code
equivalence. It's a little disappointing luac can't make this easier.

--
- Patrick Donnelly