c# - Antlr grammar for parsing C source code files and getting functions from them -
i wrote antlr grammar parsing functions c source code files:
grammar newcfunctions; options { language = csharp; } @parser::namespace { generated } @lexer::namespace { generated } func :function+ { console.writeline("hello"); } //this debugging ; name :[a-za-z]+[a-za-z0-9]* ; typename : 'void' | [a-za-z]+ | 'char' | 'short' | 'int' | 'long' | 'float' | 'double' | 'signed' | 'unsigned' | '_bool' | '_complex' | '__m128' | '__m128d' | '__m128i' | name ; arguments : (typename name)* ; newline : '\r'? '\n' ; functionbody : ([a-za-z0-9]|newline)*; function : typename ' ' name '(' arguments ')' ' '? newline? '{' functionbody '}' newline? ;
i generatet c# files , included them test project. main function of it:
try { antlrinputstream input = new antlrinputstream(console.in); newcfunctionslexer lexer = new newcfunctionslexer(input); commontokenstream tokens = new commontokenstream(lexer); newcfunctionsparser parser = new newcfunctionsparser(tokens); parser.func(); } catch (exception e) { console.writeline(e.message); } console.readkey();
when write "void foo(int a){return a;}" gives me ann error: "line 1:0 mismatched input 'void' expecting typename". please, me grammar! saw c grammar in internet, has 800+ lines , don't know it. if know, how use it, promt me please. thank you!
as has been said name
rule should placed after typename
rule. lexem typename
should not contain lexem name
, [a-za-z]+
.
so, final verison:
grammar newcfunctions; options { language = csharp; } @parser::namespace { generated } @lexer::namespace { generated } func : function+ { console.writeline("hello"); } //this debugging ; function : typename ' ' name '(' arguments ')' ' '? newline? '{' functionbody '}' newline? ; arguments : (typename name)* ; typename : typename | name ; functionbody : (typename | name | newline)* ; typename : 'void' | 'char' | 'short' | 'int' | 'long' | 'float' | 'double' | 'signed' | 'unsigned' | '_bool' | '_complex' | '__m128' | '__m128d' | '__m128i' ; name : [a-za-z]+ [a-za-z0-9]* ; newline : '\r'? '\n' ;
also advise use channels newlines , spaces ignoring in parsing process.
Comments
Post a Comment