Talk:Comparison of parser generators

This article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.Computer scienceWikipedia:WikiProject Computer scienceTemplate:WikiProject Computer scienceComputer science articles

Mid

This article has been rated as Mid-importance on the project's importance scale.

Things you can help WikiProject Computer science with:

Here are some tasks awaiting attention:

Article requests :
- Requested articles/Applied arts and sciences/Computer science, computing, and Internet
Cleanup :
- Computer science articles needing attention
- Computer science articles needing expert attention
Copyedit :
- Computing
Expand :
- Computer science
Infobox :
- Computer science articles without infoboxes
Maintain :
- Timeline of computing 2020–present
Photo :
- Find pictures for the biographies of computer scientists (see List of computer scientists)
- Computing articles needing images
Stubs :
- Computer science stubs
Unreferenced :
- WikiProject Computer science/Unreferenced BLPs
Project-related :
- Tag all relevant articles in Category:Computer science and sub-categories with {{WikiProject Computer science}}

Computing: Software Mid‑importance

	This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.ComputingWikipedia:WikiProject ComputingTemplate:WikiProject ComputingComputing articles
Mid	This article has been rated as Mid-importance on the project's importance scale.
	This article is supported by WikiProject Software (assessed as Mid-importance).

"Confusing"[edit]

I marked the article as confusing because I made a parser generator and would like to add it to the list, but some of the columns are so unclear that I'm not sure what to put in them!

I suspect "Grammar, code" answers the question "Are actions associated with the grammar mixed or separate from the grammar?" right?
I imagine that the "Lexer" column is supposed to indicate whether the parser generator generates lexers and parsers or just parsers, but perhaps it would make more sense to have two "yes/no" columns, "generates parsers?" and "generates lexers?". As it is, there are three values, "none", "generated" and "external" and I have no idea what the difference is between "generated" and "external".
I suppose that "IDE" either means "the parser generator has its own dedicated IDE" or "the parser generator integrates with one or more IDEs". Certainly this should not be a yes/no column: it should say whether the tool has a dedicated IDE, and if it integrates with an IDE it should specify which IDE.
The "input grammar notation" seems weird in that ANTLR is marked "EBNF", but ANTLR syntax is definitely different than the EBNF described on Wikipedia. Since ANTLR is so popular I'm inclined to say that we simply refer to the notation as simply "ANTLR".
The "parsing algorithm" seems to conflate two different issues: the input grammar type (e.g. LALR(1), LL(k), PEG) versus the parsing technique (e.g. table-driven, runtime-stack recursive-descent, packrat). In the "Deterministic context-free languages" section, "parsing algorithm" seems to state the input grammar type, but in the "Parsing expression grammars" section, the grammar type is presumably "PEG" so this column specifies the parsing technique instead. Perhaps it's reasonable to have both in a single column, but let's be clear that grammar type and parsing algorithm are two different things.

I request that someone who understands these tables well should clarify them by using better column names (e.g. IDE integration instead of just "IDE") and by adding explanatory paragraphs or bullet points above or below the table. I'd also suggest merging the "Deterministic context-free languages" and "Parsing expression grammars" since both are used for parsing computer languages. I'm guessing that "General context-free, conjunctive or boolean languages" is more for natural language processing so it's probably reasonable to have a separate table; in any case, it is non-obvious what the difference between the sections are; descriptive text and links to other Wikipedia articles are needed. Finally, I think that either (A) there should an extra column for special features of the parser generator, e.g. ANTLR's gated predicates or tree parsing, which go "above and beyond" the capabilities of grammar types like LL(k) in the theoretical literature, or (B) the "grammar type" column could mention the special features (e.g. LL(*) + syntactic/semantic predicates). 66.11.82.73 (talk) 23:18, 19 March 2014 (UTC)[reply]

Perl Parse::RecDescent[edit]

Where would Perl's Parse::RecDescent be in those tables? I'd add it but I'm not confident enough in my compilers chops. It is documented at http://search.cpan.org/~jtbraun/Parse-RecDescent-1.967009/lib/Parse/RecDescent.pm

JidGom (talk) 17:32, 4 December 2013 (UTC)[reply]

What is LR(*)?[edit]

Can any expert in the field confirm that "LR(*)" is a valid term? It is mentioned as a parsing algorithm for the "Hime Paser Generator". Rasha ROR 1 (talk) 13:20, 26 March 2013 (UTC)[reply]

It's similar to LL(*) or ALL(*). ANTLR 4 does ALL(*) parsing. The (*) means that the parser does look-aheads to make a parsing decision in an ambiguous situation.

LR(*) is superior to LR(1) because sometimes it's humanly impossible or undesirable to make an LR(k) grammar into an LR(1) grammar. At least one parser generator has implemented LR(*) parsing (i.e. LRSTAR). The challenge is to make LR(*) efficient so that it does not impact the performance of the parser. As long as k (the number of look-aheads) is small, LR(*) parsing is efficient. — Preceding unsigned comment added by Paulbmann (talk • contribs) 10:28, May 23, 2019 (UTC)

Merge[edit]

Due to the fact that the article now contains also a list of parser generators for regular languages, I suggest to merge the article with List of C Sharp lexer generators. --Hyperyl (talk) 22:54, 24 December 2008 (UTC)[reply]

I think the other article is much more objective than this 189.89.156.99 (talk) 21:08, 3 May 2010 (UTC)[reply]

Done--ebraminio^talk 09:23, 18 December 2010 (UTC)[reply]

Old Discussion[edit]

This discussion page needs to be cleaned up, the mentioned issues were resolved!--Hyperyl 13:49, 11 November 2007 (UTC)[reply]

Article Content[edit]

Let's decide on what information should be in the article - if a description the different columns is needed and if the two charts should be combined.--DevinCook 13:43, 11 November 2007 (UTC)[reply]

Yes, please integrate an improved version of the old descriptions. But keep the current table, it was a lot of work to edit it!--Hyperyl 13:49, 11 November 2007 (UTC)[reply]

I'm sure it took a ton of time to edit it! I do think that the license columns need to state Open Source/Freeware/etc... Only a few programmers know the details of each license on quick inspection. Also, the term "free software" - to spite it origins in advocacy of ownerless code - is not a good term anymore. It, nowadays, just refers to the "open source + public domain". I admit that that's not good for the movement, but its the reality of the art. --DevinCook 13:55, 11 November 2007 (UTC)[reply]

Free Software licenses use copyright to protect the software thus they are not public domain. Open Source and Free Software requirements are the same. The code is not ownerless Open Source is just a marketing term. Please inform yourself! Have a look at the GNU philosophy pages!--Hyperyl 14:02, 11 November 2007 (UTC)[reply]

I am very familiar with the terminology. Free software is more a philosophy nowadays rather than than an actual licensing term. The open source community worries about the implementation of source available software while the Free Software community is more concerned with ethical issues. Open source has been extremely successful - in particular Linux. To use an old Gnu quote: "For the Open Source movement, the issue of whether software should be open source is a practical question, not an ethical one. As one person put it, 'Open source is a development methodology; free software is a social movement.' For the Open Source movement, non-free software is a suboptimal solution. For the Free Software movement, non-free software is a social problem and free software is the solution." . Whether one is a Stallman-ist or a Torvalds-ist is the big question. For the most part, Torvalds has prevailed and "free software" is basically seen as an absolute form of "open source". --DevinCook 14:31, 11 November 2007 (UTC)[reply]

Well, I am a Stallman-ist. There is a big difference between us: You write proprietary software and I reject it. So there will be no consensus between us, because you prefer restriction and try to subjugate the users and I prefer freedom and try to protect the users.

But have a look at:

--Hyperyl 15:32, 11 November 2007 (UTC)[reply]

Freeware vs. Free Software[edit]

I suggest that we keep the term "freeware" to describe the software available without cost. "Free software" is, for the most part, synonymous with "open source" - which is a logical subset of freeware.

Not all the software listed is open source. --DevinCook 23:59, 11 July 2007 (UTC)[reply]

Thus it is Proprietary Software and should be included in the appropriate section. Open Source is a marketing term with a different intention than Free Software. Open Source focuses on technical aspects and Free Software on philosophical problems and thus represents philosophical values. I think these values are far more important than technical aspects.

Additionally Freeware is gratis Proprietary Software. Thus it is not Free Software (Open Source Software). Thus Free Software programmes cannot be Freeware, because they respect our freedom instead of restricting us.

If a programme is proprietary and generates source code that is licensed under Free Software License, it should be listed in the Proprietary Software section, because it still restricts the us and does not respect our freedom.

Commercial means charging money for the programme or a service related to the programme. Proprietary and Free Software can be commercial, therefore the headline makes no sense.

I reverted your revisions due to the reasons mentioned above. Please inform yourself about the things you are writing about. —Preceding unsigned comment added by Hyperyl (talk • contribs) 21:32, 28 October 2007 (UTC)[reply]

hmm - that sort of remark belongs on slashdot, not in a neutral, informative context. Tedickey 21:52, 28 October 2007 (UTC)[reply]

Best to dispense with the labels and simply cite what license applies. That's neutral, and will only offend the people who want to claim that other people belong to their group Tedickey 21:58, 28 October 2007 (UTC)[reply]

Yes this is a good solution!--Hyperyl 22:11, 28 October 2007 (UTC)[reply]

The parser generators were originally listed in one chart. I broke it into too to allow developers to see which ones were commercial and those available for free. As the revision stands now, it isn't that helpful for users. GOLD is not commerical. In fact, most of the items listed as "free software" are not what is implied by "free software". One possible solution is to once-again combine the two tables into one.--DevinCook 07:46, 29 October 2007 (UTC)[reply]

I made some changes that should provide a good compromise.--DevinCook 07:46, 29 October 2007 (UTC)[reply]

Your argumentation makes no sense: Free Software is software that respects the user's freedom, all the listed programmes do and therefore are Free Software. If you want to filter commercial software (software that is not gratis), you could simply sort by license or add a price field. The section division you did makes no sense at all you put Proprietary and Free Software into one section. This is unethical. Please merge the list as it has been democratically (based on consensus) decided. I think this is a good compromise (like in nearly all other software lists). Everyone will be happy. Maybe you could put the Proprietary programmes at the end of the list, so it is clearly distinguishable from Free Software. But this is of course optional. I would be also happy with a alphabetically sorted list.--Hyperyl 18:57, 29 October 2007 (UTC)[reply]

Free software is a logical subset of open source software. The section is designed to list software titles that the user can use without cost. Sadly, the terminology that we all use is a tad of a nightmare - and the cause of this debate. "Free software" is free, but not all free software titles are "free software". They main difference between "free software" and "freeware" is whether the source is open. By definition, "freeware" is proprietary. However, the divisions were based on commercial vs. free rather than source code availability.--DevinCook 19:25, 29 October 2007 (UTC)[reply]

Devin, this is my last attempt to explain to you what Free Software is. By definition Free Software and Open Source Software are the same. Their definitions describe the four essential freedoms. Open Source is a marketing term that was coined because business people confused Free Software with Open Source. Thus Free Software is not a subset of Open Source - it is equal to it. You shouldn't divide software in to cost categories. This is useless. A gratis copy of a Proprietary programme is still unethical and useless (because you are not free and can not properly use (in terms of usage, education, collaboration and development) the software. Freeware is gratis Proprietary Software and Free Software is not required to be gratis. Freedom is the difference and the source code and attributed rights are a requirement for this freedom. You shouldn't define free as gratis. It is free as in Freedom not as in price. Free Software licenses never make any requirements about price. Their purpose is freedom which is much more valuable than money. If you still think money is a more important than freedom, I cannot help you, because you like to be a slave, you like to be subjugated. So please correct your mistakes!--Hyperyl 23:02, 29 October 2007 (UTC)[reply]

By the way - why do you act as if you're an authority on the matter? Tedickey 23:59, 29 October 2007 (UTC)[reply]

Definitions[edit]

Q: How is here the Definition of Grammar/code: Mixed or Separated. Separated from Code (without Code inside) or as independent File? 129.69.189.32 20:56, 12 October 2006 (UTC)[reply]

Explanation[edit]

This chart needs much more explanation. I.e. adding wiki-links to LALR, etc, what does "sepperated" and "mixed" mean (per above), etc, etc, etc... —The preceding unsigned comment was added by 71.70.210.188 (talk) 00:09, 11 December 2006 (UTC).[reply]

Misleading article title[edit]

Hi, I propose to change the title of this article to List of Parser Generators. This article is about parser generators and not about parsers, isn't it?eboy 15:55, 31 August 2007 (UTC)[reply]

Excellent point. Originally, the article was titled "List of Compiler-Compilers", however, many items on the list are not compiler-compilers, including my project. I do agree that changing the title to "List of Parser Generators" or perhaps "List of Parser Development Software", would be useful. What is everyone's thoughts on the issue? --DevinCook 20:59, 1 September 2007 (UTC)[reply]

Agreed, I like "List of Parser Generators" best. // Sping 20:24, 14 September 2007 (UTC)[reply]

Column with external link[edit]

In list are generators having Wiki page (ANTLR), having extenral link (ACCENT) and no having Wiki and external link (Grammatica). My proposal is new column with extenral link; this givs advantages:

direct go to generator home page, irrespective if generator has Wiki page or not

instantly can be seen (by color) if generator has Wiki page or not

--Borneq 08:11, 18 October 2007 (UTC)[reply]

Remove website links[edit]

I suggest to remove the links to the websites of the parser generators, because it looks ugly and duplicates the efforts of the article pages (if present) which already list it.

So I propose to link the names of the parser generator that don't have a Wikipedia article to their websites. —Preceding unsigned comment added by Hyperyl (talk • contribs) 23:01, 24 December 2008 (UTC)[reply]

Sounds like a good idea. Most of the data on this page is from non-notable programs, mostly added by their respective developers Tedickey (talk) 17:59, 15 May 2010 (UTC)[reply]

non-notable Waxeye program[edit]

Recent edits have added "Waxeye", which (see google "waxeye parser") appears to be non-notable. Tedickey (talk) 10:27, 30 December 2008 (UTC)[reply]

For context, most of the google hits on "waxeye" itself refer to a bird. The program has a few hundred hits, which is a few orders of magnitude less common than one would expect for a Wikipedia topic. Tedickey (talk) 10:28, 30 December 2008 (UTC)[reply]

Is Yacc++ really LR(k)/LALR(k)?[edit]

How sure are you that Yacc++ is LR(k)/LALR(k). k seems to imply that it has unbounded lookahead, but as far as I can tell from the website's overview page, Yacc++ is LR(1)/LALR(1). Errantkid (talk) 21:47, 28 January 2009 (UTC)[reply]

I'm changing it. Please if someone disagrees let us know Errantkid (talk) 21:09, 11 June 2009 (UTC)[reply]

It actually is LR(k) and there are two ways of achieving that functionality.

First, when doing lookahead calculations the parser can actually consult a non-terminal and not just a terminal symbol, if the non-terminal represents something of unbounded length, the lookahead that it resolves is unbounded length--this is essentially a form of backtracking (and depends upon the fact that our implementation of the parser "stack" is actually more akin to a deque or maybe a pair of deques). This method is somewhat experimental and needs a command line option to enable and we have never found it useful in practice, e.g. we have never found a non-toy grammar that would compile with it turned on that wouldn't compile with normal techniques--it didn't magically resolve all the conflicts in our C++ grammar. I've always meant to write a technical article describing the technique, but my day-jobs have never left such time--no one can make a living writing parser generators.

Second, the user can also apply predicates anywhere within a rule, and again those predicates are regular expressions of non-terminals, which essentially allow the user to given an LR(k) language as the lookahead at that particular point in the grammar--this is essentially a parsing expression grammar (PEG) feature, and works similar to the mechanism in ANTLR, where we borrowed the idea from, but did it using LR rather than LL techniques, using a variation on the idea of "tunnel grammars" that Josef Grölsch invented for Cocktail. Again, it is essentially a backtracking method, but in this case the user specifies exactly where it should occur in the parsing process, so you can bound its impact, as all backtracking techniques often defeat the linear promise of LR (and LL) parsing. Still, this is practical and someone, Boris Burshtien (sp?) or perhaps Quinn Tyler-Jackson, once showed how one could use the technique to emulate a Turing Machine, so it is also completely general.

93.40.177.50 (talk) 10:12, 1 January 2017 (UTC) Christopher F Clark[reply]

LL(1) = Recursive Descent[edit]

I think it would be the best to replace all "LL(1)"s and "Recursive Descent"s with LL(1). --Chricho (talk) 20:22, 9 March 2010 (UTC)[reply]

GLR is for CFGs[edit]

Should CFG be replaced with “deterministic CFG”? --Chricho (talk) 18:47, 3 December 2010 (UTC) However, there is an Earley-parser generator which can obviously handle non-deterministic CFGs, so I think it should be put together with the GLR generators. I think we can ignore the fact that backtracking-parsers can also handle non-deterministic/ambiguous grammars. Is there any special reason for the separation of Packrat-parser-generators? The grammars they can handle are a superset of LL(1), but not a superset of LR(1) or LL(k), so I do not think they play in another league. --Chricho (talk) 18:54, 3 December 2010 (UTC) Changed… --Chricho (talk) 19:21, 3 December 2010 (UTC)[reply]

What evidence do you have that Packrat-parser generators are a subset of LR(1)? Backtracking parsers (using predicates) have been shown to emulate Turing Machines, so I would assume that PEGs could do the same.

93.40.177.50 (talk) 10:33, 1 January 2017 (UTC) Christopher F Clark[reply]

Redlink information[edit]

Redlink cleanup removed the following information contained in the link name or non-existent article title. It might be reintegrated, for example, the fact that Narwhale is parser generator, Packrat is a library, and what GDK stands for.

Acronyms

Grammar Deployment Kit|GDK
LALR Parser Generator|LPG
Lexical Analyzer and Parser Generator|Lapg

Libraries

Parsing Expression Grammar Template Library|PEGTL
Parsnip Parser Library|Parsnip
Frisby (library)|Frisby
Packrat (library)|Packrat

Lexers

Turbo Pascal Lex/Yacc|TP Yacc
Alex (lexer generator)|Alex
Dolphin (lexer generator)|Dolphin
DFA (lexer generator)|LRSTAR

Parsers

Msta parser|Msta
Laja (parser and code generator)|Laja
Laja (parser and code generator)|Laja
APG (parser generator)|APG
Aurochs (parser generator)|Aurochs
Austenx (parser generator)|AustenX
AustenX (parser generator)|AustenX
Beaver (parser generator)|Beaver
CSP (parser generator)|CSP
CUP (parser generator)|CUP
Dragon (parser generator)|Dragon
Dypgen (parser generator)|Dypgen
eli (parser generator)|eli
Elkhound (parser generator)|Elkhound
jay (parser generator)|jay
Lime (parser generator)|Lime
LISA (parser generator)|LISA
Menhir (parser generator)|Menhir
Monkey (parser generator)|Monkey
Narwhal (parser generator)|Narwhal
peg (parser generator)|peg
SLK (parser generator)|SLK
SPARK (parser generator)|SPARK
Styx (parser generator)|Styx
Tap (parser generator)|Tap
Tom (parser generator)|Tom
Waxeye (parser generator)|Waxeye
Whale (parser generator)|Whale
Wisent (parser generator)|Wisent
Yard (parser generator)|Yard
Happy (Parser)|Happy
Parsec (parser)|Parsec
Sprache (parser)|Sprache

Programming Language and software

Katadhin (programming language)|Katahdin
Essence (software)|Essence
Treetop (software)|Treetop

— Cpiral Cpiral 16:51, 8 September 2013 (UTC)[reply]

That's problematic: most of the content for this topic lists nonnotable programs of little interest to anyone except those who added the external links (likely their developers). TEDickey (talk) 17:24, 8 September 2013 (UTC)[reply]

There are some parser generators I would consider important in your list.

LALR (and its descendent LRSTAR) by Paul Mann was one of the most popular parser generators on PC's in the 1980's and 90's.

ELI was (is?) a very extensive compiler-compiler system developed by a consortium of universities, including the University of Colorado. It is one of the few systems that I know of that attempt to have notations that solve other parts of the compiler construction problem beyond lexing and parsing.

Elkhound, Menhir, Happy, and Parsec also had (have?) adherents beyond their initial author if I recall correctly.

93.40.177.50 (talk) 11:15, 1 January 2017 (UTC) Christopher F Clark[reply]

PEG.js is not packrat[edit]

Packrat has O(N) execution time, but with PEG.js you can easily verify O(N^2) execution time with the grammar:

S = ( X / . )*
X = "a" [ab]* "b"

Running this with cache enabled with a long string of "a" characters will give O(N^2) execution time. There is a misconception that any recursive descent parser with memoization is a packrat parser. That's not the case: a packrat parser achieves O(N) execution time by memoizing the result of every subexpression at every character position, including recursively unrolling repetition operators. PEG.js merely caches the result of the entire rule X.

Probably a lot of the parsers labelled "packrat" in this table are not really packrat, but I will just change the PEG.js one for now. I will call it a recursive descent parser, which is a reasonable approximation, it's also the term the author uses [1] -- Tim Starling (talk) 00:53, 4 June 2015 (UTC)[reply]

OMeta is not packrat[edit]

Along the same lines as what Tim Starling writes above, I have confirmed that OMeta is also not fully packrat. CF 2.2.3 "A Note on Memoization" on page 41 of http://www.vpri.org/pdf/tr2008003_experimenting.pdf . It is currently labeled in the table as "Packrat (modified)". I propose that we standardize on a parenthetical, such as "(selective)" which is also used to describe several other parser generators, to describe these common variants on packrat parsing, rather than fall all the way back to the generic "Recursive descent". C. Scott Ananian (talk) 17:28, 18 June 2015 (UTC)[reply]

do they all lack a GUI (Graphical User Interface) ?[edit]

I think that modern parser generators should include a GUI, this way it would be so easier to understand the parsing (and the parser generation) algorithms, thus allowing to create better parser more easily.

So the big question is : do they ALL lack a (good, intuitive, helping) GUI ???

78.227.78.135 (talk) 21:54, 9 November 2015 (UTC)[reply]

for example : https://github.com/akshayhebbarys/lalr-parser the GUI is quite bad ... they have only 4 examples, none of which being simply an arthmetic expression parser... we are in 2015, and the tools nearly didn't evolve since 1990 (bison) !!!! 78.227.78.135 (talk) 23:54, 9 November 2015 (UTC)[reply]

Most parser generators like a "good" (or any) GUI. However, I have from normally reliable sources that the one with Visual Parse++ was good. Unfortunately, that parser generator (and its GUI) are no longer available. I've been in contact with the author, but was never able to revive the product. The cost of creating (and maintaining) a parser generator, beyond creating one of "student" quality, is simply commercially prohibitive and the market vanishingly small.

93.40.177.50 (talk) 10:28, 1 January 2017 (UTC) Christopher F Clark[reply]

SLK parser generator's license is not MIT-like[edit]

The MIT license explicitly grants broad rights to use and redistribute the software:

   Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to *use, copy, modify, merge, publish, distribute, sublicense, and/or sell* copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

This broad grant of rights is what it means to most people for a license to be "MIT-like".

The SLK license doesn't do this:

   The License grants you the right to use the Product for the purposes of evaluating, designing, developing, and testing software products, provided that you agree to indemnify, hold harmless, and defend the supplier of the Product from and against any claims or lawsuits, including attorney's fees, that arise or result from the use or distribution of your software product.

I'm not sure what the authors of the SLK license really intended, but on the face of it this license appears to be far more restrictive than the MIT license. My (non-lawyer) interpretation of this license is that it probably intends to allow you to download the software and use it as-is but not to modify it or redistribute it. Thus, I think calling the SLK license "MIT-like" is misleading, and I'm removing that statement from the chart.

I'll note that the comparison with the MIT license doesn't seem to come from the SLK project. I can't find any statements about their license on their website aside from the text of the license itself. I found at least one other page calling it "MIT-like" but they may have been using Wikipedia as their source. — Preceding unsigned comment added by Nculwell (talk • contribs) 16:54, 27 March 2019 (UTC)[reply]

Notability review[edit]

The two tables in this article are very indiscriminate, largely consisting of non-notable projects which have seen very little adoption, if any. I've begun a review of the table for "Deterministic context-free languages". If anyone would like to assist, please continue working through this table, adding citations to reliable secondary sources, and deleting rows if none can be found. Entries don't necessarily need to meet the General Notability Guidelines and worth standalone articles, but they should have at least one reliable secondary source describing them (and not in passing). StereoFolic (talk) 18:22, 16 September 2023 (UTC)[reply]

Is Carburetta Notable?[edit]

Just had my edit trying to fix things undone - but the current state is inconsistent still. Carburetta is a C/C++ scannner *and* parser generator (so both), ranks first for "C scanner generator", ranks 4th for "C parser generator" (above bison and lemon, but below "PackCC"), and similar, but lower, for C++ and so on.

Now I understand (grateful) that the temporal whims of google search rank might not be the best measure of notability, but it is a work of substance, manual and everything, which the edit linked, and this pertains to a single row in a table, not an independent article (which it might warrant at some point in the future, but not now.)

Can someone clarify how "Notability" is measured on a table row in this sense? Is it subjective or is it objective, and if so, what are the criteria. And why does it apply to the one table, but not the other (where Carburetta still appears? It's good as a scanner generator but it's as a parser generator where it shines.)

Also, reading the discussion, might I suggest simplifying the article to two tables? "scanners" and "parsers" (or lexers and parsers?) Aside from a few research projects, I think that's where most practitioners would draw the biggest line in the sand? (Arguably scanners perhaps even don't belong in a parser list.)

Am new here, learning as I go, so feel free to point things out to me, even things that should be obvious to most. Bravingtheabyss (talk) 21:24, 17 January 2024 (UTC)[reply]

WP:NOTABILITY and WP:RS are a good place to start reading TEDickey (talk) 00:00, 18 January 2024 (UTC)[reply]

Hi Bravingtheabyss, welcome to Wikipedia! The guidelines on notability in entries within list-like articles like this are a bit ambiguous. The General Notability Guidelines apply to article topics, not article contents, but are definitely a good read for understand the general approaches to notability. The guidelines at WP:Stand-alone lists are also relevant here.

I have been conducting a (slow) notability review on this article for a bit now (see thread above), as I noticed that a decent majority of the entries are pretty clearly not notable. The rule of thumb I've been using is: "Can I find any third-party Reliable Source discussing this software in more than passing?" I haven't been able to find anything like that for Carburetta, so I just removed it from the other list too. I don't think there was any particular reason it was included in the one list but not the other, the distinctions are often pretty unclear and it's one of the many problems with this article. In any case, search rankings are definitely not a useful metric for notability, especially with services like Google that customize results for each user!

Please let us know if this isn't clear, I'm happy to answer any other questions! StereoFolic (talk) 02:19, 18 January 2024 (UTC)[reply]

@StereoFolic, @Tedickey thanks for taking the time to reply at length. Let me just say that everything you say makes sense to me. I worry, however, that this might bias the lists to works from academia, rather than "works from industry" if I may dub it that. Some independent mention might include its netbsd package [2]https://pkgsrc.se/wip/carburetta as a reliable second source, and some "mentions in passing" here: [3]https://stackoverflow.com/questions/12768411/yyparse-not-declared-in-bison-flex-c-project-only-for-some-version-of-gcc-bi and here: [4]https://stackoverflow.com/questions/76133442/parsing-a-wrapped-csv-file-with-only-one-pass-over-line-string

In summary, I think there are three issues, 1) is wikipedia better for it if it is included? 2) should it have the reference to its website?, and 3) should the article apply two different standards depending on the list we're discussing?

1) what's best for wikipedia? presumably the goal of the list is to inform researchers and/or engineers of prior work, and users of options. I feel the list would be better if substantial open source options are included.

2) reference to where? with the aim of the first, perhaps if not as a reference, the table should consider an additional column directing readers where to go. The best resource for this, I find, is the manual; for this is to inform, not persuade, of any claims.

3) carburetta, because of its ability to generate both scanner and parser independently and fused, is mentioned on this page on not one, but *two* lists. The page should be consistent.

With this in mind, and with the wellness of wikipedia as priority, I feel we should consider to:

A) Overturn the reversion with the above line of reasoning and the best interests in mind, or failing that,

B) Include Carburetta in both lists, but without the explicit reference, or with the reference set to its manual (as surrogate for a paper link for instance) or failing that,

C) Remove Carburetta from both lists (it doesn't make sense to consider it for its scanner generating prowess in isolation without considering it for its parser generating prowess,) to get the consistency of applied standards.

In any case, as I hope you'll agree, the current state of the article is not correct. The conflict of my bias and desire to be a good citizen means I'll leave it for you to decide. I'd be comfortable, albeit personally very disappointed, with option C). Bravingtheabyss (talk) 08:19, 18 January 2024 (UTC)[reply]

Requiring at least one third-party reliable source discussing a program could indeed result in some biases, but the alternative is including an overwhelming amount of programs with dubious noteworthiness.

The detriment to Wikipedia by over-inclusion is that articles can become over-long, difficult to navigate, and impossible to maintain accurately. The indiscriminate length of this article is a key reason why it has so many inaccuracies. Wikipedia is not an archive or a substitute for a search engine, so we don't need to include everything under the sun. See also WP:INDISCRIMINATE. Major parser generators that are actively used in industry have plenty of reliable sources discussing them. Reliable Sources need not be academic - they can be from any established subject matter expert. Primary sources are also fine to include for basic factual information, but not notability.

Stackoverflow posts and package distro listings are not considered reliable sources because both are very easy for anyone to create, and there are clearly many published packages that are not notable. NPM alone has well over a million packages!

Regarding the inclusion of Carburetta, I already removed it completely from the article (your option C), on the basis described above. StereoFolic (talk) 17:26, 18 January 2024 (UTC)[reply]