Name>Struct is designed to be as complete, accurate, and fast as possible so that it can be used with confidence to interpret one name or a million.
"Yes" and "yes".
...however, this isn't really a good question for one very practical reason: many -- and perhaps most -- chemical names in actual use violate the nomenclature rules published by those organizations. IUPAC names and CAS names represent only a small fraction of the chemical names that are actually being used. Name>Struct does interpret IUPAC and CAS names, but it also recognizes many types of nomenclature usage (and misusage) that are discouraged or even forbidden by the published rules.
Name>Struct recognizes and correctly interprets:
...however, this also isn't really a good question. Published nomenclature recommendations are designed to be all-encompassing: IUPAC has as many rules describing the naming of amine oxides as it does for the naming of alcohols, even though the latter are vastly more common in practical use. The recommendations not supported by Name>Struct are, without exception, obscure.
Basically, "all of them". IUPAC offers this list of general nomenclature procedures:
Name>Struct handles all of the above procedures more or less completely. That is, Name>Struct may fail to interpret any given name, but it would have problems because it failed to recognize some particular name fragment ("3-unknownyl-2-chloro-propanol") rather than because it couldn't understand the principles of a substitutive name.
The only general procedure that Name>Struct fails to support completely is subtractive nomenclature.
Accuracy is an extremely important question, especially when batch-converting thousands (or hundreds of thousands) of names. It's very important that you are able to trust the output of any algorithm designed to run without supervision, and with Name>Struct, you can. In our extensive testing of many databases, including our own ChemFinder/Webserver and ChemACX, as well as many user-provided databases, we have found that the structures produced by Name>Struct are
It would be nice if we could claim to be 100% accurate, but that's never going to be realistic. The last percent includes a lot of names that are ambiguous in a variety of ways. Name>Struct is designed to interpret names in the most common and reasonable way possible. That's usually the appropriate thing to do, but if someone intends to use a name in an unusual or unreasonable way, the structure generated by Name>Struct won't match the structure that was intended (although it likely will match a name that could have been intended). Rather than arguing the correct behavior for these cases, we're simply not claiming more than 99% accuracy.
We have looked at many different collections of chemical names from varied sources including chemical vendor catalogs, reference works, and published literature. No matter what source we examine, we have found that Name>Struct consistently interprets
In rare cases we have seen Name>Struct interpret as much as 95% or as little as 35% of a given source, but there were good explanations for those outliers. In general, the 70-90% figure should be seen in most circumstances. The remaining 10-30% of the names generally correspond to known limitations of Name>Struct, and most of those limitations are insurmountable. Even with an unlimited amount of effort, it will never be possible to generate structures for anywhere close to 100% of names that are currently being used.
It's possibly worth pointing out that Name>Struct can interpret literally an infinite number of names, but that's not a very useful observation, since it depends on the creation of trivially boring infinite series of related names:
By the same argument, it becomes obvious that Name>Struct will also fail to interpret an infinite number of names. Suppose that "foobaranol" represents some structure that Name>Struct cannot interpret. Accordingly, it will fail to generate structures for all of the following names as well:
So it's not really useful to ask "how many", but "what fraction of" is an excellent question with a good answer.
In short, very. Specifically, the batch version of Name>Struct can process roughly:
Of course, the actual speed will depend on the processor speed of the machine that is used to run the software, but that is a realistic speed for most modern machines (i.e. most machines produced in the last few years, or with at least a 2.5 GHz / Pentium 4 processor). That works out to less than 2 milliseconds per name, on average. In other words, it is possible to convert over a million names in the span of an hour. We know of only a few data sources with more than a million names, including the CAS and Beilstein databases themselves.