[Desktop_printing] CUPS code quality ...

Thu Feb 2 11:41:25 PST 2006

michael meeks wrote:
> Hi Michael,
> 
> 	Good name incidentally :-)
> 
> On Thu, 2006-02-02 at 08:35 -0500, Michael Sweet wrote:
>> I'll address these points in reverse:
> 
> 	Well, it seems you're at beginning to consider some of these issues in
> 1.2 that's great.
> 
>>      1. Thread safety; CUPS 1.2 is thread-safe, and threading support
>>         is enabled by default.  Previous releases were thread safe if
>>         you stayed away from the convenience functions that used shared
>>         global data.
> 
> 	Ok - well, I guess thread local storage is one way around returning
> const static internal buffers - although the gotcha for the unwary still
> remains. Clearly having a unified, standard calling convention for
> methods helps substantially wrt. learning the API, not leaking/crashing
> and auto-generated scripting bindings to name a few.

OK, I guess I'm having trouble understanding your point.  Are you OK
or not OK with having some convenience functions that use thread
local storage?

>>      2. Synchronous; only if you use the convenience APIs - we use the
>>         CUPS API as the basis for cupsd, and all of the HTTP and IPP
>>         APIs operate as FSMs so that we can handle multiple connections
>>         asynchronously.  Other apps can do this or use threads as they
>>         like.
> 
> 	So - can you write me a short code-fragment, with error checking,
> partial write support etc. that would asynchronously fetch a PPD ? (no
> commenting necessary ;-) [ also some idea of how to integrate that into
> a custom mainloop - just need a socket to poll on etc. ] [ I'm assuming
> here there is an API function to handle a partial read in response to a
> POLLIN|POLLPRI input, and O_NONBLOCK socket handling ? ]

OK, short and asynchronous do not go together.  In particular, the
specific implementation of read/write callbacks and registering file
descriptors for your toolkit of choice will have a dramatic effect
on things...

Anyways, the basic thing to do is setup a non-blocking HTTP connection:

     http_t *http = httpConnectEncrypt(cupsServer(), ippPort(),
                                       cupsEncryption());

     httpSetBlocking(http, 0);

Note that this initial connection is synchronous (there is no
async connect API at this time).  Once you are connected, you need to
do the httpPost(), and then put everything else from
cupsDoFileRequest() in your read/write callbacks.

Once you have the complete IPP response from the server, you can
notify your code that the work is complete and you can do what you
need to do with the data.

HTTP GET requests work similarly, but are much simpler; basically,
do:

     httpGet(http, resource);

and then setup a read callback which calls httpUpdate(http) until
a status is returned (anything but HTTP_CONTINUE or HTTP_ERROR)
and then start reading data with httpRead() until you get 0 bytes.

>>      2a. Variable time; them's the breaks, some people have fast
>>          systems, some slow, some people use local servers, some
>>          remote, some people have 1 printer, some have 10000.
> 
> 	Sure - so clearly synchronous APIs are a disaster waiting to happen -
> particularly without threading support; but of course this is now fixed
> in 1.2 :-) [ which I don't seem to have on my bleeding-edge system: odd,
> so if you don't mind I'll refer below to what I see in 1.1 ].

Right, 1.2 is just going into beta (next week!)

>>      4. Badly designed; also a matter of opinion, although I heartily
>>         agree that certain parts of the API are showing their age and
>>         could use a redesign.  This, unfortunately, is a challenge
>>         with the parallel goal of maintaining backwards compatibility
>>         in the 1.x series, but we welcome feedback in improving the
>>         existing APIs and adding new ones that match developer usage.
> 
> 	Of course back-compat (particularly with this legacy) is a pain; but -
> then the problems appear to be legion - an entirely new API (perhaps
> built on top of CUPS initially) would not have this problem of course.

Again, *what* are the problems you see?  Can you describe how you
want your new API to look like?

>> You *do* know that you can do a CUPS_GET_PRINTERS request with a
>> non-blocking HTTP connection to collect the data as you go along,
>> right?  And use requested-attributes to reduce the amount of data
>> that is processed?
> 
> 	No - I have no idea; it's not clear from the headers, it's not clear to
> me how to do it. When I read the code in days of yore & the IPP spec. it
> was not clear to me how to construct arbitrary
> strings/URIs/etc./whatever & use the raw CUPS http code to achieve
> anything.

How about the developer documentation, or even the CUPS book?

>> Comments like this are not useful.  Simple documentation = easier to
>> follow code (and clearly you are able to follow the CUPS code, even if
>> you don't like the design...)
> 
> 	I spend my life reading code of often very mixed quality - some of the
> worst is unfortunately my own :-) perhaps you'd accept a patch to CUPS
> from me ? ;-)

*If* the patch makes sense, sure!  You can submit them via the STR
page at:

     http://www.cups.org/str.php

(please don't email patches to me directly; as I literally handle
thousands of emails per day, any patches you send direct *will* get
lost)

>> We use static and dynamic buffers when they make sense.
>> Sockets are a requirement for using IPP over HTTP.  
> 
> 	When does a static buffer make sense ?

When the data you are representing has a maximum size, when you
have a predefined buffer size that allows for significant optimization
by you and the compiler, when the data is only used once and thrown
away, etc.

> seriously; this is my point -
> network traffic is slow, IPC is slow - strdup() is blindingly, amazingly
> fast.

Um, actually strdup() is *not* "blindingly, amazingly fast", and we
have gone so far as to add a super-simple string spool API (private)
in CUPS to avoid using it whenever possible.  In addition, embedded
developers (I used to be one in a former life) do not like dynamic
allocation because it can fail at unexpected times...

In any case, what does static/dynamic buffer usage have to do with
IPC speed?

> My point is - why use static buffers all over the place (which may
> give some marginal speed win) when they also tend to have well known
> security issues and of course introduce arbitrary fixed limits all over
> the show: thus either wasting memory unnecessarily or limiting
> longer-string corner cases.

Buffer overflows are not restricted to static buffers - you get
similar problems with dynamic buffers...

AFAIK, none of the fixed sizes we are using are unusual or
limiting, and in fact many of them come from the HTTP and IPP
specs that we work with...

Again, SPECIFIC examples would be most helpful/constructive right
now.

>> It is synchronous by design, but nothing prevents you from using the
>> HTTP and IPP APIs to do this asynchronously - you'll just need to
>> do the IPP queries and then the HTTP GET yourself.
> 
> 	So again; when I tried to do this some year ago - it was extremely
> unclear how to do that; but perhaps I was just missing something simple.

We don't provide examples of this mainly because it is not for the
faint of heart - you need to really understand FSM programming, and
it is typically easier to just use threading.

> ...
> 	So - lets take another example:
> 
> typedef struct
> {
>   int			fd;		/* File descriptor for this socket */
>   ...
>   char			fields[HTTP_FIELD_MAX][HTTP_MAX_VALUE];
>   ...
>   /**** New in CUPS 1.1.19 ****/
>   fd_set		*input_set;	/* select() set for httpWait() */
>   http_status_t		expect;		/* Expect: header */
>   char			*cookie;	/* Cookie value(s) */
>   /**** New in CUPS 1.1.20 ****/
>   char			authstring[HTTP_MAX_VALUE],
> 					/* Current Authentication value */
> 			userpass[HTTP_MAX_VALUE];
> 					/* Username:password string */
>   int			digest_tries;	/* Number of tries for digest auth */
> } http_t;
> 
> 	So several things to say here: exposing all this lot:

This is an example of the older APIs which expose the internal
implementation, and if you look at the newer APIs we are no longer
doing this... :)

Also, the CUPS 1.2 documentation will *not* be exposing this
(otherwise private) structure, and we are introducing accessors
to wean people away from directly accessing the private data.

> 	*  gives you no guarantees of encapsulation, anything can 
> 	   screw with any of the fields at any time in any thread,
> 	   in uncontrolled ways; resulting in inconsistent state.

OK, first of all a HTTP connection can only be used by a single
thread.  You can't share them between threads.

> 	*  gives you expansion problems: yes, I'm sure you're not
> 	   expecting people to stack/heap allocate this structure 
> 	   themselves (when compiling with some old headers) - but
> 	   someone, somewhere is going to do this if they can.

The ONLY supported APIs for creating a HTTP connection are
httpConnect() and httpConnectEncrypt().  The documentation is
(AFAIK) pretty clear on this, and there is no way to initialize
a http_t structure any other way.  If anyone chooses to hang
themselves with custom http_t initialization code, then they
deserve all of the problems they get...

Anyways, I otherwise agree with your arguments, which is why
the newer APIs all hide their structures.

> 	*  gives you hard-to-detect versioning problems; if I
> 	   compile vs. 1.1.20 and run with 1.1.19 - and try to
> 	   access 'authstring' which was there in the header when
> 	   I compiled: what happens ?
> 		* the alternative being a linker error when an 
> 		  non-available accessor is called; or a named / 
> 		  enumerated property is not available.

Presumably you will not be accessing the private data in this
structure - in 1.1 and earlier we provided macros to access
private data in the http_t structure, trusting that developers
would not abuse the system.

In 1.2 those are (slowly) being replaced by accessor functions
so the existing code will still compile and code that tries to
fiddle with http_t directly will not.

> 	A far better way is to have an opaque structure, and either
> named/enumerated properties - or individual accessors. If you're
> particularly 31337 you can not dup strings and return 'const char*'
> pointers to the internal representation, thus saving a handful of cycles
> and exposing yourself to a different set of problems later :-)

Not sure what this last bit means - an accessor will return the
value from the object, no duplication is ever required unless the
caller wants to fiddle with it (and even then const doesn't save
you...)

> 	Things like:
> 
>   char			fields[HTTP_FIELD_MAX][HTTP_MAX_VALUE];
> 
> 	are particularly odd; HTTP_FIELD_MAX is 28 for me ( the end of an
> enumeration ) of course - it's not really possible to extent that
> enumeration without breaking the entire http_t layout ;-) but that aside
> we have:

This was an implementation detail to minimize memory allocations;
today we'd do things differently using our array and string pool
APIs, but regardless http_t is meant for talking to CUPS, nothing
more, nothing less.

> #  define HTTP_MAX_URI		1024	/* Max length of URI string */
> #  define HTTP_MAX_HOST		256	/* Max length of hostname string */
> #  define HTTP_MAX_BUFFER	2048	/* Max length of data buffer */
> #  define HTTP_MAX_VALUE	256	/* Max header field value length */
> 
> 	So fields consumes 7kb, and other bits perhaps another 2 1/2k or so,
> size-wise not important, but most of that will be wasted most of the
> time.

Perhaps, but you avoid memory fragmentation which is a serious
problem in some environments, and memory is not always cheap.

>>> 	Is this symptomatic of a wider problem ? no idea. All I know is my
>>> experience of using the client code & trying to make it do what is
>>> necessary has been poor in the extreme: of course, it's entirely
>>> possible that is just down to my ignorance.
>> In my experience, too many developers jump to conclusions about
>> software before asking the people who wrote the software...
> 
> 	Well - fine; what is your ABI compatibility strategy here ? why is

The strategy is to move to more encapsulation while remaining
binary and source compatible with previous releases.  Trust the
developers using the APIs less, and discourage direct use of
private data whenever possible. Minimize memory usage and
allocations whenever possible.

You can also read the Configuration Management Plan document,
which provides a lot of our (evolving) guidelines...

> there no encapsulation/data-hiding ? why are there all these arbitrary
> limits around the place ? can a FIELD really not contain a URI ? ( field
> max 256chars, URI max 1024 chars ).

For the purposes of CUPS, the 255 character value limit (255 + nul =
256) is OK.  Again, the CUPS HTTP API is not meant as a general-purpose
API for any HTTP usage, just for use with CUPS.  I know that some
people use it as a general purpose API sometimes, however there are
better libraries available today (curl, neon, serf, w3c's stuff) to
do this...

(and yes, we've thought about doing a new API based on one of those
HTTP toolkits, but given our ABI compatibility requirements that
can't happen until CUPS 2.0)

> 	Of course perhaps my critique is unfair - I've only had to use the API
> in quite a minor way; but the above are my questions from reading the
> http.h header; from a quick scan similar problems are elsewhere too.
> 
>> There is already an API like this: PAPI.  However, you may be
>> disappointed to discover that PAPI's APIs are all synchronous, with no
>> possibility of asynchronous operation.  PAPI also suffers from the
>> "least common denominator" syndrome, but that will likely improve over
>> time...
> 
> 	Interesting; of course - as you see I'm almost totally ignorant of
> printing as a domain :-) perhaps PAPI is what I'm looking for. ie. if I
> was going to bless an API/ABI to have to support for the next 20 years &
> remain compatible with - I would not be picking CUPS in it's current
> state. Of course as you say, that's a personal opinion, and perhaps I'm
> deluded & you'll persuade me otherwise :-) I look forward to that
> process.

With all due respect (and I don't want this to sound like a flame),
your comments and opinions will get a lot more mileage if you become
less ignorant about printing and put together some specific ideas for
how you would like to see applications and printing work together.

-- 
______________________________________________________________________
Michael Sweet, Easy Software Products           mike at easysw dot com
Internet Printing and Document Software          http://www.easysw.com