[Desktop_printing] CUPS code quality ...

Thu Feb 2 09:24:26 PST 2006

Hi Michael,

	Good name incidentally :-)

On Thu, 2006-02-02 at 08:35 -0500, Michael Sweet wrote:
> I'll address these points in reverse:

	Well, it seems you're at beginning to consider some of these issues in
1.2 that's great.

>      1. Thread safety; CUPS 1.2 is thread-safe, and threading support
>         is enabled by default.  Previous releases were thread safe if
>         you stayed away from the convenience functions that used shared
>         global data.

	Ok - well, I guess thread local storage is one way around returning
const static internal buffers - although the gotcha for the unwary still
remains. Clearly having a unified, standard calling convention for
methods helps substantially wrt. learning the API, not leaking/crashing
and auto-generated scripting bindings to name a few.

>      2. Synchronous; only if you use the convenience APIs - we use the
>         CUPS API as the basis for cupsd, and all of the HTTP and IPP
>         APIs operate as FSMs so that we can handle multiple connections
>         asynchronously.  Other apps can do this or use threads as they
>         like.

	So - can you write me a short code-fragment, with error checking,
partial write support etc. that would asynchronously fetch a PPD ? (no
commenting necessary ;-) [ also some idea of how to integrate that into
a custom mainloop - just need a socket to poll on etc. ] [ I'm assuming
here there is an API function to handle a partial read in response to a
POLLIN|POLLPRI input, and O_NONBLOCK socket handling ? ]

>      2a. Variable time; them's the breaks, some people have fast
>          systems, some slow, some people use local servers, some
>          remote, some people have 1 printer, some have 10000.

	Sure - so clearly synchronous APIs are a disaster waiting to happen -
particularly without threading support; but of course this is now fixed
in 1.2 :-) [ which I don't seem to have on my bleeding-edge system: odd,
so if you don't mind I'll refer below to what I see in 1.1 ].

>      4. Badly designed; also a matter of opinion, although I heartily
>         agree that certain parts of the API are showing their age and
>         could use a redesign.  This, unfortunately, is a challenge
>         with the parallel goal of maintaining backwards compatibility
>         in the 1.x series, but we welcome feedback in improving the
>         existing APIs and adding new ones that match developer usage.

	Of course back-compat (particularly with this legacy) is a pain; but -
then the problems appear to be legion - an entirely new API (perhaps
built on top of CUPS initially) would not have this problem of course.

> You *do* know that you can do a CUPS_GET_PRINTERS request with a
> non-blocking HTTP connection to collect the data as you go along,
> right?  And use requested-attributes to reduce the amount of data
> that is processed?

	No - I have no idea; it's not clear from the headers, it's not clear to
me how to do it. When I read the code in days of yore & the IPP spec. it
was not clear to me how to construct arbitrary
strings/URIs/etc./whatever & use the raw CUPS http code to achieve
anything.

> Comments like this are not useful.  Simple documentation = easier to
> follow code (and clearly you are able to follow the CUPS code, even if
> you don't like the design...)

	I spend my life reading code of often very mixed quality - some of the
worst is unfortunately my own :-) perhaps you'd accept a patch to CUPS
from me ? ;-)

> We use static and dynamic buffers when they make sense.
> Sockets are a requirement for using IPP over HTTP.  

	When does a static buffer make sense ? seriously; this is my point -
network traffic is slow, IPC is slow - strdup() is blindingly, amazingly
fast. My point is - why use static buffers all over the place (which may
give some marginal speed win) when they also tend to have well known
security issues and of course introduce arbitrary fixed limits all over
the show: thus either wasting memory unnecessarily or limiting
longer-string corner cases.

> It is synchronous by design, but nothing prevents you from using the
> HTTP and IPP APIs to do this asynchronously - you'll just need to
> do the IPP queries and then the HTTP GET yourself.

	So again; when I tried to do this some year ago - it was extremely
unclear how to do that; but perhaps I was just missing something simple.

> > 	Of course, what the contract is wrt. this file - whether you get to
> > unlink it later at your leisure, or what is being really passed to you
> > is (AFAICS) not explicit in the commenting. An opaque handle with
> > clearly defined & consistent lifecycle semantics would clearly be
> > better.
> 
> We're not trying to define a whole new object/abstraction interface
> here (i.e. no glib or similar frameworks), just trying to get the
> job done.

	Well; I'm not asking for rocket science here - just expecting some of
the hard-learned lessons of maintaining stable ABIs over long periods to
hit home.

	So - lets take another example:

typedef struct
{
  int			fd;		/* File descriptor for this socket */
  ...
  char			fields[HTTP_FIELD_MAX][HTTP_MAX_VALUE];
  ...
  /**** New in CUPS 1.1.19 ****/
  fd_set		*input_set;	/* select() set for httpWait() */
  http_status_t		expect;		/* Expect: header */
  char			*cookie;	/* Cookie value(s) */
  /**** New in CUPS 1.1.20 ****/
  char			authstring[HTTP_MAX_VALUE],
					/* Current Authentication value */
			userpass[HTTP_MAX_VALUE];
					/* Username:password string */
  int			digest_tries;	/* Number of tries for digest auth */
} http_t;

	So several things to say here: exposing all this lot:

	*  gives you no guarantees of encapsulation, anything can 
	   screw with any of the fields at any time in any thread,
	   in uncontrolled ways; resulting in inconsistent state.

	*  gives you expansion problems: yes, I'm sure you're not
	   expecting people to stack/heap allocate this structure 
	   themselves (when compiling with some old headers) - but
	   someone, somewhere is going to do this if they can.

	*  gives you hard-to-detect versioning problems; if I
	   compile vs. 1.1.20 and run with 1.1.19 - and try to
	   access 'authstring' which was there in the header when
	   I compiled: what happens ?
		* the alternative being a linker error when an 
		  non-available accessor is called; or a named / 
		  enumerated property is not available.

	A far better way is to have an opaque structure, and either
named/enumerated properties - or individual accessors. If you're
particularly 31337 you can not dup strings and return 'const char*'
pointers to the internal representation, thus saving a handful of cycles
and exposing yourself to a different set of problems later :-)

	Things like:

  char			fields[HTTP_FIELD_MAX][HTTP_MAX_VALUE];

	are particularly odd; HTTP_FIELD_MAX is 28 for me ( the end of an
enumeration ) of course - it's not really possible to extent that
enumeration without breaking the entire http_t layout ;-) but that aside
we have:

#  define HTTP_MAX_URI		1024	/* Max length of URI string */
#  define HTTP_MAX_HOST		256	/* Max length of hostname string */
#  define HTTP_MAX_BUFFER	2048	/* Max length of data buffer */
#  define HTTP_MAX_VALUE	256	/* Max header field value length */

	So fields consumes 7kb, and other bits perhaps another 2 1/2k or so,
size-wise not important, but most of that will be wasted most of the
time.

> > 	Is this symptomatic of a wider problem ? no idea. All I know is my
> > experience of using the client code & trying to make it do what is
> > necessary has been poor in the extreme: of course, it's entirely
> > possible that is just down to my ignorance.
> 
> In my experience, too many developers jump to conclusions about
> software before asking the people who wrote the software...

	Well - fine; what is your ABI compatibility strategy here ? why is
there no encapsulation/data-hiding ? why are there all these arbitrary
limits around the place ? can a FIELD really not contain a URI ? ( field
max 256chars, URI max 1024 chars ).

	Of course perhaps my critique is unfair - I've only had to use the API
in quite a minor way; but the above are my questions from reading the
http.h header; from a quick scan similar problems are elsewhere too.

> There is already an API like this: PAPI.  However, you may be
> disappointed to discover that PAPI's APIs are all synchronous, with no
> possibility of asynchronous operation.  PAPI also suffers from the
> "least common denominator" syndrome, but that will likely improve over
> time...

	Interesting; of course - as you see I'm almost totally ignorant of
printing as a domain :-) perhaps PAPI is what I'm looking for. ie. if I
was going to bless an API/ABI to have to support for the next 20 years &
remain compatible with - I would not be picking CUPS in it's current
state. Of course as you say, that's a personal opinion, and perhaps I'm
deluded & you'll persuade me otherwise :-) I look forward to that
process.

	Thanks for listening,

	Regards,

		Michael.

-- 
 michael.meeks at novell.com  <><, Pseudo Engineer, itinerant idiot