[PP-main] Oops.

Thu Mar 9 04:27:51 CET 2000

On Thu, Mar 02, 2000 at 09:16:18AM -0600, Joakim Ziegler wrote:
> So, I'd been up for too long, and tabs to spaces conversion messed up. Here's
> the corrected mail:
> 
> 
> I see there are quite a few people subscribed already, and more are sure to
> come, so I think it's time to kick off the discussion. There are a lot of
> issues to straighten out, I'll try to list them all, and state my opinion on
> them, if I have one. If you're going to reply to several points here, it
> might be a good idea to split the reply up in several mails, so we avoid the
> monolithic mails that people can't be bothered to read. This is going to e
> one of those, though, I'm afraid.

My comments.  I've read several other responses, I'm trying to minimize
overlap.

Before launching into a long discussion of what would or wouldn't be
useful for the concept, it might help to flesh out the concept.  I
realize that some of this has been posted to Advogato, but quoting or
citing the reference would be useful.

> Political/philosophical:
> 
> * What are the advantages/disadvantages of a system like this in general?
>   I think we're pretty much agreed that it is a good idea, as a whole, but
>   it's probably a good idea if we go through the advantages and disadvantages
>   ourselves, rather than have other find them for us later.
>   
>   In particular, I'm concerned with the discussion I had with Paul Dunne over
>   at Kuro5hin, about who this benefits, and how. His opinion was that the
>   only ones to benefit would be the publishers, while the writers would
>   starve (more or less). I see his point, somewhat. That is, this system is
>   made for benefitting publishers most of all. Specifically, to make smaller
>   and more niche-oriented news and discussion sites possible, without the
>   administrator having to wear him or herself out looking for relevant news.
>   
>   Now, he states that journalists usually can get paid for every
>   republication in the system as it is today. This isn't really true. The
>   system of republication is a vastly smaller one than the one of wire
>   services, in which the writer gets a one-time fee, no matter how many times
>   things are published, and republication happens on an immense scale, world
>   wide.
>   
>   I think the recognition given to a writer by wide publication, especially a
>   young, up and coming writer, far outweighs whatever compensation she would
>   get from republication rights, especially in the long term.

I've been watching several news/information channels over the past
decade or so, and noting which ones seem to do well and which don't.
With no notable exceptions it's the channels which orient themselves
around an informational goal, rather than as an advertising or marketing
channel, which provide useful content.  Particularly successful models
IMO are the Wall Street Journal and the Economist Newspaper.  

Both of these publications act as "proof of concept" for significant
research arms: Dow Jones Publishing and the Economist Intelligence
Unit, respectively.  While both the WSJ and the Economist are expensive
relative to their mass-market competition, both provide a high value/price
ratio product.  Other alternatives include professional journals, and
various special interest journals whose primary interest is integrity
rather than mass market dominance.  I see the US "newsweeklies" -- TIME,
Newsweek, US N&R -- as greatly whitewashed rags which have sacrificed all
integrity and most content to their marketing and advertising interests.

In the case of the Economist, revenues break down to ~50% subscriptions,
~30% EIU, ~20% other operations.   Financial results are published on
their website, numbers above are cited from memory for last year and
probably reflect 1998 operations.

Point:  One function of a collectively moderated, peer-validated content 
syndicate could be to launch individuals and organizations as recognized
experts in a field.  Just a thought.

Other benefits:  as a content *reader* it would be helpful to have a
site or mechanism which does consolidate useful information, again,
without the perversions common with many commercial channels, and even
as we've seen with /., noncommercial ones. 

> 
> * What do we want the license to allow/disallow?
>   There are several issues that need to be sorted out. I think we all agree
>   on some sort of copyleft-like base principle, but how infectious should it
>   be, and what should it require in the way of returning the favour?

IMO infectiousness of the license isn't as central an issue in
human-readable expression as it is in software licensing.  The GPL is
written as it is because there are forms in which a program can be
distributed which provide utility while concealing informational
content.  This is much less the case with written works, images, and
other forms of expressive rather than functional content.  Functional
compatibility is what drives much direct copying in the software world.
Scribners have other options.

I also don't believe that there is as great a social benefit to be
derived from establishing a goal of "free content" in the same sense
that there is in the case of free software.  Expressive content is
already open to fair use exemptions.  Research and restatement are
avenues for disseminating material.  There is little benefit that I see
in insisting that derived works be subject to the same licensing terms
as original content, though I do believe that direct copies and
quotations exceeding fair-use exceptions should be treated as the
original work.  Beyond this, as has been pointed out, a service and
support model for expressive works isn't as appropriate a model as it is
in software, though a channeling and distribution chain might be a
possible revenue-generating model, along the lines of an
uptime/availability grantee.

There are some troubling issues regarding recent US copyright
legislation -- DMCA anti-circumvention (17 USC 1201), and database
protection acts (proposed, IIRC, but not enacted).  Locking free content 
away under these or similar terms would IMO be a perversion of intent.

What I do feel ought to be allowed is incorporation of works in a
commercially distributed form -- though without restrictions on tertiary
copy right.  This is essentially the same spirit as the GNU GPL:  allow
remuneration for the act of copying and/or distributing, but do not
allow restrictions on the right to recopy or redistribute works.

There are two limiting factors in this:

 - There are limits to the amount and quality of content that will be
   provided free of charge.  While Usenet and similar discussion are
   typically mutually beneficial activities, there are certain types of
   content generation which simply won't occur in a reliable fashion
   without direct compensation to the provider, whether economic,
   social, spiritual, or whatever.

 - There are limits to the amount of effort a publisher will devote to
   preparing a work for publication which can be freely lifted by
   another distribution channel.  The publisher has their own business
   risk to address, and can't reap monopoly profits where no monopoly of
   distribution exists.  I'd expect some reasonable profits to be
   possible in some areas.  I can live with this.

>   There are some thing I feel strongly should be infected by use. All
>   directly derived works, translations (which could be an interesting
>   resource in themselves), and other things that obviously build on a story
>   from Peer Press.
>   
>   Now, should we require the site to return the favour? I think this is hard
>   to quantify. Should a site have to open up all their content just because
>   they use a single item from us? Sounds a bit dubious. In particular if the
>   item is used in a paper publication, which might not even publish all their
>   content on the net themselves. Also, I would consider it a good thing if
>   our content got picked up by more traditional sites, given that we would
>   get attribution.

No.  The explanation of which is an essay in itself.

For the short version, see the OSI/Debian Open Source Definition.
Others have also covered the ground.

>   Which brings me to that. The attribution should ideally both link to the
>   site of origin, the writer, and to peer press, I think. That will make it
>   easy for the sites to profit in the way of user mass and publicity, for the
>   writer to get recognition, and for peer press to get some publicity as
>   well.

I think I agree with this, though I'm not quite sure what "this" is.
Attribution is a critical component.  In EU nations, this is a direct
right under "droit d'auteur".

>   In addition, do we want to place any limitation on commercial use? That is,
>   sites that have banners, are owned by big mean media syndicates, etc.? I
>   say we shouldn't. If we really want to change the way news and media works
>   (hey, we can dream, right?), we should aim for really wide distribution,
>   and make sure we protect the core principles, while not messing around too
>   much with the peripheral details. In addition, if some of the sites get
>   cash flow through banners, and thus perhaps can pay their writers a bit for
>   the stuff originating from them, it'll gain us all.

Addressed above.

> 
> * What can/should be syndicated?
>   Well, the news items/stories are the obvious thing. In addition, several
>   other aspects of sites have been mentioned as candidates for syndication.
>   Personally, I'm positive to this, but I strongly feel the other things
>   should be optional. Optional to export, and optional to import. In
>   addition, they should probably take second place in implementation, so they
>   wait until after the basic news wire service has been implemented, and
>   works. Things that have been mentioned are:
> 
>   * Discussions.
>     Could work, probably requires a bit of infrastructure. Also, the
>     bandwidth needs increase quite a bit, and it's heavily dependent on
>     syndicating things like user accounts and whatnot to work (see below).
>     Also, I'm not sure if the gain is *that* great, this is a thing that
>     might inflate the community sizes to beyond managable levels again. But I
>     also see the interesting idea cross-pollination aspect of it... What do
>     you guys think?

IMO this will be the bulk of content.  To an extent this is a big
writers' club, we're hashing ideas.  Slashdot and OpenLaw might be two
points on a spectrum to consider.

>   * User accounts.
>     This is something I like. Perhaps we could/should do a central
>     authentication server, so that you log into one place, and get a
>     ticket/cookie thing, Kerberos style, that you carry with you around.
>     Would make it easy to set your preferences, user info, etc., etc. in one
>     place, and this feature is also fundamental to sharing discussions, trust
>     metrics, etc.

My thoughts have been leaning toward a PGP keyserver-type network.
Advantages would be a liberation from the current tyranny of tracking
user accounts and passwords for scores of sites I subscribe to, as well
as facilitating things like trust metrics (below).  It could also be
used to facilitate anonymity, or at least pseudonymity, to the extent
that I would like to see a system which allowed an arbitrary selection
of IDs, and a many:one mapping of IDs to individuals, if desired.  If
necessary (or desired), a person could establish a series of
short-lived, "anonymous" identities, never holding any one for a long
period of time.  Or longer-lived pseudonyms.  Or a "clear" identity.
Trust metrics would balance anonymous aspects, IMO -- the more anonymous
an identity, the less trust would be reflected in it, in general.

>   * Trust metrics.
>     This one, I really like. It needs to be figured out, but I can see
>     several options. One is a basic shared trust metric, like Advogato's, but
>     shared between sites. Another is to do different trust metrics that
>     influence each other, so that for instance the admin of site A thinks
>     site B metrics have a relevance of 0.5 for site A, so they play in half
>     as strong as the native ratings. Or you could do entirely user-centered
>     metrics, where what you see and how people are rated is determined by
>     your own ratings only. Or a combination. The advantages is that it's
>     low-bandwidth, and could be a truly unique thing we could show off with a
>     lot. Raph, you're the trust metrics guy, what are your opinions? And
>     other people too, of course.

An area of significant possibilities.  <g>

> * What should the Peer Press site itself do/have?
>   I mentioned on Advogato the possibility of making Peer Press itself a
>   discussion site, the target audience being the writers, publishers and
>   other people related to Peer Press. I don't think a forum like this really
>   exists anywhere (except for general journalism newsgroups/mailing lists),
>   and it could probably be a very interesting forum in itself. I wonder if
>   the people who admin/write for/contribute to web logs/news sites are as
>   varied a group as I imagine they could be. What are your opinions on this?
>   The site should also offer an index of member sites, info on how to make
>   your site a member, how to get a feed, etc., etc.

Specific focus on standards and implementation.  Possibly dispute
resolution.

> Technical/implementation matters:
> 
> * What format should news items be in?
>   I've given it some thought, and I think this should definitely be a rather
>   formal and verbose XML datatype, or something giving similar semantic
>   structure. It'll allow us to do a lot of things that aren't possible with
>   pure text exchange, HTML, etc. I have some particular ideas about meta data
>   and how that could be put to good use. In addition to the normal meta data
>   news items would obviously have, like author name, keywords, etc., we
>   should have things to aid quality control/fact checking. For instance, a
>   list of URLs that back up the story, phone numbers you can call to confirm
>   it, etc. I'd love to get more feedback on this, and I think we should get
>   the spec down soon, and then get some of the XML knowledgable people around
>   here to work on making a DTD proposal.

There is a very strong resemblance between this concept and of Usenet.
What's different, IMO, is that content can be richer and of varied
types, that there is significantly greater bandwidth, that there is an
inherent concept of history (and searchability).  A very significant key
distinction is that Usenet is a replication of content across many
sites.  PeerPress involves some content replication, but it is largely a
distributed indexing system (or several interrelated indexing systems).
Because of the greater bandwidth, content can be requested on an
as-needed basis.  While replication is possible, it is not necessitated
to the same extent it is on Usenet.

This is extending the same distributed discussion concept forward into a
world with far greater technical capabilities, and far less ability to 
rely on academic discipline as a corrective influence.

> * How do we exchange them?
>   There are two or three basic ways to do it. First of all, should it be
>   centralized or not? That is, should there be a single point all the data
>   passes through? I think so. This is dependant on timely delivery, so a 100%
>   distributed system would be too slow, and would also consume a lot of
>   excess bandwidth. The other question is, do you poll for new items once in
>   a while, or do you open a connection and get all the stuff pushed to you as
>   it happens? Or maybe you do a combination. I don't really have an opinion
>   about this, input would be appreciated, as usual.

>   But I strongly believe the stuff should pass through a central server. To
>   make it scale, we can just add child servers that get their info from the
>   central one. One level of indirection is cheap, and you can scale *really*
>   high that way, especially given that the system won't really be used
>   directly by the general public.

I'd strongly prefer a distributed architecture.  A'la Usenet.  This
has potential legal benefits, from the PoV of spam and censorship as
well.  Running a public board for fee and managing all content centrally
is strongly approaches common carrier issues, including prohibitions on
prejudicial treatment.  A confederation of sites operating in their own
interest brings forward the "private resources" defenses which have been
used against spam and crack attacks, only the wires themselves need be
considered a common carrier, nodes do as they please.  Sometimes less
organization is better.

>   Member sites should have a client system that lets them filter by keywords
>   and whatnot. Perhaps the filter should be possible to upload to the server
>   they use, so that they don't need to transfer the full feed first, and
>   filter on the local box. Should be possible.

I'd prefer separation of the filtering semantic from the distribution
network itself, though a reference implementation would be useful.

> * How does an origin site push a news item into the system?
>   This is also interesting. We will need some sort of interface system that
>   can plug in to the different systems people use to run their sites,
>   obviously. In addition, there should be a strict requirement on filling in
>   metadata and whatnot, so there needs to be an interface to do that, which
>   is also efficient. This needs some thought.

I'd prefer a pull to a push model.  Specific nodes could arrange
independently for guaranteed content delivery.  This opens up a
commercial angle which may prove useful.

> * How can we keep the quality level up?
>   Trust metrics are probably a good idea. That is, intra-site trust metrics.
>   The editor(s) of a given site are the ones that pick the items that go up
>   on their site, and also which ones are sent out over the wire. Thus,
>   editors of other sites can use trust metrics to certify them, and how good
>   they thing the editorial policy of that site is. That will let you sort
>   things out quickly. Trust threshold should be one of the filter criteria,
>   obviously. Perhaps a combination of site and author metrics would be in
>   order, to get a more fine-grained mechanism (author first, if the author is
>   unknown or has a low rating, that can be rectified by the item coming from
>   a well-respected site).

This question is also indirectly the inverse of "who do we allow to use
content" -- it's "who do we allow to submit content".  Though a less
black-tape-and-scissors attitude might be "how do we make promotion
decisions" -- eg:  deciding to promote content from one node to another.
The downsides being spam, redundant content, noise, trolls, flamebait,
etc.  Interestingly, the issue of spam raises the question of using
remuneration as a promotion metric -- give me sufficiently high-quality
content, or enough reasons per hour, to accept your content to my site.
Would economic criteria damage the system, particularly in light of how
I opened this post?

> What else? I'm sure there's a lot. This mail is getting even longer than I
> intended already, though, so I'll stop here, and let the discussion begin. I
> have more opinions, and probably more ideas that I didn't think of at the
> moment, and so do all of you, so let's see what we come up with. Oh, as a
> side note, I've grabbed the relevant domains for the site (peerpress.org, com
> and net, just to be sure), and I'll be putting up some kind of preliminary
> site as soon as things take shape here on the list.
> 
> 
> -- 
> Joakim Ziegler - simplemente r&d director - joakim at simplemente.net
>  FIX sysop - free software coder - FIDEL & Conglomerate developer
>       http://www.avmaria.com/ - http://www.simplemente.net/
> 
> 

-- 
Karsten M. Self (kmself at ix.netcom.com)
    What part of "Gestalt" don't you understand?

Scope out Scoop:  http://scoop.kuro5hin.org/
Nothin' rusty about Kuro5hin:  http://www.kuro5hin.org/