[PP-main] Technical/implementation matters

Joakim Ziegler joakim at simplemente.net
Fri Mar 3 15:38:48 CET 2000


On Thu, Mar 02, 2000 at 06:43:57PM -0500, Rusty Foster wrote:

>> Technical/implementation matters:

>> * What format should news items be in?
>> I've given it some thought, and I think this should definitely be a rather
>> formal and verbose XML datatype, or something giving similar semantic
>> structure. It'll allow us to do a lot of things that aren't possible with
>> pure text exchange, HTML, etc. I have some particular ideas about meta data
>> and how that could be put to good use. In addition to the normal meta data
>> news items would obviously have, like author name, keywords, etc., we
>> should have things to aid quality control/fact checking. For instance, a
>> list of URLs that back up the story, phone numbers you can call to confirm
>> it, etc. I'd love to get more feedback on this, and I think we should get
>> the spec down soon, and then get some of the XML knowledgable people around
>> here to work on making a DTD proposal.
 
> XML XML XML! Definitely it should be XML. Here are the advantages:

> * Can be backwards-compatible with existing syndication schemes
> (my.netscape/my.userland)

Definitely. I think we should define simple transformation to these formats,
though (RDF/RSS, if I'm not mistaken, and they have a somewhat different
focus, and are not directly suited to this).


> * Libraries for creating and dealing with XML exist in many languages

Definitely an advantage. I expect we will supply a base set of tools, though,
to get people started.


> * Easy to transfer between master/child/client servers. Just write some
> CGI's and use HTTP for all the inter-server communication

I'm very much not in agreement with this, but more on that later.


> Bascially, I think XML would be ideal for this. I think our first real
> job is to define a preliminary DTD from which to work. I'm not an XML
> guru my any means, so I'd much rather have someone who knows what
> they're talking about involved with this. But I will help work up a list
> of things the spec ultimately will need to include.

I have quite a bit of DTD experience, I'll be happy to work on this. But
first, the requirements.


>> * How do we exchange them?

> My recommendation would be:
> * Have a smallish group of "central servers" which communicate with each
> other, and their ring of "Child servers"
> * Child servers mediate between client sites and central servers.
> * The central servers store user auth information, and are central
> repositories for syndicated data. They all mirror each other
> * The children relieve some of the load from the central servers, and
> actually communicate with the client sites, accepting content (which
> they forward on to the central servers) and getting content out to the
> clients.

This sounds fair. It's more or less what I imagined.


>> Member sites should have a client system that lets them filter by keywords
>> and whatnot. Perhaps the filter should be possible to upload to the server
>> they use, so that they don't need to transfer the full feed first, and
>> filter on the local box. Should be possible.

> Should filtering be done automatically, manually, or some combination of
> both? Do we just trust clients to categorize things sanely, or do we
> accept their categorizations and add our own with automated keyword
> filtering?

Categorization should be trusted from the origin site. Basically, if you
trust the site enough to get their stuff in the first place, you should trust
them to put the right keywords on it.

As for filtering, you would need to set an automatic filter first of all, as
your most general sorting mechanism. Basically, that lets you remove the
categories and keywords you're definitely not interested in receiving. After
that, it's up to you (as editor of the site) to sort in a more detailed
manner. The main idea of the auto filter is to limit the amount of data you
have to receive. The full feed could theoretically get rather large at some
point, and you don't really want all of it when you really need maybe 5%.


>> * How does an origin site push a news item into the system?
>> This is also interesting. We will need some sort of interface system that
>> can plug in to the different systems people use to run their sites,
>> obviously. In addition, there should be a strict requirement on filling in
>> metadata and whatnot, so there needs to be an interface to do that, which
>> is also efficient. This needs some thought.

> Basically, I think we need to define an XML standard that content must
> comply to. Afetr that anyone can write their own client that fits in
> with the rest of their site's software. We can also develop "sample
> clients" but I think the important thing here is to have a public
> standard, and have master servers that only accept submissions
> conforming to that standard.

> I think http is the way to go for client<->server communication. It's
> cheap, easy, and everyone running a website already has it implemented. 

HTTP could work, but it's really sub-optimal for this sort of thing. If I had
to choose an existing protocol, I'd rather consider NNTP. But I don't see
what's wrong with brewing our own. Of course, I have an ulterior motive:
We're doing some software here that's specifically made for transferring
pre-parsed tree data (for instance XML) and knows how to take XML as input,
and output it on the other end, diff trees, and things like this. It's also
handy because it's extremely easy to use (it's a C library, with very
high-level functions). It also has crypted streams built in, without the need
for SSL libraries or anything of the sort.

I'm not a big fan of the trend lately to use HTTP for just about everything.
It's not what HTTP was made for, and it's showing.

My idea is to use the comm tool we're making at the bottom level, and then
supply some tools that use it, which let you do stuff like poll the server
for new messages, etc., command line. That's the most UNIXy and flexible way
of doing it, I think. Opinions?


-- 
Joakim Ziegler - simplemente r&d director - joakim at simplemente.net
 FIX sysop - free software coder - FIDEL & Conglomerate developer
      http://www.avmaria.com/ - http://www.simplemente.net/





More information about the Peerpress-main mailing list