Paste number 65229: fu-streams v4

Paste number 65229: fu-streams v4
Pasted by: dan_b
When:1 year, 11 months ago
Share:Tweet this! | http://paste.lisp.org/+1EBX
Channel:#lisp
Paste contents:
Raw Source | XML | Display As
-*- Text -*- mode. 

fu-streams : Fast User-extensible Streams (honest).  v4

A sketch for a proposed ansi-stream/fd-stream/serve-event replacement.


Changes 
v2: begin to explore the requirement for a serve-event-like interface
v3: remove prohibition of use of stream funcs in handlers, not sure
    yet how exactly to deal with blocking
v4: considerable rearrangement


[ TO DO

1) this is all specified in approximately-clos syntax, but files
available in cold init will need to be implemented w/o clos.  Devise
some mapping to e.g. structs-with-closures-in-the-slots

2) No mention yet of how to /create/ a device; obviously this is the
OPEN method that you're looking for.  Ditto closing a device and
releasing resources etc.

3) If an mmaped stream returns a 128Mb buffer, and we put a translator
on the front of it, it would be silly for that translator to allocate
a 128Mb translation buffer.  The advice here needs review in such
circumstances

]

* Some definitions 

In the rest of this document, the following terms have special
meanings:

end-user: person using the CL streams API

stream : as per the object defined by ansi cl

external device: anything with which an end-user might wish to
interact using the CL streams API (a file, a pipe, a network
connection, etc)

buffer : an array-like object used by a device to store elements in
the process of reading or writing them to/from a stream-like object.

bidirectional : a stream which is open for both writing and reading,
and in which a read subsequent to a write will return the elements
written.  c.f. two-way-stream

two-way-stream : as in ansi cl: can be opened for both reading and
writing, but reads come from somewhere else than writes go to.
Compare bidirectional

* device 

A device implements methods for the following

device-allocate-buffer device &key length initial-contents => buffer

 The returned buffer will be at least LENGTH elements in size.  If
 INITIAL-CONTENTS is supplied and a vector of the appropriate
 element-type, the new buffer will be populated with its contents.
 For efficiency, ALLOCATE-BUFFER is allowed (but not required) to
 share structure between BUFFER and INITIAL-CONTENTS

 If LENGTH is unsupplied, the device chooses one based on whatever
 makes sense (may be required by the external device, or chosen for
 efficiency purposes, etc).

device-input-element-type device
device-output-element-type device

 Readers for the element types that DEVICE understands/produces. 

device-encoded-length device buffer &optional (start 0) end  => number-of-elmts

 Determine how many elements (the portion of) BUFFER will need after
 encoding by this stream.  This is required only for devices 
 with character input-element-type 

device-write-buffers device ((buffer1 s1 e1) ... (buffern sn en))
   => elements

 Write (portions of) the buffers passed as arguments.  For each buffer
 n, sn and en (if supplied and non-nil) are indices for the first
 element to be written and the first element following the last element
 to be written - i.e. this has the same fencepost rules as most cl
 sequence functions.

 This method should if at all possible perform a "short write" if the
 process would otherwise block.  It must return the number of
 elements actually written.

 The caller continues to 'own' the buffer after writing some or all of
 it and can resue it as it wants to

 We allow for multiple buffers in this call so that functions like
 writev() can be used for better efficiency

device-read-buffers device ((buffer1 s1 e1) ... (buffern sn en))
   => elements

 Read from the device into the indicated (portions of) the buffers
 passed as arguments.  For each buffer n, sn and en (if supplied and
 non-nil) are indices for the first element to be replaced and the
 first element following the last element to be replaced - i.e. this
 has the same fencepost rules as most cl sequence functions.

 This method should if at all possible perform a "short read" if the
 process would otherwise block.  It must return the number of
 elements actually read.

device-file-position device &optional position

 [Optional] This returns/sets the position on the underlying device,
 which may be different to the file position reported by the streams
 api if there is unhandled data in buffers.  The position obeys the cl
 streams api rules for a file position (i.e. it's an opaque integer
 for character streams, and measured in element-sized chunks for byte
 streams)

 Some devices do not support seeking and may therefore omit this
 method

device-flush device

 Ensure that any information cached internally in the device is pushed
 out to permanent storage/the network/wherever it's destined to end up
 ultimately

device-close device

 By calling this method the user indicates that no further IO or
 buffer allocation will be done on the device, that the existing
 buffers are no longer needed, and that any resources used by the
 device can be released.  The device-close method must also do 
 whatever cleanup is also done by device-flush


* translating-device

A translating-device is a device which depends on another device to do
actual IO.  It usually does some kind of transformation on the stream
data before calling the underlying device.  For example, a
translating-device might be used to rot13, convert to uppercase, or
add HTTP chunking or ethernet headers to a stream.

[ TO DO: how to make a translating device.  It probably has a slot for
the underlying device instance ]

A translating-device therefore should implement all the methods that a real
device does, with the following provisions

device-allocate-buffer translating-device length &key initial-contents
   => buffer

 Must also call the underlying device to allocate its buffer.  It may
 if appropriate pass the underlying device's buffer directly back to
 the caller.  Otherwise it is encouraged to allocate its own buffer to
 be sufficiently large that it can fill the device buffer on a single
 call (reasoning: the device chose that size as being the most natural
 for the external device, so we should try to take advantage)

device-file-position translating-device &optional position

 Method should call device-file-position on the underlying device, and
 adjust for buffers allocated by the translating device and
 unprocessed byt he underlying device.  See also the note about file
 positions in the 'Notes for clients' section

device-write-buffers ...

 Should do whatever transformations into the output buffer(s) are
 necessary, then if the output buffers are full, call
 device-write-buffers on the underlying device to flush it/them .

* Notes for clients of the device interface

Clients of the device interface are usually (a) translating-devices,
which need to call methods in the underlying device, (b) the SBCL io
multiplexor.  

** basic usage: output

Hold an index representing the current buffer offset

Before writing data, check it will fit.  Resize the buffer or allocate
a new one (preference depends on how full the existing one is) if not.

After writing data, advance the offset by however much was written

** basic usage: input

Hold an index representing the current unread-data buffer offset.

When requested to read data: if there is sufficient in the buffer,
return some of it and move the buffer index.  If not, (1) ensure the
buffer is large enough for the requested size, (2) loop, reading stuff
into it until request satisfied.

** file position 

 (1) ask the device for its position
 (2) add/subtract any unprocessed data in buffers.  The device must 
     be asked how long these will be after encoding

For devices with character input-element-type, file position is
measured in elements of the output type.

For byte streams, file position is measured in elements /before/
processing, otherwise it doesn't increment correctly on each
{read,write}-byte.  For most binary streams this is a matter of
arithmetic, but any translating stream with variable-length encoding
might have fun if it supports file-position.

** bidirectional streams

The simplest way to implement a bidirectional device (note: not a
two-way device) may be to commit and flush buffers between writes and
reads and vice versa.

* The multiplexor

The IO multiplexor accepts readiness notification events from external
sources (initially likely: poll() on fds, signals) and arranges for
things to be done.  For input events, the notifications can be
cascaded "upwards" so that user callbacks can be registered to be run
when data is available to be read using a stream

** Input handlers that use the streams API

It should be possible for end-user code to be called when there is
data available for it to read.  It should not be possible for the
handler to block the system by reading data that's not there yet,
because then we end up in the multiplexor recursively, and that's just
bad.

User input handlers are registered with the stream they're waiting
for, a minimum number of elements on that stream, and a function to
call.  They'll be called when at least that number of elements are
available in a buffer for the stream, with the stream and the buffer
information as arguments.

Input handlers may use the standard CL stream functions, with the
following proviso: they cannot do blocking calls, and will be
terminated gracelessly if they try.  When an attempt to recursively
enter the multiplexor is detected, a THROW is made to the outer frame,
gracelessly terminating the handler.  When more input is available,
the handler will be called again: this cycle repeats until there's
enough input accumulated that the handler terminates normally.

** Output handlers

I can't figure out how to do these in a way that would make sense, nor
what they'd be used for.  Ideas welcome, otherwise I won't

===============================================================
uncategorised stuff

* The buffer 

The buffer is some form of specialised array, but the element type is
up to the device implementor: should there be an external IO entity
that expects UCS-4 characters, for example, it would be silly for us
to translate everything down to (unsigned-byte 8) before letting it
get at the buffer.  This also lets us implement string output streams
by allocating a buffer of element type 'character, and growing it then
handing back displaced arrays to it every time allocate-buffer is called. 

For input streams, an upper layer may require a device to preserve
data in the buffer for longer than the device implementor otherwise
would have done, because it needs a bigger chunk of input to work on
and wants the buffer to keep accumulating until it's ready.  In
general, the client says "provide me at least n elements", and the
device can read and buffer more provided that it doesn't adversely
affect anything (this should be a non-blocking operation, therefore)
but should not return to the client until n are available.

================================================================


TO DO:

bidi streams: what does peek-char followed by write do?

Add stuff to help in the implementation of funny ansi stream types
(e.g. broadcast, concatenated streams)

unreading elements


Comments?  See http://sbcl-internals.cliki.net/fu-stream

This paste has no annotations.

Colorize as:
Show Line Numbers

Lisppaste pastes can be made by anyone at any time. Imagine a fearsomely comprehensive disclaimer of liability. Now fear, comprehensively.