Bluesky Dev
Community discussion of the AT Protocol and Bluesky. (This room is not officially affiliated with the Bluesky team.)

    Bluesky Dev

    arcalinea created this room.

    This is the start of export of Bluesky Dev. Exported by extratoneI (@extratone:matrix.org) at 12/7/2023.


    Topic: Community discussion of the AT Protocol and Bluesky. (This room is not officially affiliated with the Bluesky team.)

  1. arcalinea joined the room
  2. arcalinea set the main address for this room to #bluesky-dev:matrix.org.
  3. arcalinea made the room public to whoever knows the link.
  4. arcalinea made future room history visible to all room members.
  5. arcalinea changed the room name to Bluesky Dev.
  6. arcalinea changed the room avatar to
  7. arcalinea invited pfrazee
  8. pfrazee joined the room
  9. pfrazee set a profile picture
  10. pfrazee
    ๐Ÿ‘‹
  11. arcalinea changed the power level of pfrazee from Default to Admin.
  12. Aaron Goldman joined the room
  13. Daniel Holmgren joined the room
  14. Aaron Goldman
    ๐Ÿ‘‹
  15. Daniel Holmgren
    ๐Ÿ‘‹
  16. arcalinea changed the power level of Aaron Goldman from Default to Admin.
  17. arcalinea changed the power level of Daniel Holmgren from Default to Admin.
  18. Aaron Goldman
    just what I needed another chat app on my phone
  19. pfrazee
    heh same
  20. I've recently been forced onto Telegram and Whatsapp, I think with everything else that's a royal flush
  21. arcalinea changed the topic to "Bluesky Dev Room Rules 1. Stay on topic. This chatroom is focused on the development of technologies related to the Bluesky decentralized social project. For all other topics, please find another room for discussion. 2. No spamming 3. No solicitation 4. No personal attacks or harassment. Criticize technologies not people. 5. Donโ€™t monopolize the conversation. We welcome engagement, but if you are always the noisiest poster, the mods may ask you to take a step back to make room for others. 6. Improve the quality of discourse. We welcome questions and comments that improve the quality of discourse in the room and help better inform participants. Mods will remove comments or participants who they feel are lowering the quality of discourse at their discretion. ".
  22. arcalinea changed the topic to "Discussion of technologies related to Bluesky decentralized social project. Room rules: https://docs.google.com/document/d/1bwjuTDiZ2hS54B2o5AI2RZVRRB2Mw1UutZoDa9P7TpE/edit ".
  23. arcalinea
    Can everyone see that google doc?
  24. Wanted to give mods guidelines
  25. pfrazee
  26. Daniel Holmgren
    i can ๐Ÿ‘๏ธ
  27. pfrazee
    ah if I widen my app window I can
  28. Aaron Goldman
    I can see the doc
  29. arcalinea invited Matthew
  30. arcalinea
    Inviting Matthew!
  31. arcalinea invited rabble
  32. arcalinea invited Golda Velez
  33. arcalinea invited @timbray:matrix.org
  34. arcalinea
    Ok invited a few people. Hopefully we get the hang of running this room before too many people come over from Discord
  35. Matthew joined the room
  36. Matthew
    hey folks :)
  37. arcalinea
    Are there reputation "playlists" we can subscribe to for moderating this room? Would love to try it out
  38. For context, the Discord is renaming to become more representative of its big-tent nature encompassing many projects, so something like Dsocial Commons. And the bluesky specific dev chat is going to start being in this one channel that we're going to try to stay on top of.
    (edited)
  39. Matthew
    so decentralised rep in Matrix is in very active dev atm. the current thing in active use is basically shared blocklists as defined by
    MSC2313
  40. and an example one would be #matrix-org-coc-bl:matrix.org
  41. arcalinea
    ^ I joined that room, but how do I apply it to this room?
  42. Matthew
    so currently the tools which consume the replists are for moderators and server admins
  43. although our trust & safety team is currently adding in the UI to Element Web to let users play along too
  44. (but it's getting repeatedly delayed by the same team having to fight plain old abuse)
  45. arcalinea
    Server mods and admins, not room mods and admins, you mean?
  46. Matthew

    the tools which currently consume it are:

  47. In reply to this message

    servers just have admins. rooms have mods & admins (on a power spectrum of 0 to 100)
  48. so the latter two tools are for server admins (i.e. people running homeservers)
  49. mjolnir is for mods/admins - but still has to be manually run for now.
  50. so we run a mjolnir (
    @abuse:matrix.org
    ) for rooms and communities our team moderates
  51. but so does mozilla, fosdem, redhat, etc
  52. mjolnir doesn't operate on a per-server basis though, as rooms aren't per-server in Matrix
  53. btw, thanks for giving Matrix another go :)
  54. Matthew is painfully aware that we still have a way to go in terms of matching Discord's UX, but we are literally going to win or die trying ;P
  55. (and hopefully having threads will be a plus).
  56. https://matrix.org/docs/guides/moderation gives an idea of the anti-abuse tooling in Matrix
  57. although https://matrix.org/docs/guides/moderation#moderation-tooling is the main relevant section there.
  58. https://matrix.org/blog/2020/10/19/combating-abuse-in-matrix-without-backdoors is another overview of how the decentralised rep stuff works in theory.
  59. arcalinea changed the topic to "Discussion of technologies related to Bluesky decentralized social project. Room rules: tinyurl.com/ye8btv38 ".
  60. whyrusleeping joined the room
  61. whyrusleeping
    Oh hello everyone
  62. Daniel Holmgren changed the topic to "Discussion of technologies related to Bluesky decentralized social project. Room rules: https://tinyurl.com/bluesky-rules ".
  63. Aaron Goldman set a profile picture
  64. hellobluesky joined the room
  65. Aaron Goldman
    Matthew: Are Mute, Remove, Ban time bound? e.g. can I Ban from the room for 8 hours?
  66. Matthew
    at the protocol level, we deliberately don't do fancier bans (e.g. timed ones), as we want to keep the merge resolution algorithm as simple and auditable as possible
  67. but tools like mjolnir could absolutely apply timed bans
  68. https://github.com/matrix-org/mjolnir/issues/62 for instance is the team doing a mental note to add timed mutes.
  69. arcalinea invited Jeremie Miller
  70. Matthew
    (similarly you can't do wildcard bans or regexp bans or anything - the expectation is that a higher level tool figures out the weird & wonderful logic to use on whether to ban/unban things)
  71. @timbray:matrix.org joined the room
  72. @timbray:matrix.org
    ยกHola!
  73. Mark Foster SSI: @mfoster.io joined the room
  74. Aaron Goldman
    welcome
  75. Mark Foster SSI: @mfoster.io
    Hello everyone
  76. J. Ryan Stinnett joined the room
  77. Mark Foster SSI: @mfoster.io set a profile picture
  78. bengo joined the room
  79. bengo
  80. bengo changed their profile picture
  81. whyrusleeping
  82. whyrusleeping changed their profile picture
  83. Matthew
    as a random update from matrixland: these days we have
    arewep2pyet.com
    tracking our transition from decentralised-servers to decentralised-clients
  84. and last week the p2p folks went on a hackathon to figure out how to change matrix to be more p2p-friendly
  85. specifically to handle situations where the common case is that nodes aren't online at the same time
  86. and we've come up with v3 of the main state replication/merge resolution algorithm, which is basically:

    • proactively sends state whenever servers are visible
    • switches to connection-based federation transport semantics
    • switch to trying to replicate the whole state DAG (but skipping out "unimportant" state events)
  87. we'll be implementing it over the next few few weeks, but if it works, in combination with pinecone it could be transformative for p2p matrix
  88. and would basically flip the crosses into green ticks for:

    • โŒ Implement comprehensive Store-and-Forward event routing capabilities in Matrix to allow nodes to talk to each other even if they arenโ€™t online at the same time. Matrix currently has limited support for this via backfilling events.
    • โŒ Improve event authentication rules when the majority of nodes in the room are unreachable.
    • โŒ Improve semantic delivery of old events to clients to ensure that when old nodes come online clients donโ€™t see lots of old messages.
  89. arcalinea
    Cool! Going forward, we're going to try to keep this channel specifically about the code and writing we're putting out for bluesky so people have a place where they can get questions answered when we put stuff out today.
  90. Know you probably don't use Discord, but what was formerly the "bluesky community" Discord has turned into a more general chat where people discuss a lot of protocols, and would probably be interested in these updates.
  91. Aaron Goldman
    #general-bsky:matrix.org
    is bridged to the Discord server so you can also reach the community there.
  92. eva joined the room
  93. eva set a profile picture
  94. Eva Zhang changed their display name to eva
  95. rodomonte joined the room
  96. @mx01xz:matrix.org joined the room
  97. willscott joined the room
  98. Gregory Klyushnikov joined the room
  99. whyrusleeping
    Hello everyone!
  100. Gregory Klyushnikov
    ะŸั€ะธะฒะตั‚!
  101. arcalinea
    ๐Ÿ‘‹
  102. @mx01xz:matrix.org joined the room
  103. @mx01xz:matrix.org
  104. Adam King joined the room
  105. oleksiidn joined the room
  106. nichoth joined the room
  107. nichoth
    ๐Ÿ‘‹
  108. nichoth

    I just read the blog post โ€” https://blueskyweb.xyz/blog/5-4-2022-working-in-public

    It feels like the world is in similar shoesโ€ฆ

    Weโ€™re in R&D mode at the moment, experimenting with pieces that point in the right direction. We donโ€™t have a finished product or a fully-specified protocol, but weโ€™re putting together components

    At least in my own personal work I have the same feeling. All the pieces more or less exist; it's a matter of finding some time to put things together in the right way.

  109. In my own work in particular โ€” https://nichoth.com/ssc/ , https://github.com/nichoth/ssc-server#what-is-this โ€” I've been thinking of it as a remix of ssb. Meaning taking the existing parts and assembling them in a new way
  110. sorry I just wrote a lot when I first 'arrived' here. just some thoughts that came to mind I suppose
  111. pfrazee
    No worries. I agree - a lot of the people I'm talking to now seem to be trying to find the right formulation of these ideas
    (edited)
  112. craigo joined the room
  113. Bob Wyman joined the room
  114. nichoth
    It is kind of a big task to get a group of people organized around an idea like this, so that you are moving faster than as individuals. I feel like that is half the battle, more difficult than the programming
  115. whyrusleeping
    Yeah, 70% of the problem is getting everyone to agree and work together, building 'the right thing thats flexible enough for the majority of usecases' isnt hard (at least, not at the demo stage)
  116. dougfort joined the room
  117. fabrice joined the room
  118. Brad joined the room
  119. Matthew

    In reply to this message

    cool, sounds good :) wasn't sure what the charter was - was just braindumping in case of interest. will use the community room in future
    (edited)
  120. Matthew
    ADX looks incredibly cool :D
  121. congratulations on releasing the code & architecture
  122. Matthew grins at using a cli script as the UI
  123. TravisR joined the room
  124. npd joined the room
  125. npd
    I'm confused/curious about the architectural design to use both federation and sovereign identities/content-addressed data
  126. vNordwand joined the room
  127. npd
    I think there are lots of advantages to webfinger and easily remembered names that look like email addresses, and it's easy to use existing tools to look up a user and get their keys -- great
  128. ... but then I thought the purported advantages of the cryptographic self-sovereign identifiers were primarily that it didn't rely on a host
  129. vNordwand
    hello all ๐Ÿ‘‹
  130. npd
    it seems like if I'm
    nick@example.com
    , then the admin of
    example.com
    is my host and a potential point of failure
  131. and it's a lot of work to define these merkel tree calculations if actually it's just, all my posts are available in a GET operation to my account at this host (a familiar style, well-developed in many many existing applications)
  132. arcalinea
    We're trying to get the best of both worlds. A disadvantage of existing federated systems is the lack of portability -- in the example above, your friends can't find you easily if you move from
    nick@example.com
    to a different homeserver. A key provides SSI, but no memorable name or location to easily find data. So we link the two, so you can rotate your keys or your username/homeserver (though not both at once).
    (edited)
  133. npd
    ah, yes, if the goal is to be able to link an identity so that you can move servers, that makes sense to me
  134. arcalinea
    And your posts, if content-addressed, are also easily moved around from server to server
  135. npd
    it seems like it would be easier to implement if we just had some nice signing operations to verify a move from one federated server to another, but at least I understand the motivation
  136. donaldfarmer joined the room
  137. npd
    sorry, I immediately dove into a technical discussion because I was just really curious about it and the topic of the room is discussion of technologies. but also, thank you for posting this work in progress! it's really interesting! I'm glad to have a federated, protocol-based room to talk with y'all rather than a Discord
  138. arcalinea
    That's exactly what this room is for -- technical discussions. So thanks for asking!
  139. @timbray:matrix.org
    Federation also acknowledges the fact that while there are people want to be self-sovereign, there are others who can't imagine why they'd want to take care of a private key and anyhow wouldn't be capable of it, so a lot of people are going to outsource that to a service provider.
  140. npd
    yeah, I think there are strong practical and technical reasons to use federation, and single-user instances are possible for people who want that (or giant organizations or extremely influential users who want particular control)
  141. I'm more confused as to what we get from a Merkel directed graph of posts, as opposed to just signing your posts that are stored on your host
  142. like, what's actually so bad about client-server? given that this already accepts that users will be identified and discovered via a server
  143. arcalinea
    We've been talking about this as providing "live" exportable data, as opposed to getting a signed snapshot of your data exported from a server. Since the signed, content-addressed version is the canonical version, rather than the location-based (found at a URL) version, it reduces the amount of trust or cooperation you need from the server in order to move your data around. It's analogous to git and Github, hence why we call them "repositories". If you have a local repo of your data, Github can disappear from the face of the earth tomorrow and you could start over elsewhere -- although it's still a nice service to have to interact with your repos in a nice UI most of the time.
    (edited)
  144. npd
    I can see that in part, but if I'm
    nick@example.com
    , I think it'll be a problem for me if
    example.com
    disappears tomorrow, even if I have all the content addressed data
  145. arcalinea
    Which is why we have keys as well as usernames. If a friend looking for you at
    nick@example.com
    gets a 404 tomorrow, they go find what update your key has signed to discover your new host/username.
  146. npd

    In reply to this message

    yeah, I definitely like that. at least for the people who already have followed me, it's possible (although still non-trivial) for them to search and find my newly signed record.
  147. that would also give me the possibility of moving my content somewhere else authenticated to me (without even relying on the data repository/merkle tree stuff that I don't understand yet)
  148. keys and signatures are great
  149. arcalinea
    Yep. There's definitely still edge cases to figure out here, like what if you lose both your username and your keys at the same time.
  150. The data repository stuff, ignoring the tree complexity, is basically a git repo with your social posts.
  151. npd

    In reply to this message

    sure, I can understand that concept, but it also requires another layer (or a few layers) of complexity in that you can't just fetch content using the client server model
  152. I know, I'm sorry, I'm getting into questions of the style why-didn't-you-do-X, but I guess it's at least worth noting for the authors that it makes it a lot more confusing to understand, and a lot more daunting to implement, compared to like the Web, which we know quite a bit about
  153. arcalinea
    This is partly why we put this out after just some prototyping, rather than trying to build it to completion. Is the extra complexity worth it -- we'll see. Another benefit of content addressing is how it could enable store-and-forward caching to help scale the network, or maybe we could dynamically move users/content around under load.
  154. arcalinea
    And it's definitely a goal for developer interfaces to be something like a clean "put/get" into the user repo, rather than having to inspect the trees. Git can get complicated when repos are multi-writer, and the same thing will probably happen here. A single-writer user profile, no problem. A big collaborative profile/application where you have to merge changes, and that gets complicated. Ask Martin about that :)
  155. Aaron Goldman
    You could just fetch client server style. Think git if I know the server and do a get pull I get all the objects from the same server. But with get if I move the upstream I need to update my local configuration there is no automated way to discover the upstream has moved
  156. pfrazee
    One thing to add to the webfinger vs DID discussion: we use the DIDs in records always, never the human readable name. This ensures that links between records donโ€™t break if you change your server/webfinger name
  157. Gregory Klyushnikov
    It blows my mind how needlessly complicated this thing is just for the sake of not trusting one single server to host someone's content.
  158. And despite there being a thing called "secure append-only log which can be monitored externally", this looks suspiciously like a blockchain. And there's monetary transactions involved, too!
  159. @miles.kinney:matrix.org joined the room
  160. Gregory Klyushnikov

    In reply to this message

    This, by the way, is a solved problem in ActivityPub (it's just that Mastodon breaks the spec in this particular case). You can change your username (the preferredUsername in the actor, and the part before the @ in webfinger queries) no problem. The one and only thing that must remain the same for an actor is its id. Which, conveniently, is the URL where the actor object lives. Different implementations do it differently. Mastodon does use the username there, making usernames effectively unchangeable. Smithereen (my ActivityPub server project) does not, it simply uses row IDs from the database.
  161. pfrazee
    Hey Gregory Klyushnikov do you mind staying positive and having an open mind? We want to hear feedback about complexity, but please be kind about it
  162. In reply to this message

    What are you referring to with the monetary transactions?
  163. Gregory Klyushnikov

    In reply to this message

    The (currently-unnamed) DID Consortium will securely host user IDs at a low cost to users. It will be operated by multiple different organizations who share ownership of the service.
  164. whyrusleeping
    You can use whatever DID provider you would like, im sure there will be entirely free ones
  165. pfrazee
    Ah, yeah we should look at that language. The last time Daniel Holmgren and I talked about it, we were discussing making the consortium free
  166. at this stage we just don't know how the economics of it will work, but I generally agree people won't pay to create accounts
  167. Daniel Holmgren

    In reply to this message

    free is a low cost ๐Ÿ˜‰
  168. Gregory Klyushnikov

    In reply to this message

    some will if you promise them there won't ever be ads and tracking and recommendations ;)
  169. arcalinea
    I thought that "costs" referred to time and resources -- someone has to host the thing, and we've been discussing how waiting any number of minutes for a user ID to be confirmed is an unacceptable "cost" to many users. "sign up for our new social app! hang on, confirming your registration, come back in 10 min..." ok no thanks
    (edited)
  170. @miles.kinney:matrix.org joined the room
  171. pfrazee

    In reply to this message

    possibly! Interested to explore that. The DID Consortium is a separate piece of architecture from the application, so we just have to sort out how it stays sustainable
  172. @miles.kinney:matrix.org joined the room
  173. pfrazee

    In reply to this message

    Yeah that language got butchered during edits. Originally it was "low cost to users and consortium operators"
  174. Gregory Klyushnikov

    In reply to this message

    English isn't my native language so I still read that to mean monetary cost
  175. pfrazee
    it does, but it meant monetary costs of operating the system
  176. @miles.kinney:matrix.org left the room
  177. @mileskinney:matrix.org joined the room
  178. @mileskinney:matrix.org joined the room
  179. @mileskinney:matrix.org left the room
  180. @jeromu:matrix.org joined the room
  181. mikestaub joined the room
  182. mike.staub changed their display name to mikestaub
  183. mikestaub set a profile picture
  184. mikestaub
    Just groked the adx repo - fantastic start! Clearly arcalinea has assembled the A-team.
  185. I will play with it more this weekend, if there are any good-first-issue tickets please add the label, I would be happy to help any way I can. Though it seems like the velocity is superb with so few cooks in the kitchen at this point.
  186. # joined the room
  187. #
    hello
  188. whyrusleeping
    ๐Ÿ‘‹
  189. mikestaub: I think the most helpful thing for the moment is just to play around with the code and get a feel for things
  190. and also discuss various components of the system, we've started some conversations on github already to kinda pre-seed some conversation, but feel free to engage however you want
  191. (oh, I see you already are on github, nice!)
  192. SelectSweet joined the room
  193. golda joined the room
  194. arcalinea changed the power level of whyrusleeping from Default to Admin.
  195. whyrusleeping
    ๐Ÿ’ช
  196. Schwentker joined the room
  197. Harlan Wood joined the room
  198. Mark Foster SSI: @mfoster.io

    ADX is an excellent collaboration of protocols. Has there been any discussion towards Encrypted Data Vaults, from the Confidential Data Storage concept Local First or Verifiable Credentials?

    https://identity.foundation/confidential-storage/

  199. whyrusleeping
    We have talked about it a bit, but arenโ€™t focusing on it too hard right now. Im very interested in how to do this properly though, especially as you start thinking about โ€œdecentralizedโ€ ACLs
  200. I think the general pattern would be to define a schema for it inside the adx store, and sync encrypted graphs of data underneath that
  201. Mark Foster SSI: @mfoster.io

    In reply to this message

    Yes there is a lot in those projects to consider, distributed ACLs has been top of mind since I discovered UCAN and ZCAP here is the specific Encrypted Data Vault draft https://digitalbazaar.github.io/encrypted-data-vaults/ a lot to think about, I look forward to seeing how things formulate.
  202. whyrusleeping
    Iโ€™ll read through that spec doc, the intro bit is interesting enough
  203. Mark Foster SSI: @mfoster.io

    In reply to this message

    Verifiable Credentials is an interesting idea to verify real people from real world 3rd party entities https://www.w3.org/TR/vc-data-model/

    and last but not least there is FedCM
    http://wicg.github.io/FedCM
    to think about for browser adoption of identity hubs and SSO. Thatโ€™s everything I have at the moment.

  204. Schwentker set a profile picture
  205. Robert Schwentker changed their display name to Schwentker
  206. Scott Gavin joined the room
  207. lukakemon joined the room
  208. lukakuma changed their display name to lukakemon
  209. SJ joined the room
  210. Eamo joined the room
  211. yixin joined the room
  212. ratkins joined the room
  213. Mistie Felkner joined the room
  214. Helen Lin joined the room
  215. lucky2077 joined the room
  216. John Jiang joined the room
  217. Kuncle joined the room
  218. Kuncle
    cool
  219. king uncle changed their display name to Kuncle
  220. Kuncle set a profile picture
  221. John Jiang set a profile picture
  222. ratkins
    Hi everyone ๐Ÿ‘‹๐Ÿป. Iโ€™ve been broadly wanting a โ€œdistributed, protocol-based Twitterโ€ for ages (https://frabjousdei.net/post/108191939171/rfcs-not-ipos) so Iโ€™ve been following SSB and now Bluesky with interest. Tell me if Iโ€™m reading the ADX right: it proposes a DNS-like federated system for managing identity and falls back from SSBโ€™s โ€œ100% distributedโ€ model to โ€œmostly federated, but you can run your own node easily if you wantโ€, mainly to get around the problem that ISPs/mobile providers donโ€™t generally let people run servers on their phones.
  223. ratkins
    Has the message format been thought about much yet? All the stuff in the ADX could apply equally to a Twitter-like short message thing or a Usenet-like long message thing. (I want Usenet back too, Iโ€™m old)
  224. Anurag Kalia joined the room
  225. heylesterco joined the room
  226. flykiller900 joined the room
  227. caiiiyua joined the room
  228. Yuanqing Cai changed their display name to caiiiyua
  229. anoa joined the room
  230. @numero6:codelutin.com joined the room
  231. Hoan Do joined the room
  232. EtherTyper joined the room
  233. lidonghao.eth joined the room
  234. lidonghao.eth
    ๐Ÿ‘
  235. yoones Hosseinabadi joined the room
  236. @numero6:codelutin.com

    In reply to this message

    I've tried to build this big-tent in matrix as the #decentralised-social-networks:codelutin.com matrix space. It gather tens of rooms: we have all the official #activitypub-community:codelutin.com projects, same with #scuttlebutt-community:codelutin.com where pretty much all main projects are present and some other official rooms for movim, OHN, OpenEngiadina, SOLID, #bsky:matrix.org.

    This space is part of a bigger community name #next-internet:codelutin.com which also gather P2P/decentralization project like GunDB, #yggdrasil-network:matrix.org, #ipfs-space:ipfs.io, #hypercore-community:matrix.org...

    I know that Discord is still popular but I really hope we can make matrix the place to be.

    Anyway thank you arcalinea. You have built the dream-team. I don't have time to follow everything that occurs in this space but I've silently followed Whyrus, Paul and others for years now. I'm sure you will publish mind-blowing things (I've read you ecosystem review x times ๐Ÿคฏ). Thank you for letting us be part of that experiment, it's such a luck to read from great mind at work. I feel like an amateur philisopher having a diner with both Aristotle and Descartes. I really hope the moderation will work so you'll keep building in the open without being trolled.

  237. Hongbo Wang joined the room
  238. sunnl joined the room
  239. Kito joined the room
  240. sunnl
    Hi everyone. Congratulations to the team on the first code release!
  241. sunnl
    Can anyone comment why is the "DID consortium" is needed and what its role is? Its role is not very clear in the architecture doc. I thought that DIDs were supposed to be self-issued? Is it to solve a discoverability problem?
    (edited)
  242. andrew (@andrew_chou:matrix.org) joined the room
  243. cryptorock joined the room
  244. cdy joined the room
  245. @geoah:nimona.io joined the room
  246. skyphen x joined the room
  247. vegas joined the room
  248. fallingrock joined the room
  249. tobbsn joined the room
  250. @jbschirtzinger:matrix.org joined the room
  251. pfrazee

    In reply to this message

    there are a couple of constraints that lead to this:

    1. We need keypairs to be rotatable, because people lose their private keys;
    2. We want DIDs to be stable, and with rotation support that means we cant use public keys as DIDs - instead we use a hash of the first DID Document;
    3. If using a hash DID then we need a strong & auditable guarantee of a specific history of the document;
    4. We also need DIDs to be discoverable as you say -- aka we always need to be able to resolve the document from the DID
  252. the DID spec support lots of methods and then we choose which to enable. I think we'd like to have one that you can self-host on a website, if possible
  253. cactusneedle joined the room
  254. @geoah:nimona.io
    pfrazee: Weird and very specific question on the Unnamed DID Consortium: If I'm understanding this right, this is storing only the latest DID Doc right? Is there any consideration to support more complex DID methods such as KERI where it might make sense to hold on to the multiple "events" that make up the history of the DID?
  255. pfrazee

    In reply to this message

    I need to read up on KERI but the nameless consortium will probably retain history
  256. that's ultimately a question of costs (state growth) and mechanics (does the consortium's shared append-only log even reasonably support compactions)
  257. @geoah:nimona.io

    pfrazee: KERI is from what I can tell a kind of snowflake as it basically creates a "micro-ledger" (their words not mine hehe) for each DID in order to deal with key rotation etc.
    That means that the UDIDC (unnamed blah blah) would need to not keep the document and it's history per-se, but the events that make it up.
    I was mainly wondering if you had considered it to be honest, nothing more, thanks. :)

    ps. It's pretty nice as a concept, there are some pretty interesting presentations from the authors you might enjoy. ie https://www.youtube.com/watch?v=izNZ20XSXR0

  258. pfrazee
    creating a per-document ledger isnt crazy afaict, just need to build your consensus algorithm around that approach
  259. I'll give that a watch
  260. sunnl

    In reply to this message

    1 - If the consortium has key rotation ability then can it impersonate any user?
    2 - what is meant by "stable" in this context? Do you ensure the same human / company / bot always has the same DID?
  261. pfrazee

    In reply to this message

    1 - the consortium is run by multiple orgs and will have a consensus algorithm that runs validation, meaning you'd need multiple member-orgs to collude to issue false creds. The purpose of the log is also to enable external auditing
    2- yes, ideally the DID should change very rarely because they are the canonical internet-wide ID for a user
  262. triskellion joined the room
  263. sunnl

    In reply to this message

    Would that imply that on key rotation the user has to identify itself (provide government ID?) to multiple orgs? How can anyone audit that the consortium is not colluding to impersonate if the only person that has proof of the event was stripped of their online identity and ability to publish?
  264. pfrazee

    In reply to this message

    the consortium isn't proving real ID or anything. It's a database of DID -> DID Document, and DID Documents only contain public keys, your human-readable username, and a list of services you can be found on
  265. (the human-readable username isn't confirmed by the consortium though so that's more of a hint)
  266. you audit the log in the same way you audit certificate issuance for CT. You watch for updates to your DID Document that you didn't authorize
  267. also worth pointing out that the DID Document can only be modified by a signature from your primary or recovery key, so if the consortium issued a change without a valid signature it would be noticeably invalid
    (edited)
  268. mccammon joined the room
  269. sunnl
    I think I might be misunderstanding what you meant by key rotation then. If you need a signature to modify the consortium-held DID Document, what key is used to sign the modification?
  270. pfrazee
    if the key rotation is to deal with primary key loss, you use the recovery key. Both of them are keypairs declared in your DID Document. If you lose both the primary and recovery, you're out of luck
    (edited)
  271. sunnl
    Got it, thanks!
  272. pfrazee
    you bet! It has a lot of nuances, hope that clears it up a bit
  273. ratkins
    If the reason for the DID is โ€œpeople arenโ€™t competent to manage their own keysโ€, that sounds incompatible with โ€œpeople need to supply their key to make a change to their DIDโ€. What am I missing?
  274. (Or is it that some DIDs might manage keys on behalf of their users, who identify via other means?)
    • some DID consortium participants.
  275. Aaron Goldman

    In reply to this message

    The KERI would work if you had the did document in the repo and wanted to verify the repo. The did document can point the client at the witnesses to find a bounded staleness key validity. It is less clear that given a did you will be able to find the document or the repo. If I have a resolvable did then I can go from did -> did document#services -> repo host. If the did is not universal resolvable it is not clear what we would want to do. This would need to be answered for did:key did:pkh did:peer and I think did:keri
  276. pfrazee

    In reply to this message

    the assertion isn't "people cant manage their own keys," it's that "key rotations are necessary because keys get lost or compromised"
  277. ratkins
    People canโ€™t manage their own keys tho ๐Ÿ˜…
  278. pfrazee
    the key management layer lives separately (I'm gonna say it: orthogonally) to the DID consortium
  279. and for that, we're looking at both custodial and non-custodial systems -- to match the user's comfort level -- and are going to try to include all the typical mechanisms people need for recovery
  280. whyrusleeping
    Also one really nice thing is that most people have one or more devices with HSMs (like iphones) that can store backup keys
  281. One thought is that you can have users set multiple backup keys by enrolling their devices
  282. ratkins
    Right, so you can outsource your key management to someone you can call up and use some kind of out-of-band method to identify yourself when you drop your phone in the toilet.
  283. pfrazee
  284. Aaron Goldman

    In reply to this message

    I think of it as key rotations enable use cases. I could chose to use a custodian to manege my keys for me. Then one day I ether want to switch custodians or move to managing with my own wallet. I can do a key rotation and know it is final. I have moved control of my did in a way that the old custodian cant reverse
  285. ratkins
    (As an aside, if Bluesky can solve this and only this key management problem for the world itโ€™s a massive win and incredibly valuable. The messaging system on top is gravy.)
  286. Aaron Goldman
    I expect a world where most users use a custodian like a LastPass or 1Pasword. Each time you get a new device they use an existing device to authorize the new device. Almost all signing will be done with these delegated keys. The trunk keys would only be needed to bootstrap in an event where you have no device and prove your ID to the custodian to add a first device.
  287. other will want the only way to add a device to have an existing one. They need to be careful not to lose there keys
  288. dev_phantom joined the room
  289. @ozanozdil:matrix.org joined the room
  290. @timbray:matrix.org
    Google is pushing hard in the direction of device-to-device bridging. Saw an announcement this morning that they're going to push to have more websites switch from passwords to wake-up-your-phone-and-say-OK.
  291. ratkins
    Apple are chasing something similar arenโ€™t they? I canโ€™t help but think that if either of their passwordless login systems are implemented as well as most other web standards, theyโ€™ll be pretty much useless.
  292. @timbray:matrix.org
    I have quite a bit of experience with the Google thing, because they've been using it for gmail logins for a while. My Android phone wakes up and says "trying to log in?" with a bit of geographic context and so on, tap on yes (after unlocking your phone, obvs) and you're in. Really quite seductive.
  293. ratkins
    But itโ€™ll only work really well with Google websites and Android phones. And Appleโ€™s thing will only work well with iPhones and Apple websites (hahaha no it wonโ€™t Appleโ€™s websites are shite, it wonโ€™t even work well with those.) And the long tail of everything else will still use plain email address/password combosโ€”with the sites breaking password managers either by ignorance or malice. And itโ€™s just one more thing to remember to have to use.
  294. @timbray:matrix.org
    Well, there are a whole lot of sites out there already doing "Sign in with Google" so I imagine G will be pressuring them to adopt this. I spent 2012-2014 in the Google Identity group, I used to be one of those people doing the squeezing ๐Ÿ˜‰
  295. ratkins

    That will surely be true, but forgive me: what it is convenient or profitable for Google for โ€œthe webโ€ to be, is probably not what the web should be.

    I mean, of course, yes, Sign In With Google will use Googleโ€™s thing and itโ€™ll mostly be fine, but the iOS app will be ugly and annoying and not work like a proper iOS app and Appleโ€™s thing will work incredibly smoothly for SIWA sites with Apple devices and be janky for everything else, and the other 70% of the web will use something else. So now I have yet another piece of cognitive load. And the problem (โ€œI, as a user, want to not have to think about logging in to anything and still be secureโ€) will not be solved.

    (edited)
  296. whyrusleeping
    Yeah, the device bridging thing is a really good direction
  297. sure, google or apples specific implementation might be bad and proprietary, but using the users own devices as a multi-factor auth is a win
  298. @timbray:matrix.org
    Hey ratkins can't disagree but at least this is proving that the kind of identity-by-device-bridging discussed a little further up the thread is an idea that's mainstream not crazy. So it might be straightforward to convince people that your Bluesky identity is via your device, just like your Google or Apple identity.
  299. ratkins
    Sure, sorry, was getting off topic. Suffice it to say Iโ€™d be very happy if a DNS-like coalition of independent agents ran the Internetโ€™s identity layer, rather than one or other of the FAANGs.
  300. Daniel Holmgren

    In reply to this message

    to me, one of the big problems with key management right now is the notion that every key related to a user's account, can do any action on their behalf. which means that every key needs to be treated with the opsec necessary of "full admin control".
    user's have been handling key-like material for years now - OAuth tokens - they just don't realize it! This is one of the big benefits to our use of UCANs: in most situations, a user doesn't need a hardware wallet or HSM, or fancy key management solution. They need the private key equivalent of an OAuth token
  301. pfrazee
    UCAN do it
  302. ngerakines joined the room
  303. Michael Mullins joined the room
  304. Michael Mullins
    Please be warned I am now lurking in the chat ๐Ÿ‘‹
  305. whyrusleeping
  306. Michael Mullins
    Congrats on releasing ADX! It's a big milestone and I'm excited to dig in!
  307. Yuriy @html5cat joined the room
  308. Bob Wyman
    Is there any reason that a "Label" wouldn't be simply a Post which refers to some other Post and carries structured data of a type that is easily processed by Action routines? See: https://github.com/bluesky-social/adx/discussions/114
    (edited)
  309. haitao joined the room
  310. Bob Wyman set a profile picture
  311. Aaron Goldman
    labels are a type of reaction much like a "like", "comment", ''re post", "emoji" those reaction in turn are types of posts It is not yet clear how or even if reactions need special treatment.
  312. @timbray:matrix.org
    One of the nice things about the Twitter API is that everything is a tweet. The Facebook and Reddit APIs, on the other hand, require you to know whether something is a post or a reaction or an image or, oh I forget. The point is that there is no chance of devising a universal hierarchy of kinds of things, so I would strongly recommend adopting the Twitter approach, except call them posts or something rather than tweets. A Like, then, is an attribute of a post.
    (edited)
  313. Bob Wyman

    In reply to this message

    So, does that mean that I can do with a "Label" all the same things I can do with a Post or comment? For instance, can I label a label? Can I react to a Label with a comment, etc.?
  314. If one wants to provide something like a "Count of Likes" along with a Post, or some other summary or aggregation of reactions, who does the aggregation and how is the aggregation represented or transmitted to the client for display? (Note: On a Post with 1 million likes, I don't want the client to count them all. Also, I don't want a new Post giving the count since that will change every few seconds...)
  315. Aaron Goldman

    In reply to this message

    as awkward as likeing the fact that someone liked something or labeling a label the model is cleaner if it is allowed. Is there a reason to prevent this? It is possible that you might want to trust friends of friends labels unless they are labeled by a friend as not a good labeler ยฏ_(ใƒ„)_/ยฏ
  316. Bob Wyman

    In reply to this message

    When you said: "unless they [friends] are labeled by a friend as not a good labeler" does that mean that I will be able to label people as well as Posts?
  317. Bob Wyman
    @timbray:matrix.org: , when you said "A Like, then, is an attribute of a post" do you mean it is an attribute of the Post which is liked or that the Like itself is a Post that has a "Like" attribute? I prefer the second option. (i.e. A Like is a Post with a "Like" Attribute.)
    (edited)
  318. @timbray:matrix.org
    No quick answer. Hmm, if a Like is a Post, then probably you could Like a Like.
  319. Bob Wyman

    In reply to this message

    Yes, you could like a like, or dislike it, our even write a bunch of prose describing your deep analysis of the like and positioning it within the context of all other likes created by the like's author...
    (edited)
  320. @timbray:matrix.org
    I guess the tl;dr of what I'm saying is "The Reddit API is an anti-pattern, let's make sure Bluesky devs don't have to jump through those kinds of hoopsโ€. And I get nervous when a Post is anything but just a Post. And I think a content-less upvote is a native noun of the social-media ecosystem and should have a natural representation, and I'm not sure that that is as a Post.
  321. Bob Wyman
    @timbray:matrix.org: I understand your concern. What I'm wondering is: If I can do filtering and moderation as well as "like" as a label, can I filter and count instances of labels, such as labels which are Likes? Or, if a post can have a Label, how do I get the count of all labels that meet my filtering rules? If I want to filter you out of my stream, I also don't want to see your labels influencing my label aggregation or counting functions.
  322. @timbray:matrix.org
    Hm, I certainly want to be able to query posts by criteria which include associated Likes and their counts and sources.
  323. whyrusleeping
    A like is not a post
  324. โ€œInteractionsโ€ are a separate abstraction in the system
  325. Bob Wyman
    whyrusleeping: But, Aaron Goldman said above that "likes" and "comments" are both "reactions." Is a "reaction" different from an "interaction?" It seems to me that a comment would be just a Post that refers to another Post. If not, how does a comment differ from a Post?
    (edited)
  326. whyrusleeping
    Likes and comments would be treated differently
  327. but the system is very flexible, we can define the schemas however we want to, and then its up to the indexers from there
  328. Bob Wyman
    whyrusleeping: Why are likes and comments treated differently? Are Likes labels? Can I comment on a like? Can I label a like? Can I count Likes given my own content filtering algorithms? Can I comment on, like, label, or count labels?
    (edited)
  329. @geoah:nimona.io
    Bob Wyman: What is the benefit of treating different concepts such as post/comment/like the same?
    If it's just for querying this can surely be left to whatever is doing the indexing and querying rather than influencing the underlying data structures right?
  330. Bob Wyman
    @geoah:nimona.io: I think treating as much as possible in a consistent manner produces a cleaner, more easily understood architecture even though I understand that doing so might introduce some implementation complexities. In this case, treating metadata (likes and labels) in the same way as data means that we'll have the same conversational power over metadata that we have over data (i.e. Posts.) I think this is valuable since it is clear that the distinction between metadata and data is often one of perspective. What is metadata given one perspective is data given a different perspective. It would be useful and valuable to be able to "dislike" a label or even discuss that label. Similarly, it is valuable to be able to label a label. (i.e. I'd like to label some label of yours as being either good or bad or credible, etc.)
    (edited)
  331. @geoah:nimona.io: I should also say that I tend to view much of what is handled by separate facilities in today's social networks as all just different kinds of annotations.
  332. testman joined the room
  333. Aaron Goldman
    I think the question of if the reactions can be kept generic with a Safety Label as a specific use case will rely on finding an acceptable solution to the materialized view problem.
  334. the "like" count should probably not be downloading the full set of likes and then counting them.
  335. sum(like) for like in posts
  336. Bob Wyman
    Aaron Goldman: My hope is that even if the materialized view problem can't be solved well, that the architecture would show all these things as being equal, but that any necessary departure from that architecture would be a recognized limitation of the implementation. If there is a clean, consistent architecture than we may find that methods are discovered at some later time that will allow an implementation without the limitation. Ideally, we'll see many implementations over many decades of what should be a reasonably unchanging architecture. It would be unfortunate to allow today's implementation issues to define tomorrow's architecture.
  337. @geoah:nimona.io
    Bob Wyman: Treating everything the same way could result in some fun "post < comment < like < label (on like) < like (on like label) < comment (on like label like)" nesting.
    Unless you specify strict relationships between those I feel that the user experience could suffer, and it would also make it very annoying to query/process.
  338. moonrocket16 joined the room
  339. Bob Wyman
    @geoah:nimona.io: Yes, it is possible that such consistency in treating things could produce problems that we've yet to address in existing systems. But, that alone isn't a reason not to do this since I think we'll also find that such architectural consistency will enable our doing things which are very valuable. I suggest that it would be best to consider the possibility and only reject it, or accept an implementation limitation, if any problems it causes turn out to be intractable or if the result is somehow illogical.
  340. Daniel Holmgren

    but the system is very flexible, we can define the schemas however we want to, and then its up to the indexers from there

    as whyrusleeping said here, the schema system is very flexible. Different apps should be able to define different schemas in different manners. So at the abstract level: Posts, likes, comments, labels, etc are all the same sort of object - a content-addressed document that adheres to a schema published on the schema network. Some schema systems will likely allow the (somewhat) ridiculous chain of "post < comment < like< label < like < comment" that @geoah:nimona.io points out. Others will be a bit more restrained in the sort of interactions they allow.

    My opinion is we should call things what they are: Likes aren't posts. Labels aren't posts. But likes, posts, and labels are all social media documents that do share many things in common. So let's handle the shared abstraction at one level, but don't introduce complexity at the application layer by pretending that every object is the same "type" of thing. Currently, we split up the documents in a namespace by "posts" & "interactions". As Why said "A like is not a post", & as Aaron Goldman said "It is not yet clear how or even if reactions need special treatment." At the moment we are giving them special treatment, but we've had discussions about removing that divide & treating them all as the same "type" of object

    (edited)
  341. (ah enter button is too easy to press! doing some edits ๐Ÿ˜…)
  342. @timbray:matrix.org
    I mean, we're not building infrastructure to handle an entirely abstract set of networked resources. If we were, we could invite TimBL over to explain the Semantic Web earnestly at great length. This whole online conversation has had a couple decades to develop a nice familiar set of noun and verb semantics, and the shape of the API should not differ dramatically from the mental image held by a reasonably intelligent participant in online conversations.
  343. whyrusleeping
    Sure, and we can build the specific thing inside of ADX, but the nice thing is that we can abstract however we want, and still be compatible
  344. the main concept here is the separation of the data and authority layer from the application and indexing layer
  345. Bob Wyman
    @timbray:matrix.org: You're arguing for path-dependence and merely replicating what exists? I think we need to recognize that what is in people's mental images is largely limited to what they have seen before. That doesn't mean that what they've seen is all there could be or all that they would want to see. We've got an opportunity here to learn from the past while exploring new directions. I hope we'll do more than merely codify what we've seen before. (In any case, Tim, you, and I, and others have been changing people's mental images for on to 40 years now... Why stop now?)
    (edited)
  346. Bob Wyman
    Message deleted
  347. Jenkijo joined the room
  348. hexbang joined the room
  349. bayhican415 joined the room
  350. Taylor Ikari joined the room
  351. fi hunb joined the room
  352. nfk joined the room
  353. ricvolpe joined the room
  354. @davidprieto:envs.net joined the room
  355. @davidprieto:envs.net
    Hi everyone
  356. whyrusleeping
    Hello!
  357. ratkins
    The danger with attempting to over-specify and over-classify and model every little thing is that complexity will kill you. The โ€œSโ€ in SMTP stands for โ€œSimpleโ€ and it was once, and itโ€™s why email is still around. Make the specified protocol stuff as simple as possible and people will build (โ€”be able to build) useful application-layer abstractions on top of it.
  358. (Without knowing that much about the state of the art, Iโ€™d have all the messages be the same, with some flag indicating โ€œcontrol messageโ€, indicating to the indexers that the body should be parsed as some agreed-upon format containing likes, tags, deletions, etc.)
  359. Aaron Goldman
    There value in keeping things simple and flexible but we get the most Interoperability when we specify things. JSON is a subset of utf8 encoded unicode but it feels more flexible than a raw string. The spec meant being able to parse it with a common reader and treat it as maps, lists, strings, numbers, books, and nulls. Feels much more powerful. A common structure for post that enables them to be maps with specific keys to indicate concepts like replaces, responds to, is caused by. Could make a system feel more flexible when you took away choice by reserving some keys
  360. sheeple000 joined the room
  361. barisyilmaz joined the room
  362. yip yuanchao joined the room
  363. yip yuanchao set a profile picture
  364. @lee_baker:matrix.org joined the room
  365. dego11 joined the room
  366. Manuel Etchegaray joined the room
  367. Manwรซ changed their display name to Manuel Etchegaray
  368. Manuel Etchegaray set a profile picture
  369. Manuel Etchegaray
    Hi everyone! Contrats on the ADX release! super excited to begin playing with this
  370. Manuel Etchegaray
    Im just reading trough DIDs and triying to wrap my head around it (its a lot!), but if I might ask, is there any specific pro of using them over some more battle-tested web3 public/private key management methods like ETH spec (using wallets) ? Am I right to assume the goal of DID is only providing authenthication or there is more to it in BlueSky?
  371. @patricioc joined the room
  372. Aaron Goldman
    The pro of DIDs is that they keep the identifiers out of each other's name space. We could decide to support did:ethr, did:ens, or did:pkh:eth for ETH keys on the main net, ens short names, or ens keys that are in wallets but are not tied to the eth network respectively. We are able to defer a lot of design decisions by having the did method namespace to keep identifiers with different resolution methods separate and unambiguous.
  373. @wclayferguson:matrix.org joined the room
  374. @wclayferguson:matrix.org
    Hi guys. Glad to see you have a github project up!
  375. ultraman joined the room
  376. pfrazee
    On the ontology question, I just want to point out that flexibility and evolvability are inherently good, but we have to include tools for devs to coordinate with each other. If you cant predict what your changes will do, you end up actually not able to extend the schemas
    (edited)
  377. In SSB, we had a "it's just JSON with a type string" philosophy. A few patterns, like links between users, were auto indexed and thus queryable. Very flexible, didn't want to constrain the system.
  378. What happened was, you would add a new field to an existing schema and if that field wasn't understood by other clients, it was a disaster
  379. The common example of this is, I used EmbeddedMedia and you used MediaEmbeds and now our users dont see each others cat pics in their tweets
  380. But it happened on just about everything. What format is the rich text, html or markdown? Users see markup rather than rich text. What happens if I had a field saying this belongs under a topic? People who don't support topics see these subtopic tweets in their main feed, which the author thought would be otherwise
  381. So as app developers, we got totally jammed up because we couldn't tell how our extensions would get interpreted. There wasnt any forum to discuss the schemas, there wasnt any tooling that would warn you about incompatibilities, there wasnt any mechanism for adding an extension "safely"
  382. So whatever we end up doing for schemas, we have to find the right balance of flexibility, ease of dev use, and tools that help us coordinate with each other. We've been discussing it a while and still need to do more rounds with the whiteboard
  383. Bob Wyman
    pfrazee: In HTTP, this issue, or at least part of it, is supposed to be handled by providing ACCEPT headers that describe the capabilities of the receiving system. Would a variant of that approach be useful here?
  384. pfrazee
    Yes! In fact, one of the ideas on our list is something we call Schema Negotiation
  385. In which a record will basically declare the schemas its using and indicate how to handle a lack of support
  386. Bob Wyman
    pfrazee: Of course, there is a problem in determining whose "ACCEPT" constraints matter. It may be that an intermediary indexing system doesn't know how to index some experimental or unusual mime type, but at least some clients do. Thus, the ACCEPT constraints of the channel intermediaries and those of the clients may differ. Also, it is likely that clients will differ between themselves.
  387. pfrazee
    True
  388. Thib joined the room
  389. @geoah:nimona.io
    Would the "mime types" in this case be something human readable (ie reaction/slap, reaction/like) that will require people to register the schemas against a registry, or would it just be the CID of the schema?
  390. Bob Wyman
    pfrazee: The obvious thing to do is say that intermediaries can ignore what they don't understand and simply pass on whatever they see. But, such a flat rule could cause issues. I might, for instance, decide to do a denial of service attack by sending out large numbers of messages with unsupported content and thus hog bandwidth, processing, etc. Or, I could simply write an app that has large messages that are only intended for a small number of people. (I'll give example later) If the general rule is to pass on unsupported data, are there reasonable rules that can be employed to avoid abuse?
  391. pfrazee

    In reply to this message

    Not sure yet
  392. @geoah:nimona.io
    pfrazee: sorry this was just me wondering mostly, didn't expect an answer as I guess it's way too early for any of this :D
  393. Bob Wyman
    pfrazee: Example of large messages processed by a general system but which are intended to be received or "read" by only one or a few recipients: Back in the early 80's, when we were first deployed email at DEC with ALL-IN-1, we found a customer complaining about email latency. After much investigation, it turned out that one of the sysops was sending daily system backups to "headquarters" through email since we supported "store and forward" which was more often successful than trying to do a file copy of a backup tape over intermittently connected systems. Most of the email system's bandwidth was consumed by messages having backup tapes as attachments... Should ADX support folk sending backups to some single client?
  394. pfrazee
    Huh. I'm not sure I could answer that at this stage
  395. @timbray:matrix.org
    on "mime types", @geoah:nimona.io raises an interesting point: In HTTP, request/response bodies are not expected to be self-identifying, that's what Content-type is for. In much message-based infrastructure, they are self-identifying, for example look at the Avro wire format used on Kafka (4-byte int selects the schema) or the "detail-type" field in AWS Events. My initial guess is that Bluesky is more Web-like than Event-bus-like but haven't really been in any conversations around that.
    (edited)
  396. Bob Wyman
    @timbray:matrix.org: I really hope that BlueSky is more event-bus-like than Web-like...
  397. aejaz joined the room
  398. Aaron Goldman

    In reply to this message

    definitely early to answer something like that but one imagine someone trying to share a large scientific dateset or large log file and the indexers very much not wanting to full text index that thing. If you think about it in an ipfs like way you can set a maximum blob size on the blob store and any object larger then that is broken into segments in a merkle tree. Since most people don't care about that data set it is unlikely to be in any of the content caches and you would need to get it from the home host.
  399. @timbray:matrix.org
    Not sure how applicable this is, but: At AWS all of the services (SQS, Lambda, Kinesis, etc etc etc) had message size limits. These ranged from 32K to a few MB. No matter how big we made them people wanted more. So the only good solution to the problem was, for large messages, dump them into S3 and send a pointer through the service. There were open-source API add-ons that would hide this from the user and make it look like the big message was flowing through the service.
  400. loadedpunk joined the room
  401. @timbray:matrix.org
    (max size of an S3 object was I think 50G last time I checked)
  402. Bob Wyman
    @timbray:matrix.org: Yup, makes sense. However, if there is some idea of "supported" vs. "unsupported" data, you might want to allow different size limits for each.
  403. Aaron Goldman

    In reply to this message

    Is this a necessary duality? If the data is stored both sorted by time in a transaction log and indexed by entity, attribute, value you could follow it like a firehose or search it like the web. no?
  404. Bob Wyman
    @timbray:matrix.org: Do you really want to startup the "push" vs "pull" debate? I strongly suggest that the system should be capable of supporting WebSub style subscriptions at some point, hopefully early, rather than relying on repeated polling. (See https://www.w3.org/TR/websub/ ) Of course, having spent so much time on prospective search and PubSub, one would expect such a comment from me.... ๐Ÿ˜€
  405. Aaron Goldman

    In reply to this message

    I'm just saying that the max object size may be much larger then the max segment size. I may make a 50GiB object in S3 but it is still storing and retrieving 64Mib block from the storage nodes.
  406. @timbray:matrix.org
    Sure. But it'd be nice to hide that from developers, present an illusion that you can move around arbitrarily large messages.
  407. Bob Wyman
    @timbray:matrix.org: I would expect client developers to want the ability to "page" in large objects -- either because they are on limited memory devices, or because they only want to show the "first part" of some object to a user without downloading the whole thing. So, we may find that developers are happy to deal with assembling parts of objects. Even so, those developers should not be exposed to the service's decisions concerning the division of objects into parts. (i.e. The service might break things up into 10MB chunks, but the clients might want it in 1MB chuncks and should not care or even know what the service wants or does for its own purposes.)
    (edited)
  408. Bob Wyman
    Is it assumed that all content is "READ:ALL"? i.e. That all content that flows through ADX can be read by all clients? The Architecture doesn't mention access-control. Is this an indication of intent, or, will access-control be dealt with later? (i.e. Can I make a Post that can only be read by some subset of all users or attach a Label that can only be seen by some subset?)
  409. whyrusleeping
    alright, this conversation feels like itโ€™s drifting a bit far away from the topic
  410. Lets try and keep things a bit closer to whats more near term and relevant to ADX
  411. Bob Wyman
    whyrusleeping: Sorry. I've been trying to ask questions that I think are relevant. Where did I go wrong? Or, what's a better place to ask these questions?
  412. whyrusleeping
    It's all good, It just feels a bit like we are going down a rabbit hole is all and I want to direct things away from those. Not exactly sure how the problems of passing large messages around are exactly relevant to where we're at right now
  413. that said, i'm happy to be convinced that it is relevant
  414. pfrazee
    personally I was okay with it. We're still finding our feet on moderating the channel -- the discord flew out of control on us
  415. In reply to this message

    That's currently the case, but I have a TODO in my head to discuss this at some point
  416. we're scoping v1 to be "public conversation" and dont want scope creep to hit us, but there are 2 things that we have to consider
    1. how will ACLs work if/when we decide to do it,
    2. there's private state that you need even for public conversation, simple things like read/unread state on notifications
  417. @wclayferguson:matrix.org
    Regarding the ACCEPT headers, that applies to a call/response systems, but in the IPFS world the data will be just sitting there, and things will have to know how to understand it, without an ability request the format they want.
  418. whyrusleeping

    In reply to this message

    Yeah, self describing data definitely inverts that paradigm
  419. pfrazee
    Right yeah. Self-describing = putting the Content-Type part in the document itself
  420. And then when you read or process the record, you put the Accept header in there and metadata comes out informing you about the negotiation
  421. https://github.com/pfrazee/json-lz#jlzdetectsupportobj-schemaids this repo is more confusing than it needs to be, unfortunately, but this is an example of schema negotiation
  422. Bob Wyman
    pfrazee: At PubSub, Google, etc. I developed a mode of search, called "cross-matching," which is a combination of retrospective and prospective search, and which allows the search system to provide the function of ACL's as a native capability without a need for an explicit ACL system. I'd like you folk to consider it. How would I best present you with the method for consideration? (You seem to be supporting both "search" and "subscriptions," so, it seems like you already intend to develop the necessary components for a cross-matching system.)
  423. pfrazee
    That sounds quite interesting. A short writeup that explains the core concepts would be wonderful, I'd love to see it
  424. Aaron Goldman

    In reply to this message

    not clear yet at what level(s) we want access-control probably yes at a repo level less clear you need it at a record level. Also not clear where you would want access-control as opposed to encryption. something like peergos bats
  425. whyrusleeping
    I imagine at least one form of access control will be at the schema level, and involve encryption of the content in that subtree of the user store
  426. Bob Wyman
    pfrazee: Okay. I'll write it up. The basic idea is that all objects (both "queries" and "documents") are composed of two parts: Data and Constraints. For a Query, the Constraints define what you'd normally think of as the query and the data part describes the entity that issued the query (i.e. the ACL data). For a document, the Data part is the thing of most interest and the Constraints are essentially queries on the "data" parts of queries. Then, for each retrospective search, you match the constraints of the query against the data of the documents (find docs with word "foo") and then match the constraints of the documents against the data of the query (i.e. match queries with age > 16 or group = "dev"). The prospective (subscription) search is essentially the reverse. This allows you to do all the ACL stuff, as well as handle things like age-limitations, porn-filters based on labels, etc. with a single, consistent system. Basically, queries constrain which docs they match and documents constrain which queries are allowed to match them. That's a cross-match. I'll try to get you a better write-up later.
  427. pfrazee
    okay great, just jumped in a meeting, will read your overview here in a sec
  428. Bob Wyman
    whyrusleeping: It is best to try to avoid thinking of "access control" as a distinct function. If you use a cross-matching strategy, "access-control" can be naturally provided as a side-effect of search. Given that you'll need to support both retrospective (traditional search) and prospective search (subscriptions), you're already building all the needed capabilities. You just need to tie them together. Consider the example of data in a dating app. A man may seek women in NYC, but not all women in NYC are interested in men. So, a man might construct a "query" with the parts (Data = "Man", Constraint = "Women in NYC"), processing the constraint against the data parts of people's records might produce a list of "Women in NYC" however, the constraints for each women is then matched against the man's data to find the cross-matches. This general pattern can be used to implement ACL as well as many, many other useful social media content filtering functions.
  429. @wclayferguson:matrix.org

    Fancy stuff is great but hopefully the baseline least-common-denominator requirement would be something like just having in your Personal IPFS Repository stuff like this:

    12345{
       type: "Note",
       title: "My First Post",
       content: "Hello Everyone"
    }
    

    I think if we focus on designing "types" (the properties), and don't create a ton of complexity right off the bat (at least not as a requirement) then this is going to catch on fast, and apps that can "browse" the content will spring up overnight. My platform is already poised and waiting. :)

    (edited)
  430. @wclayferguson:matrix.org
    And for those not aware: Even the existing IPFS Companion and IPFS Desktop would be able to do just the very most basic browsing of the data, more as a diagnostic tool than anything else, but I'm just stating this for people not that conversant in IPFS yet.
  431. @sutorinfo:matrix.org joined the room
  432. @sutorinfo:matrix.org left the room
  433. pfrazee

    In reply to this message

    okay this makes sense. In a way you need your queries to be composable -- you have the application's query, and then you have a permissioning query that needs to merge into it
  434. Bob Wyman
    pfrazee: The key thing is that "documents" and "queries" have essentially the same structure: (Attributes + Constraints). If the system makes that assumption, then all sorts of things become much simpler than they might otherwise be.
  435. pfrazee

    In reply to this message

    Yeah I gave that a read. I played a lot with the idea of using evolvable JSON Schemas like that, back when I worked on CTZN and Atek. Some real pros, a few cons. I do find using document-oriented schemas like that to be more intuitive than JSON-LD and RDF
  436. Bob Wyman
    pfrazee: You could think of it as "ACL on every object," but that will tend to limit your vision to the stuff that is usually done in ACL systems. (i.e. finding intersections in lists of users or groups) In a social system, the need to match is much richer: i.e. age, sex, location, labels used, etc... and we'll probably need boolean combinations of constraints on these objects (i.e. Age >18 AND location=(USA OR Europe)) Access control systems don't normally need that query power. What we do in access control systems is just a tiny subset of the uses for the general cross-matching pattern. So, given that access control can be implemented by cross-matching and since cross-matching is useful in doing other social stuff, why not just implement access control as a use of general cross-matching?
    (edited)
  437. csshsh joined the room
  438. pfrazee
    Makes sense! I'll keep that in my mental hot-cache when the topic comes up. Seems elegant and flexible
  439. whyrusleeping
    Raise your hand if you've tried out the demo code
  440. Michael Mullins
    planning on playing around with it this weekend, so not yet
  441. whyrusleeping
    i've been thinking of running the server portion somewhere publicish for people to mess around with
  442. @numero6:codelutin.com

    In reply to this message

    Observing how W3C, XMPP, SSB deal with that, I think Matrix found a good compromise between "freedom for developers to hack fast and iterable quickly on product" and "we need a spec that everybody implements for interoperability". The content types are namespaced: if you develop an app about chess you introduce a concept of a chess move which will be named "com.mycompany.chess.mychessmovetype". The matrix spec allow that. When the chess move become normalized and enter the spec, it gets its proper name "m.chess.move" (usually a "m.msc1234.move", 1234 being the Nยฐ of the matrix spec that introduce this content).
  443. Michael Mullins
    whyrusleeping: If you publish it, Iโ€™d be happy to give it a try. At the very least that might give a better feel of how things work remotely. Either way though I think right now Iโ€™d want to try to pull the whole thing down and try to get it running locally just to get a better idea of the systemโ€™s complexity as a whole (since that seems to be a talking point).
    (edited)
  444. pfrazee

    In reply to this message

    Yeah their approach is included in our research. Like I mentioned, we're gonna deep dive it as a team soon and try to apply some rigor to all the options
  445. @numero6:codelutin.com

    In reply to this message

    I would be very interested if you publish something about the team tables/pro/cons/reflections. It looks to me that this question is quite determining for the success / failure of a protocol.
  446. pfrazee
    It's a pretty important piece, yeah. We'll take detailed notes
  447. Michael Mullins
    When people do dig into demos and code is there any specific areas you would like us to focus on? Any specific type of feedback youโ€™re looking for?
  448. @wclayferguson:matrix.org

    In reply to this message

    I'll be writing a browser/indexer to view the data, before I start doing any posting. I'll be doing it from Java too (via the IPFS API), so I can get started once I have an example Root CID to start exploring from. Let me know once there's something out on IPFS already existing to play with.
  449. @wclayferguson:matrix.org
    Even a single repository with a single post that says "Hi everyone" is enough for me to write the crawler, because I just need to know the data format.
  450. And i can obviously reverse-engineer JSON without even having to see any spec. :)
  451. Or somebody can even send me a CAR file full of the proper data structures and objects, and I'll reverse engineer that instead.
  452. whyrusleeping
    The data isnt easily available over the public ipfs networy, the data server we wrote doesnt expose bitswap/libp2p
  453. So youll have to import a car file for inspection
  454. Daniel Holmgren
    i meant to include a note in the README (i'll add it in). you can browse CAR files at https://explore.ipld.io/
  455. to get the CAR file containing your entire repo just run yarn cli export
  456. fun for poking around the actual repo structure ๐Ÿ™‚
  457. @wclayferguson:matrix.org
    I'm slightly confused then. It doesn't sound like the data will be available on IPFS, is that right? Wouldn't we just want the Repository Roots to be available using IPFS?
  458. I go a step further too and say the root needs to be IPNS named (optionally).
    (edited)
  459. Isn't IPNS absolutely the perfect fit for this use case? I mean it's what it was designed for.
  460. @wclayferguson:matrix.org
    The IPNS however is of course just a "nice to have" thing, while, I think we'd be 100% unanimous if we took a vote about the IPFS part.
  461. Aaron Goldman

    In reply to this message

    It's never unanimous ๐Ÿ˜ž
  462. @wclayferguson:matrix.org
    Syncing a CAR to IPFS is trivial, so I'll be doing it. No one else needs to. It just seems bizarre not to.
  463. How is a Post available via CID if not thru IPFS? lol.
  464. "Protocols over Platforms"
  465. Aaron Goldman
    It is nice to be able to use IPFS but if a organization wants to host the repo a different way that's fine
  466. @wclayferguson:matrix.org
    The way I read the spec it was IPFS storage.
  467. I didn't see the IPFS part as "one type of plugin" or something like that.
  468. seemed completely integral to me in the doc.
  469. Aaron Goldman
    Certainly a cid keyed store. I don't know wether a cid keyed store over http, s3, GCS, ... Counts as IPFS
  470. @wclayferguson:matrix.org
    It's definitely a good feature to have someone's entire personal archive in one single file and I've actually been pushing that for years myself, so that's good. After thinking about it more I like it. Those archives can be unzipped directly onto IPFS as needed.
  471. CAR is of course just the import/export format for IPFS, like a ZIP file kinda thing. It's good.
  472. But there should definitely be an example CAR file sitting in the github imo.
  473. whyrusleeping
    Having the project start with everything baked into ipfs causes a lot of concerns around the actual scalability of the project. Using the DHT and bitswap for the entirety of a social graph isnt really tenable, so we are designing things with 'twitter scale' in mind
  474. For now, this means using ipld data structures in a way that allows portability, but have the data be moved through data servers using potentially custom protocols (maybe server<>server makes sense to use bitswap)
  475. but announcing billions of tweets through the DHT is just not tenable
  476. You can easily have your own implementation of the bluesky dataserver that makes content available via dht/bitswap, and we actually want to see people doing that
  477. (I actually plan on building that into the go implementation as an optional flag ;) )
  478. arcalinea
    Message deleted
  479. arcalinea
    Stepping in as a mod here -- as someone who's been watching the Discord, I've noticed a lot of bad-faith engagement from @wclayferguson:matrix.org in the past weeks, including the propagation of blatant untruths, so consider this a preemptive warning that I will not be allowing this chat to descend into harassment of the devs around design choices or other decisions.
  480. Screenshot for reference -- I understand there may have been frustration with our lack of public engagement, and there might be frustration now with design choices or other aspects of what we're doing. But our main priority as a small team is to stay focused and keep shipping code that people can someday use, so I will be aggressively moderating this chat to prevent this kind of demoralizing critique and harassment from cropping up here as it has in the Discord.
  481. Aaron joined the room
  482. @wclayferguson:matrix.org

    In reply to this message

    Right, I had considered the possibility of broadcasting each post using PubSub and always was skeptical that this would ever work at scale and had therefore abandoned that idea in my own platform months ago. However what had made me think all the data was going to be stored on IPFS rather than CAR files was the fact that the docs say everything is identified by a CID.
  483. CAR definitely has performance advantage that a zipped file can be pulled in one single HTTP request to get the entire content of a user.
  484. Daniel Holmgren
    Yup content in both IPFS & CAR files are addressed by CIDs. I think of CAR files as one possible transport. another is bitswap + DHT. If you know the location of your data & you know what you need, CAR files are much more efficient. even if it isn't the entire content of the user & just a diff, transferring via CAR will be faster than bitswap + repeated runs to the DHT
  485. CIDs are just self-describing hashes
  486. to me, the most interesting things parts of IPFS land are things such as CIDs, multiformats, IPLD, etc: tools that allow Self describing data
  487. p2p bitswap & DHT lookups are just one thing enabled by it
  488. the data is self-describing & trust is pushed down into the data layer itself. which allows for p2p interactions, but also allows for trustless interactions with servers
  489. which is what we're after :) but like why said, these two approaches aren't mutually exclusive. If the data is already in a format that abstracts away from the need for a host, the transport doesn't really matter! this enables portable data through familiar client-server transports or through more experimental p2p transports
  490. arcalinea
    whyrusleeping: you need to write that blog post breaking down the pieces of what make up the IPFS stack and how they can be composed/decomposed
  491. whyrusleeping
    its coming, i just need to figure out where to publish
  492. arcalinea
    Bluesky blog? :)
  493. whyrusleeping
  494. Daniel Holmgren
    adx?
  495. (jk don't do that... yet)
  496. David Chou joined the room
  497. David Chou
    Hi guys, just get to know this project. hope I'll able to put some code here ๐Ÿ˜ƒ(later
  498. whyrusleeping
    ๐Ÿ‘‹
  499. Jonathan joined the room
  500. @christian:c2bo.net joined the room
  501. guijie wang joined the room
  502. Serhii Khoma joined the room
  503. NO_FLAC joined the room
  504. hereforcuriosity changed their display name to NO_FLAC
  505. NO_FLAC
    hello
  506. Is it ok to ask questions here?
  507. im testing the prototype adx
  508. im getting errors
    (edited)
  509. all i did was follow the steps shown on the repo
    (edited)
  510. Stephan joined the room
  511. Harlan Wood

    Really happy to see this first launch from this awesome team. ๐Ÿ˜Ž Congrats and best wishes! I think you are uniquely positioned to unify the somewhat fragmented decentralized social network ecosystem.

    Really cool that youโ€™re using IPLD, such a powerful tech โ€” which didnโ€™t seem ready when I tried it via JS a year ago, so great to see it is working in ADX now.

    I dropped a PR to run tests and builds in CI, about which I am passionate, as some may remember from early IPFS days.

    (edited)
  512. Adelaida (๐Ÿ’Š,โšก๏ธ) joined the room
  513. Harlan Wood

    In reply to this message

    What errors are you getting?
  514. cj joined the room
  515. Adelaida (๐Ÿ’Š,โšก๏ธ) changed their profile picture
  516. ๅˆ˜ๆŸ joined the room
  517. Adelaida changed their display name to Adelaida ๐Ÿญ๐Ÿ’Š
  518. Adelaida ๐Ÿญ๐Ÿ’Š changed their display name to Adelaida (๐Ÿญ, ๐Ÿ’Š)
  519. Genbuchan joined the room
  520. Aaron Goldman

    In reply to this message

    "unify the fragmented decentralized social network ecosystem" is a phrase that hurts my head a little.
  521. Aaron Goldman

    In reply to this message

    So we know you could get an error if there is no posts and thus no root for a user. Which should be info and not error. What other errors are you seeing
  522. i was joined the room
  523. powderpanda joined the room
  524. Harlan Wood

    In reply to this message

    To clarify, what I mean by that is: just as IPLD aspires to be a โ€œthin waistโ€ protocol ( see eg https://github.com/ipfs/notes/issues/148#issuecomment-234797610 ) โ€” I believe that such a unifying layer is both possible and desirable in the decentralized social network space. And that the tech stack you have chosen: IPLD + object capability security model + DIDs is a combination with immense promise to be able to do exactly that.
  525. whyrusleeping
    +1 to targeting being a thin waist protocol for social
  526. Thatโ€™s definitely the idea, i want to have a schema for long form blog posts as well as tweets, all under the same adx user store
  527. jayasimhaprasad joined the room
  528. artixlinux joined the room
  529. hidaruma joined the room
  530. io token joined the room
  531. io token
    good
  532. rkrux joined the room
  533. Henri Carnot joined the room
  534. @rimuru:gentoo.chat joined the room
  535. @richard:huangyunsong.com joined the room
  536. Gerben joined the room
  537. Nad joined the room
  538. margincall joined the room
  539. Golda Velez joined the room
  540. Golda Velez
    hey guys we are thinking about implementing the reputation feed stuff we have so far over ADX, like parallel with microblogging Is that encouraged to do that? And is there like a sense of defining schemas/datamodels or kinda just write the code like following the pattern in https://github.com/bluesky-social/adx/tree/main/common/src/microblog And hm we are thinking if there is a way for the reputation "posts" be consumable by other critters More general question , if folks write a UI or an indexer or whatever, should we do it as a separate repo and include ADX as a dependency right?
  541. Golda Velez
    do we have a IPLD datamodel schema for the posts anywhere? we were looking for it in the repo right now but we're still kinda new to all this!
  542. @bigpoppaken:matrix.org joined the room
  543. nicktorba joined the room
  544. nicktorba
    Hey everyone! Was excited to see that Bluesky is starting to share their work in public! I've very interested in work in deso protocols, especially as it relates to my work building tweetscape: https://twitter.com/TweetscapeHQ Wanted to join the group and say hello. Does the community host calls or events ever where you can get to know others?
  545. pfrazee

    In reply to this message

    Hey Golda! Weโ€™re busy in team week meetings this week so hereโ€™s a quick answer: itโ€™s probably too early for that right now
  546. In reply to this message

    Hi Nick! Not yet but we may in the future
  547. Baiwu Zhang joined the room
  548. Golda Velez
    Great job on all this by the way guys super well structured and impressive execution!
  549. nicktorba: the closest we have are the popup events in the dSocialCommons discord/matrix servers, we're holding a few ADX hacks - want to join those? Or if there's a kind of community event you think would be good we can host it - just the core devs are probably too busy to join most of them
  550. nicktorba

    In reply to this message

    do you have the link for those servers? I'd be interested to join the hacks! Seems like those types of things would be a good way to get caught up on whose doing what in this space
  551. arcalinea

    In reply to this message

    I would say you should work out how reputation feeds would work in a generalizable way without going to the effort of building on ADX right now, because it's still an experiment -- everything can and will change. Wouldn't want you to have to rewrite everything later
  552. @geoah:nimona.io

    In reply to this message

    if you don't mind me asking, what's the "reputation feed stuff" you're talking about? Any links/literature you could share?
  553. Michael Mullins

    Hey Bluesky team, I spent some time yesterday playing with the demo. Iโ€™m going to put together a few notes from my experience once I finish going through all the documentation. One thing I thought worth bringing up now though: I was unable to install ADX with node 15.

    Looks like the @typescript-eslint/eslint-plugin errors out if you try to use node 15. Is this a known issue? Worth creating a ticket for?

  554. Hereโ€™s a clip with the install error ^
  555. Harlan Wood

    In reply to this message

    Some loose info scattered in reputation feed channel in this serverโ€” https://discord.gg/qS2CDURh โ€” also trustgraph channel has a cluster of links at the very beginning โ€” also
    trustgraph.net
  556. pinocast joined the room
  557. Wei Duan joined the room
  558. dashus joined the room
  559. ensocoatl joined the room
  560. Daniel Holmgren

    In reply to this message

    Hey Michael thanks for the report. I bumped up the required node version to 16 instead of 15 (in README & package.json). I'd like to eliminate our WebCrypto dependency which would prevent us from requiring >=15 (just stable versions as eslint requires, which is reasonable)
  561. cj
    Could the link to the demo and its source be added to the room's topic?
  562. Oh, Ok. I found the project (
    link
    ) but it would be good link it in the room's topic for easy access
  563. marfl joined the room
  564. @jan:cloudcheck.io joined the room
  565. Michael Mullins
    Daniel Holmgren: Ok awesome, thanks for the update. I think for the purpose of a demo strict engine requirements are fine. Just wanted to make sure people werenโ€™t getting blocked on that.
  566. Mark Foster SSI: @mfoster.io

    In reply to this message

    In reply to
    arcalinea
    I would say you should work out how reputation feeds would work in a generalizable way without going to the effort of building on ADX right now, because it's still an experiment -- everything can and will change. Wouldn't want you to have to rewrite everything later
    arcalinea: Thanks for the heads up. When we were hacking away at ADX we noticed the SQLite In memory store while we were searching for the schema. Are there plans for testing out Linked Vocabularies for distributed cross network/domain interoperability like Activity Streams, FOAF, and
    Schema.org
    ? We noticed the mention of ActivityStreams in the indexing section. Any pointers you can give us on potential methods of extending/expanding/interoperating with ADX data sets would be very much appreciated. I could see some expanding methods using Linked Data Vocabularies. We started an OpenClaim context here: https://github.com/blueskyCommunity/aozora/blob/gvelez17-open-trust-claims/CODEYARDS/reputation_feed/OpenClaim.jsonld but paused until we could discover methods of interoperating with ADX. Thanks.
    (edited)
  567. Golda Velez

    In reply to this message

    Sure, the events are all listed at https://join.whatscookin.us/circle/dSocialCommons and they're announced every week in the discord/matrix rooms that are linked from https://dsocialcommons.org - thre are at least 2 popups a week generally for hacking and discussing things, and anyone can add one
  568. while I'm being noisy in here - quick note, I'm sure whyrusleeping is deeply familiar with this - but go-ipfs >> js-ipfs in terms of reliability & stability right? no worries as this is just an experiment - just note that 3box had to move everything recently to the go version
  569. arcalinea

    In reply to this message

    So we haven't settled several questions around schemas yet -- currently working through them in a deep dive this week. Will circle back soon
  570. whyrusleeping

    In reply to this message

    Yes, go-ipfs is the one to use
  571. Anton Kent - Anytype joined the room
  572. letonique changed their display name to Anton Kent
  573. Anton Kent - Anytype

    Greetings Bluesky :-)
    Anton Kent, p2p engineer from

    here. Would like to thank you for your awesome market research (you even mentioned us :-)) and for the recent ADX repo.

    I am going to do a review of ADX soon.

    p.s.
    If you are interested: we are using heavily patched and optimized Go-Threads/Textile + our own CRDTs + p2plib.

    (edited)
  574. Anton Kent - Anytype set a profile picture
  575. Mark Foster SSI: @mfoster.io

    In reply to this message

    Thank you, no rush, any feedback or input around JSON-LD/RDF, I am always willing to help.
  576. Wael Bettayeb joined the room
  577. grateful-dev joined the room
  578. ใ‹ใŸใŽใ‚Šใ‚ใพใญ joined the room
  579. @jigojisho:matrix.org joined the room
  580. @numero6:codelutin.com

    In reply to this message

    ๐Ÿ‘‹ Anton Kent - Anytype anytype is a great project, does it have a matrix room I can join?
    (edited)
  581. Anton Kent - Anytype

    In reply to this message

    Thx! No, unfortunately. we use Discord
  582. deadwoody joined the room
  583. tdelaselle joined the room
  584. whyrusleeping
  585. @numero6:codelutin.com

    (first try reading the spec ๐Ÿ˜“) Making speech orthogonal to reach is ๐Ÿ‘๏ธ. There is freedom for indexer to chose content. But is there something for authors to chose/consent to indexers?

    Three cases of me wanting to control indexing of my content

    • Google using my content as a clue to page-rank the web
    • things like ThreadReaderApp unroll (i don't want my content to be printed on a website I don't trust)
    • ClearView feeding it's database

    Mastodon "solve" that by having search only on hashtags so people can choose which term their content can be found. There is also a #nobot convention (you can add #nobot to your bio and a benevolent bot won't follow you) but it lacks ability to chose which bot I'm ok with or not.

  586. another thing I don't grasp is who you imagine as the "The (currently-unnamed) DID Consortium" members? Are those members some kind of authorities (like certificate authorities), government, foundations, random people, token holder of a DAO, any person who run an indexer, data repository?
  587. whyrusleeping

    In reply to this message

    Yeah, I think adding some sort of 'robots.txt' equivalent to your adx store will happen. Still need to figure out what exactly that looks like, but its really necessary as you point out
  588. Even further, we probably want to think about ACLs in the context of your data server, maybe your data server can enforce who can read your data?
  589. In reply to this message

    I think an important thing to point out is that the DID Consortium will be somewhat separate from the protocol itself. It will just be a really good DID document host, but you can use a different DID provider if you like
  590. the entities that will run the consortium are definitely undefined as of yet, but the idea is that orgs that are interested in the general health of the network would run nodes, for example, major social media companies that adopt bluesky could run a consortium node to support the network
  591. @numero6:codelutin.com

    In reply to this message

    it's fun you make a parallel with robots.txt because IIRC Google no longer honor it (and many people don't know that). IIRC It allows to remove your content from search result but content will be crawled (= bandwidth, CPU cost for your served) and used (your links will be processed to rank pages you link data). The hard thing with that is that you may "consent" to an indexer intent but the indexer may have another undisclosed intent.
  592. In reply to this message

    ๐Ÿค” so... there will be multiple consortiums?
  593. whyrusleeping

    In reply to this message

    Yeah, that gets into weird territory, but is not meaningfully different than problems that are had on the web of today.
  594. In reply to this message

    I mean, if someone else wanted to run a different one, i guess. We don't plan on running multiple consortiums
  595. like, people could use microsofts Ion instead of the consortium, but it would be expensive and slow for them to make document changes
  596. Michael Mullins

    One thing I was still confused on after reading through the docs: Who hosts the DID documents? At first I thought it might be the consortium, but y'all mentioned the consortium potentially only keeping track of the keypair changes of DID documents. Does the consortium host them or do they serve purely as an auditable archive?

    I'm assuming understanding this might also require me to understand how the did:web method (and its potential replacement) works. I'm still wrapping my head around DIDs in general.

    (edited)
  597. @numero6:codelutin.com

    In reply to this message

    yes, that's how the mobile spyware data-brokers industry work: you install an app, you get trackers. The thing is that, even if you are aware of that, you need the app. So I'm thinking that "hostile" indexer would be quickly banned but indexer providing a good service (relevant content suggestions) may provide service only if you accept your data to be used for another (negative) purpose, and you don't truely have the choice. If an indexer get a major market-share (ร  la Google for the web), one may be "obligated" to accept indexation or being de-facto shadow-banned. I may be freaking too fast but ADX design is accurate regarding data hosting, but the indexers are a more dangerous power
  598. cryptofounder joined the room
  599. whyrusleeping

    In reply to this message

    the consortium would host DID documents that its responsible for (i.e. its not hosting Ion documents). It will also be entirely content addressed and easily cacheable, so I see the consortium proper being a host of last resort, while CDNs pick up the day to day hosting
  600. In reply to this message

    Yeah, I don't disagree that there is a potential centralization risk on the indexers, but part of the idea is to make a significantly more open a structured set of data to be indexed so that the barrier for other indexers to compete is much lower
  601. right now doing web scale indexing is hard even just because of the scale and difficulty of parsing every web page into a useful dataset
  602. with bluesky adx, you at least start with common formats and protocols over the data to be indexed, the schemas here become a really important tool to help level the playing field on indexers
  603. Michael Mullins

    In reply to this message

    DID documents that its responsible for

    Which DID documents is it responsible for? Any DID document referenced in the did:bluesky method? And following that logic, could apps/servers potentially support other DID methods that define different hosting methods (e.g. ion) and bypass the DID consortium?

    (edited)
  604. whyrusleeping
  605. Bob Wyman

    In reply to this message

    One of the things that we learned back in the early days of blogs is that, since blogs were scattered all over the web, it was very hard to discover what blogs existed. The same lesson had been learned for the web itself in earlier days and lead to TBL's list of new websites as well as to the use of Usenet's comp.infosystems.www.announce to announce new websites.

    You might have all the crawling and indexing capacity in the world, but if you can't discover Personal Data Servers, you've got nothing to crawl. For blogs, we once attempted to address this problem with FeedMesh,

    , PubSubHubbub (now
    WebSub
    ), and a variety of other systems.

    How will an ADX crawler discover what Personal Data Servers exist?

    I see that the architecture says that "Personal Data Servers can actively push updates to Crawling Indexers." So, as with WebSub, etc. a PDS can "announce" its presence. But, while a PDS "can" push updates, is it expected or required that they will do so? Also, if PDS's are notifying specific crawlers of their updates, doesn't that build in a systemic "first-mover" or "dominant provider" advantage since PDS's will tend to only update the most well-known crawler(s), but not new entrants, or more specialized crawlers? To avoid such system bias, and the resulting barrier to entry, I suggest that it would make sense to look at the development of shared receivers for updates that can then redistribute those updates to a variety of crawlers.

    In essence, the suggestion here is that the aggregation of raw content streams should be considered as something distinct from crawling or indexing. (The current architecture seems to consider them to be the same thing.) An aggregation server might receive and redistribute updates, or, it might simply notify subscribing crawlers of which PDS's have been updated. An aggregation service might also provide a list of PDS's that exist (e.g. All PDS's that have ever pushed an update, or perhaps just the subset that have updated "recently," etc.).

    (edited)
  606. Michael Mullins
    whyrusleeping: Awesome, that clears things up for me a bit. Thanks for taking the time to explain.
  607. Michael Mullins

    How will an ADX crawler discover what Personal Data Servers exist?

    Bob Wyman I'm guessing the indexers could maybe crawl the consortium for this information (since the DID document also stores all of a user's services)?

    (edited)
  608. @numero6:codelutin.com

    In reply to this message

    I guess one can ping an indexer (just like people do with Google)
  609. iboriginaldesi joined the room
  610. Bob Wyman

    In reply to this message

    I think a robots.txt equivalent wouldn't offer the semantic richness necessary to support the kind of control that @numero6:codelutin.com requests in his post.

    A robots.txt file is limited to saying only whether or not crawling is permitted, either in general or by an identified crawler. However, it seems that @numero6:codelutin.com 's request might be more accurately described as wanting to also limit the privilege to use the fruits of crawling. (e.g. He might allow Google to crawl and index, but wish to block Google from using the crawled data as "a clue to page-rank the web." Or, he might permit crawling in general, but seek to prohibit the construction and publishing of an "unroll.")

    Certainly, one could define a semantically richer "robots.txt" file and make it visible to crawlers, but then you'd have to trust crawlers to respect it -- just as you do with robots.txt. Also, unless that data were carried along with crawled data and passed on to indexers, etc, the use of aggregators or crawlers that served more than one indexer would be a problem since one indexer might do something forbidden but some other one would not. Alternatively, you require that crawlers announce their intended use of data while crawling. Having the PDS evaluate crawlers' statements of intent would allow them to reject crawlers based on their intent. (If you didn't trust crawlers to accurately describe their intent, you could buid a "Crawler Reputation" system that attempted to describe what various crawlers did with what they find.) But, once again, the use of shared crawlers would complicate matters. (You'd still want the "usage restrictions" to be passed on, by a crawler to its indexers, etc. But, then you'd have to trust those downstream indexers to respect the rights limitations.)

    It's an interesting problem.

  611. Bob Wyman

    In reply to this message

    Pinging an indexer, like Google, would only be useful if that indexer was willing to notify potential "competitors" about what pings it had received. Google did, in fact, do this for FeedMesh, back in the early days of the web. So did PubSub and a couple other blog services. However, some blog services felt that they "owned" the data they had discovered and refused to share their work (although they did read from FeedMesh for their own purposes... Grumble...)

    If it becomes understood that pinging some specific dominant indexer is the way to be discovered by other less-dominant crawlers, then that makes that indexer an essential and privileged component of the entire system's infrastructure and creates a barrier to entry for anyone else. Hopefully, those offering competitive services will compete based on the quality of the service they provide, not based on unearned privileges due to their merely having been first movers.

  612. Bob Wyman

    In reply to this message

    This would only work if all DID documents could, in fact, be crawled as a collection. Is it expected that all DID docs would, in fact, be kept in one place? Why? The normal expectation for a DID document is that while it can be dereferenced via the DID, its actual storage location can be just about anywhere. Would ADX want to change this? Also, some DID methods allow for privacy in that they require authentication and authorization in order to retrieve the DID doc. Would that not be permitted for DID docs used with ADX? In any case, if there was a way to discover all DID docs, you'd want to have the ability to do things like fetch only DID docs updated since some date, etc.

    A method more in line with the way DIDs normally work would be for someone somewhere, perhaps the consortium, to offer a list of DIDs that identified PDS's. Then, the list might be fetched and each DID dereferenced to discover the location of the PDS. But, that is likely to be a pretty expensive operation -- that's a great deal of DID doc dereferencing.

    (edited)
  613. iboriginaldesi
    Hi! Super excited to be here. I've read the section on personal data repositories here (https://github.com/bluesky-social/adx/blob/main/architecture.md#personal-data-repositories) and wanted to discuss data structure ideas. I work on an open-source project called (https://github.com/dolthub/dolt). It's Git For Data. Our Prolly Tree Structure has some very interesting parallels with the merkle semantics your are looking for (https://docs.dolthub.com/architecture/storage-engine/prolly-tree) in terms of content addressability, hash-like authenticity, as well as diff and merge. I would love to trade architecture notes and learn more from your team!
  614. Bob Wyman

    In reply to this message

    One nice side effect of having someone, somewhere keep a list of PDS DIDs is that one could then easily create private or specialized systems by simply providing an alternative list of DIDs somewhere else. For instance, if I wanted to have a private ADX system used only by members of my company, or of some club, I could post a short list of DIDs that would define the PDS's that should be crawled, indexed, etc. without having to expose those PDS's to the full universe of crawlers, etc.
  615. @numero6:codelutin.com
    ๐Ÿ‘‹ ๐Ÿ˜ฒ had dolt in my bookmarks and still hadn't time to have a look
  616. psabert joined the room
  617. @gnu_ponut:matrix.org joined the room
  618. EtomGM joined the room
  619. whyrusleeping
  620. @numero6:codelutin.com

    In reply to this message

    I thank about that. We could make a "decentralized cryptographicaly-signed proof of consent": the indexer may publish a document explaining is intent (= legal text). User may be asked its consent and a message (cryptographically signed by the user and containing a reference / DID / Hash to the signed "contract") will be stored somewhere (both by the indexer and the user data host). An indexer giving / selling / offering a service / API on a dataset can than proof that consent was obtained (hash merkle tree of all the consent documents as a verifiable small size proof that x thousands of consent have been obtained to build the dataset). Of course, this does not reduce the risk of an indexer obtaining consent to access data and make a secondary opaque non-consented usage. But, on the legal side, a legal entity will be able force an indexer to provide proof of consent on a dataset. Since, the "contract" of consent (signed by the user) is an immutable public document (content-indexed): one can proof that a data have been misused / consent has been given and the indexer activity is legit.

    I'm just dropping my brain output. Please ignore if it's stupid / off-topic ๐Ÿ˜…

  621. ebraathen joined the room
  622. @numero6:codelutin.com
    TLDR: let's store ToS as an immutable documents on IPFS user and let indexer store cryptographically signed user acceptation
  623. @wclayferguson:matrix.org
    Here's that I believe is the simplest way to achieve ACLs using encryption to share data, and where users are identified by PublicKeys... https://quanta.wiki/n/social-media-acls
  624. TL;DR encryption keys for the data are packed into the data. That is in line with the "Self Contained" (self-described, self-controlled) goals for posted data in BlueSky afaik.
  625. Manuel Etchegaray

    In reply to this message

    There is no other solution for encription in big groups of people right? say you have a room with 100 people and you need to encript each message with 99 other keys it becomes unsustainable very quick ..
  626. I think Matrix was also doing encryption I wonder how they do it
  627. @wclayferguson:matrix.org
    Somebody please correct me if this is wrong, but I think for sharing to a group the only "improvement" i know if is just doing the same kinda thing but where the group itself would have a keypair that identifies it, and everyone in that group has the PrivateKey of the group and can read the message. In the real world however normally you have either "Public" access or else a small number of people being shared with.
  628. Maybe there's some "Hierarchical System" that can do better, but then we end up making this rocket science when the goal is a simple "Protocol" and not even so much as a "Platform".
  629. It will be very very tempting to over-engineer the heck outta this if there's no one in charge able to veto complexity (I hope it comes across politely for me to say that)
  630. Manuel Etchegaray
    I think its worth exploring! not as much to create a new encryption haha but maybe some standarized way within the protocol to allow such schemas would be pretty neat and useful :)
  631. [pioneer] ๐Ÿ‡บ๐Ÿ‡ฆ joined the room
  632. RoseByte joined the room
  633. @lynn:the-apothecary.club joined the room
  634. @lynn:the-apothecary.club

    In reply to this message

    Hi, I'm struggling a little bit with understanding how formal schemas solve the problems you have described with regards to SSB and extending message content. Is it that schemas are supposed to encourage coordination between developers by providing an easy way to find a formal description of what they're trying to do?
  635. whyrusleeping
    @lynn:the-apothecary.club: instead of devs just adding random fields to their data structures and hoping other people do something with them, they define the schema for their application and use it, which means its a separate type, requiring explicit opt in from consumers to make use of. This means that in order to have other people use your schema, you have to explicitly coordinate with others on it
  636. you can support multiple different schemas if you want, but being clear about what your data structure looks like is important
  637. @timbray:matrix.org
    I think schemas are good and I think it's also good to build systems that allow schemaless operation. They work way better than most people think.
  638. @lynn:the-apothecary.club
    ok thanks. I see what you mean, though there is nothing stopping the same situation but with using random schemas other than now you cannot add fields to schemas that are already "established"? For us this is a disappointing solution and avoids the bigger problem, though I know there are other problems you are trying to solve. This bigger problem is that there should be no need for coordination to extend my application, I should just be able to do that. I should not even have to coordinate an implementation of my extension onto every client in existence, yet this is what is still being proposed.
  639. @wclayferguson:matrix.org
    Regarding Typing, actual types or good, and even type inheritance, but there's also potential to "fall back" onto Duck Typing (like TypeScript calls "shapes" of an interface object) where a type can be gleaned by looking at what properties are present.
  640. Default could be like if an object contains "text" property it's considered Markdown, and that would be the "only" required property for a totally public post.
  641. All this imo, should automatically fall back to the least-common-denominator where participants in the network can opt to not implement any of the fancy complex bells and whistles, and still participate. That's where ActivityPub went off the rails. You must do A LOT of work to even have a baseline compliance with the ActPub API.
  642. @lynn:the-apothecary.club

    In reply to this message

    To be clear, I'm saying that schemaless free JSON also has this problem. I don't think that either are good solutions and they're missing something that other data types in applications would have if they weren't being used in a networked context, we really need to think about what that is and bring it to the stage
  643. pfrazee

    In reply to this message

    My priority is entirely about enabling extension. What Iโ€™m investigating is what mechanisms we can use to make extensions and evolvability as successful as possible. My experience with very loosely defined schemas was that developers had a hard time adding fields because they had no good way to coordinate with other developers
  644. There are a lot of possible solutions here and the trick is finding something that gives real utility without needless downside. An overly constrained system can get in the way, but an overly loose one can leave developers to struggle with compat and create lots of buggy experiences
  645. Iโ€™m going to make an initial proposal soon so we can get some good discussion going, so we can suss out whatโ€™s useful and whatโ€™s just slowing devs down
  646. @gnu_ponut:matrix.org
    do you mean code?
  647. NeoDB joined the room
  648. Golda Velez
    (btw we are working on importing adx from an external sipmle node project) Cannot find package '@adx/common' imported from /Users/gv/use-adx/index.js
  649. no need to answer we are newbies to js/typescript/node :-)
  650. Golda Velez
    ah I think they are not supposed to be exported to external users yet!
  651. whyrusleeping
    Hrmโ€ฆ you should be able to import that. I think Daniel Holmgren will have to answer you though
  652. I dont know typescript very well either
  653. @lynn:the-apothecary.club

    In reply to this message

    in the words of Alan Kay

    For important negotiations we don't send telegrams, we send ambassadors. This is what objects are all about, and it continues to be amazing to me that the real necessities and practical necessities are still not at all understood. Bundling an interpreter for messages doesn't prevent the message from being submitted for other possible interpretations, but there simply has to be a process that can extract signal from noise.

    schemas and free JSON, or data, are both noise in the dark. It's useless if I make an extension if I also can't send the means to present itself.

Next group of messages