[META] Project discussion #4

Open
opened 2024-11-13 04:14:07 +00:00 by sometimesuseful · 8 comments
Collaborator

An issue for discussing the project at large. Feel free to close this issue or create your own instead.

An issue for discussing the project at large. Feel free to close this issue or create your own instead.
Author
Collaborator

The bulk of the code ought to be written in Python, it will be much simpler to structure it well, write an argument parser, add concurrency and so on.

It looks like Whonix has requests and aiohttp installed by default meaning either asyncio or threading is fine. I haven't checked Tails yet.

I do not plan on contributing the bulk of this code myself, but I am able to help. An ideal scenario would be for individual contributors to be able to write the code necessary to upload to a file host and for it to easily plug into the system. See contributing to yt-dlp for inspiration.

Also see @ebassi's critique of MAD for further arguments.

The bulk of the code ought to be written in Python, it will be much simpler to structure it well, write an argument parser, add concurrency and so on. It looks like Whonix has requests and aiohttp installed by default meaning either asyncio or threading is fine. I haven't checked Tails yet. I do not plan on contributing the bulk of this code myself, but I am able to help. An ideal scenario would be for individual contributors to be able to write the code necessary to upload to a file host and for it to easily plug into the system. See [contributing to yt-dlp](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#adding-support-for-a-new-site) for inspiration. Also see @ebassi's critique of MAD for further arguments.

The bulk of the code ought to be written in Python, it will be much simpler to structure it well, write an argument parser, add concurrency and so on.

I agree, Python is generally better for a large project like this. We should use Python whenever possible, and Bash only when necessary. I wrote the archive script in Bash because as far as I am aware, Tails does not come with any Python library capable of creating 7z archives.

It looks like Whonix has requests and aiohttp installed by default meaning either asyncio or threading is fine. I haven't checked Tails yet.

For networking it seems we have a lot of choices. I will make a separate issue about that.

An ideal scenario would be for individual contributors to be able to write the code necessary to upload to a file host and for it to easily plug into the system

Agreed. This is one of the failings of the MAD script, each host seems to require hundreds of lines of code when it should be as simple as configuring a few variables as in your yt-dlp example.

>The bulk of the code ought to be written in Python, it will be much simpler to structure it well, write an argument parser, add concurrency and so on. I agree, Python is generally better for a large project like this. We should use Python whenever possible, and Bash only when necessary. I wrote the archive script in Bash because as far as I am aware, Tails does not come with any Python library capable of creating 7z archives. >It looks like Whonix has requests and aiohttp installed by default meaning either asyncio or threading is fine. I haven't checked Tails yet. For networking it seems we have a lot of choices. I will make a separate issue about that. >An ideal scenario would be for individual contributors to be able to write the code necessary to upload to a file host and for it to easily plug into the system Agreed. This is one of the failings of the MAD script, each host seems to require hundreds of lines of code when it should be as simple as configuring a few variables as in your yt-dlp example.
Author
Collaborator

An idea I had was to have multiple stages, prep, upload, post, update and all. Names can be changed.

prep would create previews and package the file and could be used as a standalone program if one wishes to.

upload would upload previews and archives to hosts and create a machine readable "post" file in something like YAML or JSON.

post would parse the post file and create the template ready to be copy pasted to forums.

update would parse the post file and re-upload any links that are down.

all would run steps prep to post.

Let me know what you think, or what you have in mind.


Tails does not come with any Python library capable of creating 7z archives.

subprocess can be used for this: subprocess.run(["7z", *args]).


I am excited to continue, but will wait for you to expand on what this project will look like (in actual code). If you work on the main glue, it will be easier for me to develop "modules".

As a side note, would it be safe to spoof the commit dates to hide when we're working on this project except for when pushing to the repository? I'm not sure if this would leak times anyway.

GIT_COMMITTER_DATE="Jan 1 1970 00:00:00" GIT_AUTHOR_DATE="Jan 1 1970 00:00:00" git commit ...

An idea I had was to have multiple stages, `prep`, `upload`, `post`, `update` and `all`. Names can be changed. `prep` would create previews and package the file and could be used as a standalone program if one wishes to. `upload` would upload previews and archives to hosts and create a machine readable "post" file in something like YAML or JSON. `post` would parse the post file and create the template ready to be copy pasted to forums. `update` would parse the post file and re-upload any links that are down. `all` would run steps prep to post. Let me know what you think, or what you have in mind. ___ > Tails does not come with any Python library capable of creating 7z archives. subprocess can be used for this: `subprocess.run(["7z", *args])`. ___ I am excited to continue, but will wait for you to expand on what this project will look like (in actual code). If you work on the main glue, it will be easier for me to develop "modules". As a side note, would it be safe to spoof the commit dates to hide when we're working on this project except for when pushing to the repository? I'm not sure if this would leak times anyway. `GIT_COMMITTER_DATE="Jan 1 1970 00:00:00" GIT_AUTHOR_DATE="Jan 1 1970 00:00:00" git commit ...`

As a side note, would it be safe to spoof the commit dates to hide when we're working on this project except for when pushing to the repository? I'm not sure if this would leak times anyway.

Good idea. I've added these lines to my .zshrc so it will automatically apply to all git commits:

d="$(date -d '@0')"
export GIT_COMMITTER_DATE="$d"
export GIT_AUTHOR_DATE="$d"

I've tested it and it seems to work.

An idea I had was to have multiple stages, prep, upload, post, update and all. Names can be changed.

Agreed, this is the best way to do it. Especially the 'update' which automatically checks and re-uploads links. I believe that will be one of the most useful features of this program.

subprocess can be used for this: subprocess.run(["7z", *args])

While I am aware of this and I know it was used, for example, by Fylippsi in his cppacker script, to me it feels like code smell. Calling various programs from the command line is the 'wrong' way to use Python compared to using Python-specific libraries. It also makes the script no longer cross-platform, though in our case that may not be a big problem since we are only targeting Tails and Whonix (with possible support for other Linux distros). I would only use the subprocess as a last resort if there is no way to achieve it with Python itself.

>As a side note, would it be safe to spoof the commit dates to hide when we're working on this project except for when pushing to the repository? I'm not sure if this would leak times anyway. Good idea. I've added these lines to my `.zshrc` so it will automatically apply to all git commits: ``` d="$(date -d '@0')" export GIT_COMMITTER_DATE="$d" export GIT_AUTHOR_DATE="$d" ``` I've tested it and it seems to work. >An idea I had was to have multiple stages, prep, upload, post, update and all. Names can be changed. Agreed, this is the best way to do it. Especially the 'update' which automatically checks and re-uploads links. I believe that will be one of the most useful features of this program. >subprocess can be used for this: `subprocess.run(["7z", *args])` While I am aware of this and I know it was used, for example, by Fylippsi in his cppacker script, to me it feels like code smell. Calling various programs from the command line is the 'wrong' way to use Python compared to using Python-specific libraries. It also makes the script no longer cross-platform, though in our case that may not be a big problem since we are only targeting Tails and Whonix (with possible support for other Linux distros). I would only use the subprocess as a last resort if there is no way to achieve it with Python itself.
Author
Collaborator

Good idea. I've added these lines to my .zshrc so it will automatically apply to all git commits

I'll do the same. Let's hope it doesn't break something or still leak file modification dates and whatnot. There's a lot that could be going on that's not obvious at first.

Calling various programs from the command line is the 'wrong' way to use Python compared to using Python-specific libraries.

Yes, I agree. I think we should make the decision now to support only things that are already shipped with Tails and Whonix, or pull libraries that we need ourselves for development such as pillow, or PyYAML. Consider the target audience, how many people will run a CLI suite like autoshare but couldn't pip install -r requirements.txt?

We should keep external dependencies to a minimum, but some things would be helpful. However if we do pull external dependencies, I think we should skip Bash altogether. We're early enough in the development process that throwing everything out isn't a huge loss.


About YAML, we could use it as a config file and for the data file that will be parsed later. It's much more pleasant to work with than JSON in my opinion.

I'm reluctant to take the lead right now, but given a few more weeks I could write an argument parser, stubs for loading a config file and all the necessary functionality myself, then we implement what's needed together. But as stated, I'm too busy to do this right now.

We could use a config file for choosing hosts, filtering files (exclude files with preview in the name, Thumbs.db, etc), choosing output directory and more without supplying endless command line arguments.


I would only use the subprocess as a last resort if there is no way to achieve it with Python itself.

Worst case scenario, we catch ImportError and fallback to Bash, but we cannot easily use a Python logger in Bash modules.

> Good idea. I've added these lines to my .zshrc so it will automatically apply to all git commits I'll do the same. Let's hope it doesn't break something or still leak file modification dates and whatnot. There's a lot that could be going on that's not obvious at first. > Calling various programs from the command line is the 'wrong' way to use Python compared to using Python-specific libraries. Yes, I agree. I think we should make the decision now to support only things that are already shipped with Tails and Whonix, or pull libraries that we need ourselves for development such as pillow, or PyYAML. Consider the target audience, how many people will run a CLI suite like autoshare but couldn't `pip install -r requirements.txt`? We should keep external dependencies to a minimum, but some things would be helpful. However if we do pull external dependencies, I think we should skip Bash altogether. We're early enough in the development process that throwing everything out isn't a huge loss. ___ About YAML, we could use it as a config file and for the data file that will be parsed later. It's much more pleasant to work with than JSON in my opinion. I'm reluctant to take the lead right now, but given a few more weeks I could write an argument parser, stubs for loading a config file and all the necessary functionality myself, then we implement what's needed together. But as stated, I'm too busy to do this right now. We could use a config file for choosing hosts, filtering files (exclude files with preview in the name, Thumbs.db, etc), choosing output directory and more without supplying endless command line arguments. ___ > I would only use the subprocess as a last resort if there is no way to achieve it with Python itself. Worst case scenario, we catch `ImportError` and fallback to Bash, but we cannot easily use a Python logger in Bash modules.

I'll do the same. Let's hope it doesn't break something or still leak file modification dates and whatnot. There's a lot that could be going on that's not obvious at first.

I would imagine file modification times will still be visible, and obviously we can't hide what time we push to Topic Git. But I do not think it's that bad. After all, it is not much different from post times being visible every time we post on a forum or even in the issue tracker here. Even in a worst case scenario where commit times are visible, it is still not that bad.

Yes, I agree. I think we should make the decision now to support only things that are already shipped with Tails and Whonix, or pull libraries that we need ourselves for development such as pillow, or PyYAML. Consider the target audience, how many people will run a CLI suite like autoshare but couldn't pip install -r requirements.txt?
We should keep external dependencies to a minimum, but some things would be helpful. However if we do pull external dependencies, I think we should skip Bash altogether. We're early enough in the development process that throwing everything out isn't a huge loss.

My original plan was to make it work in Tails without the need for additional software, and I still believe this is a very desirable goal. Some people prefer to use Tails without persistent storage, and even with persistent storage enabled it is inconvenient to install new software - I know from personal experience that the installation process tends to fail randomly. I also do not know whether the Tails Additional Software works if installing packages with pip. On the other hand, installing new software in Whonix is as easy as any other distro, so I don't have a problem forcing Whonix users to install additional packages.

So I would suggest making it work by default in Tails, but not necessarily in Whonix.

I'm reluctant to take the lead right now, but given a few more weeks I could write an argument parser, stubs for loading a config file and all the necessary functionality myself, then we implement what's needed together. But as stated, I'm too busy to do this right now.

That is fine. Personally I was planning to develop the modules first, and then simply 'glue' them together with a lightweight main program afterwards. All the main program needs to do is parse a config file, parse the user's input, and check for dependencies. Then it delegates the work to all the modules.

What I definitely want to avoid is another cppacker or MAD situation - a single script containing thousands of lines of code. In my opinion that is a good example of what not to do, which is one of the reasons I chose the modular approach. Keep each module and the main script to a manageable size (i.e., less than 400 LOC) so that users can easily audit them. In fact, one of the reasons I started this project is in response to MAD and ebassi's criticism of it - I wanted to lead by example and show that useful scripts can be kept to a reasonable size.

The other reason I chose the modular approach is because I want users to be able to replace any part of the script with their own version. For example, there are many preview scripts around and each user seems to have a script that he personally prefers. I would like to make it easy to use any preview script as a drop-in replacement for the default. Then our program will provide an acceptable preview script for completeness, but it does not have to be perfect or meet every single possible use case.

I will keep working on the modules as originally planned, and when you have time you can work on the main script which seems to be the part you're most interested in.

>I'll do the same. Let's hope it doesn't break something or still leak file modification dates and whatnot. There's a lot that could be going on that's not obvious at first. I would imagine file modification times will still be visible, and obviously we can't hide what time we push to Topic Git. But I do not think it's that bad. After all, it is not much different from post times being visible every time we post on a forum or even in the issue tracker here. Even in a worst case scenario where commit times are visible, it is still not *that* bad. >Yes, I agree. I think we should make the decision now to support only things that are already shipped with Tails and Whonix, or pull libraries that we need ourselves for development such as pillow, or PyYAML. Consider the target audience, how many people will run a CLI suite like autoshare but couldn't pip install -r requirements.txt? >We should keep external dependencies to a minimum, but some things would be helpful. However if we do pull external dependencies, I think we should skip Bash altogether. We're early enough in the development process that throwing everything out isn't a huge loss. My original plan was to make it work in Tails without the need for additional software, and I still believe this is a very desirable goal. Some people prefer to use Tails without persistent storage, and even with persistent storage enabled it is inconvenient to install new software - I know from personal experience that the installation process tends to fail randomly. I also do not know whether the Tails Additional Software works if installing packages with pip. On the other hand, installing new software in Whonix is as easy as any other distro, so I don't have a problem forcing Whonix users to install additional packages. So I would suggest making it work by default in Tails, but not necessarily in Whonix. >I'm reluctant to take the lead right now, but given a few more weeks I could write an argument parser, stubs for loading a config file and all the necessary functionality myself, then we implement what's needed together. But as stated, I'm too busy to do this right now. That is fine. Personally I was planning to develop the modules first, and then simply 'glue' them together with a lightweight main program afterwards. All the main program needs to do is parse a config file, parse the user's input, and check for dependencies. Then it delegates the work to all the modules. What I definitely want to avoid is another cppacker or MAD situation - a single script containing thousands of lines of code. In my opinion that is a good example of what *not* to do, which is one of the reasons I chose the modular approach. Keep each module and the main script to a manageable size (i.e., less than 400 LOC) so that users can easily audit them. In fact, one of the reasons I started this project is in response to MAD and ebassi's criticism of it - I wanted to lead by example and show that useful scripts can be kept to a reasonable size. The other reason I chose the modular approach is because I want users to be able to replace any part of the script with their own version. For example, there are many preview scripts around and each user seems to have a script that he personally prefers. I would like to make it easy to use any preview script as a drop-in replacement for the default. Then our program will provide an acceptable preview script for completeness, but it does not have to be perfect or meet every single possible use case. I will keep working on the modules as originally planned, and when you have time you can work on the main script which seems to be the part you're most interested in.
Author
Collaborator

You've convinced me that it should work in Tails without having to install any additional software. Can you confirm that python3-requests is installed by default? I will use TOML for the config file and JSON for the machine-readable data both of which are in the standard library. However, requests will be necessary for uploads.

I don't have a LOC approximation yet, but I think what I have in mind will end up being a fairly large project from the "meta" side, although certainly smaller than 3000 LOC, probably less than 2000 too. There will be zero repeated code, unnecessary functionality can be kept to a minimum, the project structure will be sensible. Good software development practices will be adhered, simply put. However, I still imagine something akin to a build system but for content posting which will require at minimum a few hundred lines of code.


I've decided that I will develop this under a different name and create my own repo when I have something put together. You can decided then if it fits your vision, and we can work together from there. Worst case, we end up with two new projects and we can both contribute to each other's and steal bits of code as needed. I know that I would want help with the preview scripts and image/file hosts.

When I've finished a base I plan on developing a plugin system that makes already existing Bash scripts useful similar to what you're saying. However, slight modification will likely be needed for them to fit with what I have in mind. I imagine that they must at least echo some parsable values by the end. That is all for later, but when I'm finished with that there should be a foundation strong enough to start collaborating with other people.

With that said, I want to reiterate that this is going to be far simpler than MAD or similar projects but not simple enough that someone could eye it over for a few minutes and tell that it's safe due to the nature of what I have in mind. Perhaps "eye it over for a few minutes" is a good goal to aim for with each individual file.


Unrelated to this project, but would you like to start a "community-scripts" repo and upload all the scripts you have collected presently? I would like to study their code and I only have very few saved myself.

You've convinced me that it should work in Tails without having to install any additional software. Can you confirm that python3-requests is installed by default? I will use TOML for the config file and JSON for the machine-readable data both of which are in the standard library. However, requests will be necessary for uploads. I don't have a LOC approximation yet, but I think what I have in mind will end up being a fairly large project from the "meta" side, although certainly smaller than 3000 LOC, probably less than 2000 too. There will be zero repeated code, unnecessary functionality can be kept to a minimum, the project structure will be sensible. Good software development practices will be adhered, simply put. However, I still imagine something akin to a build system but for content posting which will require at minimum a few hundred lines of code. ___ I've decided that I will develop this under a different name and create my own repo when I have something put together. You can decided then if it fits your vision, and we can work together from there. Worst case, we end up with two new projects and we can both contribute to each other's and steal bits of code as needed. I know that I would want help with the preview scripts and image/file hosts. When I've finished a base I plan on developing a plugin system that makes already existing Bash scripts useful similar to what you're saying. However, slight modification will likely be needed for them to fit with what I have in mind. I imagine that they must at least echo some parsable values by the end. That is all for later, but when I'm finished with that there should be a foundation strong enough to start collaborating with other people. With that said, I want to reiterate that this is going to be far simpler than MAD or similar projects but not simple enough that someone could eye it over for a few minutes and tell that it's safe due to the nature of what I have in mind. Perhaps "eye it over for a few minutes" is a good goal to aim for with each individual file. ___ Unrelated to this project, but would you like to start a "community-scripts" repo and upload all the scripts you have collected presently? I would like to study their code and I only have very few saved myself.

Can you confirm that python3-requests is installed by default?

Yes, Tails comes with the requests library with socks support by default.

I've decided that I will develop this under a different name and create my own repo when I have something put together. You can decided then if it fits your vision, and we can work together from there. Worst case, we end up with two new projects and we can both contribute to each other's and steal bits of code as needed. I know that I would want help with the preview scripts and image/file hosts.

That is probably for the best. I also began to feel that we have strongly different visions on what this project should be, so separate projects seems like the right way forward. But yes, certainly you're free to use any of my code and I might borrow code from your project too. I'm aiming for a hard limit of 2000 lines of code for my entire program, which I think is a reasonable target for any on-topic script.

Unrelated to this project, but would you like to start a "community-scripts" repo and upload all the scripts you have collected presently? I would like to study their code and I only have very few saved myself.

Unfortunately I don't normally collect other people's scripts. But there are plenty of scripts available on Kitty & Mimmy's Playground, Naughty Kids 2 and Olympus.

>Can you confirm that python3-requests is installed by default? Yes, Tails comes with the requests library with socks support by default. >I've decided that I will develop this under a different name and create my own repo when I have something put together. You can decided then if it fits your vision, and we can work together from there. Worst case, we end up with two new projects and we can both contribute to each other's and steal bits of code as needed. I know that I would want help with the preview scripts and image/file hosts. That is probably for the best. I also began to feel that we have strongly different visions on what this project should be, so separate projects seems like the right way forward. But yes, certainly you're free to use any of my code and I might borrow code from your project too. I'm aiming for a hard limit of 2000 lines of code for my entire program, which I think is a reasonable target for any on-topic script. >Unrelated to this project, but would you like to start a "community-scripts" repo and upload all the scripts you have collected presently? I would like to study their code and I only have very few saved myself. Unfortunately I don't normally collect other people's scripts. But there are plenty of scripts available on Kitty & Mimmy's Playground, Naughty Kids 2 and Olympus.
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
PedoDeveloper/autoshare#4
No description provided.