# go-fastdfs is a distributed file system based on http protocol. It is based on the design concept of avenue to simple. All the simple design makes its operation and expansion more simple. It has high performance, high reliability and no center. , maintenance-free and so on.
### Everyone is worried about such a simple file system. Is it not reliable, can it be used in a production environment? The answer is yes, it is efficient because it is simple, and it is stable because it is simple. If you are worried about the function, then run the unit test, if you are worried about the performance, then run the stress test, the project comes with it, run more confident ^_^.
Note: Please read this article carefully before using it, especially [wiki](https://github.com/sjqzhang/go-fastdfs/wiki)
- Support curl command upload
- Support browser upload
- Support HTTP download
- Support multi-machine automatic synchronization
- Support breakpoint download
- Support configuration automatic generation
- Support small file automatic merge (reduce inode occupancy)
- Support for second pass
- Support for cross-domain access
- Support one-click migration
- Support for parallel experience
- Support for breakpoint resuming ([tus](https://tus.io/))
- Support for docker deployment
- Support self-monitoring alarm
- Support image zoom
- Support google authentication code
- Support for custom authentication
- Support cluster file information viewing
- Use the universal HTTP protocol
- No need for a dedicated client (support wget, curl, etc.)
- class fastdfs
- High performance (using leveldb as a kv library)
- High reliability (design is extremely simple, using mature components)
- No center design (all nodes can read and write at the same time)
# advantage
- No dependencies (single file)
- Automatic synchronization
- Failure automatic repair
- Convenient maintenance by talent directory
- Support different scenarios
- Automatic file deduplication
- Support for directory customization
- Support to retain the original file name
- Support for automatic generation of unique file names
- Support browser upload
- Support for viewing cluster file information
- Support cluster monitoring email alarm
- Support small file automatic merge (reduce inode occupancy)
- Support for second pass
- Support image zoom
- Support google authentication code
- Support for custom authentication
- Support for cross-domain access
- Very low resource overhead
- Support for breakpoint resuming ([tus](https://tus.io/))
- Support for docker deployment
- Support for one-click migration (migrated from other system file systems)
- Support for parallel experience (parallel experience with existing file system, confirm OK and then one-click migration)
- Support token download token=md5(file_md5+timestamp)
- Easy operation and maintenance, only one role (unlike fastdfs has three roles Tracker Server, Storage Server, Client), the configuration is automatically generated
- Peer-to-peer (simplified operation and maintenance)
First, if it is mass storage, do not open the file token authentication function to reduce performance.
Second, try to use the standard upload, upload the business to save the path, and then connect the domain name when the business is used (convenient migration extension, etc.).
Third, if you use breakpoints to continue transmission, you must use the file id to replace the path storage after uploading (how to replace the QA/API document), to reduce performance for subsequent access.
Fourth, try to use physical server deployment, because the main pressure or performance comes from IO
Fifth, the online business should use the nginx+gofastdfs deployment architecture (the equalization algorithm uses ip_hash) to meet the later functional scalability (nginx+lua).
Sixth, the online environment is best not to use container deployment, the container is suitable for testing and functional verification.
Summary: The path of the file saved by the business reduces the conversion of the later access path, and the file access permission is completed by the service, so that the performance is the best and the versatility is strong (can be directly connected to other web servers).
Because there are too many people asking, answer here in unison.
The file location of go-fastdfs is different from other distributed systems. Its addressing is directly located without any components, so the approximate time complexity is o(1)[file path location]
There is basically no performance loss. The project also has a pressure test script. You can carry out the pressure test yourself. Don’t discuss the problem too much in the group. People reply to the same question every time.
Everyone will also feel that this group is boring.
- Files that have been stored using fastdfs can be migrated to go fastdfs (other migrations can be handled in a similar manner, and the speed experience is similar)?
The answer is yes, the problem you are worried about is the path change, go fastdfs considers this for you.
step:
First, download the latest version of go-fastdfs
Second, copy the original fastdfs file directory to the files directory of go-fastdfs (if there are a lot of files, you can reverse it, copy the fileserver, but keep the fileserver directory structure)
Third, set the enable_migrate to true
Note: All files in the files directory will be scanned during the migration process.
Slower, set enable_migrate to false after migration is complete
Note: The directory of go-fastdfs can not be changed, related to the synchronization mechanism, many students in the group, my files directory can not be customized, the answer is no.
As for whether or not I can use the soft link, I have not tested it and can test it myself.
Note: When the support_group_manage parameter in the configuration is set to true, group information is automatically added to all urls.
For example: http://10.1.5.9:8080/group/status
Default: http://10.1.5.9:8080/status
The difference: more group, corresponding to the group parameter in the configuration, so mainly to solve a Nginx reverse proxy multiple groups (cluster)
Please refer to the deployment diagram for details.
First, the use of the 1.2.6 version of the go-fastdfs
Second, set the auth_url parameter (provided by the application)
Third, the application implements the authentication permission interface (that is, the url of the second step), the parameter is auth_toke, ok, the authentication is passed, and the others are not passed.
Fourth, after the certification is passed, you can upload or download
No, the high availability of the cluster has been considered at the beginning of the design. In order to ensure the true availability of the cluster, it must be different for ip, ip can not use 127.0.0.1
Under normal circumstances, the cluster automatically synchronizes the repair files every hour. (The performance is poor, it is recommended to turn off automatic repair in case of massive)
What about the abnormal situation?
Answer: Manual synchronization (preferably at low peaks)
Http://172.16.70.123:7080/sync?date=20190117&force=1 (Note: To be executed on a server with many files, related to pushing to another server)
Parameter description: date indicates the data of the day of synchronization. force 1. indicates whether to force synchronization of all the day (poor performance), 0. means that only failed files are synchronized.
Second, copy gen_file.py to the files folder, generate a large number of files through python gen_file.py
Third, put benchmark.py outside the files directory (that is, the same level as the files directory), press the python benchmark.py (note the ip in the benchmark.py)
First use gen_file.py to generate a large number of files (note that if you want to generate large files, you can multiply the content by a large number)
First, the current code is still very simple, no need to make it too complicated.
Second, the individual understands that modularity is not modular when multiple files are separated. You can use the IDE to look at the code structure, which is actually modular.
The general block uploading must be supported by the client, and the diversity of the language is difficult for the client to maintain, but the function of the block uploading is necessary, and a simple implementation idea is provided for this.
Option One,
Split and merge with linux split cat, see the split and cat help.
Split: split -b 1M filename #1M per text
Merge: cat x* > filename #merge
Option II,
With hjsplit
Http://www.hjsplit.org/
Specific self-realization
third solution,
It is recommended to implement the hjsplit split merge function with go, which has cross-platform capabilities. (Unrealized, waiting for you to come....)
Option 4
Use the built-in relay function (using the protocol for resumable uploads protocol, [Details] (https://tus.io/))
Note: Option 4, you can only specify one upload server, do not support simultaneous writes, and the uploaded url has changed.
Original upload url: http://10.1.5.9:8080/<group>/upload
Md5=sum(file) The digest algorithm of the file should be consistent with the algorithm of the server (the algorithm supports md5|sha1). If it is a breakpoint, you can use the id of the file, which is the id after urlolad.
It is recommended that in the early planning, try to purchase a large-capacity machine as a storage server. If you want two copies, use two to form a cluster. If you want three copies.
Let the three units form a cluster. (Note that the best configuration for each server remains the same and uses raid5 disk arrays)
For the sake of simplicity and reliability, you can directly build a new cluster (build is to start the ./fileserver process, set the IP address of the peers, three or five minutes)
In the issue, chengyuansen suggested to me to use the increased capacity expansion feature. I feel that the complexity of the code logic and operation and maintenance is increased. I have not added this feature for the time being.