·

The Application of AI Spark Big Model in Natural Language Processing (NLP)

Published at 2024-07-20 00:47:13Viewed 477 times
Academic article
·
Introduction
Please reprint with source link

Introduction

Text analysis is one of the most basic and essential processes in Natural Language Processing (NLP), and it entails extracting valuable insights and information from text data (Cecchini, 2023). With the increasing complexity and volume of text data, it is crucial to focus on the efficiency and scalability of the methods. Cecchini (2023) defines AI Spark NLP as a high-performance Python library based on Apache Spark that provides a complete solution for text data processing. Apache Spark is an open-source framework used to manage and process data in machine-learning tasks and has several benefits that make it suitable for machine learning (Tiwari, 2023). This paper aims to discuss the main features and uses of the AI Spark Big Model that allow for the generation of meaningful data by explicitly focusing on Apache Spark, as it provides a robust and distributed computing framework.

Key Features of the Apache Spark

Apache Spark is an open-source cluster computing framework used for big data workloads. This tool was designed to address the shortcomings of MapReduce by processing in memory, minimizing the number of phases in a task, and reusing data in parallel operations (Tang et al., 2020). According to the Survey Point Team (2023), Apache Spark is more effective compared to MapReduce as it promotes efficient use of resources, enables tasks to be performed concurrently thus resulting to an accelerated data processing.  This tool reuses data using an in-memory cache to significantly accelerate machine learning algorithms that invoke a function on the same data multiple times (Adesokan, 2020). Data reuse is done by creating DataFrames, an abstraction over Resilient Distributed Dataset (RDD), an object collection cached in memory and reused in multiple Spark operations. This greatly minimizes the delay, and therefore makes Spark several times faster than MapReduce, especially when conducting machine learning and interactive analysis.

Apache Spark provides Java, Scala, python, and R language high-level application programming interface and besides the in-memory couching it highly optimizes the execution of queries for fast analytic queries of any size of present data (Gour, 2018). Spark has an optimized engine that performs a general graph of computations also. It also contains a set of high-level tools for working with structured data, machine learning, work with graphs and streams data. From this, Apache Spark model comprises three primary components: Spark Core, Spark SQL, Spark Streaming, Machine Learning, and Graph Processing (Stan et al. , 2019). These components include Spark Streaming, Spark SQL, park Core, MLib, Graph X & Spark R.

Apache Spark has considerable features that makes it outstanding over other big data processing tools. First, the tool is fault-tolerant, and therefore it will have effective outcome when working with the failure of the worker node (Stan et al. , 2019). Apache Spark does this fault tolerance as it works with Directed Acyclic Graphs (DAGs) and Resilient Distributed Datasets (RDDs). Each new action and transformation made on a specific task is stored in the DAG, and in case some of the worker nodes fail, the same transformations can be commented on by the DAG to provide the same results (Rajpurohit et al., 2023). The second characteristic of the AI Apache Spark model is that it is constantly evolving. Salloum et al., 2016 explain that Spark has a dynamic nature with over 80 high-level operators that will assist in developing parallel applications (Salloum et al., 2016). Another unique of Spark is that it is slow in evaluation. On the contrary, transformation is just created and inserted into the DAG, and the final computation or result only occurs whenever actions are called (Salloum et al., 2016). This slow evaluation allows Spark to make an optimization decision for its transformations, and every operation becomes visible to the Spark engine for optimization before any action is taken, which is beneficial for optimizing data processing tasks.

Another important aspect of this tool is the real-time streaming processing that enables users to write streaming jobs the same way as batch jobs (Sahal et al., 2020). This real-time capability, along with the speed of Spark, makes the applications running on Hadoop run up to 100 times faster in memory and up to 10 times faster on disk by avoiding disk read/write operations for intermediate results (Sahal et al., 2020). Moreover, Spark's reusability enables the same code for batch processing, joining the stream against the historical data and running queries on the stream states. Spark also has better analytical tools; it has machine learning and graph processing libraries that organizations in various industries apply to solve complex problems with the help of tools like Databricks (Stan et al., 2019). The in-memory computing of the model also improves the performance by computing tasks in memory and storing the results for iterative computations. Spark has interfaces for Java, Scala, Python, and R for data analysis and SparkSQL for SQL operations (Stan et al., 2019). Spark can be combined with Hadoop, allowing it to read and write data to HDFS and various file formats, making it ideal for various inputs and outputs. Spark is an open-source software that does not have license fees and is cheaper. It has all the features of stream processing, machine learning, and graph processing integrated into the software and does not have vendor lock-in.

Spark NLP is the fastest open-source NLP library. Steller (2024) states that Spark NLP is 38 and 80 times faster than spaCy while having the same accuracy in training custom models. Spark NLP is the only open-source library that can use a distributed Spark cluster. Spark NLP is a native Spark ML library that works on data frames, which are the native data structures of Spark. Therefore, speedups on a cluster lead to yet another order of magnitude of performance improvement (Steller, 2024). In addition to high performance, Spark NLP provides the best accuracy for increasing NLP applications. The Spark NLP team always updates itself with the current literature and churns out the best models.

The Application of Spark Big Model in NLP

1. Sentiment Analysis

One of the tasks that the Apache Spark model conducts during sentiment analysis is data processing and preparation. (Zucco et al. (2020)) assert that sentiment Analysis has become one of the most effective tools that allow companies to leverage social sentiment related to their brand, product, or service. It is natural for humans to identify the emotional tones from the text. However, when it comes to large-scale text data preprocessing, Apache Spark is the best fit for the job due to its efficiency in handling big data (Verma et al., 2020). This capability is critical in AI and machine learning since preprocessing is a significant step. Spark's distributed computing framework enables it to tokenize text data, breaking down the text data into manageable units of words or tokens. The general process of stemming can also be carried out in Spark after tokenization to bring the words to their base or root form, which helps normalize the text data. The other significant preprocessing task is feature extraction, which essentially involves converting text into formats that machine learning algorithms can work on. This is because by distributing the above operations in a cluster by Spark, the preprocessing tasks are done in parallel, improving scalability and performance (Shetty, 2021). This parallelism reduces time and allows you to handle large data sets that would only be possible through conventional single-node processing frameworks. Therefore, applying Spark for text data preprocessing ensures organizations are ready with their data before feeding it to the machine learning and AI model for further training, especially since more and more applications are dealing with large volumes of data.

The second activity that the Apache Spark Model carries out in sentiment analysis is the feature engineering activity. Dey (2024) argues that PySpark is an open-source, large-scale framework to process data developed based on Apache Spark. It provides many functions and classes in data cleaning, summarization, transformation, normalization, feature engineering, and model construction. Besides, Apache Spark’s MLlib stands as a stable environment to perform feature exaction and transformation for its ML algorithms and is important in regards to feature engineering for NLP. The first of these techniques is the TF-IDF, or Term Frequency-Inverse Document Frequency, which transforms textual data into a set of numbers based on the words’ frequency and the exact words’ frequency in a set of documents (Sintia et al. , 2021). This helps to decide the significance of all the words and is rather important for reducing the impact of stop words that is, words that frequently appear very often, but contribute least to meaningful analysis. Further, vocabularies such as Word2Vec generate ordered vectors of the words considering the semantics of the word that is defined by the content of the text. Word2Vec will map similar words in vector space which will enhance the general knowledge of the model as regards the language. Spark’s MLlib assists in the conversion of the raw text into vectors, and this assists in coming up with enhanced and accurate machine learning models particularly in tasks such as, sentiment analysis of textual data.

Apache Spark Model is also applied in training and evaluation for sentiment analysis. Apache Spark is particularly appropriate when training sentiment analysis models due to the availability of many algorithms, including basic ones, such as logistic regression and decision trees, and complex ones, like LSTM networks (Raviya & Vennila, 2021). These models can be trained in parallel across multiple nodes with Spark distributed computing, which erases the timeliness associated with single-machine computation. This parallelization is most useful when the training set is significant because it allows for fully utilizing computational capacity and shortens the training time. In MLlib of Spark, we get the reliable version of each of these algorithms, and much more, the data scientist can switch between these models based on the problem's complexity and the task's requirement (Raviya & Vennila, 2021). Besides, Spark provides cross and other performance characteristics as integrated tools for the model check, thus enabling estimates and improvements to the given models for their high accuracy and good generalisability. It is demonstrated that Spark can be used for training and testing large-scale sentiment analysis models, which is beneficial for organizations since Spark is naturally distributed.

2. Machine Translation

Apache Spark remains very useful in managing large-scale bilingual corpora required to conduct machine translation tasks and train the models. The added advantage of performing complex tasks is that Apache Spark is a distributed computing environment. Spark synchronizes the bilingual sentence pairs in data correspondence, a vital process in corpora alignment, and is also utilized in machine translation models to learn the correct translations (Cutrona, 2021). Notably, all these alignment tasks can be paralleled using Spark and distributed DataFrames and RDDs, significantly accelerating the process. Tokenization is done by segmenting text into words or subwords, and this process is made faster possible due to Spark's ability to partition the data and distribute it across the nodes, especially when working with extensive data. Besides, all cleaning procedures, such as making the text lowercase and handling special characters, are performed using Spark's functions and utilities. Spark distributes these preprocessing operations to ensure that the data is prepared in the best way and in the shortest time possible for subsequent training of machine translation models using frameworks such as TensorFlow or PyTorch integrated with Spark using libraries such as Apache Spark MLlib and TensorFlowOnSpark.

Apache Spark enhances the training of NMT models and other complicated architectures, such as sequence-to-sequence models with attention mechanisms due to distributed computing (Prat et al., 2020). Spark can be interfaced on deep learning frameworks like TensorFlow, Keras and PyTorch which helps in the division of computations by nodes in a cluster. This distribution is made possible by Spark’s RDDs and DataFrames used in hosting and processing of big data. It distributes the input sequences, gradients, and model parameters across the nodes during training, which, unlike using one machine, is faster and can train large datasets, something which isn’t possible on one machine. Nevertheless, Spark can be connected to GPU clusters with the help of such libraries as TensorFlowOnSpark or BigDL which can improve the training process in conjunction with the hardware acceleraton (Lunga et al. , 2020). Thus, organizations can cut the training time and improve the models for that to get higher accuracy of translation. This capability is very essential to build accurate NMT systems which can generate the correct translations, which are of relevance in communication applications and document translation.

3. Text Generation

Apache Spark is used to train many language generation models for text generation tasks like RNNs and the latest transformer model like GPT (Myers et al. , 2024). The first benefit that comes with the use of Spark is that this is a distributed computing system that enhances the rates of training since the computations will be done in parallel across the nodes of the cluster. This distributed approach significantly cuts the time required to train large and complex models and allows for processing large datasets that cannot be processed on a single machine. According to Myers et al. , 2024, Spark's solid ground and effectiveness ensure efficient and effective utilization of resources and the possibility of increasing the training of language models that are contextually appropriate and capable of generating semantically coherent and meaningful text.

Further, Apache Spark is also beneficial when processing enormous data quantities needed for the language model's training due to the distributed computing aspect. This efficiency starts with data loading in Spark, which can read extensive text data in parallel from different sources, hence shortening the time it takes to load data (Myers et al. , 2024). Some other operation that is done before feeding the text data to the models such as tokenization, normalization, and feature extraction operated in parallel with all the nodes to make the text data ready for modeling efficiently. During the training phase, the DataFrame function provided in Spark leads to distributing the computations hence enable management of large data. It enables one to train complex language models for example RNNs and Transformers without having to worry about memory or processing time wastage. Also, Spark’s framework allows distributed model assessment so that the performance metrics and the validation checks can also be calculated on the distributed data at once making it correct. It can increase the scale of the entire text generation process, including data loading- preprocessing- and model testing of textual data making spark fit for large scale NLP tasks.

Conclusion

Apache Spark has proven to be effective tool the management and processing of data compared to other tools. It uses language models which generate text in real time to enable functions such as chat bots, content generation, and auto generation of reports. This is well supported by Spark's in-memory computing, which allows models to read and process data without the delay of disk I/O operations. Spark also optimizes memory to cache intermediate results and other frequently used data so that the text generation tasks can be completed with fast response time, thus giving the users a smooth experience. This high-performance environment is suitable for the real-time needs of interactive applications, which makes it possible to provide timely and relevant text outputs that will be useful to users. With these capabilities, Spark enables the realistic application of state-of-the-art text generation technologies in different use cases. Spark NLP: The Functionality Spark NLP has Python, Java, and Scala libraries that contain all the features of the traditional NLP libraries like spaCy, NLTK, Stanford CoreNLP, and Open NLP. Spark NLP has other features like spell check, sentiment analysis, and document categorization. Spark NLP is more advanced than previous attempts because it offers the best accuracy, speed, and scalability.

References

  1. Adesokan, A. (2020). Performance Analysis of Hadoop MapReduce And Apache Spark for Big Data.
  2. Cecchini, D. (2023). Scaling up text analysis: Best practices with Spark NLP n-gram generation Medium. Available at: https://medium.com/john-snow-labs/scaling-up-text-analysis-best-practices-with-spark-nlp-n-gram-generation-b8292b4c782d
  3. Cutrona, V. (2021). Semantic Table Annotation for Large-Scale Data Enrichment.
  4. Dey, R. (2014). Feature engineering in PySpark: Techniques for data transformation and model improvement. Medium. https://medium.com/@roshmitadey/feature-engineering-in-pyspark-techniques-for-data-transformation-and-model-improvement-30c0cda4969f#:~:text=Introduction%20to%20Feature%20Engineering&text=PySpark%2C%20built%20on%20top%20of,%2C%20transformation%2C%20and%20model%20building.
  5. Gour, R. (2018). Apache Spark Ecosystem — Complete Spark Components Guide. Medium. https://medium.com/@rinu.gour123/apache-spark-ecosystem-complete-spark-components-guide-f3b57893173e
  6. Lunga, D., Gerrand, J., Yang, L., Layton, C., & Stewart, R. (2020). Apache Spark accelerated deep learning inference for large-scale satellite image analytics. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, pp. 13, 271–283.
  7. Myers, D., Mohawesh, R., Chellaboina, V. I., Sathvik, A. L., Venkatesh, P., Ho, Y. H., ... & Jararweh, Y. (2024). Foundation and large language models: fundamentals, challenges, opportunities, and social impacts. Cluster Computing, 27(1), 1–26.
  8. Prats, D. B., Marcual, J., Berral, J. L., & Carrera, D. (2020). Sequence-to-sequence models for workload interference. arXiv preprint arXiv:2006.14429.
  9. Rajpurohit, A. M., Kumar, P., Kumar, R. R., & Kumar, R. (2023). A Review on Apache Spark. Kilby, 100, 7th.
  10. Raviya, K., & Vennila, M. (2021). An implementation of hybrid enhanced sentiment analysis system using spark ml pipeline: an extensive data analytics framework. International Journal of Advanced Computer Science and Applications, 12(5).
  11. Shetty, S. D. (2021, March). Sentiment analysis, tweet analysis, and visualization of big data using Apache Spark and Hadoop. In IOP Conference Series: Materials Science and Engineering (Vol. 1099, No. 1, p. 012002). IOP Publishing.
  12. Sintia, S., Defit, S., & Nurcahyo, G. W. (2021). Product Codification Accuracy With Cosine Similarity And Weighted Term Frequency And Inverse Document Frequency (TF-IDF). Journal of Applied Engineering and Technological Science, 2(2), 14–21.
  13. Stan, C. S., Pandelica, A. E., Zamfir, V. A., Stan, R. G., & Negru, C. (2019, May). Apache spark and Apache ignite performance analysis. In 2019, the 22nd International Conference on Control Systems and Computer Science (CSCS) (pp. 726-733). IEEE.
  14. Steller, M. (2024). Large-scale custom natural language processing (NLP). Microsoft. Available at: https://learn.microsoft.com/en-us/azure/architecture/ai-ml/idea/large-scale-custom-natural-language-processing
  15. Survey Point Team (2023). 7 Powerful Benefits of Choosing Apache Spark: Supercharge Your Data, https://surveypoint.ai/knowledge-center/benefits-of-apache-spark/#:~:text=The%20parallel%20processing%20architecture%20of,choice%20for%20handling%20large%20datasets.
  16. Tang, S., He, B., Yu, C., Li, Y., & Li, K. (2020). A survey on spark ecosystem: Big data processing infrastructure, machine learning, and applications. IEEE Transactions on Knowledge and Data Engineering, 34(1), 71-91.
  17. Verma, D., Singh, H., & Gupta, A. K. (2020). A study of big data processing for sentiments analysis.
  18. Zucco, C., Calabrese, B., Agapito, G., Guzzi, P. H., & Cannataro, M. (2020). Sentiment analysis for mining texts and social networks data: Methods and tools. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 10(1), e1333.
  19. Tiwari, R. (2023). Simplifying data handling in machine learning with Apache Spark. Medium. Available at: https://medium.com/@NLPEngineers/simplifying-data-handling-for-machine-learning-with-apache-spark-e09076d0256e
  20. Salloum, S., Dautov, R., Chen, X., Peng, P. X., & Huang, J. Z. (2016). Big data analytics on Apache Spark. International Journal of Data Science and Analytics, 1, 145-164.
  21. Sahal, R., Breslin, J. G., & Ali, M. I. (2020). Big data and stream processing platforms for Industry 4.0 requirements mapping for a predictive maintenance use case. Journal of manufacturing systems, 54, 138-151.

0 人喜欢

Comments

There is no comment, let's add the first one.

弦圈热门内容

(✅已修复)弦圈编辑器上传图片已知bug,未知原因导致上传图片模糊

这是个很早之前就存在的bug了,一直都没解决,并且我也不知道具体是什么原因。原本这个bug也只是偶尔发生,已经有很长一段时间没有再遇过了。结果今天我写Android Studio安装教程 - Windows版这篇文章的时候,又遇到这个问题了。刚刚上传前面几张图片的时候一切正常,直到上传最后几张图片时,我发现上传的照片又很糊了,并且无论上传多少次,按多少比例截图,最后结果都一样。而且无论是复制来的图片,还是直接上传图片,都一样。见下面几张图这些图片都是原图很清晰,但上传后就很糊了。目前已确定问题发生在Django后端,具体原因仍在排查当中......更新:经过排查,确定问题跟图片模型有关系,我使用了django-resized的ResizedImageField来将上传的图片转化为webp格式。一般正常使用都是没事的,但是有些时候上传图片的次数多了以后,就会出现这种bug。为什么会出现这个问题,即便我看过django-resized源代码也不清楚,搞不好是pillow的问题?但我觉得应该跟它无关。最后我将ResizedImageField改回了ImageField,然后自己用pillow ...

Android Studio安装教程 - Windows版

如果你想要开发Android应用,那么Android Studio往往是必不可少的,其包含Android开发所需要的环境,即便你不打算用Android Studio进行开发,也需要先安装配置好它。首先打开Android Studio中文官网进行下载:https://developer.android.google.cn/studio?hl=zh-cn,下载速度亲测可以,作者今天下载时速度甚至达到了60MB/s。点击下载Android Studio xxx,接着会弹出下载Android Studio的条款及条件。勾选同意后,点击下载Android Studio xxx开始下载。下载完后打开安装包,然后点击Next继续点击Next接着选择好你的安装位置,继续点击Next现在点击Install开始安装耐心等待安装完成安装完成后,点击Next然后点击Finish完成安装等待Android Studio打开。这时会弹出这个收集数据的东西,点击Don't send😎接着进入到Android Studio Setup Wizard的界面,这时如果弹出下图错误,直接点击Cancel即可然后点击Next ...

智力等级体系:天才与超级天才 | 智力修仙

现实中等级体系这个东西非常常见,可以说深深刻在每个人心中,影响着整个社会的运作。数学圈中所谓的鄙视链,即可看作一种等级体系。而在修仙小说中,元婴、化神等各个境界的划分,也是等级体系的一种。这些等级体系其实都是有尽头的,并不是无限往上的,即存在一个最高等级。由于很多修仙作品无限套娃的境界,这让人误以为这是一种失衡的等级体系,或者压根就不能算等级体系。实际上正统的神话中,境界是有尽头的,众神中也存在着最高神。为什么等级体系一定会有最高值,或者说是否等级体系一定会有最高值,这个问题倒是挺有意思的,不过这并不是本文要讨论的重点。我们这次要讨论的是智力的等级体系,这是相较于其他等级体系较少被提及的,即便被提及也没有清晰的划分,定义一般比较模糊且不同人之间差异性较大。本文我们要像小说一样,建立一个完整的智力等级体系,毕竟想要理想上完善这个体系,就必须要考虑到一些超现实的东西。关于智力的度量与划分,基本上都离不开智商这个概念,智商其实就是智力测试的分数。我们首先从智商出发,给出智力等级体系的一个版本。每一级相差30分,意思是上一级的智力跟下一级相比就跟普通人一样。而每一级分为初期、中期、后期、巅峰, ...

回顾经典 - 使命召唤5僵尸模式mod 我的世界

在上一篇文章回顾经典 - 使命召唤5僵尸模式mod 海绵宝宝中,我们介绍了COD5的一个趣味MOD-海绵宝宝。今天我们来介绍一下另一个趣味MOD-我的世界😁。该MOD包含的场景有村庄、地堡、地狱、末地,并且除了普通的Nazi僵尸,还有末影人、烈焰人、末影龙。然后还将一些COD其他系列的武器和饮料搬到这里,还有嚼糖果。该MOD可以通关,通关流程跟MC通关流程一致,即拿到黑曜石打开地狱传送门,前往地狱打烈焰人活得烈焰棒,杀末影人得末影珍珠,回主世界合成末影之眼,前往地堡激活末地传送门,前往末地打败末影龙。该MOD除了MC这种地图外,还有多个其他地图,地图的背景和规则各不相同。当然可玩性最高的还是MC村庄这张地图。除此之外,你可以选择多个游戏难度(solo difficulty): beginner, easy, normal, hard, veteran,难度越低开局初始钱越多,僵尸数量越少,僵尸血量越低,僵尸速度越慢。默认难度是normal,对作者这种萌新而言并不简单,还是beginner舒服😃。并且你还可以修改游戏模式(solo gamemode):original - 正常的僵尸模式 ...

关于弦圈APP帖子附件的下载路径

之前弦圈APP的附件下载路径出现了问题(✔已修复)弦圈APP下载附件功能存在问题,目前暂时无法修复。如若需要下载附件,请先用Web端,现已被修复。现在所有的附件下载完后,都会放在/storage/emulated/0/Documents/Manitori这个文件夹里。需要注意的是,附件下载完后并不会显示在手机的“最近”页中,需要你自行使用文件管理器打开下载路径。 按理来说Documents的意思是文档,里面应该就是专门存放pdf这些文件的,结果却不显示在“最近”页中,也不知道开发手机的是怎么想的😅。不过这是符合某些用户的心意,他们不希望自己下载的东西显示到“最近”,甚至想直接关掉“最近”页。接下来我就给出如何找到下载附件的一个教程吧,防止用户找不到下载的文件。首先在手机桌面找到这个图标的文件打开,如果没有这个图标的文件,可以尝试找名称,见下面这个文件的名称就是文件管理器打开文件管理器后,点击中间的“手机”,然后找到Documents文件夹,打开后找到Manitori文件夹,然后打开即可

这么多个搜索引擎就必应对弦圈最好了,谈谈弦圈过去的SEO经历

早在2024年4月4日弦圈上线之日,我就开始做弦圈的SEO优化,这期间免费付费手段都用过,效果也是起起伏伏。我并不是SEO方面的专家,这篇文章仅仅只是将我过去做SEO的经历说一下,以解释为什么最后我放弃SEO。目前说到SEO优化,一般涉及的就是三大搜索引擎谷歌、百度、必应。其中谷歌全世界体量最大,百度国内体量最大,必应体量没前两者大。如果不考虑国外互联网,仅考虑国内的话,从趋势上看,谷歌对中文互联网不管不顾趋于平稳,百度则正在衰落,而必应则正在增长。在中文互联网中与SEO优化相关的内容,一般说的都是百度SEO或者谷歌SEO,然后大家一致的声音都是百度不行,谷歌行。但网络上的信息往往都有滞后性,加上我是从零开始学习SEO相关知识,总之我持续花了好几个月专攻谷歌SEO,最后都没啥效果。为什么只做了几个月,因为我当时还没大学毕业但也快了。而网上那些所谓的SEO本来就慢,至少一年才有效果等等,在我看来都是p话,先不说我耗不起这样的时间,其次做了那么久那个曝光曲线还是差不多这样,根本没有上升的趋势,我不相信坚持到某天它会突然一飞冲天。期间我试过自己按照网上的教程(国内外的教程都试过),写原创内容 ...

创意总会有枯竭的那天,但创新不会,唯有创新才有可能源源不断、永无止境

根据网上查到的资料,创意这个词是创新的子集:创意是创造意识或创新意识的简称,亦作“剙意”。它是指对现实存在事物的理解以及认知,所衍生出的一种新的抽象思维和行为潜能。但是我认为从实践中讲,更准确地,应该这样定义创意。假设创新是一个集合$A$,那么创意就是任意一个单射$f: B\rightarrow A$且满足$f(B)\subsetneqq A$。By abuse of notation,我们直接将其记作$B$。显然,此定义推广了创意的文字定义。怎么理解这个定义呢?首先两个定义的共同之处是——创意小,创新大。在生产实践中,创意的例子比比皆是,比如说一个商品的包装、一个产品的界面和logo、相同食材的不同煮法等等。这些创意有些是有限的,而有些看似无限其实也是有其上确界。我们可以将这个说法写成一个命题。命题/定义1. 任意一个创意$B$,都存在一个最小实数$M\in\mathbb{R}_{\geq0}$使得$\|B\| \leq M$。此数被称为创意$B$的上确界,并记作$\sup(B)$。为什么说创意是有限的?从生产实践中考虑,绝大多数有创意的产品,经过激烈的商业竞争,在不断的 产生新创意 ...

回顾经典 - 使命召唤5僵尸模式mod 海绵宝宝

使命召唤5虽然是2008年发行的游戏,却是COD系列中最为经典的一个。可以说它是很多玩家的童年回忆,相较于COD的其他版本,COD5可以说拥有最强大的MOD功能,啥都能做成被做成COD5的MOD,即便是COD新版本的僵尸模式也能被做成MOD回到COD5中。得益于此,COD5的僵尸模式至今仍保持着一定热度。曾经COD5的僵尸模式非常火爆,很多MOD如雨后春笋一般涌现,其中不乏一些优秀有趣的MOD,而我今天要介绍的海绵宝宝MOD便是其中之一。这张地图的面积挺大,可玩区域包括:比奇堡小镇水母田音乐会区域高菲高伯冰淇淋船蟹堡王餐厅海绵宝宝的家章鱼哥的房子珊迪的圆顶树屋这种地图包含紧张的跑酷场景,并且有多种饮料,还有特殊的武器。话不多说,直接上图。更多细节介绍、视频,以及下载地址见:Spongebob, Battle for Bikini Bottom [V1.1] LINK UPSpongebob, Battle for Bikini Bottom [V1.1]

(✔已修复)弦圈APP下载附件功能存在问题,目前暂时无法修复。如若需要下载附件,请先用Web端

今天有粉丝反馈,弦圈APP里下载的附件并不能打开。接着我马上打开APP测试,发现文件确实是下载了,但是却找不到下载的文件,这也是当初测试的一个疏漏😢。不过令人沮丧的是,我目前找不到解决这个问题的办法😭。目前这个问题的原因已经查明,就是下载路径的问题。文件下载成功后所放的位置file:///data/user/0/com.sinering.manitori/files,在手机里是打不开的,里面的文件对于用户而言不可见。用户下载的文件相当于存到APP的数据里了,我目前也不清楚如何在手机访问这些文件,开发的时候可以通过Android Studio查看,但是这是真机。由于这是我第一次写APP,自己的技术水平有限,而手机端APP与Web端相比,同样的功能没那么好实现,复杂很多。目前这个问题,我暂时找不到解决办法,其实就是一行代码有问题需要修改。import { File, Paths } from "expo-file-system/next"; const new_file = new File(Paths.document, new_filename);就是上面这行代码里的Paths.do ...

如何创建你的第一个React.js+Vite项目?

最近弦圈APP第一个正式版上线了,在下载弦圈APP这个页面中,GitHub Page的下载页面就是直接用React.js+Vite写的:https://ricciflows.github.io/xianquan-app-download/。那么,对于新人小白而言,如何创建第一个React.js+Vite项目,并写出这样一个简单的页面呢?本文将手把手教你如何实现。首先,你需要安装并配置好node.js环境,具体见Node.js安装与更新教程 - Windows版,并确保node版本是18+或者20+。接着win+R并输入cmd打开控制台(如果你想要选择项目的位置,如D:\Reactjs,则分别输入D:和cd Reactjs)然后输入命令npm create vite@latest如果输出以下结果,则输入y然后按enter键接着输入项目名称,如vite-test按方向键↓,选择React,然后enter接着,根据自己的需要选择。这里我们选择第一个然后根据提示,分别输入三条命令。第一个命令是指进入项目文件vite-test,第二个命令则是安装所有npm依赖,第三个命令则是运行测试模式。注:输 ...

Node.js安装与更新教程 - Windows版

Node.js环境是前端开发的必备环境,无论哪一个前端框架都需要用到node.js,本文将会教你如何安装配置node.js环境。如果你已经安装过node.js,但是想更新,那你也只需要按照安装的步骤直接覆盖原目录即可。首先,打开node.js的官网https://nodejs.org/zh-cn,然后下载node.js的Windows安装包。注:无特别需求,直接安装LTS(long term support)版就可以了。如果觉得在官网下载速度太慢,可以选择镜像网站下载:https://mirrors.aliyun.com/nodejs-release/v22.14.0/?spm=a2c6h.25603864.0.0.4b507621PbOVxm。然后根据自己电脑的配置选择安装包,下载完后直接打开,选Next勾同意协议,然后选Next设置你的安装路径,然后选Next接下来,这里不用管直接选Next继续选Next,注:这里说的是某些npm库安装时需要从C/C++中编译出来,如果你想要能安装这些库,就勾选此项。现在直接选Install,开始安装等待安装完成(如果这时候弹出请求管理员权限,点是 ...

Flutter、Tauri、React Native、Android原生的四次开发经历,为何最后我选择了React Native?

Flutter、Tauri、React Native都是目前移动端流行的跨平台开发框架,并且他们还能胜任全平台开发。React Native是最早开源的跨平台框架,而Flutter紧跟其后,并且Flutter最近几年超越React Native成为当前世界上最流行的跨平台框架。Tauri则是最近几年诞生的新跨平台框架,跟其他框架显著不同的一点是,它允许你使用任何前端框架,即你能够自由使用整个Web生态进行跨平台开发。Flutter、Tauri、React Native、Android原生我都尝试过,接下来我说一下我分别使用他们的开发经历。首先,我第一个使用的跨平台框架是Tauri,当时Tauri V2.0已经发布,我看它能够使用Nuxt.js或者Next.js进行开发,觉得蛮不错的。毕竟我有两个网站,一个是Nuxt写的,另一个是Next写的,这样我就只需要在原有代码基础上修改一下就行了。于是很快我就栽跟头了,首先是Nuxt的桌面端应用,我在dev模式下,没有发现任何问题,$fetch请求也能正常发送。结果build以后,发现所有的请求都废了,全是404,将url改为完整url,结果就是 ...

中国当前的教育最缺少什么?

在我看来,目前国内教育最大的问题之一,就是不告诉学生这个社会残酷的真相,只会不断给学生灌输好好读书,以后勤奋工作才能出人头地、赚钱的错误观念。这导致大学里一个非常奇怪的现象,很多人明明没有赚钱的本事,却拿着父母赚的辛苦钱无所事事,在学校享受人生,俨然一副天堂的样子。因为到了大学,好好读书这个意念也随着高考结束而逐渐淡化,没有人再管你的学习,你只需要期末考试及格,干啥都可以,甚至不及格还能补考。然而殊不知这样的学生出来以后,大概率很快会被社会毒打。被当成螺丝钉是小事,最怕的是连当螺丝钉的资格都没有。目前国内教育的目的就是为了产生好的螺丝钉。但是讽刺的是,学校即便连螺丝钉都不能好好培养出来。因为学校教育与社会现实严重脱轨,以计算机编程类的课程为例,很多课里面的内容还停留在十几年前,完全没有与时俱进,也没考虑生产实际。这也意味着,在学校哪怕你认真上好每一节,学好所有的知识,也并不能在社会的竞争中脱颖而出,你必须自学加量,疯狂内卷,在自我内耗中度过。学校只是个没有感情的流水线工厂,只会机械性的给你灌输教材的内容,不会教你如何赚钱、如何社交、如何对待感情等等更加重要的问题,最后导致很多人遇到感情 ...

是否存在人类大脑永远无法理解的数学结构?

是否存在人类大脑永远无法理解的数学结构?答案是存在也不存在。这个问题重点在于“理解”这个词,怎么样才算是理解?本文中,我们就将理解分为直观理解和抽象理解吧。所谓直观理解,指的是能够通过五官直接感受到。基于这个定义,数学从线性代数中最基础的n维线性空间开始,就不是人脑能够直观理解的了,毕竟人脑只能理解四维以下的空间,即只能理解三维的空间,不能理解处在第四维度的时间。到目前为止,四维时空是否存在都还存在争议,因为并没有直接证据表明四维时空真实存在。因此,从物理世界来看,人脑从四维线性空间开始就无法直观理解了。抛开四维空间是否真实存在的物理争议,考虑纯粹数学上的定义,四维空间是存在的。那么有可能通过作图的方式来直观理解高维空间呢?不能。那些所谓画出四维及以上空间的图,其实是通过投影等方法实现降维,将高维空间的东西通过三维的形式表现出来,并不是真正的高维空间。既然存在那么多大脑无法理解的数学结构,这时数学就派上用场了。数学正是人类用于理解人脑无法直观理解的工具,因为人脑有个很强大的功能——抽象化,既然你无法想象、也无法理解,那干脆就将它抽象化为一个数学对象来研究,即抽象理解。人类对于高维空间的 ...